Eisenbeiss Corpus of monolingual German

The Eisenbeiss corpus provides data from unimpaired monolingual German children and adults and involves a combination of spontaneous speech and semi-structured elicitaion games. Parts of the data have been transcribed in CHAT-format and video-linked using ELAN. The corpus has been funded funded by the MPI society and is hosted in their archive: https://corpus1.mpi.nl/ds/asv/;jsessionid=61F22C0F05E35EF331E603C0A846B3FD?0 . The semi-structured elicitation tasks and the transcription conventions are described here:

Eisenbeiss, Sonja and Sonnenstuhl, Ingrid (2011b) A CHAT-based annotation scheme for case and noun-phrase inflection in child language data. Essex Research Reports in Linguistics 60,3.

Eisenbeiss, Sonja (2011c) CEGS: An elicitation took kit for studies
on case marking and its acquisition. Essex Research Reports in Linguistics 60,1.

Eisenbeiss, Sonja and Sonnenstuhl, Ingrid (2011d) Transcription conventions for the Eisenbeiss German child language corpora. Essex Research Reports in Linguistics 60,2

Eisenbeiss, Sonja (2009) Contrast is the Name of the Game: Contrast-Based Semi-Structured Elicitation Techniques for Studies on Children’s Language Acquisition. Essex Research Reports in Linguistics (ERRL) 57.7.

The transcriptions are available for collaborative projects. The corpus consists of two sub-corpora.

Sub-Corpus 1: The L-Family Corpus

The L-Family corpus involves more than 1000 recordings from a two-year observation of a monolingual German family with four children and two adults. Tab.1 gives an overview:

Table 1. Children involved in the L-Family Corpus

Child Gender Age Year of birth Day-care School
L1 Male 5;2-7;8 1993 1997-2000 from 2000
L2 Male 2;0-4;6 1996 from 1997
L3 Male 0-2;5 1999 from 2000
L4 female 0-0;4 2001


The data collection started in December 1998 with irregular recordings. From June 1999 on, several recordings of varying length were made each week until June 2001. The corpus combines (1) spontaneous speech of children, parents, and guests collected during meals and free play, and (2) semi-structured elicitation games targeted at case marking and noun-phrase-internal agreement marking.

Sub-Corpus 2: The Case Elicitation Corpus

The corpus contains semi-structured elicitation data from more than 40 two- to five-year old monolingual German children. It was collected using a combination of elicitation tasks for case and noun-phrase-internal agreement.