gashol

სიმპოზიუმი: კონფერენცია: კორპუსის კონსტიტუცია და ენობრივი ცოდნა / CORPUS CONSTITUTION AND LANGUAGE KNOWLEDGE

Dates: Friday 30 and Saturday, January 31, 2015

Location: Paris

Partners: SHESL (Society for the History and Epistemology of Language Sciences), HTL "History of Linguistic Theory" (UMR 7597), LLL (UMR 7270) "Ligrien Linguistics Laboratory"

Organizers: Gabriel Bergounioux Bernard Colombat Jacqueline Leo

Scientific Committee: Emily Aussant.

Olivier Baude, Gabriel Bergounioux Danielle Candel, Bernard Colombat Pascal Cordereix Anne Grondeux Bernard Laks, Jacqueline Leo, Franck Neveu, Jean-Marie Pierrel Valerie Raby, Beno Sagot Organizing Committee: Emily Aussant Gabriel Bergounioux Valentina Bisconti, Danielle Candel, Bernard Colombat, Chloe Laplantine Jacqueline Leo, Pascale Rabault-Feuerhahn Valerie Raby, Audrey Viault

Calendar:

June 1: Deadline for submission of rigor

July 10: answer the proponents

Proposals for papers of500 wordsplus bibliography and keywords should be sent (with the subject line SHESL_HTL 2015) to the three leaders:Gabriel Bergounioux
- Bernard Colombat
-Jacqueline Leo

Information about the conference:They are available on-https://shesl htl2015.sciencesconf.org/

Presentation

News linguistic corpus

The reference to the corpus has become a major methodological orientations of contemporary linguistics in connection with the development of digitalization and the use of processing tools.To give an example, science news, we find that in the space of two years in France, this area has received a Research Infrastructure (RI Corpus) available in several consortia, a Equipex (Ortolang ) and a call from the ANR (Corpus HSS).With Huma-Num and the establishment of Dariah and CLARIN project is in Europe that the issue is transposed.

Possible questions

The current interest in the corpus corresponds to an epistemological he thinking?A corpus-based approach may be qualified as strictly empirical or she meets specific design requirements?What is the status of the corpus as data producers in the construction of a linguistic representation?How do they contribute to the cumulativeness of knowledge?What developing corpus does involves instrumentation of languages (eg. Transcription tools) and how they contribute to the development of grammar, writing dictionaries and developments of NLP?What are the effects of corpus data on the theories and schools?What gaps result from their absence?These data being central, essential, exclusive, or, conversely, additional annexes, peripherals, etc.I need regular.What evidence can you give in current research?The growth and diversification of data provided by body-they contribute to improving the theory?

History of work on data

Work on data for the preparation, collation, verification and analysis of linguistic facts is an ancient practice.First it is a philological and exegetical tradition unbroken from antiquity to the present, which remains linked to the founding of libraries and archives as the drafting of compilations (eg. Alexandrians, the Benedictines ).This relationship scholars classification and document exploitation would be in most civilizations, especially in the East.

Possible questions

What meaning given to scaling according to the work on languages is through limited or significant corpus samples?In other words, can we speak of "grammars corpus," which offer a representation of language extension, the product itself constituting a corpus object?If the existence of such objects is proved, since when do they exist and what traditions?Conversely, how to decide the justification of theories that provide such a remedy to the data?How and why constituting the corpus of inscriptions?Epigraphic data such they simply respond to a need for census and completeness, or pose any language problems for those who created and / or those who exploit them?

Extension to the world's languages

With European expansionism, accumulation - that exist in other traditions - has spread to a job description language that transform the use of recording techniques (at the end of the nineteenth century) and the application on the stories collected, methods of transcription and segmentation for which theHandbook of American Indian Languagesremains iconic.In its modern sense, corpus linguistics seems to be experiencing a revival and a new definition in a linguistic anthropological field concerns (the United States) or under planning (Russia).

Possible questions

At the time of massive grammatisation vernacular, how was made the transition from small samples (Collections Our Father for example) to more important data?What were the operating principles in the creation of new data?How these data be organized and what is their status?What is their purpose language (not folk or ethnographic)?How are they different from other tools like mapping?

The constitution of the great modern databases

Automated corpus begins in the 1960s and asks questions sampling (vs full texts), systematic research of structures.From the late 1980s, large amounts of data have become available thanks to the technological development of computers and the development of software.

Possible questions

What are the criteria for defining these large amounts of data as "corpus"?How do they induce a change in the ecology of language practice, especially in the division of scientific work?What are the epistemological foundations of the body of reference and what are the principles of legitimacy?What the growing importance given to speech corpora in languages with written tradition can it lead to a change of perspective?

The "mtacorpus"

Meanwhile, the digitization of books legitimate accumulation of written sources and documents on the representation of companies languages, as exemplified byCTLFandCorpus of French grammar, affecting after language, the metalanguage.

Possible questions

What is the role of these tools in the construction of the representation of language?Do they modify, adjust or refine this representation?