What We Are Talking about and What We Are Saying about It
In view of the relationships between theoretical, computational and corpus linguistics, their mutual contributions are discussed and illustrated on the issue of the aspect of language related to the information structure of the sentence, distinguishing ”w
- PDF / 622,762 Bytes
- 22 Pages / 430 x 660 pts Page_size
- 74 Downloads / 236 Views
stract. In view of the relationships between theoretical, computational and corpus linguistics, their mutual contributions are discussed and illustrated on the issue of the aspect of language related to the information structure of the sentence, distinguishing ”what we are talking about” and ”what we are saying about it”.
1
Introduction
The name of the research domain of Computational Linguistics seems to be self-explanatory; however, there has always been a dispute what exactly ‘computational’ means (especially from the point of view of the relation between its theoretical and applied aspects and from the point of view of its supposedly narrowing scope due to the prevalent use of statistical methods). In addition, with the expansion of the use of computers for linguistic studies based on very large empirical language material, and, consequently, with the appearance of an allegedly new domain, corpus linguistics, a question has emerged what is the position of corpus linguistics with regard to computational linguistics. After a summary of some of the issues related to the problem of ‘how many linguistics there are’ (Sect. 2), we briefly sketch in which respects the different ‘linguistics’ can mutually contribute to each other (Sect. 3). The main objective of our paper is to illustrate on an example of a linguistically based multi-layered annotation scenario (Sect. 4) and of a selected linguistic phenomenon, namely the information structure of the sentence (Sect. 5.1), how linguistic theory can contribute to a build-up of an integrated scenario of corpus annotation (Sect. 5.2) and, in the other direction, how a consistent application of such a scenario on a large corpus of continuous texts can provide a useful feedback for the theory (Sect. 5.3). In Section 6, some conclusions will be drawn from the personal experience with working with the given theory and scenario.
2
How Many Linguistics?
If the terms computational linguistics and corpus linguistics are understood rather broadly, as covering those domains of linguistics that are based on the use of computers and on the creation and use of corpora, respectively, then it can be seen that A. Gelbukh (Ed.): CICLing 2008, LNCS 4919, pp. 241–262, 2008. c Springer-Verlag Berlin Heidelberg 2008
242
E. Hajiˇcov´ a
the intersection of the two domains is very large. (Speaking of corpora, what we have in mind are corpora implemented in computers and patterned as data bases.) However, it is important to be aware also of a third domain that develops along with the two mentioned ones, and this is theoretical linguistics. Certainly, there is no descriptive framework universally accepted; there are many different trends in linguistics, as there were a hundred years ago.1 This diversity, which perhaps is even growing, offers certain advantages, among which there is the possibility of fruitful discussions. Different points of view help to throw light on problems discussed and to make choice between the available approaches or their parts. However, the diversity of views also con
Data Loading...