Building A National Corpus
Download Building A National Corpus full books in PDF, epub, and Kindle. Read online free Building A National Corpus ebook anywhere anytime directly on your device. Fast Download speed and no annoying ads. We cannot guarantee that every ebooks is available!
Author | : Dawn Knight |
Publisher | : Springer Nature |
Total Pages | : 192 |
Release | : 2021-10-08 |
Genre | : Language Arts & Disciplines |
ISBN | : 3030818586 |
This book aims to provide a micro-level, working model of a methodological approach and practical guidelines for building a corpus, informed by the work on the CorCenCC project (Corpws Cenedlaethol Cymraeg Cyfoes - the National Corpus of Contemporary Welsh). It focuses specifically on the development of detailed design frames for corpora across communicative modes (spoken, written and e-language), and the practical processes involved in the planning, collection, transcription, collation and (re)presentation of language data. The book is designed to be of significant value and relevance to those interested in critically engaging with corpus methodology. Although Welsh is the language under discussion, the processes and approaches discussed in the building of CorCenCC can be applied to a lesser or greater extent to other language contexts. This book provides a working model, and an account of how to build a corpus dataset from which step by step guidelines for creating other linguistic corpora in any language can be easily extrapolated. It will be of value to students and scholars of minority languages and corpus linguistics.
Author | : Martin Wynne |
Publisher | : Oxbow Books Limited |
Total Pages | : 100 |
Release | : 2005 |
Genre | : Language Arts & Disciplines |
ISBN | : |
A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.
Author | : Robbie Love |
Publisher | : Routledge |
Total Pages | : 183 |
Release | : 2020-01-06 |
Genre | : Language Arts & Disciplines |
ISBN | : 0429771096 |
This volume offers a critical examination of the construction of the Spoken British National Corpus 2014 (Spoken BNC2014) and points the way forward toward a more informed understanding of corpus linguistic methodology more broadly. The book begins by situating the creation of this second corpus, a compilation of new, publicly-accessible Spoken British English from the 2010s, within the context of the first, created in 1994, talking through the need to balance backward capability and optimal practice for today’s users. Chapters subsequently use the Spoken BNC2014 as a focal point around which to discuss the various considerations taken into account in corpus construction, including design, data collection, transcription, and annotation. The volume concludes by reflecting on the successes and limitations of the project, as well as the broader utility of the corpus in linguistic research, both in current examples and future possibilities. This exciting new contribution to the literature on linguistic methodology is a valuable resource for students and researchers in corpus linguistics, applied linguistics, and English language teaching.
Author | : Charles F. Meyer |
Publisher | : Cambridge University Press |
Total Pages | : 188 |
Release | : 2002-06-13 |
Genre | : Computers |
ISBN | : 0521808790 |
English Corpus Linguistics is a step-by-step guide to creating and analyzing linguistic corpora. It begins with a discussion of the role that corpus linguistics plays in linguistic theory, demonstrating that corpora have proven to be very useful resources for linguists who believe that their theories and descriptions of English should be based on real rather than contrived data. Charles F. Meyer goes on to describe how to plan the creation of a corpus, how to collect and computerize data for inclusion in a corpus, how to annotate the data that are collected, and how to conduct a corpus analysis of a completed corpus. The book concludes with an overview of the challenges that corpus linguists face to make both the creation and analysis of corpora much easier undertakings than they currently are. Clearly organized and accessibly written, this book will appeal to students of linguistics and English language.
Author | : Peter Spyns |
Publisher | : Springer Science & Business Media |
Total Pages | : 414 |
Release | : 2013-02-26 |
Genre | : Language Arts & Disciplines |
ISBN | : 3642309100 |
The book provides an overview of more than a decade of joint R&D efforts in the Low Countries on HLT for Dutch. It not only presents the state of the art of HLT for Dutch in the areas covered, but, even more importantly, a description of the resources (data and tools) for Dutch that have been created are now available for both academia and industry worldwide. The contributions cover many areas of human language technology (for Dutch): corpus collection (including IPR issues) and building (in particular one corpus aiming at a collection of 500M word tokens), lexicology, anaphora resolution, a semantic network, parsing technology, speech recognition, machine translation, text (summaries) generation, web mining, information extraction, and text to speech to name the most important ones. The book also shows how a medium-sized language community (spanning two territories) can create a digital language infrastructure (resources, tools, etc.) as a basis for subsequent R&D. At the same time, it bundles contributions of almost all the HLT research groups in Flanders and the Netherlands, hence offers a view of their recent research activities. Targeted readers are mainly researchers in human language technology, in particular those focusing on Dutch. It concerns researchers active in larger networks such as the CLARIN, META-NET, FLaReNet and participating in conferences such as ACL, EACL, NAACL, COLING, RANLP, CICling, LREC, CLIN and DIR ( both in the Low Countries), InterSpeech, ASRU, ICASSP, ISCA, EUSIPCO, CLEF, TREC, etc. In addition, some chapters are interesting for human language technology policy makers and even for science policy makers in general.
Author | : Paul Baker |
Publisher | : Bloomsbury Publishing |
Total Pages | : 281 |
Release | : 2023-08-24 |
Genre | : Language Arts & Disciplines |
ISBN | : 1350083771 |
How can you carry out discourse analysis using corpus linguistics? What research questions should I ask? Which methods should you use and when? What is a collocational network or a key cluster? Introducing the major techniques, methods and tools for corpus-assisted analysis of discourse, this book answers these questions and more, showing readers how to best use corpora in their analyses of discourse. Using carefully tailored case studies, each chapter is devoted to a central technique, including frequency, concordancing and keywords, going step by step through the process of applying different analytical procedures. Introducing a wide range of different corpora, from holiday brochures to political debates, the book considers the key debates and latest advances in the field. Fully revised and updated, this new edition includes: - A new chapter on how to conduct research projects in corpus-based discourse analysis - Completely rewritten chapters on collocation and advanced techniques, using a corpus of jihadist propaganda texts and covering topics such as social media and visual analysis - Coverage of major tools, including CQPweb, AntConc, Sketch Engine and #LancsBox - Discussion of newer techniques including the derivation of lockwords and the comparison of multiple data sets for diachronic analysis With exercises, discussion questions and suggested further readings in each chapter, this book is an excellent guide to using corpus linguistics techniques to carry out discourse analysis.
Author | : Vaclav Brezina |
Publisher | : Cambridge University Press |
Total Pages | : 317 |
Release | : 2018-09-20 |
Genre | : Foreign Language Study |
ISBN | : 1107125707 |
A comprehensive and accessible introduction to statistics in corpus linguistics, covering multiple techniques of quantitative language analysis and data visualisation.
Author | : Sandra Kuebler |
Publisher | : Bloomsbury Publishing |
Total Pages | : 321 |
Release | : 2014-12-18 |
Genre | : Language Arts & Disciplines |
ISBN | : 1441119809 |
Linguistically annotated corpora are becoming a central part of the corpus linguistics field. One of their main strengths is the level of searchability they offer, but with the annotation come problems of the initial complexity of queries and query tools. This book gives a full, pedagogic account of this burgeoning field. Beginning with an overview of corpus linguistics, its prerequisites and goals, the book then introduces linguistically annotated corpora. It explores the different levels of linguistic annotation, including morphological, parts of speech, syntactic, semantic and discourse-level, as well as advantages and challenges for such annotations. It covers the main annotated corpora for English, the Penn Treebank, the International Corpus of English, and OntoNotes, as well as a wide range of corpora for other languages. In its third part, search strategies required for different types of data are explored. All chapters are accompanied by exercises and by sections on further reading.
Author | : Stefanowitsch, Anatol |
Publisher | : Language Science Press |
Total Pages | : 510 |
Release | : 2020 |
Genre | : Language Arts & Disciplines |
ISBN | : 3961102244 |
Corpora are used widely in linguistics, but not always wisely. This book attempts to frame corpus linguistics systematically as a variant of the observational method. The first part introduces the reader to the general methodological discussions surrounding corpus data as well as the practice of doing corpus linguistics, including issues such as the scientific research cycle, research design, extraction of corpus data and statistical evaluation. The second part consists of a number of case studies from the main areas of corpus linguistics (lexical associations, morphology, grammar, text and metaphor), surveying the range of issues studied in corpus linguistics while at the same time showing how they fit into the methodology outlined in the first part.
Author | : Antoinette Renouf |
Publisher | : Rodopi |
Total Pages | : 408 |
Release | : 2016-08 |
Genre | : Computers |
ISBN | : 940120179X |
Preliminary Material /Antoinette Renouf and Andrew Kehoe -- The corpus-user's chorus: (Based on The Major General's Song from Gilbert and Sullivan's The Pirates of Penzance) /Antoinette Renouf and Andrew Kehoe -- Introduction: The changing face of corpus linguistics /Antoinette Renouf and Andrew Kehoe -- Oh Canada! Towards the Corpus of Early Ontario English /Stefan Dollinger -- Favoring Americanisms? vs. before and in Early English in Australia: A corpus-based approach /Clemens Fritz -- Computing the Lexicons of Early Modern English /Ian Lancashire -- EFL dictionaries, grammars and language guides from 1700 to 1850: testing a new corpus on points of spokenness /Manfred Markus -- The Old English Apollonius of Tyre in the light of the Old English Concordancer /Antonio Miranda García , Javier Calle Martín , David Moreno Olalla and Gustavo Muñoz González -- Prediction with SHALL and WILL: a diachronic perspective /Maurizio Gotti -- Circumstantial adverbials in discourse: a synchronic and a diachronic perspective /Anneli Meurman-Solin and Päivi Pahta -- Changes in textual structures of book advertisements in the ZEN Corpus /Caren auf dem Keller -- “Curtains like these are selling right in the city of Chicago for USD 1.50” - The mediopassive in American 20th-century advertising language /Marianne Hundt -- Recent grammatical change in written English 1961-1992: some preliminary findings of a comparison of American with British English /Geoffrey Leech and Nicholas Smith -- Social variation in the use of apology formulae in the British National Corpus /Mats Deutschmann -- How recent is recent? On overcoming interpretational difficulties /Göran Kjellmer -- Looking at looking: Functions and contexts of progressives in spoken English and 'school' English /Ute Römer -- Ditransitives, the Given Before New principle, and textual retrievability: a corpus-based study using ICECUP /Gabriel Ozón -- The Spanish pragmatic marker pues and its English equivalents /Anna-Brita Stenström -- WebCorp: A tool for online linguistic information retrieval and analysis /Barry Morley -- Diachronic linguistic analysis on the web with WebCorp /Andrew Kehoe -- New ways of analysing ESL on the WWW with WebCorp and WebPhraseCount /Josef Schmied -- I'm like, “Hey, it works!”: Using GlossaNet to find attestations of the quotative (be) like in English-language newspapers /Cédrick Fairon and John V. Singler -- Corpus linguistics and English reference grammars /Joybrato Mukherjee -- Tracking ongoing grammatical change and recent diversification in present-day standard English: the complementary role of small and large corpora /Christian Mair -- but it will take time...points of view on a lexical grammar of English /Michaela Mahlberg -- Corpus linguistics, grammar and theory: Report on a panel discussion at the 24th ICAME conference /Jan Aarts.