Lemmatising Old English on a relational database

Lemmatising Old English on a relational database
Author: Laura García Fernández
Publisher: utzverlag GmbH
Total Pages: 432
Release: 2020-08-20
Genre: Language Arts & Disciplines
ISBN: 3831648212

This work contributes to the research in the linguistic analysis of Old English with corpus-based lexical databases. In the specific area of Old English, which presents numerous morphological variations and lacks a written standard, a lemmatised corpus is necessary. Thus, the aim of this work is to lemmatise part of the verbal lexicon of Old English, combining aspects of Morphology, Lexicography and Corpus Analysis. The scope is restricted to the most morphologically complex verbal classes of Old English, including irregular verbs and reduplicative verbs, which comprise preterite-present, anomalous, contracted and strong VII verbs. This aim requires, firstly, the selection and management of the sources of data and verification of results; and secondly, the design and sequencing of the steps of the lemmatisation tasks. This research also raises the issue of the automatisation of the process of lemmatisation of Old English verbs, on which little previous literature has been found. In conclusion, this work offers an inventory of inflectional forms and lemmas of the verbs under analysis. On the applied side, this work presents different procedures of automatic and manual lemmatisation that can be applied to the fields of Lexicography and Corpus Linguistics.

What's in a Word-list?

What's in a Word-list?
Author: Dawn Archer
Publisher: Routledge
Total Pages: 214
Release: 2016-02-24
Genre: Language Arts & Disciplines
ISBN: 1134761481

The frequency with which particular words are used in a text can tell us something meaningful both about that text and also about its author because their choice of words is seldom random. Focusing on the most frequent lexical items of a number of generated word frequency lists can help us to determine whether all the texts are written by the same author. Alternatively, they might wish to determine whether the most frequent words of a given text (captured by its word frequency list) are suggestive of potentially meaningful patterns that could have been overlooked had the text been read manually. This edited collection brings together cutting-edge research written by leading experts in the field on the construction of word-lists for the analysis of both frequency and keyword usage. Taken together, these papers provide a comprehensive and up-to-date survey of the most exciting research being conducted in this subject.

Supervised Machine Learning for Text Analysis in R

Supervised Machine Learning for Text Analysis in R
Author: Emil Hvitfeldt
Publisher: CRC Press
Total Pages: 402
Release: 2021-10-22
Genre: Computers
ISBN: 1000461971

Text data is important for many domains, from healthcare to marketing to the digital humanities, but specialized approaches are necessary to create features for machine learning from language. Supervised Machine Learning for Text Analysis in R explains how to preprocess text data for modeling, train models, and evaluate model performance using tools from the tidyverse and tidymodels ecosystem. Models like these can be used to make predictions for new observations, to understand what natural language features or characteristics contribute to differences in the output, and more. If you are already familiar with the basics of predictive modeling, use the comprehensive, detailed examples in this book to extend your skills to the domain of natural language processing. This book provides practical guidance and directly applicable knowledge for data scientists and analysts who want to integrate unstructured text data into their modeling pipelines. Learn how to use text data for both regression and classification tasks, and how to apply more straightforward algorithms like regularized regression or support vector machines as well as deep learning approaches. Natural language must be dramatically transformed to be ready for computation, so we explore typical text preprocessing and feature engineering steps like tokenization and word embeddings from the ground up. These steps influence model results in ways we can measure, both in terms of model metrics and other tangible consequences such as how fair or appropriate model results are.

WordNet

WordNet
Author: Christiane Fellbaum
Publisher: MIT Press
Total Pages: 452
Release: 1998
Genre: Computers
ISBN: 9780262061971

WordNet, an electronic lexical database, is considered to be the most important resource available to researchers in computational linguistics, text analysis, and many related areas. English nouns, verbs, adjectives, and adverbs are organized into synonym sets, each representing one underlying lexicalized concept. Different relations link the synonym sets. The purpose of this volume is twofold. First, it discusses the design of WordNet and the theoretical motivations behind it. Second, it provides a survey of representative applications, including word sense identification, information retrieval, selectional preferences of verbs, and lexical chains.

Introduction to Information Retrieval

Introduction to Information Retrieval
Author: Christopher D. Manning
Publisher: Cambridge University Press
Total Pages:
Release: 2008-07-07
Genre: Computers
ISBN: 1139472100

Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.

Applied Natural Language Processing in the Enterprise

Applied Natural Language Processing in the Enterprise
Author: Ankur A. Patel
Publisher: "O'Reilly Media, Inc."
Total Pages: 336
Release: 2021-05-12
Genre: Computers
ISBN: 1492062545

NLP has exploded in popularity over the last few years. But while Google, Facebook, OpenAI, and others continue to release larger language models, many teams still struggle with building NLP applications that live up to the hype. This hands-on guide helps you get up to speed on the latest and most promising trends in NLP. With a basic understanding of machine learning and some Python experience, you'll learn how to build, train, and deploy models for real-world applications in your organization. Authors Ankur Patel and Ajay Uppili Arasanipalai guide you through the process using code and examples that highlight the best practices in modern NLP. Use state-of-the-art NLP models such as BERT and GPT-3 to solve NLP tasks such as named entity recognition, text classification, semantic search, and reading comprehension Train NLP models with performance comparable or superior to that of out-of-the-box systems Learn about Transformer architecture and modern tricks like transfer learning that have taken the NLP world by storm Become familiar with the tools of the trade, including spaCy, Hugging Face, and fast.ai Build core parts of the NLP pipeline--including tokenizers, embeddings, and language models--from scratch using Python and PyTorch Take your models out of Jupyter notebooks and learn how to deploy, monitor, and maintain them in production

The People’s Web Meets NLP

The People’s Web Meets NLP
Author: Iryna Gurevych
Publisher: Springer Science & Business Media
Total Pages: 394
Release: 2013-04-03
Genre: Language Arts & Disciplines
ISBN: 3642350852

Collaboratively Constructed Language Resources (CCLRs) such as Wikipedia, Wiktionary, Linked Open Data, and various resources developed using crowdsourcing techniques such as Games with a Purpose and Mechanical Turk have substantially contributed to the research in natural language processing (NLP). Various NLP tasks utilize such resources to substitute for or supplement conventional lexical semantic resources and linguistically annotated corpora. These resources also provide an extensive body of texts from which valuable knowledge is mined. There are an increasing number of community efforts to link and maintain multiple linguistic resources. This book aims offers comprehensive coverage of CCLR-related topics, including their construction, utilization in NLP tasks, and interlinkage and management. Various Bachelor/Master/Ph.D. programs in natural language processing, computational linguistics, and knowledge discovery can use this book both as the main text and as a supplementary reading. The book also provides a valuable reference guide for researchers and professionals for the above topics.