Populating a Linked Data Entity Name System

Populating a Linked Data Entity Name System
Author: M. Kejriwal
Publisher: IOS Press
Total Pages: 190
Release: 2016-12-09
Genre: Computers
ISBN: 161499692X

Resource Description Framework (RDF) is a graph-based data model used to publish data as a Web of Linked Data. RDF is an emergent foundation for large-scale data integration, the problem of providing a unified view over multiple data sources. An Entity Name System (ENS) is a thesaurus for entities, and is a crucial component in a data integration architecture. Populating a Linked Data ENS is equivalent to solving an Artificial Intelligence problem called instance matching, which concerns identifying pairs of entities referring to the same underlying entity. This publication presents an instance matcher with 4 properties, namely automation, heterogeneity, scalability and domain independence. Automation is addressed by employing inexpensive but well-performing heuristics to automatically generate a training set, which is employed by other machine learning algorithms in the pipeline. Data-driven alignment algorithms are adapted to deal with structural heterogeneity in RDF graphs. Domain independence is established by actively avoiding prior assumptions about input domains, and through evaluations on 10 RDF test cases. The full system is scaled by implementing it on cloud infrastructure using MapReduce algorithms. Resource Description Framework (RDF) is a graph-based data model used to publish data as a Web of Linked Data. RDF is an emergent foundation for large-scale data integration, the problem of providing a unified view over multiple data sources. An Entity Name System (ENS) is a thesaurus for entities, and is a crucial component in a data integration architecture. Populating a Linked Data ENS is equivalent to solving an Artificial Intelligence problem called instance matching, which concerns identifying pairs of entities referring to the same underlying entity. This publication presents an instance matcher with 4 properties, namely automation, heterogeneity, scalability and domain independence. Automation is addressed by employing inexpensive but well-performing heuristics to automatically generate a training set, which is employed by other machine learning algorithms in the pipeline. Data-driven alignment algorithms are adapted to deal with structural heterogeneity in RDF graphs. Domain independence is established by actively avoiding prior assumptions about input domains, and through evaluations on 10 RDF test cases. The full system is scaled by implementing it on cloud infrastructure using MapReduce algorithms.

Knowledge Graphs

Knowledge Graphs
Author: Mayank Kejriwal
Publisher: MIT Press
Total Pages: 559
Release: 2021-03-30
Genre: Computers
ISBN: 0262045095

A rigorous and comprehensive textbook covering the major approaches to knowledge graphs, an active and interdisciplinary area within artificial intelligence. The field of knowledge graphs, which allows us to model, process, and derive insights from complex real-world data, has emerged as an active and interdisciplinary area of artificial intelligence over the last decade, drawing on such fields as natural language processing, data mining, and the semantic web. Current projects involve predicting cyberattacks, recommending products, and even gleaning insights from thousands of papers on COVID-19. This textbook offers rigorous and comprehensive coverage of the field. It focuses systematically on the major approaches, both those that have stood the test of time and the latest deep learning methods.

Identity of Long-tail Entities in Text

Identity of Long-tail Entities in Text
Author: F. Ilievski
Publisher: IOS Press
Total Pages: 229
Release: 2019-11-29
Genre: Computers
ISBN: 1643680439

The digital era has generated a huge amount of data on the identities (profiles) of people, organizations and other entities in a digital format, largely consisting of textual documents such as news articles, encyclopedias, personal websites, books, and social media. Identity has thus been transformed from a philosophical to a societal issue, one requiring robust computational tools to determine entity identity in text. Computational systems developed to establish identity in text often struggle with long-tail cases. This book investigates how Natural Language Processing (NLP) techniques for establishing the identity of long-tail entities – which are all infrequent in communication, hardly represented in knowledge bases, and potentially very ambiguous – can be improved through the use of background knowledge. Topics covered include: distinguishing tail entities from head entities; assessing whether current evaluation datasets and metrics are representative for long-tail cases; improving evaluation of long-tail cases; accessing and enriching knowledge on long-tail entities in the Linked Open Data cloud; and investigating the added value of background knowledge (“profiling”) models for establishing the identity of NIL entities. Providing novel insights into an under-explored and difficult NLP challenge, the book will be of interest to all those working in the field of entity identification in text.

The Semantic Web – ISWC 2014

The Semantic Web – ISWC 2014
Author: Peter Mika
Publisher: Springer
Total Pages: 588
Release: 2014-10-09
Genre: Computers
ISBN: 331911915X

The two-volume set LNCS 8796 and 8797 constitutes the refereed proceedings of the 13th International Semantic Web Conference, ISWC 2014, held in Riva del Garda, in October 2014. The International Semantic Web Conference is the premier forum for Semantic Web research, where cutting edge scientific results and technological innovations are presented, where problems and solutions are discussed, and where the future of this vision is being developed. It brings together specialists in fields such as artificial intelligence, databases, social networks, distributed computing, Web engineering, information systems, human-computer interaction, natural language processing, and the social sciences. Part 1 (LNCS 8796) contains a total of 38 papers which were presented in the research track. They were carefully reviewed and selected from 180 submissions. Part 2 (LNCS 8797) contains 15 papers from the 'semantic Web in use' track which were accepted from 46 submissions. In addition, it presents 16 contributions of the RBDS track and 6 papers of the doctoral consortium.

Semantic Data Mining

Semantic Data Mining
Author: A. Ławrynowicz
Publisher: IOS Press
Total Pages: 210
Release: 2017-04-18
Genre: Computers
ISBN: 1614997462

Ontologies are now increasingly used to integrate, and organize data and knowledge, particularly in data and knowledge-intensive applications in both research and industry. The book is devoted to semantic data mining – a data mining approach where domain ontologies are used as background knowledge, and where the new challenge is to mine knowledge encoded in domain ontologies and knowledge graphs, rather than only purely empirical data. The introductory chapters of the book provide theoretical foundations of both data mining and ontology representation. Taking a unified perspective, the book then covers several methods for semantic data mining, addressing tasks such as pattern mining, classification and similarity-based approaches. It attempts to provide state-of-the-art answers to specific challenges and peculiarities of data mining with use of ontologies, in particular: How to deal with incompleteness of knowledge and the so-called Open World Assumption? What is a truly “semantic” similarity measure? The book contains several chapters with examples of applications of semantic data mining. The examples start from a scenario with moderate use of lightweight ontologies for knowledge graph enrichment and end with a full-fledged scenario of an intelligent knowledge discovery assistant using complex domain ontologies for meta-mining, i.e., an ontology-based meta-learning approach to full data mining processes. The book is intended for researchers in the fields of semantic technologies, knowledge engineering, data science, and data mining, and developers of knowledge-based systems and applications.

Managing and Consuming Completeness Information for RDF Data Sources

Managing and Consuming Completeness Information for RDF Data Sources
Author: F. Darari
Publisher: IOS Press
Total Pages: 194
Release: 2019-11-12
Genre: Computers
ISBN: 1643680358

The increasing amount of structured data available on the Web is laying the foundations for a global-scale knowledge base. But the ever increasing amount of Semantic Web data gives rise to the question – how complete is that data? Though data on the Semantic Web is generally incomplete, some may indeed be complete. In this book, the author deals with how to manage and consume completeness information about Semantic Web data. In particular, the book explores how completeness information can guarantee the completeness of query answering. Optimization techniques for completeness reasoning and the conducting of experimental evaluations are provided to show the feasibility of the approaches, as well as a technique for checking the soundness of queries with negation via reduction to query completeness checking. Other topics covered include completeness information with timestamps, and two demonstrators – CORNER and COOL-WD – are provided to show how a completeness framework can be realized. Finally, the book investigates an automated method to generate completeness statements from text on the Web. The book will be of interest to anyone whose work involves dealing with Web-data completeness.

Multi-modal Data Fusion based on Embeddings

Multi-modal Data Fusion based on Embeddings
Author: S. Thoma
Publisher: IOS Press
Total Pages: 174
Release: 2019-11-06
Genre: Computers
ISBN: 1643680293

Many web pages include structured data in the form of semantic markup, which can be transferred to the Resource Description Framework (RDF) or provide an interface to retrieve RDF data directly. This RDF data enables machines to automatically process and use the data. When applications need data from more than one source the data has to be integrated, and the automation of this can be challenging. Usually, vocabularies are used to concisely describe the data, but because of the decentralized nature of the web, multiple data sources can provide similar information with different vocabularies, making integration more difficult. This book, Multi-modal Data Fusion based on Embeddings, describes how similar statements about entities can be identified across sources, independent of the vocabulary and data modeling choices. Previous approaches have relied on clean and extensively modeled ontologies for the alignment of statements, but the often noisy data in a web context does not necessarily adhere to these prerequisites. In this book, the use of RDF label information of entities is proposed to tackle this problem. In combination with embeddings, the use of label information allows for a better integration of noisy data, something that has been empirically confirmed by experiment. The book presents two main scientific contributions: the vocabulary and modeling agnostic fusion approach on the purely textual label information, and the combination of three different modalities into one multi-modal embedding space for a more human-like notion of similarity. The book will be of interest to all those faced with the problem of processing data from multiple web-based sources.

Exploiting Semantic Web Knowledge Graphs in Data Mining

Exploiting Semantic Web Knowledge Graphs in Data Mining
Author: P. Ristoski
Publisher: IOS Press
Total Pages: 246
Release: 2019-06-28
Genre: Computers
ISBN: 1614999813

Data Mining and Knowledge Discovery in Databases (KDD) is a research field concerned with deriving higher-level insights from data. The tasks performed in this field are knowledge intensive and can benefit from additional knowledge from various sources, so many approaches have been proposed that combine Semantic Web data with the data mining and knowledge discovery process. This book, Exploiting Semantic Web Knowledge Graphs in Data Mining, aims to show that Semantic Web knowledge graphs are useful for generating valuable data mining features that can be used in various data mining tasks. In Part I, Mining Semantic Web Knowledge Graphs, the author evaluates unsupervised feature generation strategies from types and relations in knowledge graphs used in different data mining tasks such as classification, regression, and outlier detection. Part II, Semantic Web Knowledge Graphs Embeddings, proposes an approach that circumvents the shortcomings introduced with the approaches in Part I, developing an approach that is able to embed complete Semantic Web knowledge graphs in a low dimensional feature space where each entity and relation in the knowledge graph is represented as a numerical vector. Finally, Part III, Applications of Semantic Web Knowledge Graphs, describes a list of applications that exploit Semantic Web knowledge graphs like classification and regression, showing that the approaches developed in Part I and Part II can be used in applications in various domains. The book will be of interest to all those working in the field of data mining and KDD.

Query Processing over Graph-structured Data on the Web

Query Processing over Graph-structured Data on the Web
Author: M. Acosta Deibe
Publisher: IOS Press
Total Pages: 244
Release: 2018-10-12
Genre: Computers
ISBN: 1614999163

In the last years, Linked Data initiatives have encouraged the publication of large graph-structured datasets using the Resource Description Framework (RDF). Due to the constant growth of RDF data on the web, more flexible data management infrastructures must be able to efficiently and effectively exploit the vast amount of knowledge accessible on the web. This book presents flexible query processing strategies over RDF graphs on the web using the SPARQL query language. In this work, we show how query engines can change plans on-the-fly with adaptive techniques to cope with unpredictable conditions and to reduce execution time. Furthermore, this work investigates the application of crowdsourcing in query processing, where engines are able to contact humans to enhance the quality of query answers. The theoretical and empirical results presented in this book indicate that flexible techniques allow for querying RDF data sources efficiently and effectively.

Semantic Sentiment Analysis in Social Streams

Semantic Sentiment Analysis in Social Streams
Author: H. Saif
Publisher: IOS Press
Total Pages: 310
Release: 2017-06-12
Genre: Computers
ISBN: 1614997519

Microblogs and social media platforms are now considered among the most popular forms of online communication. Through a platform like Twitter, much information reflecting people’s opinions and attitudes is published and shared among users on a daily basis. This has recently brought great opportunities to companies interested in tracking and monitoring the reputation of their brands and businesses, and to policy makers and politicians to support their assessment of public opinions about their policies or political issues. A wide range of approaches to sentiment analysis on social media, have been recently built. Most of these approaches rely mainly on the presence of affect words or syntactic structures that explicitly and unambiguously reflect sentiment. However, these approaches are semantically weak, that is, they do not account for the semantics of words when detecting their sentiment in text. In order to address this problem, the author investigates the role of word semantics in sentiment analysis of microblogs. Specifically, Twitter is used as a case study of microblogging platforms to investigate whether capturing the sentiment of words with respect to their semantics leads to more accurate sentiment analysis models on Twitter. To this end, the author proposes several approaches in this book for extracting and incorporating two types of word semantics for sentiment analysis: contextual semantics (i.e., semantics captured from words’ co-occurrences) and conceptual semantics (i.e., semantics extracted from external knowledge sources). Experiments are conducted with both types of semantics by assessing their impact in three popular sentiment analysis tasks on Twitter; entity-level sentiment analysis, tweet-level sentiment analysis and context-sensitive sentiment lexicon adaptation. The findings from this body of work demonstrate the value of using semantics in sentiment analysis on Twitter. The proposed approaches, which consider word semantics for sentiment analysis at both entity and tweet levels, surpass non-semantic approaches in most evaluation scenarios. This book will be of interest to students, researchers and practitioners in the semantic sentiment analysis field.