Managing Gigabytes

Managing Gigabytes
Author: Ian H. Witten
Publisher: Morgan Kaufmann
Total Pages: 572
Release: 1999-05-03
Genre: Business & Economics
ISBN: 9781558605701

"This book is the Bible for anyone who needs to manage large data collections. It's required reading for our search gurus at Infoseek. The authors have done an outstanding job of incorporating and describing the most significant new research in information retrieval over the past five years into this second edition." Steve Kirsch, Cofounder, Infoseek Corporation "The new edition of Witten, Moffat, and Bell not only has newer and better text search algorithms but much material on image analysis and joint image/text processing. If you care about search engines, you need this book: it is the only one with full details of how they work. The book is both detailed and enjoyable; the authors have combined elegant writing with top-grade programming." Michael Lesk, National Science Foundation "The coverage of compression, file organizations, and indexing techniques for full text and document management systems is unsurpassed. Students, researchers, and practitioners will all benefit from reading this book." Bruce Croft, Director, Center for Intelligent Information Retrieval at the University of Massachusetts In this fully updated second edition of the highly acclaimed Managing Gigabytes, authors Witten, Moffat, and Bell continue to provide unparalleled coverage of state-of-the-art techniques for compressing and indexing data. Whatever your field, if you work with large quantities of information, this book is essential reading--an authoritative theoretical resource and a practical guide to meeting the toughest storage and access challenges. It covers the latest developments in compression and indexing and their application on the Web and in digital libraries. It also details dozens of powerful techniques supported by mg, the authors' own system for compressing, storing, and retrieving text, images, and textual images. mg's source code is freely available on the Web.

Information Retrieval

Information Retrieval
Author: Stefan Buttcher
Publisher: MIT Press
Total Pages: 633
Release: 2016-02-12
Genre: Computers
ISBN: 0262528878

An introduction to information retrieval, the foundation for modern search engines, that emphasizes implementation and experimentation. Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus—a multiuser open-source information retrieval system developed by one of the authors and available online—provides model implementations and a basis for student work. The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. In addition to its classroom use, Information Retrieval will be a valuable reference for professionals in computer science, computer engineering, and software engineering.

Understanding Compression

Understanding Compression
Author: Colt McAnlis
Publisher: "O'Reilly Media, Inc."
Total Pages: 241
Release: 2016-07-13
Genre: Computers
ISBN: 1491961503

If you want to attract and retain users in the booming mobile services market, you need a quick-loading app that won’t churn through their data plans. The key is to compress multimedia and other data into smaller files, but finding the right method is tricky. This witty book helps you understand how data compression algorithms work—in theory and practice—so you can choose the best solution among all the available compression tools. With tables, diagrams, games, and as little math as possible, authors Colt McAnlis and Aleks Haecky neatly explain the fundamentals. Learn how compressed files are better, cheaper, and faster to distribute and consume, and how they’ll give you a competitive edge. Learn why compression has become crucial as data production continues to skyrocket Know your data, circumstances, and algorithm options when choosing compression tools Explore variable-length codes, statistical compression, arithmetic numerical coding, dictionary encodings, and context modeling Examine tradeoffs between file size and quality when choosing image compressors Learn ways to compress client- and server-generated data objects Meet the inventors and visionaries who created data compression algorithms

Applied Data Science

Applied Data Science
Author: Martin Braschler
Publisher: Springer
Total Pages: 464
Release: 2019-06-13
Genre: Computers
ISBN: 3030118215

This book has two main goals: to define data science through the work of data scientists and their results, namely data products, while simultaneously providing the reader with relevant lessons learned from applied data science projects at the intersection of academia and industry. As such, it is not a replacement for a classical textbook (i.e., it does not elaborate on fundamentals of methods and principles described elsewhere), but systematically highlights the connection between theory, on the one hand, and its application in specific use cases, on the other. With these goals in mind, the book is divided into three parts: Part I pays tribute to the interdisciplinary nature of data science and provides a common understanding of data science terminology for readers with different backgrounds. These six chapters are geared towards drawing a consistent picture of data science and were predominantly written by the editors themselves. Part II then broadens the spectrum by presenting views and insights from diverse authors – some from academia and some from industry, ranging from financial to health and from manufacturing to e-commerce. Each of these chapters describes a fundamental principle, method or tool in data science by analyzing specific use cases and drawing concrete conclusions from them. The case studies presented, and the methods and tools applied, represent the nuts and bolts of data science. Finally, Part III was again written from the perspective of the editors and summarizes the lessons learned that have been distilled from the case studies in Part II. The section can be viewed as a meta-study on data science across a broad range of domains, viewpoints and fields. Moreover, it provides answers to the question of what the mission-critical factors for success in different data science undertakings are. The book targets professionals as well as students of data science: first, practicing data scientists in industry and academia who want to broaden their scope and expand their knowledge by drawing on the authors’ combined experience. Second, decision makers in businesses who face the challenge of creating or implementing a data-driven strategy and who want to learn from success stories spanning a range of industries. Third, students of data science who want to understand both the theoretical and practical aspects of data science, vetted by real-world case studies at the intersection of academia and industry.

Medical Informatics

Medical Informatics
Author: Hsinchun Chen
Publisher: Springer Science & Business Media
Total Pages: 656
Release: 2006-07-19
Genre: Medical
ISBN: 038725739X

Comprehensively presents the foundations and leading application research in medical informatics/biomedicine. The concepts and techniques are illustrated with detailed case studies. Authors are widely recognized professors and researchers in Schools of Medicine and Information Systems from the University of Arizona, University of Washington, Columbia University, and Oregon Health & Science University. Related Springer title, Shortliffe: Medical Informatics, has sold over 8000 copies The title will be positioned at the upper division and graduate level Medical Informatics course and a reference work for practitioners in the field.

Human-computer Interaction, INTERACT '99

Human-computer Interaction, INTERACT '99
Author: Martina Angela Sasse
Publisher: IOS Press
Total Pages: 744
Release: 1999
Genre: Computers
ISBN: 9780967335506

This text provides an overview of leading-edge developments in the field of human-computer interaction. It includes contributions from many key areas that are influencing the use of computers. Sections include speech technology, interaction with mobile and hand-held computers, e-business, web-based systems, virtual reality and haptic interfaces.

How to Build a Digital Library

How to Build a Digital Library
Author: Ian H. Witten
Publisher: Morgan Kaufmann
Total Pages: 655
Release: 2009-11-09
Genre: Computers
ISBN: 0080890393

How to Build a Digital Library reviews knowledge and tools to construct and maintain a digital library, regardless of the size or purpose. A resource for individuals, agencies, and institutions wishing to put this powerful tool to work in their burgeoning information treasuries. The Second Edition reflects developments in the field as well as in the Greenstone Digital Library open source software. In Part I, the authors have added an entire new chapter on user groups, user support, collaborative browsing, user contributions, and so on. There is also new material on content-based queries, map-based queries, cross-media queries. There is an increased emphasis placed on multimedia by adding a "digitizing" section to each major media type. A new chapter has also been added on "internationalization," which will address Unicode standards, multi-language interfaces and collections, and issues with non-European languages (Chinese, Hindi, etc.). Part II, the software tools section, has been completely rewritten to reflect the new developments in Greenstone Digital Library Software, an internationally popular open source software tool with a comprehensive graphical facility for creating and maintaining digital libraries. - Outlines the history of libraries on both traditional and digital - Written for both technical and non-technical audiences and covers the entire spectrum of media, including text, images, audio, video, and related XML standards - Web-enhanced with software documentation, color illustrations, full-text index, source code, and more

Digital Watermarking and Steganography

Digital Watermarking and Steganography
Author: Ingemar Cox
Publisher: Morgan Kaufmann
Total Pages: 623
Release: 2007-11-23
Genre: Computers
ISBN: 0080555802

Digital audio, video, images, and documents are flying through cyberspace to their respective owners. Unfortunately, along the way, individuals may choose to intervene and take this content for themselves. Digital watermarking and steganography technology greatly reduces the instances of this by limiting or eliminating the ability of third parties to decipher the content that he has taken. The many techiniques of digital watermarking (embedding a code) and steganography (hiding information) continue to evolve as applications that necessitate them do the same. The authors of this second edition provide an update on the framework for applying these techniques that they provided researchers and professionals in the first well-received edition. Steganography and steganalysis (the art of detecting hidden information) have been added to a robust treatment of digital watermarking, as many in each field research and deal with the other. New material includes watermarking with side information, QIM, and dirty-paper codes. The revision and inclusion of new material by these influential authors has created a must-own book for anyone in this profession. - This new edition now contains essential information on steganalysis and steganography - New concepts and new applications including QIM introduced - Digital watermark embedding is given a complete update with new processes and applications

Semantic Search over the Web

Semantic Search over the Web
Author: Roberto De Virgilio
Publisher: Springer Science & Business Media
Total Pages: 418
Release: 2012-08-04
Genre: Computers
ISBN: 3642250084

The Web has become the world’s largest database, with search being the main tool that allows organizations and individuals to exploit its huge amount of information. Search on the Web has been traditionally based on textual and structural similarities, ignoring to a large degree the semantic dimension, i.e., understanding the meaning of the query and of the document content. Combining search and semantics gives birth to the idea of semantic search. Traditional search engines have already advertised some semantic dimensions. Some of them, for instance, can enhance their generated result sets with documents that are semantically related to the query terms even though they may not include these terms. Nevertheless, the exploitation of the semantic search has not yet reached its full potential. In this book, Roberto De Virgilio, Francesco Guerra and Yannis Velegrakis present an extensive overview of the work done in Semantic Search and other related areas. They explore different technologies and solutions in depth, making their collection a valuable and stimulating reading for both academic and industrial researchers. The book is divided into three parts. The first introduces the readers to the basic notions of the Web of Data. It describes the different kinds of data that exist, their topology, and their storing and indexing techniques. The second part is dedicated to Web Search. It presents different types of search, like the exploratory or the path-oriented, alongside methods for their efficient and effective implementation. Other related topics included in this part are the use of uncertainty in query answering, the exploitation of ontologies, and the use of semantics in mashup design and operation. The focus of the third part is on linked data, and more specifically, on applying ideas originating in recommender systems on linked data management, and on techniques for the efficiently querying answering on linked data.