Evolutionary Decision Trees in Large-Scale Data Mining

Evolutionary Decision Trees in Large-Scale Data Mining
Author: Marek Kretowski
Publisher: Springer
Total Pages: 180
Release: 2019-06-05
Genre: Computers
ISBN: 3030218511

This book presents a unified framework, based on specialized evolutionary algorithms, for the global induction of various types of classification and regression trees from data. The resulting univariate or oblique trees are significantly smaller than those produced by standard top-down methods, an aspect that is critical for the interpretation of mined patterns by domain analysts. The approach presented here is extremely flexible and can easily be adapted to specific data mining applications, e.g. cost-sensitive model trees for financial data or multi-test trees for gene expression data. The global induction can be efficiently applied to large-scale data without the need for extraordinary resources. With a simple GPU-based acceleration, datasets composed of millions of instances can be mined in minutes. In the event that the size of the datasets makes the fastest memory computing impossible, the Spark-based implementation on computer clusters, which offers impressive fault tolerance and scalability potential, can be applied.

Evolutionary Computation in Data Mining

Evolutionary Computation in Data Mining
Author: Ashish Ghosh
Publisher: Springer
Total Pages: 279
Release: 2006-06-22
Genre: Computers
ISBN: 3540323589

Data mining (DM) consists of extracting interesting knowledge from re- world, large & complex data sets; and is the core step of a broader process, called the knowledge discovery from databases (KDD) process. In addition to the DM step, which actually extracts knowledge from data, the KDD process includes several preprocessing (or data preparation) and post-processing (or knowledge refinement) steps. The goal of data preprocessing methods is to transform the data to facilitate the application of a (or several) given DM algorithm(s), whereas the goal of knowledge refinement methods is to validate and refine discovered knowledge. Ideally, discovered knowledge should be not only accurate, but also comprehensible and interesting to the user. The total process is highly computation intensive. The idea of automatically discovering knowledge from databases is a very attractive and challenging task, both for academia and for industry. Hence, there has been a growing interest in data mining in several AI-related areas, including evolutionary algorithms (EAs). The main motivation for applying EAs to KDD tasks is that they are robust and adaptive search methods, which perform a global search in the space of candidate solutions (for instance, rules or another form of knowledge representation).

Parallel Processing and Applied Mathematics

Parallel Processing and Applied Mathematics
Author: Roman Wyrzykowski
Publisher: Springer Nature
Total Pages: 487
Release: 2023-04-27
Genre: Computers
ISBN: 303130442X

This two-volume set, LNCS 13826 and LNCS 13827, constitutes the proceedings of the 14th International Conference on Parallel Processing and Applied Mathematics, PPAM 2022, held in Gdansk, Poland, in September 2022. The 77 regular papers presented in these volumes were selected from 132 submissions. For regular tracks of the conference, 33 papers were selected from 62 submissions. The papers were organized in topical sections named as follows: Part I: numerical algorithms and parallel scientific computing; parallel non-numerical algorithms; GPU computing; performance analysis and prediction in HPC systems; scheduling for parallel computing; environments and frameworks for parallel/cloud computing; applications of parallel and distributed computing; soft computing with applications and special session on parallel EVD/SVD and its application in matrix computations. Part II: 9th Workshop on Language-Based Parallel Programming (WLPP 2022); 6th Workshop on Models, Algorithms and Methodologies for Hybrid Parallelism in New HPC Systems (MAMHYP 2022); first workshop on quantum computing and communication; First Workshop on Applications of Machine Learning and Artificial Intelligence in High Performance Computing (WAML 2022); 4th workshop on applied high performance numerical algorithms for PDEs; 5th minisymposium on HPC applications in physical sciences; 8th minisymposium on high performance computing interval methods; 7th workshop on complex collective systems.

Mining of Massive Datasets

Mining of Massive Datasets
Author: Jure Leskovec
Publisher: Cambridge University Press
Total Pages: 480
Release: 2014-11-13
Genre: Computers
ISBN: 1107077230

Now in its second edition, this book focuses on practical algorithms for mining data from even the largest datasets.

Data Mining with Decision Trees

Data Mining with Decision Trees
Author: Lior Rokach
Publisher: World Scientific
Total Pages: 263
Release: 2008
Genre: Computers
ISBN: 9812771727

This is the first comprehensive book dedicated entirely to the field of decision trees in data mining and covers all aspects of this important technique. Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining, the science and technology of exploring large and complex bodies of data in order to discover useful patterns. The area is of great importance because it enables modeling and knowledge extraction from the abundance of data available. Both theoreticians and practitioners are continually seeking techniques to make the process more efficient, cost-effective and accurate. Decision trees, originally implemented in decision theory and statistics, are highly effective tools in other areas such as data mining, text mining, information extraction, machine learning, and pattern recognition. This book invites readers to explore the many benefits in data mining that decision trees offer:: Self-explanatory and easy to follow when compacted; Able to handle a variety of input data: nominal, numeric and textual; Able to process datasets that may have errors or missing values; High predictive performance for a relatively small computational effort; Available in many data mining packages over a variety of platforms; Useful for various tasks, such as classification, regression, clustering and feature selection . Sample Chapter(s). Chapter 1: Introduction to Decision Trees (245 KB). Chapter 6: Advanced Decision Trees (409 KB). Chapter 10: Fuzzy Decision Trees (220 KB). Contents: Introduction to Decision Trees; Growing Decision Trees; Evaluation of Classification Trees; Splitting Criteria; Pruning Trees; Advanced Decision Trees; Decision Forests; Incremental Learning of Decision Trees; Feature Selection; Fuzzy Decision Trees; Hybridization of Decision Trees with Other Techniques; Sequence Classification Using Decision Trees. Readership: Researchers, graduate and undergraduate students in information systems, engineering, computer science, statistics and management.

Computational Science – ICCS 2020

Computational Science – ICCS 2020
Author: Valeria V. Krzhizhanovskaya
Publisher: Springer Nature
Total Pages: 648
Release: 2020-06-19
Genre: Computers
ISBN: 3030504204

The seven-volume set LNCS 12137, 12138, 12139, 12140, 12141, 12142, and 12143 constitutes the proceedings of the 20th International Conference on Computational Science, ICCS 2020, held in Amsterdam, The Netherlands, in June 2020.* The total of 101 papers and 248 workshop papers presented in this book set were carefully reviewed and selected from 719 submissions (230 submissions to the main track and 489 submissions to the workshops). The papers were organized in topical sections named: Part I: ICCS Main Track Part II: ICCS Main Track Part III: Advances in High-Performance Computational Earth Sciences: Applications and Frameworks; Agent-Based Simulations, Adaptive Algorithms and Solvers; Applications of Computational Methods in Artificial Intelligence and Machine Learning; Biomedical and Bioinformatics Challenges for Computer Science Part IV: Classifier Learning from Difficult Data; Complex Social Systems through the Lens of Computational Science; Computational Health; Computational Methods for Emerging Problems in (Dis-)Information Analysis Part V: Computational Optimization, Modelling and Simulation; Computational Science in IoT and Smart Systems; Computer Graphics, Image Processing and Artificial Intelligence Part VI: Data Driven Computational Sciences; Machine Learning and Data Assimilation for Dynamical Systems; Meshfree Methods in Computational Sciences; Multiscale Modelling and Simulation; Quantum Computing Workshop Part VII: Simulations of Flow and Transport: Modeling, Algorithms and Computation; Smart Systems: Bringing Together Computer Vision, Sensor Networks and Machine Learning; Software Engineering for Computational Science; Solving Problems with Uncertainties; Teaching Computational Science; UNcErtainty QUantIficatiOn for ComputationAl modeLs *The conference was canceled due to the COVID-19 pandemic.

Artificial Intelligence and Soft Computing

Artificial Intelligence and Soft Computing
Author: Leszek Rutkowski
Publisher: Springer
Total Pages: 796
Release: 2018-05-24
Genre: Computers
ISBN: 3319912534

The two-volume set LNAI 10841 and LNAI 10842 constitutes the refereed proceedings of the 17th International Conference on Artificial Intelligence and Soft Computing, ICAISC 2018, held in Zakopane, Poland in June 2018. The 140 revised full papers presented were carefully reviewed and selected from 242 submissions. The papers included in the first volume are organized in the following three parts: neural networks and their applications; evolutionary algorithms and their applications; and pattern classification.

Parallel Problem Solving from Nature – PPSN XVI

Parallel Problem Solving from Nature – PPSN XVI
Author: Thomas Bäck
Publisher: Springer Nature
Total Pages: 717
Release: 2020-09-02
Genre: Computers
ISBN: 3030581152

This two-volume set LNCS 12269 and LNCS 12270 constitutes the refereed proceedings of the 16th International Conference on Parallel Problem Solving from Nature, PPSN 2020, held in Leiden, The Netherlands, in September 2020. The 99 revised full papers were carefully reviewed and selected from 268 submissions. The topics cover classical subjects such as automated algorithm selection and configuration; Bayesian- and surrogate-assisted optimization; benchmarking and performance measures; combinatorial optimization; connection between nature-inspired optimization and artificial intelligence; genetic and evolutionary algorithms; genetic programming; landscape analysis; multiobjective optimization; real-world applications; reinforcement learning; and theoretical aspects of nature-inspired optimization.

Automatic Design of Decision-Tree Induction Algorithms

Automatic Design of Decision-Tree Induction Algorithms
Author: Rodrigo C. Barros
Publisher: Springer
Total Pages: 184
Release: 2015-02-04
Genre: Computers
ISBN: 3319142313

Presents a detailed study of the major design components that constitute a top-down decision-tree induction algorithm, including aspects such as split criteria, stopping criteria, pruning and the approaches for dealing with missing values. Whereas the strategy still employed nowadays is to use a 'generic' decision-tree induction algorithm regardless of the data, the authors argue on the benefits that a bias-fitting strategy could bring to decision-tree induction, in which the ultimate goal is the automatic generation of a decision-tree induction algorithm tailored to the application domain of interest. For such, they discuss how one can effectively discover the most suitable set of components of decision-tree induction algorithms to deal with a wide variety of applications through the paradigm of evolutionary computation, following the emergence of a novel field called hyper-heuristics. "Automatic Design of Decision-Tree Induction Algorithms" would be highly useful for machine learning and evolutionary computation students and researchers alike.

Parallel Problem Solving from Nature – PPSN XV

Parallel Problem Solving from Nature – PPSN XV
Author: Anne Auger
Publisher: Springer
Total Pages: 515
Release: 2018-08-30
Genre: Computers
ISBN: 3319992597

This two-volume set LNCS 11101 and 11102 constitutes the refereed proceedings of the 15th International Conference on Parallel Problem Solving from Nature, PPSN 2018, held in Coimbra, Portugal, in September 2018. The 79 revised full papers were carefully reviewed and selected from 205 submissions. The papers cover a wide range of topics in natural computing including evolutionary computation, artificial neural networks, artificial life, swarm intelligence, artificial immune systems, self-organizing systems, emergent behavior, molecular computing, evolutionary robotics, evolvable hardware, parallel implementations and applications to real-world problems. The papers are organized in the following topical sections: numerical optimization; combinatorial optimization; genetic programming; multi-objective optimization; parallel and distributed frameworks; runtime analysis and approximation results; fitness landscape modeling and analysis; algorithm configuration, selection, and benchmarking; machine learning and evolutionary algorithms; and applications. Also included are the descriptions of 23 tutorials and 6 workshops which took place in the framework of PPSN XV.