Machine Learning with scikit-learn

In the chain of processes that make up data analysis, the construction phase of predictive models and their validation are done by a powerful library called scikit-learn . In this chapter, you see some examples that illustrate the basic construction of pr

PDF / 888,701 Bytes
35 Pages / 504 x 720 pts Page_size
51 Downloads / 442 Views

DOWNLOAD

REPORT

Machine Learning with scikit-learn In the chain of processes that make up data analysis, the construction phase of predictive models and their validation are done by a powerful library called scikit-learn. In this chapter, you see some examples that illustrate the basic construction of predictive models with some different methods.

The scikit-learn Library scikit-learn is a Python module that integrates many of machine learning algorithms. This library was developed initially by Cournapeu in 2007, but the first real release was in 2010. This library is part of the SciPy (Scientific Python) group, a set of libraries created for scientific computing and especially for data analysis, many of which are discussed in this book. Generally these libraries are defined as SciKits, hence the first part of the name of this library. The second part of the library’s name is derived from machine learning, the discipline pertaining to this library.

Machine Learning Machine learning is a discipline that deals with the study of methods for pattern recognition in datasets undergoing data analysis. In particular, it deals with the development of algorithms that learn from data and make predictions. Each methodology is based on building a specific model.

© Fabio Nelli 2018 F. Nelli, Python Data Analytics, https://doi.org/10.1007/978-1-4842-3913-1_8

313

Chapter 8

Machine Learning with scikit-learn

There are very many methods that belong to the learning machine, each with its unique characteristics, which are specific to the nature of the data and the predictive model that you want to build. The choice of which method is to be applied is called a learning problem. The data to be subjected to a pattern in the learning phase can be arrays composed by a single value per element, or by a multivariate value. These values are often referred to as features or attributes.

Supervised and Unsupervised Learning Depending on the type of the data and the model to be built, you can separate the learning problems into two broad categories:

Supervised Learning They are the methods in which the training set contains additional attributes that you want to predict (the target). Thanks to these values, you can instruct the model to provide similar values when you have to submit new values (the test set). •

Classification—The data in the training set belong to two or more classes or categories; then, the data, already being labeled, allow you to teach the system to recognize the characteristics that distinguish each class. When you will need to consider a new value unknown to the system, the system will evaluate its class according to its characteristics.

•

Regression—When the value to be predicted is a continuous variable. The simplest case to understand is when you want to find the line that describes the trend from a series of points represented in a scatterplot.

Unsupervised Learning These are the methods in which the training set consists of a series of input values x without any corresponding target value. •

314

Clustering—The g

Data Loading...

Machine Learning with scikit-learn

Recommend Documents

Machine Learning With Python

Machine Learning with Health Care Perspective Machine Learning and H

Medicare with Machine Learning and Deep Learning

Machine Learning with Core ML

Machine Learning

Predictive Analytics with Microsoft Azure Machine Learning

Machine Learning Programming with Tensorflow 2.0

Machine Learning

Machine-Learning

Machine Learning

Machine Learning

Machine Learning