Machine Learning with scikit-learn
In the chain of processes that make up data analysis, the construction phase of predictive models and their validation are done by a powerful library called scikit-learn . In this chapter, you see some examples that illustrate the basic construction of pr
- PDF / 888,701 Bytes
- 35 Pages / 504 x 720 pts Page_size
- 51 Downloads / 352 Views
Machine Learning with scikit-learn In the chain of processes that make up data analysis, the construction phase of predictive models and their validation are done by a powerful library called scikit-learn. In this chapter, you see some examples that illustrate the basic construction of predictive models with some different methods.
The scikit-learn Library scikit-learn is a Python module that integrates many of machine learning algorithms. This library was developed initially by Cournapeu in 2007, but the first real release was in 2010. This library is part of the SciPy (Scientific Python) group, a set of libraries created for scientific computing and especially for data analysis, many of which are discussed in this book. Generally these libraries are defined as SciKits, hence the first part of the name of this library. The second part of the library’s name is derived from machine learning, the discipline pertaining to this library.
Machine Learning Machine learning is a discipline that deals with the study of methods for pattern recognition in datasets undergoing data analysis. In particular, it deals with the development of algorithms that learn from data and make predictions. Each methodology is based on building a specific model.
© Fabio Nelli 2018 F. Nelli, Python Data Analytics, https://doi.org/10.1007/978-1-4842-3913-1_8
313
Chapter 8
Machine Learning with scikit-learn
There are very many methods that belong to the learning machine, each with its unique characteristics, which are specific to the nature of the data and the predictive model that you want to build. The choice of which method is to be applied is called a learning problem. The data to be subjected to a pattern in the learning phase can be arrays composed by a single value per element, or by a multivariate value. These values are often referred to as features or attributes.
Supervised and Unsupervised Learning Depending on the type of the data and the model to be built, you can separate the learning problems into two broad categories:
Supervised Learning They are the methods in which the training set contains additional attributes that you want to predict (the target). Thanks to these values, you can instruct the model to provide similar values when you have to submit new values (the test set). •
Classification—The data in the training set belong to two or more classes or categories; then, the data, already being labeled, allow you to teach the system to recognize the characteristics that distinguish each class. When you will need to consider a new value unknown to the system, the system will evaluate its class according to its characteristics.
•
Regression—When the value to be predicted is a continuous variable. The simplest case to understand is when you want to find the line that describes the trend from a series of points represented in a scatterplot.
Unsupervised Learning These are the methods in which the training set consists of a series of input values x without any corresponding target value. •
314
Clustering—The g
Data Loading...