Sudo pip install scikit-learn One final step is to install nose and run some tests to make sure everything is working 100%. Sudo pip install nose nosetests pandas nosetests sklearn. The Anaconda distribution may contain a whole lot of packages you never use. A better choice is to install their other distribution, Miniconda, a small installation.
This repository holds the code for the forthcoming book 'Introduction to MachineLearning with Python' by Andreas Mueller and Sarah Guido.You can find details about the book on the O'Reilly website.
The books requires the current stable version of scikit-learn, that is0.20.0. Most of the book can also be used with previous versions ofscikit-learn, though you need to adjust the import for everything from the
model_selection
module, mostly cross_val_score
, train_test_split
and GridSearchCV
.This repository provides the notebooks from which the book is created, togetherwith the
mglearn
library of helper functions to create figures anddatasets.For the curious ones, the cover depicts a hellbender.
All datasets are included in the repository, with the exception of the aclImdb dataset, which you can download fromthe page of Andrew Maas. See the book for details.
If you get
ImportError: No module named mglearn
you can try to install mglearn into your python environment usingthe command pip install mglearn
in your terminal or !pip install mglearn
in Jupyter Notebook.Errata
Please note that the first print of the book is missing the following line when listing the assumed imports:
Please add this line if you see an error involving
display
.The first print of the book used a function called
plot_group_kfold
.This has been renamed to plot_label_kfold
because of a rename inscikit-learn.![Pandas numpy matplotlib Pandas numpy matplotlib](/uploads/1/2/5/8/125828277/481831150.jpg)
Setup
To run the code, you need the packages
numpy
, scipy
, scikit-learn
, matplotlib
, pandas
and pillow
.Some of the visualizations of decision trees and neural networks structures also require graphviz
. The chapteron text processing also requirs nltk
and spacy
.The easiest way to set up an environment is by installing Anaconda.
Installing packages with conda:
![Numpy Numpy](/uploads/1/2/5/8/125828277/223329164.png)
If you already have a Python environment set up, and you are using the
conda
package manager, you can get all packages by runningFor the chapter on text processing you also need to install
nltk
and spacy
:Installing packages with pip
If you already have a Python environment and are using pip to install packages, you need to run
You also need to install the graphiz C-library, which is easiest using a package manager.If you are using OS X and homebrew, you can
brew install graphviz
. If you are on Ubuntu or debian, you can apt-get install graphviz
.Installing graphviz on Windows can be tricky and using conda / anaconda is recommended.For the chapter on text processing you also need to install nltk
and spacy
:Downloading English language model
For the text processing chapter, you need to download the English language model for spacy using
Submitting Errata
If you have errata for the (e-)book, please submit them via the O'Reilly Website.You can submit fixed to the code as pull-requests here, but I'd appreciate it if you would also submit them there, as this repository doesn't hold the'master notebooks'.