**NOTE** The following is a draft of minireviews of some machine learning books in the style of chicago math bibliography. They are designed to be short and pithy.

*Nicolo Cesa-Bianchi, Gabor Lugosi*

This book is a bit between machine learning and game theory, but you should still buy it. Most of the interesting problems machine learning researchers and practitioners solve exist at this border. Computational advertising and portfolio optimization all exist in these adverserial settings where the thing you are trying to learn doesn’t want to be learned. With good coverage of bandit algorithms and sequential learning, it is a good book to complement the more Bayesian bunch here.

It is pitched at the researcher level so be prepared for some math, but the writing is great. You will use the material in this book more often than many of the Bayesian tricks the other books cover.

*Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar*

This or the Cesa-Bianchi book make a good complement to an actual book on Machine Learning. The title is a bit of a misonomer. Still the Computational Learning Theory and material on sequential prediction is top notch. This is definitely more a monograph for graduate students than a textbook, so be prepared for lots of theorems and proofs.

Normally that would be a bad thing, but these proofs are important for understanding why online learning works and most big data problems use online learning.

*Ethem Alpaydin*

This book is probably the most accessible on the list. It was the first book I read on Machine Learning, so I should have fond memories of it. I consider it a reasonable book for someone that wants to use the algorithms. It’s basically a survey which means you won’t find any new methods here. I would hold off buying this one until you are certain you need something this basic.

*David Barber*

This is a remarkably good book on graphical models. Filled with lots of examples and really takes its time trying to build intuition. Seriously, consider this book if the idea of going through Koller’s 1400 page tome feels intimidating.

The book is more gentle than the others listed here but still manages a remarkable amount.

Note it says Bayesian in the title so non-Bayesian algorithms and models are mostly ignored. You will need to supplement with a book like EoSL or Murphy’s book. There is a free copy on his website so be sure to preview and see if its at the right level for you.

*Kevin Murphy*

Short answer: Buy this book. It covers essentially everything you will want to learn about machine learning and genuinely seems to want you to learn the content. It isn’t as equation heavy as Bishop’s book and the exercises are really good. The book is also loaded with many insights into how many of the popular methods fail.

If I had to say anything bad about this book is the organization could be better. The last section in many chapters seems arbitrary and out of place. Yes I love proximal point methods, but why is it in the sparse linear models chapter? Why is Learning To Rank in the linear models chapter? Also some of the content could be better motivated.

*Trevor Hastie, Robert Tibshirani, Jerome Friedman*

The sole machine learning book from statistics on the list. I really like this book mostly because they stay practical all the way through. Lots of examples and lots of charts showing how the models compare. This book also features lots of algorithms that only statisticians use.

The one complaint other people have is the book doesn’t seem designed to be read front to back. Frequent references exist to material that has both been covered and will be covered. As someone that likes to skim, I didn’t mind but I can see how it could be grating. This one is also available for free online so read the first few chapters to see how you like it.

*Christopher Bishop*

Up until Murphy’s book came out this was the book people recommended for machine learning. I have always been hesitant to recommend this book to anybody. The coverage is good, but the difficulty is all over the map. Swinging wildly from explaining what is a Gaussian distribution to pages of matrix algebra, where many of the steps are skipped. The book is filled with cryptic remarks that will only make sense after you run into the issue he covers.

The book is oddly insistent on taking the Bayesian stance on nearly every algorithm. This reaches high comedy when a section is dedicated to relevance vector machines, the bayesian SVM. The later chapters (10-14) are actually really good, but overall this book is just too hard to read.

*Daphne Koller, Nir Friedman*

I am not sure how to feel about this book. It’s gigantic and says essentially everything that can be said about graphical models. The problem is it takes too many pages to say everything. The book does assume you are an absolute begineer and goes from there, but graphical models I think make the most sense as a synthesizing subject. Something you enjoy once you’ve cut your teeth on hidden markov models, kalman filters, and maybe some naive bayes.

With this book you won’t learn how to learn a graphical model until two-thirds into the book. The exercises are good, but the structure sometimes makes the subject feel more theoretical than it is actually. I hesitate to recommend this as anything beyond a reference.

*David MacKay*

Yes ostentibly it is a book about Information Theory, but that pretense stops about halfway through the book. At this point, it is essentially a book on probability and inference on graphical models. Near the end, as you play with Boltzmann Machines and Gaussian Processes you essentially have a full-on Machine Learning book.

The content is great, the exercises are nice and its available for free to download. My big complaint is it sometimes hard to jump around the book since the dependency structure between the chapters is not straightforward (yes read the preface carefully). For its age, it is still remarkably current and readable.

*Richard O. Duda, Peter E. Hart and David G. Stork*

While the second edition was released in 2000, nothing in this book is newer than 1980. The material that is covered is very well-written. Unfortunately, without having read some papers it can be hard to tell what is worth remembering from this book. This book works best as something to visit if you don’t like how these other books explain an idea. I usually find this book or EoSL have the easiest to understand explainations.

*Stuart Russell and Peter Norvig*

It’s the AI book, but with the 3rd edition has a fairly solid portion of the book on Machine Learning. I like their coverage, but at times it feels rushed. The reason I mention it is they have a book chapter on Reinforcement Learning which seems never gets enough coverage in any of these books.

This list doesn’t include all the books out there, so feel free to comment on which other books you would want minireviews for.