F# for Machine Learning Essentials
Format: PDF / Kindle (mobi) / ePub
- Design algorithms in F# to tackle complex computing problems
- Be a proficient F# data scientist using this simple-to-follow guide
- Solve real-world, data-related problems with robust statistical models, built for a range of datasets
The F# functional programming language enables developers to write simple code to solve complex problems. With F#, developers create consistent and predictable programs that are easier to test and reuse, simpler to parallelize, and are less prone to bugs.
If you want to learn how to use F# to build machine learning systems, then this is the book you want.
Starting with an introduction to the several categories on machine learning, you will quickly learn to implement time-tested, supervised learning algorithms. You will gradually move on to solving problems on predicting housing pricing using Regression Analysis. You will then learn to use Accord.NET to implement SVM techniques and clustering. You will also learn to build a recommender system for your e-commerce site from scratch. Finally, you will dive into advanced topics such as implementing neural network algorithms while performing sentiment analysis on your data.
What you will learn
- Use F# to find patterns through raw data
- Build a set of classification systems using Accord.NET, Weka, and F#
- Run machine learning jobs on the Cloud with MBrace
- Perform mathematical operations on matrices and vectors using Math.NET
- Use a recommender system for your own problem domain
- Identify tourist spots across the globe using inputs from the user with decision tree algorithms
About the Author
Sudipta Mukherjee was born in Kolkata and migrated to Bangalore. He is an electronics engineer by education and a computer engineer/scientist by profession and passion. He graduated in 2004 with a degree in electronics and communication engineering.
He has a keen interest in data structure, algorithms, text processing, natural language processing tools development, programming languages, and machine learning at large. His first book on Data Structure using C has been received quite well. Parts of the book can be read on Google Books at http://goo.gl/pttSh. The book was also translated into simplified Chinese, available from Amazon.cn at http://goo.gl/lc536. This is Sudipta's second book with Packt Publishing. His first book, .NET 4.0 Generics (http://goo.gl/MN18ce), was also received very well. During the last few years, he has been hooked to the functional programming style. His book on functional programming, Thinking in LINQ (http://goo.gl/hm0lNF), was released last year. Last year, he also gave a talk at @FuConf based on his LINQ book (https://goo.gl/umdxIX). He lives in Bangalore with his wife and son.
Sudipta can be reached via e-mail at firstname.lastname@example.org and via Twitter at @samthecoder.
Table of Contents
- Introduction to Machine Learning
- Linear Regression
- Classification Techniques
- Information Retrieval
- Collaborative Filtering
- Sentiment Analysis
- Anomaly Detection
select to fit the model. In vectorized form, this can be written as: Y = θ ′X Theta can be calculated by the following formula: θ = ( X ′X ) X ′Y −1 So using the MathNet.Fsharp package, this can be calculated as follows: [ 43 ] www.it-ebooks.info Linear Regression Previously, in Chapter 1, Introduction to Machine Learning, I mentioned a car's miles per gallon (mpg) dataset. The question I want to solve with multiple linear regression is: what is the relationship between miles per gallon
algorithms implemented earlier in the chapter: https://gist.github.com/sudipto80/606418978f4a86fe93aa Once you generate this array, you can then plug this into the algorithms described earlier. Summary In this chapter, the most commonly used memory-based approaches for recommendations were discussed. There are several other approaches to recommender system building, which have not been discussed here, such as model-based and hybrid recommendations systems that take cues from several other
for the review of a bank called Bank2: The following call calculates semantic orientation for all the reviews in the list of reviews; in this case, there are two reviews for two banks. soPMI reviews The above call produces the following result in F# interactive. val it : (string list list * float) list = [([["positive"; "outlook"]; ["good"; "service"]; ["nice"; "people"]; ["bad"; "location"]], 5.545177444); ([["nasty"; "behaviour"]; ["unfortunate"; "outcome"]; ["poor"; "quality"]],
If you want to understand what all the other fields mean, take a look at https://archive.ics.uci.edu/ml/machine-learning-databases/breastcancer-wisconsin/wdbc.names. Now the question is, given a new entry with all the other records except the tag M or B, can we predict that? In ML terminology, this value "M" or "B" is sometimes referred to as "class tag" or just "class". The task of a classification algorithm is to determine this class for a new data point. K-NN does this in the following way: it
measures the distance from the given data to all the training data and then takes into consideration the classes for only the k-nearest neighbors to determine the class of the new entry. So for the current case, if more than 50% of the k-nearest neighbors is of class "B", then k-NN will conclude that the new entry is of type "B". Distance metrics The distance metric used is generally Euclidean, that you learnt in high school. For example, given two points in 3D. d ( p, q ) = ( p1 − q1 ) + (