bookmark_border

What is Machine Learning?

Mohammad Shaddad,

I can't count the number of times I have gotten into my car after midnight after a long, exhausting day in the office and said: "I'm too tired to drive, I wish I can just tell my car - Take me home!".

Well, thanks to the amazing work being done by Tesla, Uber, Google, and others in the auto industry, that day is getting closer. And we all have to thank the recent advances in machine learning for that.

So, what is machine learning and how does it make that possible?

What is Machine Learning?

Machine learning; aka ML, is the process of feeding data to a smart piece of software and allowing it to analyze the data, detect patterns, and come up with its own findings about the data using predefined generic algorithms.

For example, when an driverless car is working, it collects data from its surroundings like: distance from nearby objects, type of nearby objects (cars, humans, etc.), terrain (smooth, rigid), and uses this data to set speed, or come to a stop, etc. This is what machine learning is about.

Of course, the data and the problem that we are trying to solve must be related. You can't provide data that contains terrain, speed limit and car weight to an algorithm and expect it to figure out if today is a good day for a BBQ or not. Each machine we build, can solve only one problem, and for that, we need to provide it with the right data.

But how can machine learning make the right decisions?

The two types of Machine Learning

Like we said, machine learning uses data to reach conclusions and make decisions. It can be raw data, which we don't know how its structured or the relationships between its dimensions. It can also be structured data, that we understand and know how it affects the problem we are trying to solve.

These two scenarios present us with two types of machine learning: Unsupervised Learning, and Supervised Learning. Can you guess which scenario is which?

Supervised Learning

Supervised learning is when you have structured data, and have an understanding of how its dimensions affect the problem that you are trying to solve. For example, we know that when a terrain is smooth, and the car is driving in a straight line, we can drive fast. While if the terrain is rigid and bumpy, even when driving in a straight line, we have to drive at a relatively slow speed. Teaching the machine to respond to its environment and decide whether to drive slow or fast on a terrain based on these past experiences, is known as Supervised Learning.

Unsupervised Learning

Unsupervised learning is when you have data, but you have no idea on how that data correlates and how it can be used to solve your problem, and you have to guess a decision! For example, when you are driving down a slope for the first time, you guess that if you drive fast, then gravity might work against you and get you in trouble, so you decide to drive slow. Making these decisions is known as Unsupervised Learning.

In unsupervised learning, the algorithm would make best use of the data available, and try and classify it against other not-so-obvious characteristics of the data (slope and gravity) to reach a decision.

In this post, and future series, we'll be using a variety of supervised learning algorithms and techniques to solve problems. We'll leave unsupervised learning for another round.

P.S. If you have an experience in Unsupervised Learning, please feel free to contribute some material 😉

Words that you will hear a lot

These are words that you will hear a lot in any machine learning conversation: accuracy, attribute, classifier, dimension, feature, fit, labels, model, over fitting, prediction, regression, test data, train.

For now, I'll let you guess what these words mean, but they will all start to make sense as we go.

What do I need to get started with Machine Learning?

There are many tools and languages for creating machine learning enabled software, with the most popular languages being Python and R. Choosing between them is a matter of a combination of personal preference and project decisions. There are different schools of thought around this, and you can check out what Udacity, EliteDataScience had to say on this, and you can always reach out to me if you want help choosing one. For me, I started with Python, but plan on learning R as well in the future. For now, in this article and the coming series, I'll be using Python.

The most common and famous Python algorithm library to use is scikit-learn, which is an open source library, that provides simple and efficient tools for data mining and data analysis.

You can also use a number of Python IDE editors for development on Linux/Mac/Windows. I use Visual Studio on Windows.

Our first Machine Learning program

For our first program, let's take a simple problem to solve. This will allow us to get a quick handle on how to use Python and scikit-learn for writing machine learning software. And we'll work with more complex problems as we go.

If you know me well, you would know that I love BBQs. And since it's currently winter and I miss the BBQ season, then let's write a program that would tell us if today is a good day for a BBQ or not.

I usually like to BBQ when the weather is nice and warm, with a little breeze. There are other factors that determine if a day is BBQ-kinda-day, but for simplicity, let's use only two: temperature and wind speed.

Since we are using supervised learning, we have the following simple data based on past experience:

TEMPERATUREWIND_SPEEDBBQ_DAY
1230
2531
23270
13270
3191
23101
24171
26191
22210
1871
1530
1490
21131
15150
18200



If we were to "humanly" figure out if a day is good for BBQ or not, we would look at the data above and make an informed prediction on that, to best make that, let's visualize the data:

Training Data

Figure: BBQ Day Classification

You can see the days that are good for a BBQ in green, while days that are not ideal in red. Now, if I ask you if a day with a 22 degrees temperature and wind speed of 5, what will your answer be? My guess is that it will be a good day for a BBQ.

But, we are talking about machine learning, how can we get a machine to predict it for us?

Step 1 - Install the scikit-learn library

You can install the scikit-learn library using pip or conda using the following commands:

for pip use

pip install -U scikit-learn

for conda use

conda install scikit-learn

Step 2 - Import the algorithm

As mentioned, machine learning uses a lot of different algorithms that can solve problems. Choosing the right algorithm depends on the problem at hand. For this example, we will use the simplest algorithm which is the Naive Bayes algorithm

  1. from sklearn.naive_bayes import GaussianNB
  2. gnb = GaussianNB()

The code above imports the GaussianNB class implemeting the algorithm, and initializes an instance of it. We will use the instance to make predictions.

Step 3 - Train the machine

Since we are using supervised learning to make predictions, we have to train the machine with our past experiences to allow it to make a well informed guess as we did manually above. To do that, we need to pass our training data to the algorithm using the fit method:

  1. import numpy as np
  2. training_data = np.array([[12, 3], [25, 3], [23, 27], [13, 27], [31, 9], [23, 10], [24, 17], [26, 19], [22, 20], [18, 7], [14, 3], [14, 9], [21, 13], [15, 15], [18, 20]])
  3. training_result = np.array([0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0])
  4. from sklearn.naive_bayes import GaussianNB
  5. gnb = GaussianNB()
  6. gnb.fit(training_data, training_result)



The fit method accepts two parameters

  • X : array-like, shape (n_samples, n_features). The training vectors, where n_samples is the number of samples and n_features is the number of features.
  • y : array-like, shape (n_samples,). The target values.

Step 4 - Make a prediction

Now this is the fun part, can the machine make the same prediction as we did? To find out, we need to call the predict method passing it the data for today's weather:

  1. is_bbq_day = gnb.predict([[22, 5]])

Let's print the value of is_bbq_day to see if it matches the value we guessed earlier:

  1. print("Is today a bbq day: {}".format(is_bbq_day))

This gives us an output of:

Is today a bbq day: [1]

Great! This matches the guess we made earlier.

Step 5 - Measure our prediction accuracy

One of the key aspects of building a machine learning program is achieving a high level of accuracy. This will allow us to score the algorithm we used against the data we have from our prior experiences by letting it make a prediction from the data we have and comparing it against the result. This can be done by calling the score method

  1. accuracy = gnb.score(training_data, training_result)
  2. print("We are {} accurate".format(accuracy))


Which resulted in an 86.67% accuracy of prediction as in the output below

We are 0.8666666666666667 accurate

Which is not bad, given the limited data set that we have.

How can we increase our prediction accuracy? This is where machine learning becomes an art. It's an art where you need to carefully select the relevant data points, the training data set size, the right algorithm to tackle your problem, and you have to continuously re-learn as your machine makes predictions by having someone provide input on the accuracy of the data.

Continuous Supervised Learning

Figure - Continuous Supervised Learning

If you got here, then I hope you liked the post and found some value in it. In future posts, I'll be discussing some of the common algorithms that are used in machine learning, as well as tackling more complex problems and solving them with supervised learning. In the mean time, you can play around with scikit-learn, get your hands dirty, and come up with a different problem/solution to tackle.

Did I miss anything or get something wrong? Or do you need further elaboration? Please leave your feedback in the discussion section below.

Oh, and checkout this cool podcast on scikit-learn!

machine-learningpythonscikit-learn

What level is this content for?

We've all had our humble beginnings, it's our hard learned experiences that got us to master our code. We will all learn something sharing our knowledge, leave a legacy and write a post now.
create Write post

About the author


Mohammad Shaddad
Mohammad Shaddad

Mohammad is a technology consultant and entrepreneur with big passion for technology and living systems; aka software communities. He enjoys mentoring other programmers and entrepreneurs, as well as creating new things. Mohammad is the founder of @barmijly and @nafaqati. He currently spends his time between Amman.JO and Dubai.AE.