In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into “regression” and “classification” problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

Example 1:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

We could turn this example into a classification problem by instead making our output about whether the house “sells for more or less than the asking price.” Here we are classifying the houses based on price into two discrete categories.

Example 2:

(a) Regression – Given a picture of a person, we have to predict their age on the basis of the given picture

(b) Classification – Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.

Unsupervised Learning

e.g. of Clustering

Of the following examples, which would you address using an unsupervised learning algorithm? (Check all that apply.)

Given email labeled as spam/not spam, learn a spam filter.

Un-selected is correct

is not selected.This is correct.

Given a set of news articles found on the web, group them into sets of articles about the same stories.

Correct

is selected.This is correct.

Given a database of customer data, automatically discover market segments and group customers into different market segments.

Correct

is selected.This is correct.

Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not.

Un-selected is correct

So you might look at

an Unsupervised Learning algorithm like

this and ask how

complicated this is to implement this, right?

It seems like in order to,

you know, build this application, it seems

like to do this audio processing you

need to write a ton of code

or maybe link into like a

bunch of synthesizer Java libraries that

process audio, seems like

a really complicated program, to do

this audio, separating out audio and so on.

It turns out the algorithm, to

do what you just heard, that

can be done with one line

of code – shown right here.

It take researchers a long

time to come up with this line of code.

I’m not saying this is an easy problem,

But it turns out that when you

use the right programming environment, many learning

algorithms can be really short programs.

So this is also why in

this class we’re going to

use the Octave programming environment.

Octave, is free open source

software, and using a

tool like Octave or Matlab,

many learning algorithms become just

a few lines of code to implement.

Later in this class, I’ll just teach

you a little bit about how to

use Octave and you’ll be

implementing some of these algorithms in Octave.

Or if you have Matlab you can use that too.

It turns out the Silicon Valley, for

a lot of machine learning algorithms,

what we do is first prototype

our software in Octave because software

in Octave makes it incredibly fast

to implement these learning algorithms.

Here each of these functions

like for example the SVD

function that stands for singular

value decomposition; but that turns

out to be a

linear algebra routine, that is just built into Octave.

If you were trying to do this

in C++ or Java,

this would be many many lines of

code linking complex C++ or Java libraries.

So, you can implement this stuff as

C++ or Java

or Python, it’s just much

more complicated to do so in those languages.

What I’ve seen after having taught

machine learning for almost a

decade now, is that, you

learn much faster if you

use Octave as your

programming environment, and if

you use Octave as your

learning tool and as your

prototyping tool, it’ll let

you learn and prototype learning algorithms much more quickly.