Skip to content
Home » Duke University » AI Product Management Specialization » Machine Learning Foundations for Product Managers » Module 1: What is Machine Learning

Module 1: What is Machine Learning

In this module we will be introduced to what machine learning is and does. We will build the necessary vocabulary for working with data and models and develop an understanding of the different types of machine learning. We will conclude with a critical discussion of what machine learning can do well and cannot (or should not) do.

Learning Objectives

  • Describe what machine learning is and does
  • Explain why we should care about machine learning
  • Identify the common types of machine learning tasks
  • Define common ML terms to be able to understand articles and conversations about ML

Course Overview


Video: Specialization Overview

About the Specialization

  • Goal: Increase the success rate of AI-based product development within organizations.
  • Focus: Combines an intuitive understanding of machine learning with best practices for managing AI projects.
  • Structure: Three courses:
    1. Machine Learning Foundations: Theory, algorithms, and model building process.
    2. Managing Machine Learning Projects: Data science process, team structure, system design, and deployment.
    3. Human Factors in AI: User experience (UX), ethics, privacy, and legal considerations.

Who It’s For

This specialization targets professionals across industries who work with AI products, including:

  • Product managers
  • Customer support and sales
  • Engineering and team leaders
  • Executives

Learning Objectives

  • Communicate effectively using data science and machine learning terminology.
  • Lead successful AI projects by applying best practices.
  • Design AI products that prioritize user experience, ethics, and privacy.

Welcome to the AI Product
management specialization. My name is John. I’m the director of the Master of
Engineering Program, an AI for product innovation at Duke University and
I’ll be your instructor. The specialization contains three courses, each focusing on a different aspect
of working with AI products. The first course, machine learning
foundations for product managers provides an intuitive introduction to
the theory behind machine learning. We’ll focus on building your intuition for
what machine learning is and how it works. Will cover several of
the key algorithms used, including both the classical algorithms,
as well as deep learning. And we’ll discuss the process for
building training, evaluating and interpreting machine learning models. The second course in the specialization,
managing machine learning projects focuses on how to organize
machine learning development. We’ll discuss how to implement the data
science process to organize machine learning projects,
talk about team structure, how to overcome common pitfalls that make so many
machine learning projects unsuccessful. And we’ll cover important topics such
as machine learning, system design and model deployment and maintenance. In the third course,
in the specialization, human factors in A I we’ll focus on how
we as humans interact with AI systems. We’ll discuss important aspects of
user experience design in building i systems and we’ll dive into
the important ethical and legal and privacy considerations in
using AI within products. Companies within every industry
today are either already using AI or planning to embed AI
within their products. This is creating huge demand for workers who are comfortable in building AI
based products and managing such products. This includes not only data scientists and
software engineers but workers across other functions
such as product management, customer service sales,
engineering, team management and even executives yet
building successful AI products is hard. The failure rate of machine
learning projects is very high, is much more complexity,
more uncertainty and a much higher degree of technical risk
relative to normal software projects. This specialization provides two key
elements to help improve the success rate of machine learning projects. The first is an intuitive introduction
to machine learning, how it works, how to apply it to solve problems. The most popular machine learning
algorithms in the process of building and training models. The second is a set of best practices and
building machine learning products. We’ll discuss how the manage projects
using the data science process, how to design machine learning systems. How to deploy and
manage models in production and we’ll talk about the important human
considerations such as user experience, design, privacy and ethics. We have three main learning objectives for
the specialization. At the conclusion of the specialization, you should be able to communicate
in the language of data science and machine learning so that everybody on
the team is speaking in the same language. You should be capable of leading AI and
machine learning development projects applying the best practices we
discussed throughout the courses. And you should be capable of considering
and making decisions on how to integrate the key human factors in
designing AI product experiences.

Video: Instructor Introduction

About Jon Reifschneider

  • Instructor for the AI Product Management Specialization.
  • Director of the Masters of Engineering in AI Product Innovation at Duke University.
  • Previous Experience:
    • 15+ years in industry leadership, building analytics products.
    • Senior VP at a major weather prediction services company.
    • Built the company’s data science team.
    • Developed predictive analytics and ML products used by large companies across various industries (utilities, airlines, etc.).

Key Takeaway: Jon brings extensive real-world experience in building and managing AI products to the courses for the benefit of learners.

Hi. My name is Jon
Reifschneider, and I’m the instructor
for the courses in the AI product management
specialization. I’m also the Director
of the Masters of Engineering Program in
AI Product Innovation at Duke University, a graduate engineering
program that teaches engineers and
scientists to apply the tools of machine
learning and artificial intelligence to solve problems in their industries. Prior to joining
the Duke Faculty, I spent 15 years leading industry teams
building analytics products. Most recently, I was the
Senior Vice President leading an organization that is the world’s largest
commercial provider of weather prediction services. In this role, I oversaw
the creation of the company’s data science team and led the development of predictive analytics in
machine learning products in use today by many of the largest companies
in multiple industries, including many of the major
US electric utilities and many of the global airlines. Through other time together
I’ll share a number of real experiences from
my past experience in building machine
learning products.

Video: Course Overiew

Why Take This Course?

  • AI is everywhere! It’s transforming products and services across industries.
  • Building successful AI products requires a team effort, including non-technical roles.
  • This course provides a shared understanding of AI concepts and terminology for cross-functional collaboration.

Course Overview

  • Course 1: Machine Learning Foundations – Intuitive understanding of ML, how it works, model building, and evaluation.
  • Course 2: Managing ML Projects – Best practices, team structures, system design, deployment.
  • Course 3: Human Factors in AI – Ethical considerations, privacy, and user experience design.

Learning Objectives

  1. Explain ML concepts and types.
  2. Understand the challenges of applying ML and strategies to overcome them.
  3. Gain an intuitive grasp of common ML algorithms, their strengths, and use cases.
  4. Understand deep learning (neural networks) fundamentals.
  5. Learn best practices for evaluating and implementing ML models.

Course Structure (6 Modules)

  1. ML Fundamentals and the modeling process.
  2. Model evaluation and interpretation.
  3. Linear models (for regression and classification).
  4. Tree models and ensemble methods.
  5. Unsupervised learning (clustering).
  6. Deep learning (neural networks).

Welcome to Machine Learning Foundations for
Product Managers. We’re excited that you’re
joining us for this course. Let’s start by
talking a little bit about why you should
take this course. Machine learning
applications are all around us in
the world today. Companies across every industry are using machine learning and artificial intelligence to make their products and
services more predictive, more personalized,
and more automated. AI is also opening up a
world of possibilities to solve difficult problems that
previously were unsolved. But building machine learning
products and successfully bringing them to market
requires a team effort, involving people
throughout many functions in a business organization. Not only data scientists
and software engineers, but people from other functions including customer service, marketing and sales, product
management, executives, etc. Therefore it’s important that
everyone in an organization shares a fundamental
understanding of what machine learning is, what it does, and how
it can be applied. Also has a same vocabulary and a common understanding
of language when we talk about machine
learning based products. The focus of this course
is to provide people, regardless of your role in an organization or
your background, a fundamental understanding
of machine learning and how it works and how it can be applied within your industry. This course is the 1st course in the AI product management
specialization. In this course, we’ll
focus on building an intuitive understanding
of what machine learning is, how to build and train
and evaluate models. We’ll develop an understanding
of the basic theory behind many of the most popular
machine learning algorithms, including both the
classical techniques as well as what’s
called deep learning. In course 2 in this
specialization, we’ll focus on best practices in managing machine
learning projects, including the application of
the data science process, team organization, and designing AI systems and deploying and managing
models in production. Course 3 in this specialization, we’ll focus on human
factors in AI, including the
important privacy and ethical considerations
of working with AI, as well as designing user
experiences for AI products. There five main
learning objectives for this course in
this specialization. At the end of the course, you should be able to explain how machine learning
works and understand the different types of
machine learning and the different types
of problems that machine learning can help solve. You should have an
appreciation of the key challenges of
applying machine learning on problems and understand
some basic strategies for overcoming these challenges. We’re going to focus on building an intuitive understanding of some of the main
algorithms that are used for machine learning
tasks and understand when each algorithm is commonly used and the strengths
and weaknesses of each. We’ll spend a module on
deep learning or the use of neural networks and develop an understanding of not
only how they work, but also the strengths
and challenges relative to other forms
of machine learning. Finally, we’ll build
an understanding of some best practices in evaluating and implementing
machine learning models. No matter which algorithm
you choose to use, having a good set of
skills and evaluating models and comparing
outputs is critical. This course is organized
into six modules. Module 1 starts with building a fundamental
understanding of what machine learning is
and how it works. We’ll then talk
about the modeling process and walk through each of the key steps in building a machine
learning model. In module 3, we’ll focus
on how to evaluate models and how to interpret
the output of models. In module 4 and 5,
we’ll dive into some of the common machine
learning algorithms, starting with the
discussion on a set of algorithms called
linear models, which are commonly used for regression and
classification tasks. We’ll then switch
to another type of algorithm called tree models. We’ll talk about ensemble
models, and finally, we’ll wrap up our
discussion on algorithms by focusing on
unsupervised learning and particularly what’s
called clustering. In module 6, we’ll focus on deep learning or
neural networks. We’ll build an
intuition for what they are, how they work, and why they’re so powerful in solving challenging problems
in machine learning.

Reading: Module 1 Slides

Introduction to Machine Learning


Video: Module 1 Introduction & Objectives

Module Focus: A beginner-friendly introduction to machine learning (ML).

Key Topics

  • History: A brief look at the development of ML.
  • Definitions: Clear explanations of what ML is and how it works.
  • Applications: Why ML is an exciting field with many real-world uses.
  • ML Tasks: Exploring types like supervised, unsupervised, and reinforcement learning.
  • Vocabulary: Building essential ML terminology for better understanding.

Goal: This module aims to lay the groundwork for understanding machine learning concepts, enabling you to read articles and engage in conversations about ML.

Welcome to Module 1 of the machine-learning foundations for product managers course. In this module, we’ll provide a gentle introduction
into machine learning. We’ll focus on
answering the question, what is machine learning? We’ll start with a
little bit of history. We’ll cover some
common definitions for machine learning so
that you understand what it is and what it does. We’ll then talk about the
types of problems that machine learning
can be applied to, and why it’s such an
exciting field and there’s so many applications of machine learning out
there in the world today. We’ll then dive into a breakdown of the
machine learning tasks, focusing on the
different types of tasks that machine
learning is applied to, such as supervised learning, unsupervised learning, and
reinforcement learning. We’ll also begin to build our vocabulary around data
science and machine learning terminology so that
we’re able to start to understand articles and have conversations about
machine learning.

Video: Introduction to Machine Learning

What is Machine Learning (ML)?

  • Core Idea: Instead of giving computers explicit rules to solve a problem, we show them examples and let them figure out the rules themselves.
  • Example: A computer learns to recognize hamburgers by being shown lots of hamburger pictures, not by being given a definition of what a hamburger looks like.

How ML Differs from Traditional Software

  • Traditional: Input data + Rules = Output
  • ML: Input data + Previous Outputs = Model (which figures out the rules) -> Model then generates new outputs

ML as Part of AI

  • Artificial Intelligence (AI): The broad goal of making machines mimic human intelligence.
  • Machine Learning (ML): A subset of AI, a collection of tools and methods to help achieve AI’s goals.
  • Deep Learning: A subfield of ML using ‘neural networks’ for complex tasks.

Brief History of ML

  • Origins: Early 1800s with statistical methods still used today.
  • AI Term Coined: 1940s-50s, initial hype and progress.
  • Disillusionment: Late 60s-70s, overpromising led to funding cuts.
  • Revival: Late 70s-90s, development of many key algorithms.
  • Deep Learning Boom: 2000s-Today, focus on neural networks enabling amazing achievements.

Why is ML So Popular Now?

  • Massive Data: The internet, sensors, and connected devices generate huge amounts of data to fuel ML models.
  • Deep Learning Power: Neural networks can now solve previously impossible problems.
    • More Computing Power: Especially with graphics processing units (GPUs)
    • Huge Datasets: Organized collections of data to train complex models.
    • Algorithmic Breakthroughs: New techniques to build better models.

Where We See ML Today

  • Product Recommendations: Personalized suggestions on shopping, music, or movie sites.
  • Spam Filters: Keeping your inbox clean.
  • Mail Routing: Optical character recognition to read addresses.
  • Fraud Detection: Protecting your credit card.

Let’s start with the basics. First off, what is
machine learning? Arthur Samuel, an IBM engineer, first defined
machine learning in 1959 as a field of study that gives computers the ability to learn without being
explicitly programmed. The main idea here is
that instead of providing a computer with exact
instructions to solve a problem, we show the examples of the
problem to solve and we let the computer figure out for itself how to solve the problem. An example I’d like to give
is let’s say we’d like to train a model to
recognize hamburgers. In the traditional method
of programming computers, we would provide
explicit instructions to the computer of what
a hamburger is. We tell the computer
a hamburger consists of two buns with a darker
patty in the middle, possibly will have a piece
of cheese or lettuce on it. That’s a hamburger.
Using machine learning, rather than providing
the computer with any instructions or definition
of what a hamburger is, we would simply show
the computer many, many pictures of
different hamburgers. Over time, the
computer would be able to learn for itself and
recognize a hamburger. Let’s now take a look
at the difference between how a
traditional software generates predictions and how machine learning
generates predictions. With traditional
software systems, we take a set of input
data and a set of rules that we provide
to the software system, which the system can then
use to generate new outputs. In the previous
example of classifying an image of food as a
hamburger or not a hamburger, we would feed our
software system a set of input data or images
of food in this case, and a set of rules for what
constitutes a hamburger. The system would then
be able to apply those rules to identify whether each image in the input data was a
hamburger or not a hamburger. Let’s now take a look
at how machine learning would solve this problem. With machine learning systems, we provide the input data and
a set of previous outputs. The machine learning model
then figures out for itself the rules or patterns behind what constitutes
a hamburger. It can then use these
self-learning patterns or rules to classify new input data as
hamburger or not hamburger. One common question
that we see is what is the difference
between artificial intelligence and
machine learning? Artificial intelligence is
the broad field of trying to replicate aspects of human
intelligence in machines. Machine learning can
be considered as a subset of artificial
intelligence, which is focused on a set of
methods and tools to help realize the goals of the field of artificial
intelligence. Another common term,
deep learning, can be defined as a sub-field within the broader
field of machine learning. Deep learning is
focused on the use of a specific machine learning
model called a neural network to accomplish the
goals of machine learning. Finally, we have some other smaller sub-fields
focused on accomplishing specific tasks using
machine learning or possibly deep learning, such as computer
vision to detect objects and images or
natural language processing. Finally, recommendation systems such as the type of
systems you might see on product websites of your favorite store providing personalized
recommendations to you. Let’s cover a very
brief history of the field of artificial intelligence
and machine learning. The origins of machine
learning actually date back to the early 1800s. Things like the least
squares regression or Bayes’ theorem or
machine-learning algorithms, which are actually
still in use today, and their origins date back
to statistical principles discovered and published
in the early 1800s. The field of AI was really
launched and the term AI was coined in the
’40s and 1950s. That began in 1943
with the proposition of a simple artificial model of a neuron that
exist in the brain. That neuron was then extended
to the idea or the concept of a full neural network
throughout the early 1950s. However, by the late
1960s and early 1970s, many folks throughout
government and scientific research had become disillusioned with the field. It was hyped up during the ’50s and ’60s with the significant
progress that was made. The field,
unfortunately, was not able to live up to the lofty
expectations that were set. As a result, funding was cut and machine learning
and AI researchers turned their
attention elsewhere. In the late ’70s to 1980s, research again picked
up and many of the algorithms that are
most commonly used today, both in the broader field
of machine learning, as well as specific
to the field of deep learning or the
use of neural networks, were developed through the
late ’80s and the 1990s. Finally in the 2000s, starting around 2009
until the present day, we’ve been going through what’s called the deep learning boom. The attention of the
broader field of machine learning has really
focused on the application of deep learning or neural
networks to achieve incredibly difficult tasks that previously were
considered impossible. We’ve made tremendous
progress over the last 15 years or so in
applying deep learning. Let’s talk a bit more about machine learning and where
it is as a field today. Machine learning has become a pervasive part of our world. As consumers, we may interact with machine
learning models dozens of times of day through a variety of different products and systems that
we interact with. Often we don’t even know
that we’re interacting with a machine learning model or an AI technology
in the background. The popularity of
machine learning, in particular, in recent times, is due to a couple
of main factors. The first one is that
there’s been an explosion in the amount of data that
we have access to. That’s been driven by ubiquitous
Internet connectivity that we’re now connected
to more and more people, computers and even devices. We’ve seen drastic advances in sensor technology that allows us to collect massive
amounts of data from sensors of many
different types. Finally, our devices themselves have become
intelligent and connected, meaning that they’re
able to produce data about their environment, about us, and about our usage, all of which can be used to build machine
learning models. Secondly, the field of deep
learning or the use of neural networks has
made things that previously were considered
impossible, now possible. Key drivers of that have
been a massive increase in computational
power, in particular, through what’s called
graphics processing units, which are used to
train very large, very complex machine
learning models. Secondly, we’ve seen efforts by scientists and researchers and other people working throughout academia and the corporate world to assemble huge sets of
what’s called labeled data, which is available for training very large,
very complex models. Thirdly, we’ve had
significant breakthroughs and algorithmic advances, which have allowed us to build innovative machine
learning architectures to accomplish very difficult tasks. Let’s look a little bit
more about where we find machine learning
in our world today. One common example
of machine learning is in product recommendations. If you go to a website for your favorite store
or online retailer, or if you’re looking to listen to a song or watch
a movie, generally, you interact with an AI based product recommendation
system which learns about you and your personal
preferences and is then able to provide
personalized recommendations. Another example of machine
learning in our world today is spam filters
for email clients, which are able to distinguish
between real messages, which we’ve received,
versus spam messages. The routing of mail to us at home is accomplished
through the use of machine learning via
what’s called optical character recognition
to recognize digits as well as handwritten
letters and words on envelopes and help the postal service get our
mail to us at home. Finally, things like credit
card fraud detection are driven by machine
learning algorithms, identifying and distinguishing
between normal patterns of spending versus when there’s a likely chance of
credit card fraud.

Video: Data Terminology

What is Data?

  • Characteristics or information, usually numerical, collected by observation.
  • Examples: measurements, text, images, sound, video (all can be represented numerically).
  • Types of relationships: Spatial (location-based) and Temporal (time-based).

Structured vs. Unstructured Data

  • Structured:
    • Pre-defined fields and format (like Excel spreadsheets)
    • Often stored in relational databases.
    • Easier to analyze with common tools.
  • Unstructured:
    • No pre-defined format or length (images, text, etc.)
    • Requires specialized tools.
    • Comprises roughly 80% of organizational data.

Types of Data

  • Continuous: Infinite values between given points (temperature, height)
  • Categorical: Finite categories, sometimes with ranking (gender, color)
  • Discrete: Numeric, but countable (age, number of items) – often treated as categorical in machine learning.
  • Time Series: Ordered by time, with closer points assumed more related (sensor data, stock prices)

Structured Data Terminology (in machine learning)

  • Observation/Instance/Example: A single row of data (one house).
  • Feature/Factor/Predictor: A column of data (square footage, # bedrooms).
  • Target/Label/Response: The value we’re trying to predict (sale price).

Would data from a weather station containing sensors for temperature, humidity and wind speed be an example of structured or unstructured data?

Structured data

Correct, the data is organized in a structure with defined fields and could be stored in a relational database

In this lesson, we’re going to define
some commonly used terminology in data analytics and machine learning. First of all,
what does the word data mean? The OECD defines data as characteristics
or information usually numerical, they’re collected through observation. Data can come in many forms, because almost anything can be
turned into numerical values. Data can be measurements of an object or
its dimensions. It can mean text, words or
sentences or documents can be images, it could be sound,
that it can even be video because images, sound and video even on the surface, they may not appear to be numeric
actually consists of numbers. For example,
the values of pixels within an image, data can also have different
types of relationships. So common relationships that are found
in data are spatial relationships where data points are related
through some concept of nearness or foreignness in space or
location within space. Data can also have temporal relationships,
where points and data are related through time and through how near or how
far within time they are to each other. We commonly break data down
into two primary types of data, structured versus unstructured data. So structured data
follows a set structure, which is based on a set
of pre-defined fields. So we have various records
within structured data and each record includes a number
of pre-defined fields. So for those of you have used
a spreadsheet program like Excel, you’ll be very familiar with this. Excel is based around
the idea of structured data, where we have a series of
rows that are the records and each row can have multiple columns
that are the pre-defined fields. Often structured data is stored in
what’s called relational databases and it’s nice to work with, because it’s
very easy to enter and organize and the structure that it has makes it
easy to search and to analyze it. It also works very well
with commonly used tools, not only by data science and
machine learning practitioners, but folks who work through a variety of
different roles within an organization often can interact structured data through
programs such as Microsoft, Excel. Unstructured data does not follow
any pre-defined format of fields. So examples of unstructured data
would be things like images, videos, sounds or text where
there’s no pre-defined fields or not even perhaps a pre-defined length. When we think about text, for example, a sentence can consist
of any arbitrary number of words. A document can consist of any
arbitrary number of sentences, so it’s hard to pre- define a structure
to capture something like text. On structured data generally requires
a set of specialist tools in order to work with it. So it’s a little bit more challenging for
individuals in an organization to be able to work with it without
the right set of tools. However, if we look at
a typical organizations data, roughly 80% of it is considered
unstructured data and this would be things like images, might be video or
text, such as an emails or slides. 20% of the data is what’s
considered to structured data. When we work with machine learning
will work with both types of data, structured and unstructured. And we often use different
algorithms are different approaches, depending on the type of data
that we’re working with, continuous data means numeric variables
that can take an infinite number of possible values between
any two given values. So an example of this would be
the length of a part, the temperature, a person’s height or weight or even time, which can be represented by
an infinite possible number of values. On the other hand, categorical data
can be classified into a finite number of categories or
distinctive groups. Sometimes these have a logical order or
ranking and sometimes they don’t. So examples could be gender, students
major, colors, a type of material. And we have a third type
called discrete data. A discrete data are numeric variables
that have a countable number of values. So examples of this might be age, the
number of parts, the year something was made even though their numeric in nature,
because we have a finite number of them. Often when we’re doing machine learning
will consider these to be categorical variables, because they fall into a finite
number of possible categories or groups. Another type of data that will
commonly use a machine learning modeling is what’s
called time series data. In time series data is
organized in order of time. Typically points
are equally spaced by time. So we may have points representing
measurements from a sensor for example every second or every minute,
every hour or every day. We may be working with stock prices where
we have ticker prices every 15 minutes or daily opening and closing prices. Or we might be working with data for example from a smart meter where we have
continuous readings and we have daily, monthly and annual aggregate numbers
that represent usage over time. The assumptions behind time series data
number one is the time is considered one- way we don’t go back in time,
it only goes forward in one direction. Secondly, we assume that points that are
closer together in time are generally more relevant or more related to each other than points
which are further away in time. So we now introduce some terminology
that’s specific to structure data. So we have an example here on the slide
showing a number of houses for sale in the neighborhood in the local area, each
house has a number of characteristics, such as what neighborhood that’s in,
school districts it’s in, square footage of the house,
the number of bedrooms and the year built. And then for each house we have a recorded
market sale price of the house. So this would be a perfect
example of structured data. And we’re working with this type of data. We use some specific terminology
when we apply machine learning. And so
let’s go through the details of that. First of all, each row of a data. So each house in this case is what’s
called an observation of a data. You’ll also see it referred to as
an instance of the data, an example or a feature vector. Each column of a data is what is
commonly called a feature of our data. Also referred to as a factor,
a predictor, an X variable, independent variable, an attribute, or
even a dimension and machine learning. We’d like to use a lot of different words
to represent the same thing sometimes. Finally, the last column is a bit
different than the other columns, because the last column is what
we are trying to predict, right? So the last column we call our target,
because the objective of a model that we like to build is to
predict the sale price. So this can be called the target,
also called the label and annotation, response, a Y variable or
even a dependent variable.

Video: What is a Model?

What is a Model?

  • Simplified Representation: A model approximates how input variables (x) relate to output variables (y).
  • Formula: y = f(x) + error (recognizing that models are never perfect)

Example: Fuel Efficiency

  • Input (x): Engine horsepower
  • Output (y): Miles per gallon (MPG)
  • The model can estimate MPG based on horsepower, but there will always be some error due to other factors.

Building a Model to Predict Home Prices

  1. Features (x): Characteristics like school district, square footage, number of bedrooms.
  2. Target Values (y): Historical sale prices.
  3. Model Creation: We use this data to establish the relationship between features and target values.

Four Key Components of Model Creation

  1. Features: Select the relevant data attributes.
  2. Algorithm: Choose a suitable machine learning algorithm (this provides the model’s basic structure).
  3. Hyperparameters: Fine-tune the algorithm by adjusting ‘knobs’ to control complexity.
  4. Loss Function: Defines what it means for the model to be wrong; the goal of training is to minimize this error.

Training a Model

  • Data Input: Use historical data with known inputs and outputs.
  • Optimization: The algorithm, guided by hyperparameters, learns values that minimize the loss function, resulting in a trained model.

Which of the following must we define in order to create and train a machine learning model (select all that apply)?
  • An algorithm
  • A set of features of our data to use in our model
  • A loss function to use in optimizing our model
  • Values for the hyperparameters of the algorithm

So what exactly is a model? A model is simply an approximation
of the relationship between two or more variables. Typically with a model, we have one or
more input variables, which we call x, and we have one or more output variables
of our model, which we generally call y. A model simply approximates
the relationship between x and y. In the form of y is a function of x
plus an error term in recognition that we generally can never create
a perfect model which can fully predict the values of y without error. So we always add an additional
error term to recognize that. Let’s take an example of a model. Suppose we’ve collected a bunch
of data on car engines and we’d like to try to approximate the fuel
efficiency of a car in miles per gallon from data that we’ve collected on
the horsepower of the car’s engine. You can see on the screen a simple
plot where we plotted horsepower on the horizontal axis and
miles per gallon on the vertical axis. In this case, horsepower is serving as
the input variable to our model and the variable we’re trying to predict
miles per gallon is our output variable. We can create a model where miles per
gallon is a function of horsepower plus some error term. As you can see on the screen,
the model can roughly approximate miles per gallon given
information on horsepower. But it’s not able to perfectly capture
the randomness of the data that we have on miles per gallon. And therefore it’s important to always
keep in mind that you always have some small amount of error in your model. Let’s now go back to the example we were
using previously where we’d like to create a model to predict the sale
price of homes for sale. In this example,
the characteristics of homes for sale, such as the school district,
the square footage of the home, the number of bedrooms of the attributes
of each home would be the x values, what we call the features of our model. We have a number of historical
observations of these features which are represented by data we’ve collected
on past homes which have been for sale. Likewise for each of those observations, we also have our y value which is
the actual sale price of that home. We can use these historical
observations of our input data. And these historical y values or
target values to then create a model which can approximate
the relationship between the input data represented by our features and
the output targets. In order to create a model,
there’s four things we need to define. The first is the set of features to use or the attributes of our data which we
like to use as inputs to our model. Second thing we need to define
is our choice of algorithm. In machine learning, there are a variety
of different algorithms we can choose. And the algorithm acts as a general form
or a template for the model that we’re creating to define the rough shape and
structure of the model. Each algorithm also has a set of
hyperparameter values that we then need to define. You can think of hyperparameter values as
knobs that we can turn on the algorithm to make our algorithm more simple or
more complex to best fit our problems. And the fourth thing we need to
find is a choice of loss function, which we’re seeking to optimize. And this is how we train our model. A loss function is a way to quantify
the error in our model and as we build and train our model,
we seek to minimize the error. That’s when we define the loss function. Our job is to minimize the loss function
or minimize the amount of error and to adjust the values of our model to result
in that minimum of the loss function. When we train our model, we use
historical data generally on the inputs, as well as the outputs. Our algorithm and our hyperparameters,
provide an overall form or structure for a model. And as we train our model
using that historical data, we’re learning values for the model
which minimize our loss function or minimize the amount of error in
the end model that we create.

Video: Types of Machine Learning

The Three Main Types

  1. Supervised Learning
    • Goal: Predict a target variable (price, category, etc.)
    • Needs: Training data with both features AND target values for past observations.
    • Examples: Detecting pneumonia from X-rays, predicting house prices.
  2. Unsupervised Learning
    • Goal: Find patterns and structure within data.
    • Needs: Observations, but NO target values.
    • Examples: Customer segmentation (no predefined groups), anomaly detection.
  3. Reinforcement Learning
    • Goal: Learn a strategy through trial and error to accomplish a goal.
    • Example: Training a computer to play chess.

Supervised Learning: A Closer Look

  • Regression: Predicts numerical targets (e.g., house price, product demand).
  • Classification: Predicts categories (e.g., spam vs. not spam, flower type).

Gemini

Claude AI

ChatGPT 3.5

Copilot

A model which is trained to identify several types of diseases in the lungs using a training dataset consisting of chest x-ray images and corresponding target labels indicating a human radiologist's diagnosis is an example of what type of machine learning?

Supervised learning

Correct. We are training a model using both historical observations and corresponding target values

We are building a model to detect instances of "fake news" within social media feeds. What type of supervised learning task is this?

Classification

Correct. We are trying to predict whether an item is one of two categories: “fake news” or “not fake news”

In this section, we’ll talk about
the primary types of machine learning. There’s three main types of machine
learning, supervised learning, unsupervised learning, and
reinforcement learning. In supervised learning,
our objective is to predict a target variable given
a set of observations. We accomplished to primary tasks,
one is classification or recognizing a category or
class of an object. The second is called regression or
predicting some sort of numerical variable, such as the price of a home for
sale. In supervised learning, we generally have a large set of past
observational data to use for training, as well as the associated target
variable for each of those observations. So examples of supervised learning
would be using X-ray images to identify pneumonia in the lungs or
predicting real estate prices. In unsupervised learning,
our objective is a little bit different, here we’re organizing data by
some sort of inherent structure. We generally have a set of observations
from the past available to us, but we don’t have the associated target
variables for each of those observations. So common types of unsupervised
learning would be clustering or anomaly detection and it’s used for
things such as market segmentation, where there are no commonly agreed-upon
definitions of market segments. But our objective is to take
the potential buyers or customers for a certain product and break them into
logical groups based on some sort of inherent pattern or
order in those customers. The third type of machine learning
is called reinforcement learning. Here our objective is to learn a certain
strategy through interaction or achieve a goal. This is the type of machine learning
that’s often used to teach computers to train to learn how to play
things like chess or checkers. We’ll now dive in a little bit deeper on
supervised versus unsupervised learning. In supervised learning, we have some set
of past observations of the features and we have the associated target for
each of those observations. We can use those to develop a model. So let’s say we want to build
a model to recognize apples. We have a set of pictures of apples,
and for each of those pictures, we have a label that
says this is an apple. We build a model using that
set of pictures of apples, and then our model is now capable to
recognize new pictures of apples so we can feed it a picture
of a different apple and it should be able to recognize that
this is also in fact an apple. In unsupervised learning,
we have observations but we generally don’t have target values which
are associated with those observations. So keeping with our fruit example here, we might provide a large set of pictures
of different types of fruits to our model. Our model would then
be able to organize or group those fruits based on some sort
of inherent pattern or structure. It might use color or shape, for example,
to separate the apples from the oranges from the bananas, but it wouldn’t
necessarily know that these are apples or these are bananas because we have not
provided the target information for each of the observations to
be able to learn that target. Our objective again is different. We’re trying to group or organize things,
not predict a specific output target. There are two primary types of supervised
learning, regression, and classification. In regression, our objective is to
predict some numerical target variable. So we’ve talked about
the example using home prices. Other types of regression might be used
for things like predicting demand for a certain new product that’s being
launched, predicting power outages or demand for power over time,
which I’ll take numerical values. In classification, rather than
predicting a continuous numerical value, we’re predicting or identifying a class or
category for a certain observation. Examples of this might be
detecting lung diseases, identifying different types of plants or
flowers, detecting whether a an email messages,
a spam message, or not a spam message. In all of these cases, we have a set
of categories which may be a Binary, either yes or no, one, or zero or
maybe one out of many possible categories, such as types of flowers. And our objective is to predict
the category that a new input observation falls into.

Video: What ML Can and Cannot Do

What Machine Learning Does Well

  • Assumes you have good quality data.
  • Automation: Excels at routine tasks like mail sorting (using image recognition) or automated speech transcription.
  • Predictions: Learns relationships between inputs and outputs, predicting things like product demand or student grades.
  • Personalization: Tailors recommendations to individuals (think Netflix or product suggestions).

What Machine Learning Doesn’t Do Well

  • Understanding Context: It misses subtleties like sarcasm or humor, taking everything literally.
  • Identifying Causation: Machine learning finds correlations, but it cannot prove that one thing causes another (the ice cream sales vs. crime example).
  • Explaining the “Why”: Machine learning reveals patterns, but not the underlying reasons behind them.
  • Proposing Solutions: A predictive model won’t tell you how to fix the problem, like reducing crime. It can predict, but not provide actionable insights on how to intervene.

Key Takeaway

Machine learning is a powerful tool, but it’s important to understand its capabilities and limitations for responsible and effective use.

We are building a machine learning model to help us better understand a recent drop in demand for our product. Which of the following would be difficult (or impossible) for our machine learning model to do (select all that apply)?
  • Determine the root cause in the drop in demand
  • Identify the impact that we could have on demand if we run a new advertising campaign that we are building

to really be capable of applying machine
learning in real life situations. We need to understand not only how
the technology works and how to apply it, but we also have to know when it works
well and when we should apply it and when it doesn’t work well, and
when we shouldn’t apply it to that end, we’re going to focus this
lesson on situations where machine learning can work well in
situations where it doesn’t work well. Let’s start with discussing what
machine learning can do well. And we have to caveat this with the fact
that machine learning really can only work well if it’s given a sufficient quantity
and quality of data, if we don’t have enough data about the specific
problem that we’re trying to solve, or a data is not clean and we have a lot of
noise or heavy outliers are missing data, we’re not going to be able to
make machine learning work well, even if it’s a simple
straightforward task. So let’s assume for the moment that
we have sufficient quantity and quality of data. Some of the things that machine learning
is particularly adept at include the automation of straightforward tasks,
making predictions by learning input and output relationships and personalizing
services or products for individual users. Talk a bit about what each of those means. First of all,
all the nation of straightforward tests. So examples of this might be
automation of mail routing based on optical character recognition. So as male comes to the post office it’s
automatically sorted and filtered and sent on its route based on machine
learning that recognizes handwritten words and digits and automatically can
route that mail to its destination. Another example of automation of
a straightforward task would be all the transcription. So when I teach here at Duke, all of my lectures are recorded through
the webinar software we use called zoom. Zoom includes an automatic transcription
feature so that after every session I can receive a written transcript of my
license in which I can then post up and make available for my students. Some uses a machine learning model behind
the scenes to automatically transcribed my speech to text to provide me
that automated transcription. 2nd Thing. Machine learning can do particularly well
is making predictions by learning simple input output relationships. Examples of this might
include predicting demand for a product based on things like time of day
or season of year or temperature outside. It could include things
like predicting grades for students in my class based on
how often they show up to class, whether they do their homework or not and
their quiz scores throughout the semester. Finally, personalization for
individual users. So if you’re a netflix or
amazon prime subscriber, for example, you should be very familiar with this. Netflix provides personalized
recommendations of movies to watch to users based on the machine learning model. Likewise, many of the online retailers
today use machine learning models to recommend products to buy based
on your buying history and the buying history of users
who are similar to you. So now let’s talk about some of the things
that machine learning cannot do very well. The first is understanding context. So a good example here would
be machine translation. As humans. We have the capability to understand
context about conversations that we have. We can understand when a sentence is
intended to be a joke or when someone says something that is intended to be sarcastic
in nature versus taken literally. Machine learning models today are unable
to understand things like jokes or sarcasm. So everything to the machine
learning model here’s and interprets as a literal statement. 2nd thing machinery cannot do particularly
well is determined causation. This is really important to understand
that machine learning identifies patterns and correlations in data but
does not determine cause or causation whether one thing
caused another thing. So take an example of this, we could
show that over the course of a year, ice cream sales and
violent crimes are actually correlated, so as I scream, sales go up,
violent crimes also go up. This is an example of
correlation between two things. However, it would be obvious that ice
cream sales going up is not a cause for violent crimes going up. There’s other variables that come into
effect things like seasonality and, and weather. It can be shown that in summertime
demand for ice cream goes up, but violent crimes also
rise in the summertime. So if we were to build
a machine learning model, we would identify a strong correlation
between ice cream sales and crimes, but we have to be careful not to misinterpret
that correlation as causation. Third thing, machine learning cannot do particularly
well is explaining why things happen. So machine learning again
identifies patterns but doesn’t attempt to explain why
these patterns are occurring. It can explain outputs in terms of
correlations with given input features. And again, it doesn’t explain why this
combination of input features results in a certain output. And finally, finally, machine learning is
not capable of determining the impact of what we call interventions or
possible solutions on the problem. It’s also not capable of fine
solutions to a given problem. Just take an example of this, suppose we were building a machine
learning model that predicts crime. It’s likely that we would be able to
accurately predict crime given a certain number of different features, things like
location, seasonality and other factors. We could build a model that could
effectively predict crime, but this model wouldn’t necessarily tell us
anything about how to reduce the crime. So, for example, if we wanted to analyze
whether enacting a law to ban guns in a certain area would reduce crime,
a machine learning model that we’ve built, it really wouldn’t be able to tell us
anything about whether that was a good or bad solution and
how that might impact the predicted crime. Likewise, our machine learning model
wouldn’t be able to answer the question, what should we do to reduce crime or
to suggest or proposed solutions to the problem
that we’re trying to model?

Review


Video: Module Wrap-up

What is Machine Learning?

  • A way to teach computers to learn from data, enabling them to complete tasks without being explicitly programmed.
  • Requires substantial data to uncover patterns and insights.

Types of Machine Learning

  • Supervised Learning: Learning from labeled data (input and output provided).
  • Unsupervised Learning: Finding patterns in unlabeled data.
  • Reinforcement Learning: Learning by interacting with an environment and receiving rewards or penalties.

How Machine Learning is Used

  • Automation: Streamlining various tasks.
  • Prediction: Forecasting future outcomes.
  • Personalization: Tailoring experiences for individual users.

Limitations of Machine Learning

  • No Causation: Identifies correlations, not the reasons behind them.
  • No Problem Solving: Cannot propose solutions or explain how to achieve a desired change.

Next Up

The next module will delve into the steps involved in building and training effective machine learning models.

So let’s summarize what we’ve learned so
far in this module, we started off with a discussion on what
machine learning is and what it does. She learned, again, as a way to program
computers to learn from experience, to complete a task without providing explicit
instructions on how to perform that task. The key to doing this is
providing sufficient data, so that the computer is able to learn
from the data that we provide it. We also talked about the different
types of machine learning. We covered an introduction to supervised
learning versus unsupervised learning, and briefly touched on what’s
called reinforcement learning. We discussed some of the ways that machine
learning is used in the world today, specifically to automate processes to
be able to generate predictions and to personalize products and
services for individual users. We then wrapped up with the discussion of
where machine learning is not useful, and some of the limitations of
applying machine learning. Specifically, that machine learning is not
capable to explain why something happens, so it’s not used for causation, right? And it’s also not able to
explain how to fix something. Machine learning can tell us about
patterns and correlations in data, and can exploit those patterns to generate
predictions and identify things, but doesn’t really understand the context
of why things are as they are, or how to change things,
if we desire to change something. In the next module, we’ll dive further
into the machine learning process and we’ll start to discuss
the key steps in building and training a machine learning model.

Quiz: Module Wrap-up

What is the difference between “machine learning” and “deep learning”?

Which of the following are characteristics of structured data (select all that apply)?

We are building a model for a nationwide fast food retailer to predict the daily sales from one of their restaurants around the country. Which of the following features would we most likely treat as categorical (rather than continuous) variables in our model?

What is the purpose of the “algorithm” in building a machine learning model?

Building a model which uses historical data to predict the future demand for electricity within a certain utility territory would likely be an application of which type of machine learning?

Building a model which organizes news articles from the daily paper into groups by subject (e.g. sports, business, politics) using only the text of the articles, without being trained on previous labeled articles, would likely be an application of which type of machine learning?

Which of the following are things that machine learning cannot do well (select all that apply)?

Building a model to identify whether a patient has skin cancer based on images of the patient’s skin is an example of which type of supervised learning task?

In the matrix of input variable data to a supervised learning model (often referred to as X), the rows of the matrix represent the ____ and the columns of the matrix represent the ____.

What is the main difference between regression and classification?