Skip to content
Home » IBM » IBM AI Engineering Professional Certificate » Machine Learning with Python » Week 1: Introduction to Machine Learning

Week 1: Introduction to Machine Learning

In this module, you will learn about applications of Machine Learning in different fields such as health care, banking, telecommunication, and so on. You’ll get a general overview of Machine Learning topics such as supervised vs unsupervised learning, and the usage of each algorithm. Also, you understand the advantage of using Python libraries for implementing Machine Learning models.

Learning Objectives

  • Provide examples of Machine Learning in various industries.
  • Outline the steps machine learning uses to solve problems.
  • Provide examples of various techniques used in machine learning.
  • Describe the Python libraries for Machine Learning.
  • Explain the differences between Supervised and Unsupervised algorithms.
  • Describe the capabilities of various algorithms.

Welcome


Video: Course Introduction

Course Overview:

  • Focus: Introduction to machine learning fundamentals using Python.
  • Topics: Supervised/unsupervised learning, classification, regression, clustering, and their real-world applications.
  • Format: Modules with videos and hands-on labs using Jupyter Lab, Python, and libraries like Pandas, NumPy, and Scikit-Learn.
  • Projects: Apply algorithms to datasets such as automobile emissions, real estate prices, customer behavior, cancer detection, and Australian rainfall prediction.

Instructors:

  • Saeed Aghabozorgi: AI/ML Customer Engineer at Google, industry expertise.
  • Joseph Santarcangelo: Ph.D. in Electrical Engineering, ML research background.
  • Azim Hirjani: Data Science Intern at IBM, BS in Computer Science.

Key Takeaways:

By completing the course, you will:

  • Understand key differences between machine learning concepts.
  • Describe the mechanics of various machine learning algorithms.
  • Gain hands-on experience applying machine learning in Python.

Hello and welcome to the Fundamentals of Machine
Learning with Python course instructed by Saeed Aghabozorgi, Joseph Santarcangelo, and
Azim Hirjani. In this course, you will be introduced to
machine learning and learn how to apply various machine learning algorithms. The instructors for the course are Saeed Aghabozorgi,
Joseph Santarcangelo, and Azim Hirjani. Saeed Aghabozorgi Ph.D. is a senior AI/ML
Customer Engineer at Google, with a track record of developing enterprise-level solutions
that increase customers’ ability to turn their data into actionable knowledge. He has worked at IBM and Amazon Web Services. Saeed is also a researcher in the artificial
intelligence and machine learning field. Joseph Santarcangelo has a Ph.D. in Electrical
Engineering. His research focused on using machine learning,
signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed
his Ph.D. And Azim Hirjani is a Data Scientist Intern at
IBM, creating content for various IBM Data Science courses. He is pursuing a BS in Computer Science from
the University of Toronto. Machine learning is present in many fields
and industries. It is used heavily in the self-driving car
industry to classify objects that a car might encounter while driving, for example, people,
traffic signs, and other cars. Many cloud computer service providers like
IBM and Amazon use machine learning to protect their services. It is used to detect and prevent attacks like
a distributed denial-of-service attack or suspicious and malicious usage. Machine learning is also used to find trends
and patterns in stock data that can help decide which stocks to trade or which prices to buy
and sell at. Another use for machine learning is to help
identify cancer in patients. Using an x-ray scan of the target area, machine
learning can help detect any potential tumors. This course consists of four modules: Introduction and Regression, Classification, Clustering, and the Final Project. Each module comprises videos with hands-on
labs to apply what you have learned. The hands-on labs use Jupyter Lab, which is hosted on Skills Network Labs and
uses the Python programming language and various Python libraries like Pandas, Numpy, and Scikit-Learn. You will explore different machine learning
algorithms in this course and work with a variety of data sets to help you apply machine
learning. With linear regression, you will work with
an automobile data set to estimate the CO2 emission of cars using various features, and then predict the CO2 emissions of cars
that haven’t even been produced yet. In regression trees, you will work with real
estate data to predict the price of houses. In logistic regression, you will work with
customer data for telecommunication companies and see how machine learning is used to predict
customer loyalty. With K-nearest neighbors you will use telecommunication
customer data to classify customers. For support vector machines, you will classify
human cell samples as benign or malignant. In multiclass prediction, you will work with
the popular iris data set to classify types of flowers. With decision trees, you will build a model
to determine which drugs to prescribe to patients. And finally, with K-means, you will learn
to segment a customer data set into groups of individuals with similar characteristics. In the last module, you will complete the
final project where you will use many of the classification algorithms to predict rain
in Australia. After completing this course, you will be
able to explain, compare, and contrast various machine learning topics and concepts like supervised learning, unsupervised learning,
classification, regression, and clustering. You will also be able to describe how the
various machine learning algorithms work. And finally, you will learn how to apply these
machine learning algorithms in Python using various Python libraries.

What is Machine Learning?


Video: Welcome

What You’ll Learn

  • Diverse Applications: Discover how machine learning drives innovation in fields like:
    • Healthcare: Detecting cancer, aiding in diagnosis and treatment.
    • Banking: Loan approvals and customer segmentation.
    • E-commerce: Personalized product recommendations
  • Practical Skills: Learn to use popular Python libraries like scikit-learn and PsyPI to:
    • Analyze automobile data for CO2 emission prediction.
    • Help businesses predict customer churn.
  • Hands-On Practice: Experiment with code directly in your browser using built-in labs. No setup is required.

Key Takeaways

This course will give you:

  • Job-Ready Skills: Build your resume with in-demand machine learning techniques.
  • Portfolio Projects: Showcase your abilities with real-world examples.
  • Certificate of Completion: Earn proof of your machine learning competency.

Hello and welcome to Machine Learning with Python.
In this course you’ll learn how Machine Learning is used in many key fields and industries.
For example, in the healthcare industry, data scientists use Machine Learning to predict whether
a human cell that is believed to be at risk of developing cancer is either benign or malignant.
As such, Machine Learning can play a key role in determining a person’s health and welfare.
You’ll also learn about the value of decision trees and how building a good decision tree from
historical data helps doctors to prescribe the proper medicine for each of their patients.You’ll
learn how bankers use Machine Learning to make decisions on whether to approve loan applications;
and you will learn how to use Machine Learning to do bank customer segmentation, where it is
not usually easy to run for huge volumes of varied data. In this course you’ll see how machine
learning helps websites such as YouTube, Amazon, or Netflix develop recommendations to their
customers about various products or services, such as which movies they might be interested in
going to see or which books to buy. There is so much that you can do with Machine Learning.
Here you’ll learn how to use popular Python libraries to build your model. For example, given
an automobile data set, we can use the scikit-learn library to estimate the co2 emission of cars
using their engine size or cylinders. We can even predict what the co2 emissions will be for a
car that hasn’t even been produced yet, and we’ll see how the telecommunications industries can
predict customer churn.You can run and practice the code of all these samples using the built-in
lab environment in this course, you don’t have to install anything to your computer or do anything
on the cloud. All you have to do is click a button to start the lab environment in your browser.
The code for the samples is already written using Python language in Jupyter notebooks and
you can run it to see the results or change it to understand the algorithms better. So what will you
be able to achieve by taking this course? Well, by putting in just a few hours a week over the
next few weeks you’ll get new skills to add to your resume such as regression, classification,
clustering, scikit-learn, and psy PI. You’ll also get new projects that you can add to
your portfolio including cancer detection, predicting economic trends, predicting customer
churn, recommendation engines, and many more. You’ll also get a certificate in Machine Learning
to prove your competency and share it anywhere you like online or offline such as LinkedIn
profiles and social media, so let’s get started. [music]

Video: Introduction to Machine Learning

What is Machine Learning?

  • Human Analogy: Machine learning (ML) enables computers to learn and make predictions like humans do. Think of a child learning to identify animals – they learn by observing patterns and features. ML algorithms work similarly.
  • Real-World Example: ML can analyze medical data (e.g., cell characteristics) to aid in early cancer diagnosis.
  • Formal Definition: ML is a subfield of computer science that allows computers to learn from data without being explicitly programmed for every single rule.

Why Machine Learning Matters

  • Old vs. New: Traditional programming required hand-coding rules for specific tasks. ML automates pattern discovery within data.
  • Impact: ML powers many everyday applications:
    • Netflix/Amazon recommendations
    • Bank loan approvals
    • Customer targeting in telecom
    • Chatbots, face recognition, etc.

Types of Machine Learning Techniques

  • Regression: Predicts continuous values (e.g., house price).
  • Classification: Predicts categories (e.g., malignant/benign cell).
  • Clustering: Groups similar data points (e.g., patient similarities).
  • And more: Association analysis, anomaly detection, recommendation systems…

AI, ML, and Deep Learning

  • AI: The overarching field of making computers intelligent.
  • ML: A subset of AI that focuses on statistical learning from data.
  • Deep Learning: A type of ML with deeper automation, making computers learn more independently.

Which Machine Learning technique is proper for grouping of similar cases in a dataset, for example to find similar patients, or for customers segmentation in a bank?

Clustering

Hello, and welcome! In this video I will give you a high level introduction to Machine Learning. So let’s get started. This is a human cell sample extracted from a patient, and this cell has characteristics. For example, its clump thickness is 6, its uniformity of cell size is 1, its marginal adhesion is 1, and so on. One of the interesting questions we can ask, at this point is: Is this a benign or malignant cell? In contrast with a benign tumor, a malignant tumor is a tumor that may invade its surrounding tissue or spread around the body, and diagnosing it early might be the key to a patient’s survival. One could easily presume that only a doctor with years of experience could diagnose that tumor and say if the patient is developing cancer or not. Right? Well, imagine that you’ve obtained a dataset containing characteristics of thousands of human cell samples extracted from patients who were believed to be at risk of developing cancer. Analysis of the original data showed that many of the characteristics differed significantly between benign and malignant samples. You can use the values of these cell characteristics in samples from other patients to give an early indication of whether a new sample might be benign or malignant. You should clean your data, select a proper algorithm for building a prediction model, and train your model to understand patterns of benign or malignant cells within the data. Once the model has been trained by going through data iteratively, it can be used to predict your new or unknown cell with a rather high accuracy. This is machine learning! It is the way that a machine learning model can do a doctor’s task or at least help that doctor make the process faster. Now, let me give a formal definition of machine learning. Machine learning is the subfield of computer science that gives “computers the ability to learn without being explicitly programmed.” Let me explain what I mean when I say “without being explicitly programmed.” Assume that you have a dataset of images of animals such as cats and dogs, and you want to have software or an application that can recognize and differentiate them. The first thing that you have to do here is interpret the images as a set of feature sets. For example, does the image show the animal’s eyes? If so, what is their size? Does it have ears? What about a tail? How many legs? Does it have wings? Prior to machine learning, each image would be transformed to a vector of features. Then, traditionally, we had to write down some rules or methods in order to get computers to be intelligent and detect the animals. But, it was a failure. Why? Well, as you can guess, it needed a lot of rules, highly dependent on the current dataset, and not generalized enough to detect out-of-sample cases. This is when machine learning entered the scene. Using machine learning, allows us to build a model that looks at all the feature sets, and their corresponding type of animals, and it learns the pattern of each animal. It is a model built by machine learning algorithms. It detects without explicitly being programmed to do so. In essence, machine learning follows the same process that a 4-year-old child uses to learn, understand, and differentiate animals. So, machine learning algorithms, inspired by the human learning process, iteratively learn from data, and allow computers to find hidden insights. These models help us in a variety of tasks, such as object recognition, summarization, recommendation, and so on. Machine Learning impacts society in a very influential way. Here are some real-life examples. First, how do you think Netflix and Amazon recommend videos, movies, and TV shows to its users? They use Machine Learning to produce suggestions that you might enjoy! This is similar to how your friends might recommend a television show to you, based on their knowledge of the types of shows you like to watch. How do you think banks make a decision when approving a loan application? They use machine learning to predict the probability of default for each applicant, and then approve or refuse the loan application based on that probability. Telecommunication companies use their customers’ demographic data to segment them, or predict if they will unsubscribe from their company the next month. There are many other applications of machine learning that we see every day in our daily life, such as chatbots, logging into our phones or even computer games using face recognition. Each of these use different machine learning techniques and algorithms. So, let’s quickly examine a few of the more popular techniques. The Regression/Estimation technique is used for predicting a continuous value. For example, predicting things like the price of a house based on its characteristics, or to estimate the Co2 emission from a car’s engine. A Classification technique is used for Predicting the class or category of a case, for example, if a cell is benign or malignant, or whether or not a customer will churn. Clustering groups of similar cases, for example, can find similar patients, or can be used for customer segmentation in the banking field. Association technique is used for finding items or events that often co-occur, for example, grocery items that are usually bought together by a particular customer. Anomaly detection is used to discover abnormal and unusual cases, for example, it is used for credit card fraud detection. Sequence mining is used for predicting the next event, for instance, the click-stream in websites. Dimension reduction is used to reduce the size of data. And finally, recommendation systems, this associates people’s preferences with others who have similar tastes, and recommends new items to them, such as books or movies. We will cover some of these techniques in the next videos. By this point, I’m quite sure this question has crossed your mind, “What is the difference between these buzzwords that we keep hearing these days, such as Artificial intelligence (or AI), Machine Learning and Deep Learning?” Well, let me explain what is different between them. In brief, AI tries to make computers intelligent in order to mimic the cognitive functions of humans. So, Artificial Intelligence is a general field with a broad scope including: Computer Vision, Language Processing, Creativity, and Summarization. Machine Learning is the branch of AI that covers the statistical part of artificial intelligence. It teaches the computer to solve problems by looking at hundreds or thousands of examples, learning from them, and then using that experience to solve the same problem in new situations. And Deep Learning is a very special field of Machine Learning where computers can actually learn and make intelligent decisions on their own. Deep learning involves a deeper level of automation in comparison with most machine learning algorithms. Now that we’ve completed the introduction to Machine Learning, subsequent videos will focus on reviewing two main components: First, you’ll be learning about the purpose of Machine Learning and where it can be applied in the real world; and Second, you’ll get a general overview of Machine Learning topics, such as supervised vs unsupervised learning, model evaluation and various Machine Learning algorithms. So now that you have a sense with what’s in store on this journey, let’s continue our exploration of Machine Learning! Thanks for watching! (Music)

Video: Python for Machine Learning

Why Python is Ideal for Machine Learning:

  • Powerful General Language: Python is easy to learn and versatile, great for both beginners and experienced programmers.
  • Specialized Libraries: Python offers a wealth of libraries designed specifically for machine learning tasks:
    • NumPy: Efficiently handles arrays and mathematical computations.
    • SciPy: Provides advanced scientific computing tools.
    • Matplotlib: Excellent for creating visualizations and plots.
    • Pandas: Simplifies data manipulation and analysis.
    • Scikit-learn: The core library, with tools for:
      • Preprocessing data
      • Implementing various machine learning algorithms (classification, regression, etc.)
      • Model training, testing, and evaluation

Scikit-learn Advantages:

  • Ease of Use: Building and evaluating machine learning models can be done in just a few lines of code.
  • Efficiency: Scikit-learn is built on top of optimized libraries like NumPy, making computations fast.
  • Documentation: Clear explanations make it easy to learn.

Key Takeaway: Python’s readability and powerful libraries streamline the entire machine learning process, from data preparation to model deployment.

Llama3

Why Scikit is a proper library for Machine Learning (select all the options that are correct)?
  • Scikit-learn is a free machine learning library that works with Numpy and Scipy.
  • Scikit-learn has most of machine learning algorithms.

Hello and welcome. In this video, we’ll talk about how to use Python for machine learning. So let’s get started. Python is a popular and powerful general purpose programming language that recently emerged as the preferred language among data scientists. You can write your machine-learning algorithms using Python, and it works very well. However, there are a lot of modules and libraries already implemented in Python, that can make your life much easier. We try to introduce the Python packages in this course and use it in the labs to give you better hands-on experience. The first package is NumPy which is a math library to work with N-dimensional arrays in Python. It enables you to do computation efficiently and effectively. It is better than regular Python because of its amazing capabilities. For example, for working with arrays, dictionaries, functions, datatypes and working with images you need to know NumPy. SciPy is a collection of numerical algorithms and domain specific toolboxes, including signal processing, optimization, statistics and much more. SciPy is a good library for scientific and high performance computation. Matplotlib is a very popular plotting package that provides 2D plotting, as well as 3D plotting. Basic knowledge about these three packages which are built on top of Python, is a good asset for data scientists who want to work with real-world problems. If you’re not familiar with these packages, I recommend that you take the data analysis with Python course first. This course covers most of the useful topics in these packages. Pandas library is a very high-level Python library that provides high performance easy to use data structures. It has many functions for data importing, manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and timeseries. SciKit Learn is a collection of algorithms and tools for machine learning which is our focus here and which you’ll learn to use within this course. As we’ll be using SciKit Learn quite a bit in the labs, let me explain more about it and show you why it is so popular among data scientists. SciKit Learn is a free Machine Learning Library for the Python programming language. It has most of the classification, regression and clustering algorithms, and it’s designed to work with a Python numerical and scientific libraries: NumPy and SciPy. Also, it includes very good documentation. On top of that, implementing machine learning models with SciKit Learn is really easy with a few lines of Python code. Most of the tasks that need to be done in a machine learning pipeline are implemented already in Scikit Learn including pre-processing of data, feature selection, feature extraction, train test splitting, defining the algorithms, fitting models, tuning parameters, prediction, evaluation, and exporting the model. Let me show you an example of what SciKit Learn looks like when you use this library. You don’t have to understand the code for now but just see how easily you can build a model with just a few lines of code. Basically, machine-learning algorithms benefit from standardization of the dataset. If there are some outliers or different scales fields in your dataset, you have to fix them. The pre-processing package of SciKit Learn provides several common utility functions and transformer classes to change raw feature vectors into a suitable form of vector for modeling. You have to split your dataset into train and test sets to train your model and then test the model’s accuracy separately. SciKit Learn can split arrays or matrices into random train and test subsets for you in one line of code. Then you can set up your algorithm. For example, you can build a classifier using a support vector classification algorithm. We call our estimator instance CLF and initialize its parameters. Now you can train your model with the train set by passing our training set to the fit method, the CLF model learns to classify unknown cases. Then we can use our test set to run predictions, and the result tells us what the class of each unknown value is. Also, you can use the different metrics to evaluate your model accuracy. For example, using a confusion matrix to show the results. And finally, you save your model. You may find all or some of these machine-learning terms confusing but don’t worry, we’ll talk about all of these topics in the following videos. The most important point to remember is that the entire process of a machine learning task can be done simply in a few lines of code using SciKit Learn. Please notice that though it is possible, it would not be that easy if you want to do all of this using NumPy or SciPy packages. And of course, it needs much more coding if you use pure Python programming to implement all of these tasks. Thanks for watching. (Music)

Video: Supervised vs Unsupervised

Supervised Learning

  • Concept: Like a supervised task, the model is “taught” how to make predictions.
  • Data: Uses a labeled dataset, where each data point has a known outcome or class.
  • Examples:
    • Classification: Predicting a category (e.g., if a tumor is benign or malignant)
    • Regression: Predicting a continuous value (e.g., CO2 emission based on car engine size)

Unsupervised Learning

  • Concept: The model finds patterns on its own, without pre-defined labels.
  • Data: Uses an unlabeled dataset.
  • Techniques:
    • Clustering: Grouping similar data points together.
    • Dimensionality Reduction: Simplifying data by removing redundant features.
    • Density Estimation: Understanding the distribution of data.
    • Market Basket Analysis: Finding items frequently bought together.

Key Differences

  • Labeled vs. Unlabeled Data: Supervised learning uses known labels, unsupervised doesn’t.
  • Control: Supervised learning is more controlled, as you guide the model towards specific outcomes. Unsupervised provides less control, with the model finding patterns independently.
Which technique/s is/are considered as Supervised learning?

Regression, Classification

Hello, and welcome. In this video we’ll introduce supervised algorithms versus unsupervised algorithms. So, let’s get started. An easy way to begin grasping the concept of supervised learning is by looking directly at the words that make it up. Supervise, means to observe, and direct the execution of a task, project, or activity. Obviously we aren’t going to be supervising a person, instead will be supervising a machine learning model that might be able to produce classification regions like we see here. So, how do we supervise a machine learning model? We do this by teaching the model, that is we load the model with knowledge so that we can have it predict future instances. But this leads to the next question which is, how exactly do we teach a model? We teach the model by training it with some data from a labeled dataset. It’s important to note that the data is labeled, and what does a labeled dataset look like? Well, it could look something like this. This example is taken from the cancer dataset. As you can see, we have some historical data for patients, and we already know the class of each row. Let’s start by introducing some components of this table. The names up here which are called clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion and so on are called attributes. The columns are called features which include the data. If you plot this data, and look at a single data point on a plot, it’ll have all of these attributes that would make a row on this chart also referred to as an observation. Looking directly at the value of the data, you can have two kinds. The first is numerical. When dealing with machine learning, the most commonly used data is numeric. The second is categorical, that is its non-numeric because it contains characters rather than numbers. In this case, it’s categorical because this dataset is made for classification. There are two types of supervised learning techniques. They are classification, and regression. Classification is the process of predicting a discrete class label, or category. Regression is the process of predicting a continuous value as opposed to predicting a categorical value in classification. Look at this dataset. It is related to CO2 emissions of different cars. It includes; engine size, cylinders, fuel consumption, and CO2 emission of various models of automobiles. Given this dataset, you can use regression to predict the CO2 emission of a new car by using other fields such as engine size, or number of cylinders. Since we know the meaning of supervised learning, what do you think unsupervised learning means? Yes, unsupervised learning is exactly as it sounds. We do not supervise the model, but we let the model work on its own to discover information that may not be visible to the human eye. It means, the unsupervised algorithm trains on the dataset, and draws conclusions on unlabeled data. Generally speaking, unsupervised learning has more difficult algorithms than supervised learning since we know little to no information about the data, or the outcomes that are to be expected. Dimension reduction, density estimation, market basket analysis, and clustering are the most widely used unsupervised machine learning techniques. Dimensionality reduction, and/or feature selection, play a large role in this by reducing redundant features to make the classification easier. Market basket analysis is a modeling technique based upon the theory that if you buy a certain group of items, you’re more likely to buy another group of items. Density estimation is a very simple concept that is mostly used to explore the data to find some structure within it. And finally, clustering: Clustering is considered to be one of the most popular unsupervised machine learning techniques used for grouping data points, or objects that are somehow similar. Cluster analysis has many applications in different domains, whether it be a bank’s desire to segment his customers based on certain characteristics, or helping an individual to organize in-group his, or her favorite types of music. Generally speaking though, clustering is used mostly for discovering structure, summarization, and anomaly detection. So, to recap, the biggest difference between supervised and unsupervised learning is that supervised learning deals with labeled data while unsupervised learning deals with unlabeled data. In supervised learning, we have machine learning algorithms for classification and regression. In unsupervised learning, we have methods such as clustering. In comparison to supervised learning, unsupervised learning has fewer models and fewer evaluation methods that can be used to ensure that the outcome of the model is accurate. As such, unsupervised learning creates a less controllable environment as the machine is creating outcomes for us. Thanks for watching. (Music)

End of Module Review & Evaluation


Practice Quiz: Intro to Machine Learning

Supervised learning deals with unlabeled data, while unsupervised learning deals with labelled data.

The “Regression” technique in Machine Learning is a group of algorithms that are used for:

When comparing Supervised with Unsupervised learning, is this sentence True or False?
In contrast to Supervised learning, Unsupervised learning has more models and more evaluation methods that can be used in order to ensure the outcome of the model is accurate.

Quiz: Graded Quiz: Intro to Machine Learning

In a dataset, what do the columns represent?

What is a major benefit of unsupervised learning over supervised learning?

What’s the correct order for using a model?

Which of the following is suitable for an unsupervised learning?

The main purpose of the NumPy library is to:


Home » IBM » IBM AI Engineering Professional Certificate » Machine Learning with Python » Week 1: Introduction to Machine Learning