You’ll learn the basic rules for calculating probability for single events. Next, you’ll discover how data professionals use methods such as Bayes’ theorem to describe more complex events. Finally, you’ll learn how probability distributions such as the binomial, Poisson, and normal distribution can help you better understand the structure of data.
Learning Objectives
- Use Python to model data with a probability distribution
- Describe the significance and use of z-scores
- Define the Empirical Rule
- Describe the features and uses of continuous probability distributions such as the normal distribution
- Describe the features and uses of discrete probability distributions such as the binomial and Poisson distributions
- Explain the difference between discrete and continuous random variables
- Describe Bayes’ theorem and its applications
- Define dependent events
- Describe conditional probability and its applications
- Define different types of events such as mutually exclusive and independent events
- Apply basic rules of probability such as the complement, addition, and multiplication rules
- Describe basic probability in mathematical terms
- Explain the difference between objective and subjective probability
- Basic concepts of probability
- Video: Welcome to module 2
- Video: Objective versus subjective probability
- Video: The principles of probability
- Reading: Fundamental concepts of probability
- Video: The basic rules of probability and events
- Reading: The probability of multiple events
- Practice Quiz: Test your knowledge: Basic concepts of probability
- Conditional probability
- Discrete probability distributions
- Continuous probability distributions
- Probability distributions with Python
- Lab: Annotated follow-along guide: Work with probability distributions in Python
- Video: Work with probability distributions in Python
- Lab: Activity: Explore probability distributions
- Lab: Exemplar: Explore probability distributions
- Practice Quiz: Test your knowledge: Probability distributions with Python
- Review: Probability
Basic concepts of probability
Video: Welcome to module 2
This text provides an overview of the upcoming lessons on probability, highlighting its applications in data-driven decision making.
Key takeaways:
- Probability: Measures the likelihood of events, used for data-driven decisions under uncertainty.
- Types of Probability: Objective (based on data) and Subjective (based on personal belief).
- Topics Covered:
- Basic rules (complement, addition, multiplication)
- Conditional probability and Bayes’ theorem
- Probability distributions (discrete and continuous)
- Common distributions (binomial, Poisson, normal)
- Z-scores and their application
- Applying probability distributions using Python’s SciPy library
This information sets the stage for further exploration of probability concepts and their practical applications in various fields, especially data analysis.
Hey there. I really enjoyed exploring descriptive
statistics with you and I’m excited for what’s next: probability. Probability
is the branch of mathematics that deals with measuring and
quantifying uncertainty. In other words, probability uses math to describe
the likelihood of something happening. For example, the chance of rain
tomorrow or of winning the lottery. Data professionals use probability to
help business leaders make data driven decisions in situations of uncertainty. No one can know the outcome of future
events with complete certainty. What data professionals can do is use all
the available data to make reasonable predictions based on probability. For instance, imagine you’re working with the
stakeholder at a large aerospace company. They need to decide whether or not to
invest in a new technology to improve the production process for
their jet engines. As a data professional, you can estimate
the probability that the new technology will have a positive impact and
predict what its potential costs and benefits might be. The stakeholder can use this information
to make an informed decision about what’s best for the organization. We’ll start by reviewing the two main
types of probability: objective and subjective. We’ll cover basic
rules of probability, like the complement rule, the addition
rule and the multiplication rule. Then we’ll go over
conditional probability and how to describe the relationship
between dependent events. We’ll check out Bayes’ theorem, a key
formula for conditional probability and the basis for
more advanced Bayesian analysis. You’ll also learn about probability
distributions. Probability distributions describe the likelihood of the possible
outcomes of a random event and can be discrete or continuous. We’ll check out discrete probability
distributions such as the binomial and Poisson and find out how they can help
you model specific kinds of data. Then we will explore continuous
probability distributions and focus on the normal distribution, the most widely
used distribution in all statistics. You’ll discover its main features and how
it applies to many different data sets. Next, we’ll also discuss how z-scores
can help you better understand the relationship between data values
in a standard normal distribution. Finally, you learn how to use
Python’s SciPy stats module to apply probability distribution to your data. When you’re ready to start
learning about probability, join me in the next video.
Video: Objective versus subjective probability
Summary of Probability and its Applications
This video explores the concept of probability and its use in data-driven decision making:
Key Points:
- Probability: Measures the likelihood of uncertain events, aiding informed decisions. (e.g., wearing appropriate clothing based on weather forecast)
- Applications: Predicting product sales, investment returns, election outcomes, and medical test accuracy.
Types of Probability:
- Objective Probability:
- Based on data, experiments, and mathematical calculations.
- Two types:
- Classical: Applies to events with equally likely outcomes. (e.g., flipping a coin)
- Calculated by: Favorable outcomes / Total possible outcomes
- Empirical: Based on historical data or experiments. (e.g., taste test for ice cream preference)
- Calculated by: Number of times event occurs / Total number of events
- Classical: Applies to events with equally likely outcomes. (e.g., flipping a coin)
- Subjective Probability:
- Based on personal feelings, experience, or judgment. (e.g., predicting a horse race winner)
- Not based on formal analysis or experiments.
- Can vary significantly between individuals.
Importance of Distinguishing Probability Types:
- Objective probability: Crucial for data analysis and making informed decisions.
- Subjective probability: Can be unreliable and should be used cautiously when evaluating predictions or making decisions.
Example: A CEO’s subjective feeling about a new technology’s success might be inaccurate. Data science based on objective probability can provide a more reliable prediction, enabling a data-driven decision.
Overall:
Understanding probability, both objective and subjective, is crucial for making informed decisions in various fields, especially data analysis. The next video will delve deeper into fundamental probability concepts.
Tutorial: Summary of Probability and its Applications
This tutorial provides an overview of probability, its different types, and its applications in various fields, especially data analysis.
What is Probability?
Probability is a branch of mathematics that deals with measuring the likelihood of events occurring. It allows us to quantify uncertainty and make informed decisions in situations where we cannot know the outcome with complete certainty.
Example: We might use probability to decide what to wear based on the weather forecast. A 70% chance of rain suggests it’s more likely to rain than not, so wearing a raincoat might be a wise choice.
Types of Probability
There are two main types of probability:
- Objective Probability:
- Based on data, experiments, or mathematical calculations.
- Two main types:
- Classical Probability: Applies to events with equally likely outcomes.
- Formula: Favorable outcomes / Total possible outcomes
- Example: Flipping a coin (heads or tails) has a 50% chance (1/2) of landing on either side.
- Empirical Probability: Based on historical data or observations.
- Formula: Number of times event occurs / Total number of events
- Example: A taste test reveals 80 out of 100 people prefer strawberry ice cream. The probability of someone preferring strawberry is 80/100 = 80%.
- Classical Probability: Applies to events with equally likely outcomes.
- Subjective Probability:
- Based on personal beliefs, feelings, or experiences.
- Not based on formal calculations or data.
- Can vary significantly from person to person.
- Example: You might feel strongly that your favorite team will win the game, but this belief is not based on data or analysis.
Applications of Probability
Probability plays a crucial role in various fields, including:
- Data Analysis: Predicting product sales, investment returns, election outcomes, and medical test accuracy.
- Machine Learning: Training algorithms to make predictions based on patterns in data.
- Finance: Assessing risk and making investment decisions.
- Insurance: Determining premiums based on the likelihood of claims.
- Quality Control: Setting standards and monitoring manufacturing processes.
Why is understanding probability important?
Understanding probability allows us to:
- Make informed decisions in situations with uncertainty.
- Interpret data and draw meaningful conclusions.
- Evaluate the reliability of predictions and claims.
- Reduce risk by understanding the likelihood of potential outcomes.
Conclusion
Probability is a powerful tool that can be applied in various fields to quantify uncertainty and make informed decisions. Recognizing the difference between objective and subjective probability is crucial for evaluating information and making sound judgments. By studying probability, you equip yourself with a valuable skill for navigating the world of uncertainty and making data-driven decisions.
Additional Resources:
- You can find various online resources and interactive simulations to practice and visualize probability concepts.
- Consider exploring introductory statistics courses or textbooks for a deeper understanding of probability and its applications.
Probability helps you
measure and quantify uncertainty and make informed decisions about
uncertain outcomes. For example, you might use probability to decide what
to wear on a given day. Today’s weather
forecast says there’s a 70 percent chance of snow. Based on this data, you
decide to wear your hat, gloves, and snow boots. When the snow falls,
you stay warm and dry. Data professionals might use probability to
predict the chances that a company will sell a certain amount of product
in a given time period, a financial investment will
have a positive return, a political candidate
will win an election, or a medical test
will be accurate. In this video, we’ll explore the two main types of probability: objective
and subjective. Objective probability
is based on statistics, experiments, and
mathematical measurements. Subjective probability
is based on personal feelings,
experience, or judgment. Let’s start with
objective probability. Data professionals use
objective probability to analyze and interpret data. There are two types of
objective probability: classical and empirical. Classical probability is
based on formal reasoning about events with
equally likely outcomes. To calculate classical
probability for an event, you divide the number
of desired outcomes by the total number
of possible outcomes. For example, if you flip a coin, the result will be
either heads or tails. Heads and tails are
terms commonly used to refer the two sides of the coin. There are only two
possible outcomes, and both outcomes
are equally likely. The chance that you get heads is one out of two or 50 percent, the same goes for tails. Or take playing cards. There are 52 cards
in a standard deck. Choosing a card gives
you a one-in-52 chance, or 1.9 percent of getting
any card in the deck, whether it’s the ace of hearts, 10 of clubs, or four of spades. But most events are
more complex and do not have equally
likely outcomes. Usually, the weather isn’t a 50 percent chance
of rain or snow, there might be an 80
percent chance of rain tomorrow and a 20 percent
chance of some other outcome. While classical
probability applies to events with equally
likely outcomes, data professionals use
empirical probability to describe more complex events. Empirical probability
is based on experimental or historical data; it represents the likelihood of an event occurring based on the previous results of an
experiment or past events. To calculate empirical
probability, you divide the number
of times a specific event occurs by the
total number of events. For example, say you conduct a taste test with 100 people to find out whether they
prefer strawberry or mint chip-flavored ice cream. You want to know the
probability that a person prefers
strawberry ice cream. Your taste test reveals that 80 people prefer
strawberry ice cream. To calculate probability, you
divide the number of times the event of preferring
strawberry ice cream occurs, 80, by the total
number of events, 100. 80 divided by 100 equals
0.8 or 80 percent. So the probability
that a person prefers strawberry over mint
chip is 80 percent. Earlier, you learned about inferential statistics and
how data professionals use sample data to make inferences or predictions about
larger populations. Inferential statistics
uses probability too. For instance, a retail company might survey a
representative sample of 100 customers to predict the shopping preferences
of all their customers. Data professionals rely on
empirical probability to help them make accurate
predictions based on sample data. For example, in an A/B
test of a website, you test a sample
of users to make a prediction about the future
behavior of all users. Say the sample of users prefer a green add-to-cart
button over a blue one. You may infer from this data that the larger population of future users will probably
share their preference. An A/B test lets you make a reasonable prediction about future users based on
empirical probability. This probability can
help an online business make smarter decisions
and increase sales. In contrast, the results of subjective probability
are based on personal feeling,
experience, or judgment. This type of probability does not involve formal calculations, statistical analysis, or
scientific experiments. For instance, you may have an overwhelming feeling that a certain horse will
win a horse race, or that your favorite team will win the championship game. You may have good
reasons for your belief, but your reasons are
personal or subjective. Your belief is not based on statistical analysis or
scientific experiments. For this reason, the
subjective probability of an event may differ widely
from person to person. It’s important to know
the difference between subjective and objective
probability when you evaluate a prediction
or make a decision. For example, the CEO of an auto company might
feel confident that using a new technology to
manufacture their pickup truck will cut costs and
increase profits. But if their prediction
is only based on personal feeling or
subjective probability, it may not be reliable. Data science based on statistical analysis or
objective probability can help accurately predict the
potential impact of the new technology and help
the CEO make an informed, data-driven decision about
adopting the technology. That’s all for now. Coming up, we’ll check out some fundamental
concepts of probability.
Video: The principles of probability
Summary of Probability Concepts:
- Probability: A number between 0 and 1 representing the likelihood of an event occurring.
- 0: No chance (event won’t occur)
- 1: Certain (event will occur)
- Values between 0 and 1 indicate varying degrees of likelihood.
- Examples: Flipping coins, rolling dice, drawing cards (used for historical and educational reasons).
- Random Experiment: A process with uncertain outcomes.
- Has multiple possible outcomes.
- Each outcome can be identified beforehand.
- Outcome depends on chance (cannot be predicted with certainty).
- Calculating Probability:
- Divide the number of desired outcomes by the total number of possible outcomes.
- Example: Probability of getting heads from a coin toss = 1 (desired outcome) / 2 (total outcomes) = 0.5 (50%)
This summary provides a basic understanding of probability and its calculation for single events. The video mentions exploring more complex scenarios in future lessons.
If the probability of an event equals 1, what is the chance that the event will occur?
100%
If the probability of an event equals 1, there is a 100% chance that the event will occur. Probability is expressed as a number between 0 and 1. If the probability of an event is close to zero, there is a small chance that it will occur. If the probability is close to 1, there is a strong chance that it will occur.
Recently, you learned that probability uses
math to deal with uncertainty or to determine how likely it is that
an event will occur. In this video, you’ll learn some fundamental
concepts of probability. We’ll discuss the
mathematical definition of probability and how to calculate probability for
single random events. First, I want to give
you some context about the types of
examples we’ll be using. In this part of the course, we’re going to
continue to reference examples of events
like flipping coins, rolling dice, and drawing cards. There are a couple
of reasons for this. One is historical. The modern theory of probability originates in the
analysis of games, of chance in the 16th
and 17th centuries. Second, and more importantly, these are events with
clearly defined outcomes that most people
are familiar with. They’re just super
useful examples of basic probability concepts. That’s why they’re used in stats classes around the world. Later on in the course, we will explore probability of for more complex events like the ones
you’ll encounter in your future work as
a data professional. Let’s talk about the fundamental
concepts of probability. First, the probability
that an event will occur is expressed as a
number between 0 and 1. If the probability of
an event equals zero, there’s a zero percent chance
that the event will occur. If the probability of
an event equals one, there’s a 100 percent chance
that the event will occur. There are lots of possibilities
in between 0 and 1. If the probability of
an event equals 0.5, there is a 50
percent chance that the event will
occur or not occur. If the probability of an
event is close to zero, there’s a small chance
that the event will occur. If the probability of an
event is close to one, there’s a strong chance
that the event will occur. For example, the chance of a stock price
going up this year is 0.05 or five percent, then you probably
don’t want to buy it. If it’s 0.95 or 95 percent, then it’s probably
a good investment. Probability measures the
likelihood of random events. The result of a random event cannot be predicted
with certainty. Before flipping a coin
or rolling a die, you do not know the outcome. The coin could turn
up heads or tails, and a die could show any
number one through six. These are examples of what statisticians call a
random experiment, also known as a
statistical experiment. A random experiment is a process whose outcome cannot be
predicted with certainty. All random experiments have
three things in common. The experiment can have more
than one possible outcome, you can represent each
possible outcome in advance, and the outcome of the
experiment depends on chance. Let’s take the example
of flipping a coin. There’s more than one
possible outcome. You can represent
each possible outcome and advance heads or tails, and the outcome
depends on chance. Until you actually
toss the coin, you can’t know whether it
will be heads or tails, or think about rolling
a six-sided die. There’s more than one
possible outcome, and all outcomes can be
represented in advance, 1, 2, 3, 4, 5, and 6. The outcome of any roll
depends on chance. Until you roll the die, you can’t know which
number will turn up. To calculate the probability
of a random experiment, you divide the number
of desired outcomes by the total number
of possible outcomes. You may recall that this is also the formula for
classical probability. The probability of tossing a coin and getting heads
is one chance in two. This is 1 divided 2
equals 0.5 or 50 percent. The probability of rolling
a die and getting two, is one chance out of six, this is 1 divided 6 equals 0.166 repeating or
about 16.7 percent. Now, let’s conduct a
different random experiment. Imagine a jar
contains 10 marbles, two marbles are red, three are green,
and five are blue. You decide to take one
marble from the jar. You want to know the probability that the marble will be green. First, count the number
of possible outcomes. You have an equal
chance of choosing any one of the 10 marbles. Next, figure out how many of these outcomes refer to
what you want to know. The chance of choosing
a green marble. Of the 10 total marbles,
three are green. Therefore, the
probability of choosing a green marble is 3
out of 10, or 0.3. In other words, you have
a 30 percent chance of choosing a green marble. Now you know how to calculate the probability of a
single random event. This knowledge will be useful as a building block for more complex calculations
of probability.
Reading: Fundamental concepts of probability
Reading
Recently, you learned that probability uses math to quantify uncertainty, or to describe the likelihood of something happening. For example, there might be an 80% chance of rain tomorrow, or a 20% chance that a certain candidate wins an election.
In this reading, you’ll learn more about fundamental concepts of probability. We’ll discuss the concept of a random experiment, how to represent and calculate the probability of an event, and basic probability notation.
Probability fundamentals
Foundational concepts: Random experiment, outcome, event
Let’s begin with three concepts at the foundation of probability theory:
- Random experiment
- Outcome
- Event
Probability deals with what statisticians call random experiments, also known as statistical experiments. A random experiment is a process whose outcome cannot be predicted with certainty.
For example, before tossing a coin or rolling a die, you can’t know the result of the toss or the roll. The result of the coin toss might be heads or tails. The result of the die roll might be 3 or 6.
All random experiments have three things in common:
- The experiment can have more than one possible outcome.
- You can represent each possible outcome in advance.
- The outcome of the experiment depends on chance.
In statistics, the result of a random experiment is called an outcome. For example, if you roll a die, there are six possible outcomes: 1, 2, 3, 4, 5, 6.
An event is a set of one or more outcomes. Using the example of rolling a die, an event might be rolling an even number. The event of rolling an even number consists of the outcomes 2, 4, 6. Or, the event of rolling an odd number consists of the outcomes 1, 3, 5.
In a random experiment, an event is assigned a probability. Let’s explore how to represent and calculate the probability of a random event.
The probability of an event
The probability that an event will occur is expressed as a number between 0 and 1. Probability can also be expressed as a percent.
- If the probability of an event equals 0, there is a 0% chance that the event will occur.
- If the probability of an event equals 1, there is a 100% chance that the event will occur.
There are different degrees of probability between 0 and 1. If the probability of an event is close to zero, say 0.05 or 5%, there is a small chance that the event will occur. If the probability of an event is close to 1, say 0.95 or 95%, there is a strong chance that the event will occur. If the probability of an event equals 0.5, there is a 50% chance that the event will occur—or not occur.
Knowing the probability of an event can help you make informed decisions in situations of uncertainty. For example, if the chance of rain tomorrow is 0.1 or 10%, you can feel confident about your plans for an outdoor picnic. However, if the chance of rain is 0.9 or 90%, you may want to think about rescheduling your picnic for another day.
Calculate the probability of an event
To calculate the probability of an event in which all possible outcomes are equally likely, you divide the number of desired outcomes by the total number of possible outcomes. You may recall that this is also the formula for classical probability:
# of desired outcomes ÷ total # of possible outcomes
Let’s explore the coin toss and die roll examples to get a better idea of how to calculate the probability of a single random event.
Example: Coin toss
Tossing a fair coin is a classic example of a random experiment:
- There is more than one possible outcome.
- You can represent each possible outcome in advance: heads or tails.
- The outcome depends on chance. The toss could turn up heads or tails.
Say you want to calculate the probability of getting heads on a single toss. For any given coin toss, the probability of getting heads is one chance out of two. This is 1 ÷ 2 = 0.5, or 50%.
Now imagine that you were to toss a specially designed coin that had heads on both sides. Every time you toss this coin it will turn up heads. In this case, the probability of getting heads is 100%. The probability of getting tails is 0%.
Note that when you say the probability of getting heads is 50%, you aren’t claiming that any actual sequence of coin tosses will result in exactly 50% heads. For example, if you toss a fair coin ten times, you may get 4 heads and 6 tails, or 7 heads and 3 tails. However, if you continue to toss the coin, you can expect the long-run frequency of heads to get closer and closer to 50%.
Example: Die roll
Rolling a six-sided die is another classic example of a random experiment:
- There is more than one possible outcome.
- You can represent all possible outcomes in advance: 1, 2, 3, 4, 5, and 6.
- The outcome depends on chance. The roll could turn up any number 1–6.
Say you want to calculate the probability of rolling a 3. For any given die roll, the probability of rolling a 3 is one chance out of six. This is 1 ÷ 6 = 0.1666, or about 16.7%.
Probability notation
It helps to be familiar with probability notation as it’s often used to symbolize concepts in educational and technical contexts.
In notation, the letter P indicates the probability of an event. The letters A and B represent individual events.
For example, if you’re dealing with two events, you can label one event A and the other event B.
- The probability of event A is written as P(A).
- The probability of event B is written as P(B).
- For any event A, 0 ≤ P(A) ≤ 1. In other words, the probability of any event A is always between 0 and 1.
- If P(A) > P(B), then event A has a higher chance of occurring than event B.
- If P(A) = P(B), then event A and event B are equally likely to occur.
Key takeaways
Data professionals use probability to help stakeholders make informed decisions about uncertain events. Your knowledge of fundamental concepts of probability will be useful as a building block for more complex calculations of probability.
Resources for more information
To learn more about fundamental concepts of probability, refer to the following resources:
- These lecture notes from Richland Community College provide a useful summary of the fundamental concepts and basic rules of probability.
Mark as completedLikeDislikeReport an issue
Video: The basic rules of probability and events
Key Concepts
- Probability Notation: Using P(A) to represent the probability of event A. The complement (event not occurring) is denoted as P(A’).
- Complement Rule: The probability of an event occurring and its complement must add up to 1. In other words, P(A) + P(A’) = 1.
- Mutually Exclusive Events: Events that cannot happen simultaneously (e.g., rolling a 2 and a 4 on a single die roll).
- Addition Rule: For mutually exclusive events, the probability of either A or B happening is the sum of their individual probabilities: P(A or B) = P(A) + P(B).
- Independent Events: Events where the outcome of one doesn’t affect the other (e.g., a coin toss and the weather).
- Multiplication Rule: For independent events, the probability of both A and B happening is the product of their individual probabilities: P(A and B) = P(A) * P(B).
Important Distinctions
- Addition Rule: Use for mutually exclusive events (can’t happen at the same time).
- Multiplication Rule: Use for independent events (one doesn’t influence the other).
Examples
- Complement Rule: 30% chance of rain means a 70% chance it won’t rain.
- Addition Rule: Probability of rolling a 2 or 4 on a die: 1/6 + 1/6 = 1/3.
- Multiplication Rule: Probability of tails, then heads on two coin flips: 1/2 * 1/2 = 1/4.
Fill in the blank: The addition rule states that, if the events A and B are ____, then the probability of A or B happening is the sum of the probabilities of A and B.
mutually exclusive
The addition rule states that, if the events A and B are mutually exclusive, then the probability of A or B happening is the sum of the probabilities of A and B. Two events are mutually exclusive if they cannot occur at the same time.
So far, we’ve been focusing on calculating the probability
of single events. Many situations, both in everyday life and
in data analytics, involve more than one event. As a future data professional, you’ll often deal with
probability for multiple events. In this video, we’ll cover three basic rules
of probability: the complement rule,
the addition rule, and the multiplication rule. These rules help you
better understand the probability of
multiple events. We’ll also discuss two
different types of events: mutually exclusive events
and independent events. Then you’ll learn
how to calculate probability for each of them. First, let’s discuss
probability notation, which is the standard way to symbolize probability concepts. As we go along, I’ll share some useful notations that will help us communicate
more efficiently when it comes to
basic probability. The letter P indicates the
probability of an event. For example, if you’re
dealing with two events, you can label one event A, and the other event B. The notation for the
probability of event A is the letter P followed by the
letter A in parenthesis. For the probability of event B, it’s the letter P, followed by the letter B in parenthesis. If you want to talk
about the probability of event A not occurring, add an apostrophe
after the letter A. You can also say this is
the probability of not A. Now, let’s check out
our first basic rule, the complement rule. In stats, the complement of an event is the
event not occurring. For example, either it rains or it does not rain. Either you win the lottery or you don’t win the lottery. The complement of
rain is no rain. The complement of
winning is not winning. The important thing to note is that the two probabilities, the probability of an
event happening and the probability of
it not happening, must add to one. Recall that a probability of
one is the same as saying there’s 100 percent certainty
of an event occurring. Another way to think
about it is that there is a 100 percent chance of one event or the other
event happening. There may be a 30 percent
chance of rain tomorrow, but there is a 100 percent
chance that it will either rain or not
rain tomorrow. The complement rule says that the probability
that event A does not occur is 1 minus the
probability of event A. For example, if the
weather forecast says there’s a 30 percent
chance of rain tomorrow, there’s a probability of 0.3. You can use the
complement rule to calculate the probability that
it does not rain tomorrow. The probability
of no rain equals 1 minus the probability of rain. This is 1 minus 0.3
equals 0.7 or 70 percent. But the complement rule
and our next rule, the addition rule, applied two events that are
mutually exclusive. Two events are
mutually exclusive if they cannot occur
at the same time. For example, you can’t visit both Argentina and
China at the same time, or turn left and right
at the same time. The addition rule says that if the events A and B are
mutually exclusive, then the probability
of A or B happening is the sum of the
probabilities of A and B. Let’s check out an example
using a six-sided die. Say you want to find out
the probability of rolling either a two or a four on
a single roll of the die. These two events are
mutually exclusive. You can roll a two or a four, but not both at the same time. The addition rule
says that to find the probability of
either event happening, you should sum up
their probabilities. The odds of rolling any single
number on a six-sided die are 1/6. The probability of
rolling a two is 1/6, and the probability of
rolling a four is 1/6. 1/6 plus 1/6 equals 1/3. The probability of rolling
either a two or a four is 1/3 or 33 percent. The addition rule applies to
mutually exclusive events. If you want to calculate probability for
independent events, you can use the
multiplication rule. Two events are independent
if the occurrence of one event does not change the probability of
the other event. This means that
one event does not affect the outcome
of the other event. For example, checking
out a book from your local library does not
affect tomorrow’s weather. Drinking coffee in
the morning does not affect the delivery of your
mail in the afternoon. These events are separate
and independent. The multiplication
rule says that if the events A and B
are independent, then the probability of
both A and B happening is the probability of A multiplied
by the probability of B. For instance, imagine
two consecutive tosses. Say you want to know the
probability of tails on the first toss and heads
on the second toss. First, figure out what
events you’re dealing with and then apply
the appropriate rule. Two coin tosses are
independent events. The first toss does not affect the outcome
of the second toss. For any toss, the probability
of getting either heads or tails always remains
1/2 or 50 percent. You would use the multiplication
rule for this event. The probability of
getting tails and heads is the probability
of getting tails, multiply it by the
probability of getting heads. The probability of each
event is 0.5 or 50 percent. Now, plug in the
numbers, 0.5 times 0.5 equals 0.25 or 25 percent. The probability of getting
tails on the first toss and heads on the second
toss is 25 percent. To recap, let’s compare
the addition and multiplication rules and
list their differences. It will be helpful to keep
these differences in mind, so you know when to
use the two rules. The addition rule sums up
the probabilities of events, and the multiplication rule
multiplies the probabilities. The addition rule
applies to events that are mutually exclusive. The multiplication
rule applies to events that are independent. The basic rules of
probability help you describe events that are mutually
exclusive or independent. In an upcoming video, we’ll check out
conditional probability, which applies to
dependent events.
Reading: The probability of multiple events
Reading
So far, you’ve been learning about calculating the probability of single events. Many situations, both in daily life and in data work, involve more than one event. As a future data professional, you’ll often deal with probability for multiple events.
In this reading, you’ll learn more about multiple events. You’ll learn three basic rules of probability: the complement rule, the addition rule, and the multiplication rule. These rules help you better understand the probability of multiple events. First, we’ll discuss two different types of events that these rules apply to: mutually exclusive and independent. Then, you’ll learn how to calculate probability for both types of events.
Two types of events
The three basic rules of probability apply to different types of events. Both the complement rule and the addition rule apply to events that are mutually exclusive. The multiplication rule applies to independent events.
Mutually exclusive events
Two events are mutually exclusive if they cannot occur at the same time.
For example, you can’t be on the Earth and on the moon at the same time, or be sitting down and standing up at the same time.
Or, take two classic examples of probability theory. If you toss a coin, you cannot get heads and tails at the same time. If you roll a die, you cannot get a 2 and a 4 at the same time.
Independent events
Two events are independent if the occurrence of one event does not change the probability of the other event. This means that one event does not affect the outcome of the other event.
For example, watching a movie in the morning does not affect the weather in the afternoon. Listening to music on the radio does not affect the delivery of your new refrigerator. These events are separate and independent.
Or, take two consecutive coin tosses or two consecutive die rolls. Getting heads on the first toss does not affect the outcome of the second toss. For any given coin toss, the probability of any outcome is always 1 out of 2, or 50%. Getting a 2 on the first roll does not affect the outcome of the second roll. For any given die roll, the probability of any outcome is always 1 out of 6, or 16.7%.
Three basic rules
Now that you know more about the difference between mutually exclusive and independent events, let’s review three basic rules of probability:
- Complement rule
- Addition rule
- Multiplication rule
Complement rule
The complement rule deals with mutually exclusive events. In statistics, the complement of an event is the event not occurring. For example, either it snows or it does not snow. Either your soccer team wins the championship or it does not win the championship. The complement of snow is no snow. The complement of winning is not winning.
The probability of an event occurring and the probability of it not occurring must add up to 1. Recall that a probability of 1 is the same as a 100%.
Another way to think about it is that there is a 100% chance of one event or the other event occurring. There may be a 40% chance of snow tomorrow. However, there is a 100% chance that it will either snow or not snow tomorrow.
The complement rule states that the probability that event A does not occur is 1 minus the probability of A. In probability notation, you can write this as:
Complement rule
P(A’) = 1 – P(A)
Note: In probability notation, an apostrophe (‘) symbolizes negation. In other words, if you want to indicate the probability of event A NOT occurring, add an apostrophe after the letter A: P(A’). You can say this as “the probability of not A.”
So, if you know there is a 40% chance of snow tomorrow, or a probability of 0.4, you can use the complement rule to calculate the probability that it does not snow tomorrow. The probability of no snow equals one minus the probability of snow.
P(no snow) = 1 – P(snow) = 1 – 0.4 = 0.6.
So, the probability of no snow tomorrow is 0.6, or 60%.
Addition rule (for mutually exclusive events)
The addition rule states that if events A and B are mutually exclusive, then the probability of A or B occurring is the sum of the probabilities of A and B. In probability notation, you can write this as:
P(A or B) = P(A) + P(B)
Note that there is also an addition rule for mutually inclusive events. In this course, we focus on the rule for mutually exclusive events.
Let’s explore our example of rolling a die.
Die roll (rolling either a 2 or a 4)
Say you want to find the probability of rolling either a 2 or a 4 on a single roll. These two events are mutually exclusive. You can roll a 2 or a 4, but not both at the same time.
The addition rule says that to find the probability of either event occurring, you sum up their probabilities. The odds of rolling any single number on a die are 1 out of 6, or 16.7%.
P(rolling 2 or rolling 4) = P(rolling 2) + P(rolling 4) = ⅙ + ⅙ = ⅓
So, the probability of rolling either a 2 or a 4 is one out of three, or 33%.
Multiplication rule (for independent events)
The multiplication rule states that if events A and B are independent, then the probability of both A and B occurring is the probability of A multiplied by the probability of B. In probability notation, you can write this as:
P(A and B) = P(A)×P(B)
Note that there is also a multiplication rule for dependent events. In this course, we focus on the rule for independent events.
Let’s continue with our example of rolling a die.
Die roll (rolling a 1 and then rolling a 6)
Now imagine two consecutive die rolls. Say you want to know the probability of rolling a 1 and then rolling a 6. These are independent events as the first roll does not affect the outcome of the second roll.
The probability of rolling a 1 and then a 6 is the probability of rolling a 1 multiplied by the probability of rolling a 6. The probability of each event is ⅙, or 16.7%. You can write this as:
P(rolling 1 on the first roll and rolling 6 on the second roll) = P(rolling 1 on the first roll)×P(rolling 6 on the second roll) = ⅙×⅙ = 1/36
So, the probability of rolling a 1 and then a 6 is one out of thirty-six, or about 2.8%.
Key takeaways
The basic rules of probability help you describe events that are mutually exclusive or independent. Understanding basic rules of probability is an essential foundation for more complex analyses you will perform as a future data professional.
Resources for more information
To learn more about probability, refer to the following interactive guide: Seeing Theory
Practice Quiz: Test your knowledge: Basic concepts of probability
Objective probability is based on personal feeling, experience, or judgment.
False
Subjective probability is based on personal feeling, experience, or judgment. Objective probability is based on statistics, experiments, and mathematical measurements.
Fill in the blank: In statistics, a number between _____ is used to express the probability that an event will occur.
0 and 1
The probability that an event will occur is expressed as a number between 0 and 1. If the probability of an event equals 0, there is a 0% chance that the event will occur. If the probability of an event equals 1, there is a 100% chance that the event will occur.
The probability of no snow tomorrow equals 1 minus the probability of snow tomorrow. This is an example of what rule of probability?
Complement rule
This is an example of the complement rule, which states that the probability that event A does not occur is 1 minus the probability of A. In statistics, the complement of an event is the event not occurring.
Conditional probability
Video: Conditional probability
Conditional Probability
- Definition: The probability of one event happening given that another event has already occurred.
- Dependence: This applies when the first event influences the probability of the second event.
- Applications: Used in finance, insurance, science, machine learning, and everyday decision-making.
Examples of Dependence
- Needing internet to visit a website
- Needing a passport for international travel
- Drawing two aces in a row from a deck of cards (the first draw affects the second draw’s probability).
Conditional Probability Formula
- P(A and B) = P(A) * P(B given A)
- Where P(B given A) means the probability of event B happening if event A has already occurred.
Examples:
- Cards: Probability of drawing two aces in a row is very low (about 0.5%).
- College: Probability of being admitted and getting a scholarship is even lower (about 0.2%) due to the dependence of the events.
Key Takeaway: Conditional probability helps us understand relationships between dependent events, allowing for more accurate predictions.
Fill in the blank: Two events are _____ if the occurrence of one event changes the probability of the other event.
dependent
Two events are dependent if the occurrence of one event changes the probability of the other event.
So far, you’ve learned how
to calculate probability for a single event and for two
or more independent events. Remember, two events are independent if one
event does not affect the outcome
of the other event, like two coin flips. In this video, you’ll
learn how to calculate probability for two or
more dependent events. This type of probability is known as conditional
probability. Conditional
probability refers to the probability of an event occurring given that another
event has already occurred. Conditional probability is
used in many different fields, like finance, insurance,
science, and machine learning. For example, an agency that sells life
insurance might use conditional probability
to decide how risky it is to insure someone who
skydives for a living. Data professionals,
like those who work on machine
learning models use conditional probability to make accurate predictions
about complex data sets. Before we get into calculating
conditional probability, let’s go over the
concept of dependence. Two events are dependent
if the occurrence of one event changes the
probability of the other event. This means that the
first event affects the outcome of the second event. For instance, if you
want to visit a website, you first need Internet access. Visiting a website depends on you having access
to the Internet. If you want to travel
to another country, you first need to
get a passport. Traveling to another country depends on you
having a passport. In each instance, we can
say that the second event is dependent on or conditional
on the first event. Let’s check out an
example of dependence that’s closer to
probability theory. Imagine you have two events. The first event is drawing an ace from a standard
deck of playing cards, and the second event is drawing another ace from the same deck. There are four aces in
a deck of 52 cards. For the first draw, the chance of getting
an ace is four out of 52 or 7.8 percent. But for the second draw, the probability
of getting an ace changes because you’ve
removed a card from the deck. Now, there are three aces
in a deck of 51 cards. For the second draw, the chance of getting an ace is three out of 51 or about 5.8 percent. Getting an ace is
now less likely. These two events are dependent
because getting an ace on the first draw changes the probability of getting
an ace on the second draw. Now you have a better
understanding of dependent events. Let’s return to conditional probability and check
out the formula. You don’t need to memorize
the formula, but personally, I find that reviewing
the formula often helps me understand
the concept better. That’s why I’m
sharing it with you. The formula says that for
two dependent events, the probability of event A
and event B occurring equals the probability of event A times the probability of
event B given A. You may notice that we have a new notation in this formula, the vertical bar between
the letters B and A means that event B depends
on event A happening. We say this as the
probability of B given A. The formula can also
be expressed as the probability of
event B given event A equals the probability
that both A and B occur divided by the
probability of A. These are just two ways of representing the same equation. Depending on the situation or what information you
are given upfront, it may be easier to
use one or the other. We can apply the conditional
probability formula to our example of drawing
an ace playing card. The probability of
event A or getting an ace on the first
draw is four out of 52. The probability of
event B given event A, or of getting an ace
on the second draw, is three out of 51. Let’s enter these numbers
into the formula. The probability of
event A and event B, or of getting two aces in a row, is 4 over 52 multiplied
by three over 51. If you do the math, this
equals one over 221. The probability of
getting two aces in a row equals one over 221, or about 0.5 percent. Let’s check out another example. Imagine you’re
applying for college. The college accepts 10 out of every 100 applicants
or 10 percent. If you’re accepted, you also hope to receive an
academic scholarship. The college awards academic
scholarships to two out of every 100 accepted
students or two percent. You want to calculate the
probability that you get accepted and you
get a scholarship. Getting a scholarship depends
on first getting accepted. So this is a
conditional probability because it deals with
two dependent events. Let’s call getting accepted, event A and getting a
scholarship, event B. You want to calculate
the probability of event A and event B. According to the formula, to find the probability
of event A and event B, you can multiply the
probability of event A by the probability of
event B given event A. The probability of event A, getting accepted
is 10 out of 100. The probability of
event B given event A, or getting a scholarship given that you are
first accepted, is two out of 100. Ten divide by 100
times 2 divided by 100 equals 1 divided by 500. The probability of getting accepted and getting
a scholarship is one out of 500 or 0.2 percent. Conditional probability
helps you better understand the relationship
between dependent events. As a data professional, I often use conditional
probability to predict how an event, like an ad campaign, will
impact sales revenue. Then I share my findings
with stakeholders so they can make more
informed business decisions.
Reading: Calculate conditional probability for dependent events
Reading
Recently, you learned that conditional probability refers to the probability of an event occurring given that another event has already occurred. Conditional probability allows you to describe the relationship between dependent events, or how the occurrence of the first event affects the likelihood of the second event.
In this reading, you’ll learn how to calculate conditional probability for two or more dependent events. Before we discuss calculating conditional probability, we’ll go over the concept of dependence.
Conditional probability
Previously, you calculated probability for a single event, and for two or more independent events, such as two consecutive coin flips. Conditional probability applies to two or more dependent events.
Dependent events
Earlier, you learned two events are independent if the first event does not affect the outcome of the second event, or change its probability. For example, two consecutive coin tosses are independent events. Getting heads on the first toss doesn’t affect the outcome of the second toss.
In contrast, two events are dependent if the occurrence of one event changes the probability of the other event. This means that the first event affects the outcome of the second event.
For instance, if you want to get a good grade on an exam, you first need to study the course material. Getting a good grade depends on studying. If you want to eat at a popular restaurant without waiting for a table, you have to arrive early. Avoiding a wait depends on arriving early. In each instance, you can say that the second event is dependent on, or conditional on, the first event.
Now that you have a better understanding of dependent events, let’s return to conditional probability and review the formula.
Formula for conditional probability
The formula says that for two dependent events A and B, the probability of event A and event B occurring equals the probability of event A occurring, multiplied by the probability of event B occurring, given event A has already occurred.
Conditional probability
P(A and B) = P(A) * P(B|A)
In probability notation, the vertical bar between the letters B and A indicates dependence, or that the occurrence of event B depends on the occurrence of event A. You can say this as “the probability of B given A.”
The formula can also be expressed as the probability of event B given event A equals the probability that both A and B occur divided by the probability of A.
Conditional probability
P(B|A) = P(A and B) / P(A)
These are just two ways of representing the same equation. Depending on the situation, or what information you are given up front, it may be easier to use one or the other.
Note: The conditional probability formula also applies to independent events. When A and B are independent events, P(B|A) = P(B). So, the formula becomes P(A and B) = P(A) * P(B). This formula is also the multiplication rule that you learned about earlier in the course.
Example: playing cards
Let’s explore an example of conditional probability that deals with a standard deck of 52 playing cards.
Imagine two events:
- The first event is drawing a heart from the deck of cards.
- The second event is drawing another heart from the same deck.
Say you want to find out the probability of drawing two hearts in a row. These two events are dependent because getting a heart on the first draw changes the probability of getting a heart on the second draw.
A standard deck includes four different suits: hearts, diamonds, spades, and clubs. Each suit has 13 cards. For the first draw, the chance of getting a heart is 13 out of 52, or 25%. For the second draw, the probability of getting a heart changes because you’ve already picked a heart on the first draw. Now, there are 12 hearts in a deck of 51 cards. For the second draw, the chance of getting a heart is 12 out of 51, or about 23.5%. Getting a heart is now less likely—the probability has gone from 25% to 23.5%.
Now, let’s apply the conditional probability formula:
P(A and B) = P(A) * P(B|A)
You want to calculate the probability of both event A and event B occurring. Let’s call event A 1st heart, which refers to getting a heart on the first draw. Let’s call event B 2nd heart, which refers to getting a heart on the second draw, given a heart was drawn the first time. The probability of event A is 13/52, or 25%. The probability of event B is 12/51, or 23.5%.
Let’s enter these numbers into the formula:
P(1st heart and 2nd heart) = P(1st heart) * P(2nd heart | 1st heart) = 13/52 * 12/51 = 1/17 = 0.0588, or about 5.9%
So, there is a 5.9% chance of drawing two hearts in a row from a standard deck of playing cards.
Example: online purchases
Let’s explore another example. Imagine you are a data professional working for an online retail store. You have data that tells you 20% of the customers who visit the store’s website make a purchase of $100 or more. If a customer spends $100, they are eligible to receive a free gift card. The store randomly awards gift cards to 10% of the customers who spend at least $100.
You want to calculate the probability that a customer spends $100 and receives a gift card. Receiving a gift card depends on first spending $100. So, this is a conditional probability because it deals with two dependent events.
Let’s apply the conditional probability formula:
P(A and B) = P(A) * P(B|A)
You want to calculate the probability of both event A and event B occurring. Let’s call event A $100 and event B gift card. The probability of event A is 0.2, or 20%. The probability of event B is 0.1, or 10%.
P($100 and gift card) = P($100) * P(gift card given $100) = 0.2 * 0.1 = 0.02, or 2%
So, the probability of a customer spending $100 or more and receiving a free gift card is 0.2 * 0.1 = 0.02, or 2%.
Key takeaways
Conditional probability helps you describe the relationship between dependent events. Data professionals often use conditional probability in a business context. For example, they might use conditional probability to predict how an event like a new ad campaign will impact sales revenue. This helps stakeholders make intelligent decisions about the best way to invest their company’s resources.
Resources for more information
To learn more about conditional probability, refer to the following resource:
Video: Discover Bayes’ theorem
Conditional Probability Recap
- Conditional probability is the probability of one event happening given another event has already happened (like the chance of getting a second ace after drawing one).
What is Bayes’ Theorem?
- A formula used to calculate conditional probability.
- Allows us to update our estimate of a probability based on new information.
Key Terms
- Prior Probability: The base probability of an event occurring before any new data.
- Posterior Probability: The revised probability of an event after considering new information.
Bayes’ Theorem Applications
- Widely Used: Finance, marketing, medical testing, artificial intelligence – all use Bayesian approaches.
- Example: A medical test’s accuracy can be refined using Bayes’ Theorem based on the patient’s age or other factors.
Example: Planning an Outdoor Event
- Prior Probability: Overall, there’s a 10% chance of rain.
- New Information: The morning is cloudy.
- Bayes’ Theorem: Used to update the chance of rain based on cloud data.
- Posterior Probability: Calculation yields a 12.5% chance of rain (it increased slightly)
Key Takeaway
Bayes’ Theorem is a powerful tool to revise our understanding of probabilities as new information becomes available.
What does Bayes’s theorem enable data professionals to calculate?
Posterior probability
Bayes’s theorem enables data professionals to calculate posterior probability, or the updated probability of an event based on new data.
Earlier, you learned that conditional probability
refers to the probability of an event occurring given that another event
has already occurred. For example, when you draw an ace from a deck
of playing cards, this changes the probability of drawing a second ace
from the same deck. In this video,
you’ll learn how to calculate conditional probability
using Bayes’ theorem. Bayes’ theorem, also
known as Bayes’ rule, is a math formula for determining conditional
probability. It’s named after Thomas Bayes, an 18th century mathematician
from London, England. Bayes’ theorem provides a way to update the probability of an event based on new
information about the event. In Bayesian statistics,
prior probability refers to the probability of an event before new data is collected. Posterior probability is
the updated probability of an event based on new data. Posterior means occurring after. Posterior probability
is calculated by updating the prior probability
using Bayes’ theorem. For example, let’s say a medical condition
is related to age. You can use Bayes’ theorem
to more accurately determine the probability
that a person has the condition based on age. The prior probability would be the probability of a person
having the condition. The posterior or
updated probability would be the probability of a person having the condition if they’re in a
certain age group. Bayes’ theorem is the foundation for the field of
Bayesian statistics, also known as
Bayesian inference, which is a powerful
method for analyzing and interpreting data in
modern data analytics. Data professionals
applied Bayes’ theorem in a wide variety of fields from artificial
intelligence to medical testing. For instance,
financial institutions use Bayesian analysis to rate the risk of lending
money to borrowers or to predict the success
of an investment. Online retailers use Bayesian
algorithms to predict whether or not users will like certain products and services. Marketers rely on Bayes’
theorem for identifying positive or negative responses
in customer feedback. Let’s check out the
theorem itself. As always, don’t worry
about memorizing it. Bayes’ theorem is a bit complicated and this is
the basic version of it. Bayes’ theorem says that
for any two events A and B, the probability of A given B equals the probability
of A multiplied by the probability of B given A divided by the
probability of B. In math terms, prior probability is the probability of event A. Posterior probability or what you’re ultimately
trying to figure out is the probability of
event A given event B. The key for Bayes’
theorem is that it includes both the
conditional probability of B given A and the conditional
probability of A given B. If you know one of
these probabilities, Bayes’ theorem can help
you determine the other. Let’s check out an example. Say you’re planning
a big outdoor event like a graduation party. The success of the event
depends on good weather. On the day of the event, you notice that the
morning is cloudy. You want to find out
the chance of rain, given that this day
starts off cloudy. If there’s a high
probability of rain, you may decide to move the event indoors or even cancel it. You know the following
information: at this time of year, the overall chance of
rain is 10 percent. However, cloudy
mornings are common. About 40 percent of all
days start off cloudy and 50 percent of all rainy
days start off cloudy. In this example, your
prior probability is the overall probability
of a rainy day. New data will update
this probability, in this case, the knowledge that the morning is cloudy and
that rain may be coming. What you ultimately
want to find out is the probability that it will
rain given that it’s cloudy. This is your posterior
probability. You can use Bayes’ theorem to update the prior probability that it rains based on the new data that the
morning is cloudy. When you work with
Bayes’ theorem, it’s helpful to first figure out what event A is and
what event B is. This makes it easier
to understand the relationship between
events and use the formula. Let’s use the word rain
to refer to event A, the probability of rain. This is your prior probability. Event B is the probability
that the day will be cloudy. Let’s use the word cloudy
to refer to event B. Now, you can rewrite the
probability of event B given event A as the
probability that it’s cloudy, given that it rains. Finally, the
probability of event A, given event B is the
probability that it rains, given that it’s cloudy. This is your posterior
probability or the updated probability that Bayes’ theorem will
help you calculate. Finally, enter what you
know into the formula. The probability of
rain is 10 percent, the probability that it’s
cloudy is 40 percent, the probability that it’s cloudy given that it rains
is 50 percent. The probability of rain given
that it’s cloudy equals 0.1 times 0.5, divided by 0.4. This equals 0.125
or 12.5 percent. There’s a 12.5 percent
chance of rain today. This is your posterior
probability or the updated probability based on the data that the
morning is cloudy. The odds are still
in your favor. You decide to proceed with your outdoor party.
Hope it’s a fun one.
Video: The expanded version of Bayes’s theorem
When to Use the Expanded Bayes’ Theorem
- Situations with Unknowns: The basic Bayes’ Theorem requires you to know the probability of event B. The expanded version is useful when you don’t have this information.
- Evaluating Tests: A common use case is determining the accuracy of tests (medical tests, spam filters, quality control checks) where false positives and false negatives are a concern.
Key Terms
- False Positive: A test wrongly indicates something is present when it’s not. (e.g., a spam filter misclassifying an email)
- False Negative: A test wrongly indicates something is absent when it’s present. (e.g., missing a defective part)
Example: Peanut Allergy Test
- Prior Probability: 1% of the population has the allergy.
- Test Accuracy:
- 95% chance of true positive (test positive if the allergy is present)
- 2% chance of false positive (test positive even if the allergy is absent)
- Posterior Probability (what we want): Given a positive test result, what’s the chance the allergy is actually present?
Using the Expanded Bayes’ Theorem By plugging in the known probabilities, we calculate that if the test is positive, there’s only a 32.4% chance the person actually has the allergy. This highlights the impact of false positives when the overall condition is rare.
Key Takeaway: The expanded Bayes’ Theorem helps us understand test accuracy more deeply when dealing with uncertainties and potential errors.
You’ve already
learned that Bayes’ theorem tells you how to update the probability of an event based on new data
about the event. But there are several
different versions of Bayes theorem. They’re written in
different ways and used for different
types of problems. In this video, you’ll learn
about an expanded version of Bayes theorem and
how to use it to predict the accuracy of a test. The expanded version of
Bayes theorem is long. If you’re not an
experienced statistician, it may seem quite intimidating. You don’t need to worry about
memorizing this formula. What’s important
to know is that, the expanded version
works better than the basic version in
certain situations. The theorem goes like this. The probability of event A given event B equals the
probability of B given A, multiplied by the
probability of A divided by the
probability of B given A, times the probability of A plus the probability
of B given not A, multiplied by the
probability of not A. Well, that was a lot. You can use the two versions of Bayes’ theorem to deal with
different types of problems. Sometimes for instance, you don’t know the
probability of event B, which is in the denominator of the equation for the
basic Bayes’ theorem. In that case, you can use the expanded version
of Bayes’ theorem, because you don’t need to
know the probability of event B to use the
expanded version. This longer version of Bayes’
theorem is often used to evaluate tests such as
medical diagnostic tests, quality control tests or software test such
as spam filters. When evaluating the
accuracy of a test, Bayes’ theorem can take into account the probability
for testing errors known as false
positives and false negatives. A false positive is a test
result that indicates something is present
when it really is not. For example, a spam filter may incorrectly identify a
legitimate email as spam. False positives, often
referred to medical testing, but they also apply to other
areas like software testing. For instance, antivirus software may indicate that a
computer file is a virus, even though the file is normal. A false negative is
a test result that indicates something is not
present when it really is. For example, a spam filter may incorrectly identify a spam
email as a legitimate. False negatives also apply
to all kinds of tests. In manufacturing for instance, a quality control
test may incorrectly identify a defective part
as an acceptable part. Next, let’s explore
detailed example of how to use the longer Bayes
theorem to evaluate a test. Let’s say you want to
evaluate the accuracy of a diagnostic test that checks for the presence of
a peanut allergy. Suppose that one percent of the population is
allergic to peanuts. Based on historical data, if a person has the allergy, there is a 95 percent chance
that the test is positive. If a person doesn’t
have the allergy, there is still a
two percent chance that the test is positive. This is a false positive, because it’s a
positive result for a person who does not
actually have the allergy. You want to know given that
a person tests positive, what are the chances that they
actually have the allergy? You can also think of
the situation in terms of prior and posterior
probability, which you learned about
earlier in connection with a basic version
of Bayes theorem. You start off with
the prior probability that a person has the
allergy, this is one percent. Then you’ll update
this prior probability with new data based on testing; the probability of getting true positive and false
positive test results. Finally, you want to figure out the posterior probability that the allergy is present given
that the test is positive. There are two main events
in this situation. First, actually
having the allergy. Second, testing positive. Let’s call having the allergy, Event A and testing
positive, Event B. Remember, these two
events are different, because you can test positive
and not have the allergy, which is a false positive. Now, let’s review what you know. First, there is the
probability that a person actually
has the allergy, which is one percent. The probability of Event
A equals one percent. Next, there is a 95
percent chance that a test is positive if the
person has the allergy. This is a conditional
probability for two dependent events. The probability of
a positive test given that the
allergy is present, so the probability of Event B given Event A equals 95 percent. Then there is the
false positive result. The two percent chance
that the test is positive given that the
allergy is not present. This is another
conditional probability; the probability of Event B
given not A equals 2 percent. Finally, if you use
the compliment rule, you can also figure
out one more probability. The probability of not
having the allergy. The complement rule says that the probability that
Event A does not occur, is 1 minus the
probability of Event A. If the probability of Event A, actually having the allergy
is one percent or 0.01, then the probability
of not having the allergy is 1 minus 0.01. This equals 0.99 or 99 percent, so the probability of
not A equals 99 percent. These are the
probabilities you know. What you don’t know is the
probability of Event B, the probability that a person gets a positive test result. This is where you’d have trouble using the basic version
of Bayes theorem, because the probability of Event B is part of the formula. Instead, you can use the
expanded version since you don’t need to know the probability of Event
B for that formula. Now you can enter what you
know into the formula. The probability of A is
one percent or 0.01. The probability of not A
equals 99 percent or 0.99. The probability of B given A
equals 95 percent or 0.95. The probability of B given not a equals two
percent or 0.02. If you do the math, the result
is 0.324 or 32.4 percent. The probability of
Event A given Event B, or the probability that the
allergy is present given that the test is positive
is 32.4 percent. If 32.4 percent
seems low to you, it’s because the allergy
is rare to begin with. It’s not very likely that
a random person will both test positive and
have the allergy. The expanded version
of Bayes theorem gives you a better understanding
of the accuracy of the test by taking into account multiple probabilities.
That’s all for now.
Reading: Calculate conditional probability with Bayes’s theorem
Reading
Recently, you learned that Bayes’s theorem is a math formula for determining conditional probability. The theorem is named after Thomas Bayes, an 18th-century mathematician from London, England. Recall that conditional probability refers to the probability of an event occurring given that another event has already occurred. For example, when you draw an ace from a deck of playing cards, this changes the probability of drawing a second ace from the same deck.
In this reading, you’ll learn more about the different parts of Bayes’s theorem, and how you can use the theorem to calculate conditional probability.
Bayes’s theorem
Bayes’s theorem provides a way to update the probability of an event based on new information about the event.
Posterior and prior probability
In Bayesian statistics, prior probability refers to the probability of an event before new data is collected. Posterior probability is the updated probability of an event based on new data.
Bayes’s theorem lets you calculate posterior probability by updating the prior probability based on your data.
For example, let’s say a medical condition is related to age. You can use Bayes’s theorem to more accurately determine the probability that a person has the condition based on age. The prior probability would be the probability of a person having the condition. The posterior, or updated, probability would be the probability of a person having the condition if they are in a certain age group.
The theorem
Let’s examine the theorem itself.
Bayes’s theorem states that for any two events A and B, the probability of A given B equals the probability of A multiplied by the probability of B given A divided by the probability of B.
Bayes’s theorem
\(P(A∣B)= P(B∣A)∗P(A)\over P(B)\)
In the theorem, prior probability is the probability of event A. Posterior probability, or what you’re trying to calculate, is the probability of event A given event B.
- P(A): Prior probability
- P(A|B): Posterior probability
Sometimes, statisticians and data professionals use the term “likelihood” to refer to the probability of event B given event A, and the term “evidence” to refer to the probability of event B.
- P(B|A): Likelihood
- P(B): Evidence
Using these terms, you can restate Bayes’s theorem as:
- Posterior = Likelihood * Prior / Evidence
![](https://i0.wp.com/stackfolio.xyz/wp-content/uploads/2024/03/bayes-theorem.png?resize=1024%2C491&ssl=1)
It can be helpful to think about the calculation from these different perspectives and help to map your problem onto the equation.
One way to think about Bayes’s theorem is that it lets you transform a prior belief, P(A), into a posterior probability, P(A|B), using new data. The new data are the likelihood, P(B|A), and the evidence, P(B).
Note: This reading provides an introduction to the basic concepts and terms associated with Bayes’s theorem. A detailed examination of Bayesian statistics is beyond the scope of this course. As you progress in your career as a data professional, you’ll have the opportunity to further explore Bayes’s theorem and its various applications.
For now, a key point to remember is that Bayes’s theorem includes both the conditional probability of B given A and the conditional probability of A given B. If you know one of these probabilities, Bayes’s theorem can help you determine the other.
Let’s explore an example to get a better understanding of how the theorem works.
Example: spam filter
A well-known application of Bayes’s theorem in the digital world is spam filtering, or predicting whether an email is spam or not. In practice, a sophisticated spam filter deals with many different variables, including the content of the email, its title, whether it has an attachment, the domain type of the sender address (.edu or .org), and more. However, we can use a simplified version of a Bayesian spam filter for our example.
Let’s say you want to determine the probability that an email is spam given a specific word appears in the email. For this example, let’s use the word “money.”
You discover the following information:
- The probability of an email being spam is 20%.
- The probability that the word “money” appears in an email is 15%.
- The probability that the word “money” appears in a spam email is 40%.
In this example, your prior probability is the probability of an email being spam. Your posterior probability, or what you ultimately want to find out, is the probability that an email is spam given that it contains the word “money.” The new data you will use to update your prior probability is the probability that the word “money” appears in an email and the probability that the word “money” appears in a spam email.
When you work with Bayes’s theorem, it’s helpful to first figure out what event A is and what event B is—this makes it easier to understand the relationship between events and use the formula.
Let’s call event A a spam email and event B the appearance of the word “money” in an email. Now, you can re-write Bayes’s theorem using the word “spam” for event A and the word “money” for event B.
P(A|B) = P(B|A) * P(A) / P(B)
P (Spam | Money) = P(Money | Spam) * P(Spam) / P(Money)
You want to find out the following:
- P(Spam | Money), or posterior probability: the probability that an email is spam given that the word “money” appears in the email
Now, enter your data into the formula:
- P(Spam), or prior probability: the probability of an email being spam = 0.2, or 20%
- P(Money), or evidence: the probability that the word “money” appears in an email = 0.15, or 15%
- P(Money | Spam), or likelihood: the probability that the word “money” appears in an email given that the email is spam = 0.4, or 40%
P (Spam | Money) = P(Money | Spam) * P(Spam) / P(Money) = 0.4 * 0.2 / 0.15 = 0.53333, or about 53.3%.
So, the probability that an email is spam given that the email contains the word “money” is 53.3%.
Key takeaways
Bayes’s theorem is the foundation for the field of Bayesian statistics, also known as Bayesian inference, which is a powerful method for analyzing and interpreting data in modern data analytics. Data professionals use Bayes’s theorem in a wide variety of fields, from artificial intelligence to medical testing.
Having a basic understanding of Bayes’s theorem will enable you to learn more about Bayesian statistics as you advance in your career as a data professional.
Resources for more information
To learn more about Bayes’s Theorem, refer to the following resource:
For an interesting discussion of the “prosecutor’s fallacy,” check out this page:
Practice Quiz: Test your knowledge: Conditional probability
What is conditional probability?
The probability of an event occurring given that another event has already occurred
Conditional probability refers to the probability of an event occurring given that another event has already occurred.
Suppose two events occur: The first event is drawing an ace from a standard deck of playing cards, and the second event is drawing another ace from the same deck. Note that the first ace is not reinserted into the deck after it is drawn. What term is used to describe these two events?
Dependent
These two events are described as dependent because drawing the first ace changes the probability of drawing the second ace. Two events are dependent if the occurrence of one event changes the probability of the other.
Fill in the blank: _____ probability is the updated probability of an event based on new data.
Posterior
Posterior probability is the updated probability of an event based on new data. It is calculated by updating the prior probability using Bayes’ theorem.
Discrete probability distributions
Video: Introduction to probability distributions
What are Probability Distributions?
- They describe how likely different outcomes of a random event are.
- Used to model data and find patterns within it.
- Example: Probability of a drug curing a disease, or the results of dice rolls.
Random Variables
- Represent the possible outcomes of an event.
- Types:
- Discrete: Countable values, often whole numbers (e.g., number of times a coin lands heads-up).
- Continuous: Measured values along a range (e.g., height, time, temperature) – infinite decimal possibilities.
Distributions for Each Type of Variable
- Discrete Probability Distributions:
- Describe probabilities for each specific outcome (e.g., the exact probability of rolling a 3 on a die is 1/6).
- Can be shown as tables or bar graphs (histograms).
- Continuous Probability Distributions:
- Describe probability of an outcome falling within a range of values (e.g., probability of a tree being between 15-16 ft).
- Can’t get the exact probability of a single value (it’s essentially zero).
- Visualized as curves, the most common being the bell curve (normal distribution).
Key Points
- To determine if a variable is discrete or continuous: Can you count the outcomes or do you need to measure them?
- Sample Space: All possible outcomes of an event.
Fill in the blank: A _____ random variable has a countable number of possible values.
discrete
A discrete random variable has a countable number of possible values.
So far we’ve covered a lot of key
concepts in basic probability. What you’ve learned about basic
probability will help you better understand probability distributions or
main topic for this part of the course. In my job as a data professional,
I use probability distributions to model different kinds of data sets and to
identify significant patterns in my data. A probability distribution describes
the likelihood of the possible outcomes of a random event. Probability distributions
can represent the possible outcomes of simple random events. Like tossing a coin or rolling a die. They can also represent
more complex events. Like the probability of a new medication
successfully treating a medical condition. A random variable
represents the values for the possible outcomes of a random event. There are two types of random
variables: discrete and continuous. A discrete random variable has
a countable number of possible values. Often discrete variables are whole
numbers that can be counted. For example, if you roll a die five times you can count
the number of times the die lands on two. If you toss a coin five times you can
count the number of times it lands on heads. A continuous random variable takes all the
possible values in some range of numbers. When it comes to continuous variables, you’re dealing with decimal
values rather than whole numbers. For instance,
all the decimals values between one and two, such as 1.1, 1.12, 1.125 and so on. These values are not countable since there
is no limit to the possible number of decimal values between one and two. Typically these are decimal values
that can be measured such as height, weight, time or temperature. For example, if you measure
the height of a person or object, you can keep on making your
measurement more accurate. The height of a person could
be 70.2 inches, 70.23 inches, 70.237 inches, 70.2375 inches and so on. There is no limit to
the number of possible values. It’s not always immediately obvious if
a variable is discrete or continuous. To help choose between the two, you can
use the following general guidelines. If you can count the number of outcomes
you are working with a discrete random variable. For example, counting the number
of times a coin lands on heads. If you can measure the outcome, you are working with
the continuous random variable. For example, measuring the time it
takes for a person to run a marathon. Now that we’ve explored random variables, let’s return to the topic of probability
distributions which described the probability of each possible
value of the random variable. Discrete distributions represent
discrete random variables and continuous distributions represent
continuous random variables. Once you know the sample
space of a random variable, you can assign probabilities to
each of the possible values. In statistics you can use the term
sample space to describe the set of all possible values for
a random variable. For example, a single coin toss is a
random variable with two possible values. Heads and tails. So the sample space is heads and tails. If you roll a six sided die, you have
a random variable with six possible values or a sample space of one,
two, three, four, five and six. Let’s check out an example of
a discrete probability distribution. Take the familiar random
event of a single die roll. The sample space for a single die roll
is one, two, three, four, five and six. The probability of each
outcome is the same. One out of six or 16.7%. You can display a discreet probability
distribution as a table or a graph. The distribution table summarizes
the probability for each possible outcome. The top row list each
outcome of the die roll and the bottom list
the corresponding probability. The bar graph or histogram shows
the same probability distribution but in a different form. For a discrete probability the random
variable is plotted along the X axis and the corresponding probability
is plotted along the Y axis. In this case the X axis
represents each possible outcome of a single die roll one through six. The Y axis represents
the probability of each outcome. Continuous probability distributions and their graphs work a little differently
from discrete distributions. This is due to the difference between
discrete and continuous random variables. The probability distribution for
a discrete random variable can tell you the exact probability for
each possible value of the variable. For instance,
the probability of rolling a die and getting a three is one out of six or
about 16.7%. The probability distribution for
a continuous random variable can only tell you the probability that the variable
takes on a range of values. Let’s check out an example to learn more. A continuous random variable may have
an infinite number of possible values. Imagine you want to measure the height
of an Oak tree you picked at random from a nearby forest. In this example, the height of the tree
is a continuous random variable. The tree’s height could be say 15 ft or 15.2 ft or 15.2187 ft and so on. You can keep on adding another decimal
place to the measurement without limit. Now say you want to know the probability
that the height of the oak tree is exactly 15.2 ft. Because the height of the tree could
be any decimal value between the range of 15 ft and 16 ft. The probability that the tree is exactly
any single value is essentially zero. In this example you’ll need to use
a continuous probability distribution to tell you the probability that the height
of the oak tree is in a certain range or interval. Such as between 15 ft and 16 ft. The probability of any
specific value is zero, so it only makes sense to talk about
the probabilities of intervals. A convenient way to show
the probabilities of a range or interval of values is with a curve. On a graph,
continuous distributions appear as curves. You may have heard of the bell curve,
which refers to the graph for a continuous distribution
called the normal distribution. On the curve the X axis refers to the
value of the variable you’re measuring, in this case oak tree height. The Y axis refers to something
called probability density. This is a math function that deals
with the values of intervals. You don’t need to focus on
the math part right now, just know that probability density is
not the same thing as probability. There’s a lot more to learn about
probability distributions and how they can help you model
different kinds of data. These topics are complex, so feel free to revisit the video
to go over this part again.
Video: The binomial distribution
What is the Binomial Distribution?
- Models the probability of events with two outcomes: success or failure.
- Used in fields like medicine, finance, and machine learning.
Key Requirements for a Binomial Experiment
- Fixed Number of Trials: The experiment is repeated a set number of times (e.g., 10 coin flips).
- Two Outcomes: Each trial has only ‘success’ or ‘failure’ (the labels are up to you).
- Consistent Success Probability: The chance of ‘success’ is the same for every trial.
- Independence: One trial’s outcome doesn’t affect the others.
Examples
- Coin flips
- Percentage of customers making a return
- Machine learning image classification (cat or not cat)
Why it Matters
- If your data fits a binomial experiment, you can use the binomial distribution formula to calculate probabilities.
- Example: What’s the probability of getting 2 ‘heads’ out of 10 coin flips?
Key Takeaway: Understanding the binomial distribution helps you model and predict outcomes in a wide range of situations.
Fill in the blank: The binomial distribution models the probability of events with _____ possible outcomes.
two
The binomial distribution models the probability of events with two possible outcomes.
Recently, you learned about discrete probability
distributions, which represent
discrete random events like tossing a coin
or rolling a die. Often, the outcomes of
discrete events are expressed as whole numbers
that can be counted. For example, the number of times a coin lands on
heads in 10 tosses. In this video, you’ll
learn about one of the most widely used discrete
probability distributions, the binomial distribution. The binomial distribution is a discrete distribution
that models the probability of events with only
two possible outcomes, success or failure. This definition assumes
that each event is independent or does not affect the probability
of the others, and that the probability of success is the same
for each event. For example, the binomial
distribution applies to an event like tossing the
same coin 10 times in a row. Keep in mind that
success and failure are labels used for convenience. For example, each toss has only two possible
outcomes, heads or tails. You could choose to label
either heads or tails as a successful outcome based on the needs
of your analysis. Whatever label you
apply to the outcomes, it’s important to know that they must be mutually exclusive. As a quick refresher, two outcomes are mutually exclusive if they cannot
occur at the same time. You can’t get both heads and
tails in a single coin toss. It’s either one or the other. Data professionals use the
binomial distribution to model data in different
fields such as medicine, banking, investing,
and machine learning. For example, data professionals use binomial distribution to model the probability that a new medication
generates side effects, a credit card transaction
is fraudulent, or a stock price rises
or falls in value. In machine learning, the
binomial distribution is often used to classify data. For example, a data professional may
train an algorithm to recognize whether
a digital image of an animal is or is not a cat. The binomial distribution
represents a type of random event called a
binomial experiment. A binomial experiment is a
type of random experiment. You may recall that a
random experiment is a process whose outcome cannot be predicted
with certainty. All random experiments have
three things in common. The experiment can have more
than one possible outcome. You can represent each
possible outcome in advance, and the outcome of the
experiment depends on chance. On the other hand, a
binomial experiment has the following attributes. The experiment consists of a
number of repeated trials. Each trial has only
two possible outcomes. The probability of success
is the same for each trial, and each trial is independent. An example of a
binomial experiment is tossing a coin
10 times in a row. This is a binomial experiment because it has the
following features. The experiment consists of 10 repeated trials
or coin tosses. Each trial has only two possible outcomes, heads or tails. The probability of success
is the same for each trial. If you define success as heads, then the probability
of success for each toss is the
same, 50 percent. Each trial is independent. The outcome of one
coin toss does not affect the outcome of
any other coin toss. Let’s check out another example
of a binomial experiment. Suppose you want to
know how many customers return an item to a department
store on a given day. Say 100 customers visit
the store each day. Ten percent of all customers who visit the store make a return. You label a return as a success. This is a binomial
experiment because there are 100 repeated trials
or customer visits. Each trial only has
two possible outcomes, return or not return. If you label return a success, the probability of success for each customer visit is
the same, 10 percent. Each trial is independent. The outcome of one customer
visit does not affect the outcome of any
other customer visit. It’s important to understand the features of a binomial
experiment because the binomial distribution could only model data for
this type of event. If you’re working with data for a different type of event, you need to use a different type of probability distribution, like the Poisson
to model the data. Once you’ve determined that your distribution is binomial, you can apply the binomial
distribution formula to calculate the probability. No need to memorize it. You can use your computer
to make the calculations. If you want to learn more, feel free to check out the
relevant reading. In brief, the binomial distribution formula
helps you determine the probability of getting
a certain number of successful outcomes in a
certain number of trials. For example, getting
a certain number of heads in a certain
number of coin flips. In this formula, k refers
to the number of successes, n refers to the
number of trials, p refers to the probability
of success on a given trial, and n choose k refers
to the number of ways to obtain k
successes in n trials. Let’s explore our
departments, for example, to better understand
how the formula works. This time, suppose 10 percent of all customers who visit
the store make a return. Imagine that three
customers visit the store. You label a return as a success. You can use the formula to determine the probability
of getting 0, 1, 2, and 3 returns among
the three customers. In the calculation, X refers
to the number of returns. I’ll skip the calculations and go directly to the results. If you plug in for
the probability that X equals 0 returns, the result is 0.729. For the probability
that X equals 1 return, the result is 0.243. For the probability that
X equals 2 returns, the result is 0.027. For the probability that
X equals 3 returns, the result is 0.001. You can then use a histogram to visualize this
probability distribution. For a discrete
probability distribution, like the binomial distribution, the random variable is
plotted along the x-axis and the corresponding probability is plotted along the y-axis. In this case, the x-axis
shows the visits per hour: 0, 1, 2, and 3. The y-axis shows the probability
of getting each result. The binomial distribution lets you model the probability of events with only two
possible outcomes, success or failure. Identifying the distribution of your data is a key step in any analysis and helps you make informed predictions
about future outcomes.
Video: The Poisson distribution
The Poisson Distribution
- Models the probability of a specific number of events happening within a defined time period.
- Examples:
- Customer calls per hour at a call center
- Website visitors per hour
- Severe storms per month in a city
Prerequisites for a Poisson Experiment
- Events can be counted
- You know the average number of events in a set time period.
- Events are independent (one doesn’t affect another’s chance of happening).
Example: Fast Food Orders
- A restaurant averages 2 drive-through orders per minute.
- The Poisson distribution can calculate the probability of getting 0, 1, 2, 3… orders in a given minute.
- This helps with staff planning.
Formula: (It gets a little mathy, the key is understanding the concept!)
- Lamda = Average number of events in the time period
- k = Number of events you’re interested in
- e = Mathematical constant (~2.71828)
- ! = Factorial function
Poisson vs. Binomial
- Poisson: You know the average rate of events per time, and want to find the probability of a specific number of events within that time.
- Binomial: You know the exact probability of a single event and want to find the probability of it happening a certain number of times in repeated trials.
The Poisson distribution can model the probability that a certain number of events will occur during a specific time period.
True
The Poisson distribution can model the probability that a certain number of events will occur during a specific time period.
As a data professional, knowing about probability
distributions is super useful because different types of distributions help you model different
kinds of data. Every time I work
with a new dataset, I try to understand if there is a pattern present in
the distribution data. Knowing the probability
distribution of my data also helps me choose the machine learning model that works best. This way, I’m able to get a
better result in less time. Data professionals work with many different types of
probability distributions. As you advance in your career
and continue to learn, you can explore
different distributions and discover how they
apply to your work. In this part of the course, we’re focusing on two of the most common discrete
probability distributions, the Binomial and the Poisson. Earlier, you learned that the binomial distribution
represents experiments with repeated trials that each have two possible outcomes,
success or failure. In this video,
you’ll learn about the main features of the
Poisson distribution. The Poisson distribution is a probability
distribution that models the probability that
a certain number of events will occur during
a specific time period. The Poisson distribution
can also be used to represent the number
of events that occur in a specific space, such as a distance,
area, or volume, but in this course
we’ll focus on time. Baron Simeon Denis Poisson, French mathematician, originally derive the Poisson
distribution in 1830. He developed the distribution to describe the number of
times a gambler would win in difficult game of chance in a large
number of tries. Data professionals use the Poisson distribution
to model data, such as the expected number of calls per hour for a customer
service call center, visitors per hour for a website, customers per day
at a restaurant, and severe storms
per month in a city. The Poisson distribution
represents a type of random experiment called
a Poisson experiment. A Poisson experiment has
the following attributes; The number of events in the
experiment can be counted. The mean number of events
that occurred during a specific time period is known, and each event
is independent. Let’s explore an example. Imagine you’re a data
professional working for a large restaurant chain
that serves fast food. You know that the drive-through
service at a restaurant receives an average of
two orders per minute. You want to determine the
probability that a restaurant will receive a certain number of orders in a given minute. This is a Poisson experiment because the number of events in the experiment
can be counted. You can count the
number of orders. The mean number of events that occur during a specific
time period is known. There is an average of
two orders per minute, each outcome is independent. The probability of one person
placing an order does not affect the probability of another person placing an order. Once you know that
you’re working with the Poisson distribution, you can apply the Poisson
distribution formula to calculate the probability. In brief, the formula helps
you determine the probability that a certain number of events occurring during a
specific time period. In this formula, the Greek
letter Lamda refers to the mean number of events that occurred during a
specific time period. k refers to the
number of Events. e is a constant equal to
approximately 2.71828. The exclamation point
stands for factorial, a function that
multiplies a number by every whole number
below it down to one. For example, two factorial
is two multiplied by one. Let’s continue with our
restaurant chain example to better understand
how the formula works. Recall that the
drive-through service at a restaurant receives an average of two orders per minute. You can use the Poisson
formula to determine the probability of the
restaurant receiving 0, 1, 2 or 3 orders
in a given minute. Knowing this
information may help the restaurant organized
staffing for the drive-through. I’ll skip the calculations and go directly to the results. If you plug in for
the probability that X equals 0 orders, the result is 0.1353. For the probability that
X equals one order, the result is 0.2707. For the probability that
X equals two orders, again, the result is 0.2707. For the probability that
X equals three orders, the result is 0.1805. You can then use a histogram to visualize the probability
distribution. The x-axis shows the
number of events, in this case, orders per minute. The y-axis shows the
probability of occurrence. For example, the probability of getting 0 orders
in a minute is about 0.1353 or 13.5 percent. The probability of one order
is 0.2707 or 27.07 percent. The probability of
two orders is also 0.2707 or 27.07 percent. The probability of
three orders is 0.1805 or 18.05 percent. Before we finish up, let’s compare the two discrete
probability distributions you recently learned about, the binomial and the Poisson. Sometimes it can be challenging to figure
out if you should use a binomial distribution
or a Poisson distribution. To help you choose
between the two, you can use the following
general guidelines. Use the Poisson distribution
if you are given the average probability
of an event happening for a
specific time period. You want to find out
the probability of a certain number of events
happening in that time period. For example, if a call center averages 10 customer service
calls per hour, you can use the
Poisson distribution to find the probability of getting 12 calls between
02:00 P.M. and 3PM. Use the binomial
distribution if you are given the exact probability
of an event happening, and you want to find out the
probability of the event happening a certain number of
times in a repeated trial. For example, if the
probability of getting heads for any coin
toss is 50 percent, you can use the binomial
distribution to find the probability of getting
8 heads in 10 coin tosses. That’s all for discrete
probability distributions. In your future career
as a data professional, you’ll use discrete
distributions like the binomial
and the Poisson to better understand
your data and make informed predictions
about future outcomes.
Reading: Discrete probability distributions
Practice Quiz: Test your knowledge: Discrete probability distributions
Which of the following statements describe continuous random variables? Select all that apply.
Continuous random variables are typically decimal values.
Continuous random variables take all the possible values in some range of numbers. Typically, these are decimal values that can be measured, such as height, weight, or time.
Continuous random variables take all the possible values in some range of numbers.
Continuous random variables take all the possible values in some range of numbers. Typically, these are decimal values that can be measured, such as height, weight, or time.
What probability distribution represents experiments with repeated trials that each have two possible outcomes: success or failure?
The binomial distribution
The binomial distribution represents experiments with repeated trials that each have two possible outcomes: success or failure.
Continuous probability distributions
Video: The normal distribution
From Discrete to Continuous
- Discrete: Outcomes are whole numbers (like rolling 2 or 3 on a die).
- Continuous: Outcomes can take on any decimal value within a range (like time, height, or temperature).
Introducing the Normal Distribution (a.k.a the Bell Curve)
- Key Features:
- Bell-shaped curve
- Symmetrical around the mean (center)
- Total area under the curve equals 1 (100% of possible outcomes)
- Why It Matters: Many real-world datasets follow this pattern (e.g., test scores, heights, salaries), making it essential in statistics and machine learning.
Example: Honeycrisp Apples
- Assume weights are normally distributed with a mean of 100 grams and a standard deviation of 15 grams.
- Key Points Illustrated:
- Mean is at the peak of the curve (most likely weight).
- Symmetry: 50% of apples are heavier than the mean, 50% are lighter.
- Farther from the mean = less likely weights.
Standard Deviations and the Empirical Rule
- Standard Deviation: Measures how spread out the data is around the mean.
- Empirical Rule: For normal distributions…
- 68% of data is within 1 standard deviation of the mean.
- 95% is within 2 standard deviations.
- 99.7% is within 3 standard deviations.
Applications of the Empirical Rule
- Estimating Data: Quickly understand how your data is distributed.
- Outlier Detection: Values beyond 3 standard deviations might be errors.
Example: Plant Heights
- Knowing the normal distribution and the empirical rule helps you determine the percentage of plants meeting your landscape design criteria.
Key Takeaway: The normal distribution is a powerful tool for understanding and analyzing continuous data throughout various fields.
What shape is the graph of a normal distribution?
Bell-shaped
The normal distribution is a continuous probability distribution that is symmetrical on both sides of the mean and bell-shaped. It is often called the bell curve because its graph has the shape of a bell, with a peak at the center and two downward sloping sides.
So far, we’ve been talking about discrete
probability distributions where the outcomes
of experiments are represented by countable
whole numbers. For example, rolling a die
can result in a two or three, but not a decimal values
such as 2.178 or 3.394. Now, we’ll move from discrete to continuous probability
distributions. Recall that continuous
probability deals with outcomes that can take on all the
values in a range of numbers. Typically, these are decimal values that can be
measured such as: height, weight, time, or temperature. For example, you can keep
on measuring time with more accuracy: 1.1 seconds, 1.12 seconds, 1.1257
seconds, and so on. In this video, we’ll explore the most widely used
probability distribution in statistics, the
normal distribution. The normal distribution is a continuous probability
distribution that is symmetrical on both sides of the
mean and bell-shaped. The normal distribution
is often called the bell curve because its
graph has the shape of a bell with a peak at the center and two
downward-sloping sides. It is also known as the
Gaussian distribution after the German mathematician
Carl Gauss who first described the formula
for the distribution. While we’re on the
subject of formulas, if you want to learn
more about this formula, please check out the
relevant reading where it’s discussed
in more detail. The normal distribution is the most common probability
distribution in statistics because so many
different kinds of data sets display a
bell-shaped curve. For example, if you
randomly sample 100 people you’ll discover a normal distribution curve
for continuous variables, such as: height, weight, blood pressure, IQ scores,
salaries, and more. For example, think of the typical results of
standardized tests. The majority of
people will score close to the average
score or mean. Fewer numbers of people
will score below or above average farther
out from the mean. A very small percentage
of people will score extremely high or extremely low. Very far away from the mean. This distribution of scores
generates a bell curve. Most of the data values are
relatively close to the mean. The farther a value is
away from the mean, the less likely it is to occur. On a normal curve the x-axis refers to the value of
the variable you’re measuring and the y-axis refers to how likely you are
to observe the value. In the case of test
scores, the x-axis is the raw score and the y-axis is the percentage of the population
that gets that score. Data professionals use the
normal distribution to model all kinds of different data sets in
the fields of business, science, government, machine
learning, and others. Understanding the
normal distribution is also important for more advanced statistical
methods such as hypothesis testing and
regression analysis which you’ll learn
more about later. Plus many machine
learning algorithms assume that data is
normally distributed. All normal distributions have the following features:
the shape is a bell curve, the mean is located at
the center of the curve, the curve is symmetrical on
both sides of the center, and the total area under
the curve equals 1. To clarify the features of
the normal distribution, let’s graph the weights of honeycrisp apples. Assume that the weights of honeycrisp apples are approximately normally
distributed with a mean of 100 grams and a standard
deviation of 15 grams. First, find the mean at
the center of the curve. This is also the highest
point or peak of the curve. This data point represents the most probable
outcome in the data set, mean weight of 100 grams. Second, notice that the curve is symmetrical on each
side of the mean. Fifty percent of the
data is above the mean, and 50 percent is
below the mean. Third, the farther a point
is away from the mean, the lower the probability
of those outcomes. The points farthest
from the mean represent the least probable
outcomes in the data set. These are apples that have
more extreme weights, either low or high. Finally, the area under
the curve is equal to 1. This means that the area
under the curve accounts for 100 percent of the possible
outcomes in the distribution. On a normal distribution, the distance of a data
point from the mean is often measured in
standard deviations. As a refresher, the standard
deviation calculates the typical distance of a data point from the
mean of your data set. While the mean refers to
the center of your data, the standard deviation
measures spread. As standard deviations
become larger, data values become more
spread out from the mean. In our apple example,
the mean weight is 100 grams and the standard
deviation is 15 grams, so an apple that is
one standard deviation above the mean will
weigh 115 grams with the mean
weight of 100 grams plus the standard
deviation of 15 grams. An apple that is one
standard deviation below the mean will weigh 85 grams or 100 grams
minus 15 grams. An apple that’s two
standard deviations above the mean will
weigh 130 grams, and an apple that’s two
standard deviations below the mean will
weigh 70 grams. The values on a normal
curve are distributed in a regular pattern based on
their distance from the mean. This is known as
the empirical rule. It states that for a given data set with a normal distribution, 68 percent of values fall within one standard
deviation of the mean, 95 percent of values fall within two standard
deviations of the mean, and the 99.7 percent of values fall within three
standard deviations of the mean. The empirical rule can give you a clear idea of
how the values in your data set are
distributed which helps you save time and better
understand your data. Let’s continue with
our apple example. The empirical rule
tells you that most apples were 68 percent will fall within one
standard deviation of the mean weight of 100 grams. This means that 68 percent
of apples will weigh between 85 grams which is one standard deviation
below the mean, and 115 grams, one standard deviation
above the mean. Ninety-five percent
of apples will weigh between 70 grams and 130 grams or within two standard
deviations from the mean. Almost all apples or
99.7 percent will weigh between 55 grams
and then 145 grams, or within three standard
deviations of the mean. The empirical rule is useful for estimating data, especially for large data sets like height and weight data for an
entire population. You can use the
empirical rule to get an initial estimate of
the distribution of values in your data set such as what percentage of values
will fall within one, two, or three standard
deviations of the mean. This saves time and helps
you better understand your data. Plus, knowing the location of your values on a normal distribution is
useful for detecting outliers. Recall that an outlier
is a value that differs significantly from
the rest of the data. Typically, data professionals
consider values that lie more than three
standard deviations below or above the
mean to be outliers. It’s important to
identify outliers because some extreme values
may be due to errors in data collection
or data processing. These false values may skew
the results of your analysis. Let’s explore another
example of how the empirical rule can help you better
understand your data. Imagine you have a garden, the height of your plans is normally distributed
with a mean of 32.1 inches and a standard
deviation of 2.2 inches. Let’s say you want to find
out what percentage of plants are greater than
29.9 inches tall. You want your plants to be
at least this tall as part of your landscape design
plan for your backyard. First, find out where the value 29.9 is located on
the distribution. Twenty-nine point nine is
located one standard deviation below the mean. The empirical rule tells
you that 68 percent of values fall within one standard
deviation of the mean. Half of these values or 34
percent fall below the mean. Now you know that 34 percent
of values are between 29.9 and the mean because 29.9 is one standard
deviation below the mean plus 50 percent of all values in a normal distribution fall above the mean or
center of the curve. The sum of these two
percentages will tell you the overall percentage of
values greater than 29.9. 34 percent plus 50 percent
equals 84 percent, so 84 percent of your plants are greater than 29.9 inches tall. The empirical rule
helps you quickly understand the overall
distribution of your data values. Now you know that most
of your plants are tall enough for your
landscape design plan. As a future data professional, you use the normal
distribution to identify significant patterns in a
wide variety of data sets.
Reading: Model data with the normal distribution
Categorize: Probability distributions
Reading
![](https://i0.wp.com/stackfolio.xyz/wp-content/uploads/2024/03/Categorize_-Probability-distributions.png?resize=1024%2C829&ssl=1)
Video: Standardize data using z-scores
What is a Z-Score?
- Definition: A z-score tells you how many standard deviations a specific data point is away from the mean of a normally distributed dataset.
- Significance: It standardizes data, allowing you to compare values from different datasets that might have different scales or units.
Interpreting Z-Scores
- Z = 0: The data point is equal to the mean.
- Z > 0: The data point is above the mean.
- Z < 0: The data point is below the mean.
Why Z-Scores are Useful
- Comparing Datasets: Even with different scales, z-scores let you compare how individual points perform across different datasets.
- Anomaly Detection: Z-scores help find unusual data points (outliers) that might signal fraud, errors, etc.
Z-Score Formula
Z = (x - μ) / σ
Where:
- x = Data point (raw score)
- μ = Population mean
- σ = Population standard deviation
Example
You score 133 on a test with a mean of 100 and a standard deviation of 15. Your z-score is:
(133 – 100) / 15 = 2.2
This means your score is 2.2 standard deviations above average (a great score!)
Key Points
- Z-scores are typically used with normally distributed data.
- Data analysts often use programming languages (like Python) to compute z-scores for large datasets.
What is the z-score of a data value equal to the mean?
0
The z-score is 0 if the data value is equal to the mean. A z-score is a measure of how many standard deviations below or above the population mean a data point is.
Recently, you learned about
the normal distribution and how it applies to many
different kinds of data sets. In this video, you’ll learn about z-scores
and how they can help you compare values from different types of
normally distributed data sets. A z-score is a measure of how
many standard deviations below or above the population mean a data point is. A z-score gives you an idea of how
far from the mean a data point is. For example, the z-score is 0,
if the value is equal to the mean. The z-score is positive if
the value is greater than the mean. The z-score is negative if
the value is less than the mean. Z-scores help you standardize your data. In statistics, standardization is the process of putting
different variables on the same scale. There is a formula for
this which will check out a little later. Z-scores are also called standard
scores because they’re based on what’s called the standard
normal distribution. A standard normal distribution is just
a normal distribution with a mean of 0 and a standard deviation of 1. Z-scores typically range from -3 to +3. Standardization is useful because it
lets you compare scores from different data sets that may have different units,
mean values and standard deviations. Data professionals use z-scores to better
understand the relationship between data values within a single dataset and
between different data sets. For example, data professionals often
use z-scores for anomaly detection, which finds outliers in datasets. Applications of anomaly detection include
finding fraud in financial transactions, flaws in manufacturing products,
intrusions in computer networks and more. For example, different customer satisfaction surveys
may have different rating scales. One survey could rate a product or
service from 1 to 20, another from 500 to 1,500,
and a third from 130 to 180. Let’s say the same product got
a score of 9 on the first survey, 850 on the second and 142 on the third. These numbers don’t mean much
by themselves, but if you know, they all have a z-score of 1, or
one standard deviation above the mean, you can meaningfully compare
ratings across surveys. A z-score for an individual value
can be interpreted as follows. A z-score of 1 is one standard
deviation above the mean. A z-score of 1.5 is 1.5 standard
deviations above the mean. A z-score of -2.3 is 2.3 standard
deviations below the mean. You can use the following
formula to calculate a z-score, Z equals x minus mu divided by sigma. In this formula, x refers to
a single data value or raw score. The Greek letter mu refers
to population mean. The Greek letter sigma refers to
the population standard deviation. So we can also say that Z
equals the raw score or data value minus the mean divided
by the standard deviation. For example,
let’s say you take a standardized test, you have a test score of 133. The test has a mean score of 100 and
a standard deviation of 15. Assuming a normal distribution, you can
use the formula to calculate your z-score. Your z-score is your raw score,
133 minus the mean score, 100 divided by the standard deviation 15. This is 133 minus 100 divided by 15
equals 33 divided by 15 equals 2.2. Your Z-score of 2.2 tells you that your
test score is 2.2 standard deviations above the mean or average score. That’s a really good score. Recall that the empirical rule tells
you that 95% of values fall within two standard deviations of the mean. Your score of 2.2 is more than two
standard deviations above the mean. Z-scores are useful because
they give us an idea of how an individual value compares
to the rest of the distribution. Let’s take a different exam
with a different grading scale. Say you score in 85, you want to find out if that’s a good
score relative to the rest of the class. Whether or not it’s a good
score depends on the mean and standard deviation of all exam scores. Suppose the exam scores are normally
distributed with a mean score of 90 and a standard deviation of 4, you can use the formula to calculate
the z-score of a raw score of 85. Your z-score is yours raw score,
85 minus the mean score 90, divided by the standard deviation 4. This is 85 minus 90 divided by 4
equals -5, divided by 4 equals 1.25. Your z-score of -1.25 tells you
that your exam score of 85 is 1.25 standard deviations below the mean or
average exam score. Z-scores give you an idea of how
individual values compared to the mean. As a data professional, you’ll use
z-scores to help you better understand the relationship between
specific values in your data set. You’ll most likely use a programming
language like Python to calculate z-scores on your computer as you”ll
learn in an upcoming video.
Practice Quiz: Test your knowledge: Continuous probability distributions
The normal distribution has which of the following features? Select all that apply.
- The curve is symmetrical on both sides of the center
- The mean is located at the center of the curve
- The shape is a bell curve
The normal distribution has the following features: the shape is a bell curve, the mean is located at the center of the curve, and the curve is symmetrical on both sides of the center. The normal distribution is the most common probability distribution in statistics because so many different kinds of datasets display a bell-shaped curve.
What does the empirical rule state?
For a dataset with a normal distribution, 68% of values fall within 1 standard deviation of the mean, 95% of values fall within 2 standard deviations of the mean, and 99.7% of values fall within 3 standard deviations of the mean.
A data value is 2 standard deviations above the mean. What is its z-score?
2
Its z-score is 2. A z-score of 2 is 2 standard deviations above the mean. Z-score is a measure of how many standard deviations below or above the population mean a data point is.
Probability distributions with Python
Lab: Annotated follow-along guide: Work with probability distributions in Python
Reading
Instructions
All of the instructional videos with onscreen coding demonstrations have a corresponding follow-along guide that is available to you. The follow-along guide is an annotated Jupyter notebook organized to match the content from each module. It contains the same code shown in the videos for that module. This guide is provided for your reference; you do not need to add any text or run the code yourself. If you would like to run the code, you will need to run each cell sequentially for the code to function as intended.
In addition to content that is identical to what is covered in the videos, you’ll often find additional information throughout the guide to explain the purpose of each concept covered, why the code is written in a certain way, and tips for running the code.
The landing page for each follow-along notebook also provides information about data sources (when relevant) and tips on how to access and use these guides.
Data dictionary
This notebook uses a file called education_districtwise.csv. This dataset represents a list of school districts in an anonymous country. The data includes district and state names, total population, and the literacy rate.
The dataset contains:
634 rows – each row is a different school district
7 columns
Column name | Type | Description |
---|---|---|
DISTNAME | str | The names of an anonymous country’s school districts |
STATNAME | str | The names of an anonymous country’s states where school districts are located |
BLOCKS | int64 | The number of blocks in the school district. Blocks are the smallest organizational structure in the education system of the anonymous country. |
VILLAGES | int64 | column shows how many villages are in each district |
CLUSTERS | int64 | The number of clusters in the school district. Clusters are the second smallest organizational structure in the education system of the anonymous country. |
TOTPOPULAT | int64 | column shows the population for each district |
OVERALL_LI | int64 | shows the literacy rate for each district |
Note: The data in this project was derived from this data set on Kaggle. However, it has been modified for pedagogical purposes.
Remember, you can access and download the data for any Jupyter notebook activity from within the notebook itself by navigating to the Lab Files dropdown menu at the top of the page, clicking into the /home/jovyan/work folder, selecting the relevant data file, and clicking Download.
Video: Work with probability distributions in Python
Background
- The video teaches how to model data with distributions (like the normal distribution) and find outliers using Z-scores.
- Scenario: You’re analyzing district literacy rates for a Department of Education.
Key Python Libraries
- NumPy, pandas, matplotlib.pyplot: Essential data analysis tools.
- SciPy stats: Specifically designed for statistical work.
- statsmodels: Provides statistical modeling and testing functions.
Analyzing Your Data
- Visualize with Histograms: Plotting a histogram helps you see the shape of data, suggesting what type of distribution might fit.
- Check the Empirical Rule (Normal Distribution):
- The data’s histogram was bell-shaped, indicating a normal distribution.
- The empirical rule states that for a normal distribution:
- 68% of data falls within 1 standard deviation of the mean.
- 95% within 2 standard deviations.
- 99.7% within 3 standard deviations.
- The Python calculations closely matched these percentages, confirming the normal distribution.
- Why this Matters: Many statistical tests assume a normal distribution.
- Calculating Z-Scores
- Z-Score shows how many standard deviations a data point is from the mean.
- Python’s
stats.zscore
function makes the calculation easy.
- Detecting Outliers
- Z-scores outside of +/- 3 standard deviations are often considered outliers.
- In this example, two districts were discovered with unusually low literacy rates, meriting further investigation.
Key Takeaway: Python’s statistical libraries equip you to analyze data distributions effectively, guiding your analytical choices and pinpointing unusual data.
When I deal with a new dataset, I first go through the
process of EDA and compute descriptive stats to get a basic understanding
of my data. After that, I try to
determine if my data fits a certain type of
probability distribution, like the binomial, Poisson, and normal distributions
you recently learned about. Knowing the distribution of
my data helps me decide what statistical test or
machine learning model will work best for my analysis. Python has a great selection of function libraries
for data analysis. Using Python to work
with distributions saves time and improves the overall efficiency
of my analysis. In this video,
you’ll use Python to model your data with the
normal distribution. Then you’ll compute Z-scores to find any outliers
in your data. We’ll continue with our previous scenario
in which you’re a data professional working for
the Department of Education of a large nation. Recall that you’re
analyzing data on the literacy rate
for each district, and you’ve already computed descriptive statistics
to summarize your data. You’ll continue to use the
dataset you worked with before. If you need to access
the data, do so now. Along with pandas, NumPy,
and matplotlib.pyplot, you’ll use two Python packages
that may be new to you: SciPy stats and statsmodels. SciPy is an open source software you can use for
solving mathematical, scientific, engineering,
and technical problems. It allows you to
manipulate and visualize data with a wide range
of Python commands. SciPy stats is a module designed specifically
for statistics. Statsmodels is a Python package that lets you explore data, work with statistical models, and perform statistical tests. It includes an extensive list of stats functions for
different types of data. Now that you know more about the packages you’ll
be working with, let’s open up a Jupyter
Notebook and load them up. To start, import the Python
packages you will use: NumPy, pandas, and
statsmodels.api and the library you will
use matplotlib.pyplot. To save time, rename each package and library
within abbreviation: NP, PD, PLT, and SM. To load the SciPy
stats module right, from SciPy import stats. For the next part
of your analysis, you want to find
out if the data on literacy rate fits a specific type of
probability distribution. The first step in trying
to model your data with a probability distribution
is to plot a histogram. This will help you visualize
the shape of your data and determine if it
resembles the shape of a specific distribution. Use matplotlib’s histogram
function to plot a histogram of the district
literacy rate data. Recall that the overall_li
column contains this data. The x-axis of your
plot refers to the literacy rate
of each district, and the y-axis refers to count or to the
number of districts. The histogram shows that
the distribution of your literacy rate data is bell-shaped and symmetric
about the mean. Recall that the normal
distribution is a continuous probability
distribution that is bell-shaped and symmetrical
on both sides of the mean. The mean literacy rate, which is around 73 percent, is located in the
center of the plot. The shape of your
histogram suggests that the normal distribution might be a good modeling
option for your data. To verify that your data
is normally distributed, you can use Python
to find out if your data follows
the empirical rule. Recall that the empirical rule says that for every
normal distribution, about 68 percent of values fall within one standard
deviation of the mean, 95 percent fall within two standard deviations
of the mean, and 99.7 percent fall within three standard
deviations of the mean. Since the normal
distribution seems like a good fit for the district
literacy rate data, you can expect the
empirical rule to apply relatively well. In other words, you
can expect that about 68 percent of literacy rates will fall within one standard deviation
of the mean, 95 percent will fall within
two standard deviations, and 99.7 percent will fall within three
standard deviations. First, name two new variables
to store the values for the mean and
standard deviation of the district literacy rate. Name your first variable
mean_overall_li and compute the mean. Display the value
of your variable. The mean district literacy
rate is about 73.4 percent. Name your second variable std_overall_li and compute the standard deviation and display the value
of your variable. The standard deviation
is about 10 percent. If your data follows
the empirical rule, you can expect
roughly 68 percent of your values to fall within one standard deviation of the mean district literacy
rate of 73 percent. One standard deviation below
the mean is 63 percent, or 73 minus 1 times 10. One standard deviation above
the mean is 83 percent, or 73 plus 1 times 10. So you can expect roughly
68 percent of your values to fall within this
range of 63-83 percent. Now compute the
actual percentage of district literacy
rates that fall within one standard
deviation of the mean. To do this, first name
two new variables, lower_limit and upper_limit. The lower limit will be one standard deviation
below the mean, or the mean minus 1 times
the standard deviation. The upper limit will be one standard deviation
above the mean, or the mean plus 1 times
the standard deviation. To write the code for
the calculations, use your two previous variables mean_overall_li
and std_overall_li for the mean and
standard deviation. Next, add a line of code that tells the
computer to decide if each value in the
overall literacy column is between the lower
limit and upper limit. In other words, to decide if each value is greater
than or equal to one standard deviation
below the mean and less than or equal to one standard deviation
above the mean. To do this, use the relational operators
greater than or equal to and less than or equal to and the bitwise operator AND. Finally, use the
mean function to divide the number of
values that are within one standard deviation
of the mean by the total number of
values and run the code. The output shows you
that about 0.664 or 66.4 percent of your
district literacy rates fall within one standard
deviation of the mean. This is very close
to the roughly 68 percent that the
empirical rule suggests. You can use the
same code structure to determine how many values of your literacy rate
values fall within two and three standard
deviations of the mean. Just multiply the
standard deviation by two or three instead of one. About 0.954 or 95.4 percent of your values fall within two standard
deviations of the mean, and about 0.996 or 99.6
percent of your values fall within three standard
deviations of the mean. Here, values of 66.4, 95.4, and 99.6 percent are very close to what the
empirical rule suggests, roughly 68, 95,
and 99.7 percent. At this point, it’s safe to say your data follows a
normal distribution. Knowing that your
data is normally distributed is useful
for analysis because many statistical tests and machine learning models
assume a normal distribution. Plus, when your data follows
a normal distribution, you can use Z-scores to measure the relative position
of your values and find outliers in your data. Let’s explore how to calculate
Z-scores in Python now. Recall that a Z-score
is a measure of how many standard
deviations below or above the population
mean a data point is. A Z-score is useful
because it tells you where a value lies
in a distribution. For example, if I tell you a literacy rate is 80 percent, this doesn’t give you
much information about where the value lies
in the distribution. However, if I tell you the literacy rate has
a Z-score of two, then you know that the value is two standard deviations
above the mean. Data professionals often use Z-scores for outlier detection. Typically, they
consider observations with the Z-score smaller than a negative 3 or larger than
positive 3 as outliers. These are values that lie more than three
standard deviations below or above the mean. To find outliers in your data, first create a new
column called Z_SCORE that includes the Z-scores for each district literacy
rate in your dataset. Then compute the Z-scores with
the function stats.zscore. Python takes care of
all the calculations. Now, write some code to identify outliers or districts
with Z-scores that are greater than or less than three standard
deviations from the mean. Use the relational operators
greater than and less than and the bitwise operator OR. Using Z-scores, you identify
two outlying districts: District 461 and District 429. The literacy rates in
these two districts are more than three
standard deviations below the overall mean, which means they have
unusually low rates. Your analysis gives you important
information to share. The government may want to provide more funding
and resources to these two districts in the hopes of significantly
improving literacy. Probability
distributions are useful for modeling your data and help you determine
which statistical test to use for an analysis. In addition to the
normal distribution, Python can help you work with a wide range of
probability distributions.
Lab: Activity: Explore probability distributions
Reading
Instructions
In this lab, you will learn how data across a dataset is distributed and identify data points that need further examination. You will be presented with a business scenario and a dataset to use as you explore the distribution. You will practice determining the type of probability distribution that fits the data, calculating z-score, and detecting outliers to understand how the data is distributed.
To complete this lab, you will open a Jupyter notebook and follow instructions to enter code and written responses where prompted.
Data Dictionary
This activity uses air quality index (AQI) data from the United States Environmental Protection Agency (EPA). The data was taken from the EPA’s own repository of AQI data and assembled for pedagogical purposes.
This activity uses a modified version of the dataset called modified_c4_epa_air_quality.csv. The variables in the dataset have been adjusted to suit the objectives of this lab, and may vary from other versions of the dataset you have worked with previously. It contains air quality readings for various counties in the United States. For more information about AQI data, refer to AirNow.
The dataset contains:
260 rows – each row is a different aqi reading
8 columns
Column name | Type | Description |
---|---|---|
date_local | datetime | Date of AQI reading |
state_name | str | The name of the state |
county_name | str | The name of the U.S. county |
city_name | str | U.S. City where monitoring occurred |
local_site name | str | The identifier of the site in the owning agency’s (e.g., not US EPA) nomenclature |
parameter_name | str | The description assigned in the air quality system to the parameter measured by the monitor. Parameters may be pollutants or non-pollutants (e.g., wind speed). |
units_of_measure | str | The unit of measure for all data in that row. Every parameter has a standard unit of measure. |
aqi_log | int | AQI readings that were transformed logarithmically to suit the objectives of this lab. |
You can download the csv file from within the notebook itself by navigating to the Lab Files dropdown menu at the top of the page, clicking into the /home/jovyan/work folder, selecting the file, and clicking Download.
Lab: Exemplar: Explore probability distributions
Practice Quiz: Test your knowledge: Probability distributions with Python
A data professional is working with a dataset that has a normal distribution. To test out the empirical rule, they want to find out if roughly 68% of the data values fall within 1 standard deviation of the mean. What Python functions will enable them to compute the mean and standard deviation?
mean() and std()
To compute the mean, they would use the mean() function; to compute the standard deviation, they would use the std() function.
What Python function is used to compute z-scores for data?
stats.zscore()
The Python function stats.zscore() is used to compute z-scores for data. This function is part of the stats module in the SciPy package.
Review: Probability
Video: Wrap-up
What You’ve Learned
- Probability in Data Science: It’s the foundation for making predictions, understanding patterns, and making data-driven decisions.
- Types of Probability: Objective (data-based) is crucial for data analysis, while subjective involves personal judgment.
- Probability Rules: Complement, addition, multiplication, and conditional probability govern how events relate.
- Bayes’ Theorem: Helps you update probabilities as you gain new information.
- Probability Distributions: Models for understanding the likelihoods of different outcomes.
- Discrete (e.g., binomial, Poisson): For countable outcomes.
- Continuous (e.g., normal distribution): For measurements on a spectrum.
- Z-scores: Help standardize data within the normal distribution.
- SciPy Stats: A powerful Python tool for working with probability distributions.
Why This Matters
Probability is essential for:
- Deeper statistical analysis (hypothesis testing, regression)
- Identifying patterns within data
- Machine learning
Next Steps
- Prepare for the graded assessment!
- Review the new terminology introduced.
- Revisit any concepts that were challenging for a stronger foundation.
You’ve come to the end of your introduction
to probability. Wow, you’ve learned a lot of important concepts, great work. Along the way we’ve explored how data professionals
use probability to make reasonable predictions
about uncertain events and help people and organizations
make data-driven decisions. Basic probability is a foundational part
of data science, and it also informs more
advanced statistical methods, such as hypothesis testing
and regression analysis, which we’ll explore
later in the program. In your future career
as a data professional, you’ll use probability
distributions to discover significant
patterns in your data. Plus a working knowledge of probability distributions is key for machine learning and essential tool in
modern data science. We started off this part
of the course by reviewing the two main types of probability, objective
and subjective. Data professionals use
objective probability to analyze and interpret data. From there, we reviewed the
basic rules of probability, such as the complement, addition, and
multiplication rules. Then you learned about
conditional probability, which helps you
better understand the relationship between
dependent events. We also discussed
Bayes’ theorem, which updates the probability of an event based on new
data about the event. After that, we moved from basic probability to
probability distributions. Probability distributions
describe the likelihood of the possible outcomes of a random event and can be
discrete or continuous. Data professionals use
probability distributions to find meaningful patterns
in complex datasets. Next, we explored discrete
probability distributions, such as the binomial
and Poisson, and discovered how
they can help you model different types of data. Then we moved on to continuous
probability distributions. We focused on the normal
distribution or bell curve, the most widely used
distribution in statistics. We also discussed how Z-scores can help you better understand the relationship
between values and a standard normal distribution. Finally, you learned that
the SciPy stats module is a powerful tool for working with probability
distributions. You use the normal distribution
to model your data and gain useful insights. Coming up, you have a graded
assessment to prepare, check out the reading that lists all the new terms
you’ve learned. Feel free to revisit
videos, readings, and other resources that
covered key concepts, until we meet again. Good luck.
Reading: Glossary terms from module 2
Terms and definitions from Course 4, Module 2
Addition rule (for mutually exclusive events): The concept that if the events A and B are mutually exclusive, then the probability of A or B happening is the sum of the probabilities of A and B
Bayes’ theorem: A math formula for stating that for any two events A and B, the probability of A given B equals the probability of A multiplied by the probability of B given A divided by the probability of B; Also referred to as Bayes’ rule
Bayes’ rule: (Refer to Bayes’ theorem)
Bayesian inference: (Refer to Bayesian statistics)
Bayesian statistics: A powerful method for analyzing and interpreting data in modern data analytics; Also referred to as Bayesian inference
Binomial distribution: A discrete distribution that models the probability of events with only two possible outcomes: success or failure
Classical probability: A type of probability based on formal reasoning about events with equally likely outcomes
Complement of an event: In statistics, refers to an event not occurring
Complement rule: A concept stating that the probability that event A does not occur is one minus the probability of A
Conditional probability: Refers to the probability of an event occurring given that another event has already occurred
Continuous random variable: A variable that takes all the possible values in some range of numbers
Dependent events: The concept that two events are dependent if one event changes the probability of the other event
Discrete random variable: A variable that has a countable number of possible values
Empirical probability: A type of probability based on experimental or historical data
Empirical rule: A concept stating that the values on a normal curve are distributed in a regular pattern, based on their distance from the mean
False positive: A test result that indicates something is present when it really is not
Independent events: The concept that two events are independent if the occurrence of one event does not change the probability of the other event
Multiplication rule (for independent events): The concept that if the events A and B are independent, then the probability of both A and B happening is the probability of A multiplied by the probability of B
Mutually exclusive: The concept that two events are mutually exclusive if they cannot occur at the same time
Normal distribution: A continuous probability distribution that is symmetrical on both sides of the mean and bell-shaped
Objective probability: A type of probability based on statistics, experiments, and mathematical measurements
Poisson distribution: A probability distribution that models the probability that a certain number of events will occur during a specific time period
Posterior probability: Refers to the updated probability of an event based on new data
Prior probability: Refers to the probability of an event before new data is collected
Probability: The branch of mathematics that deals with measuring and quantifying uncertainty
Probability distribution: A function that describes the likelihood of the possible outcomes of a random event
Random experiment: A process whose outcome cannot be predicted with certainty
Random variable: A variable that represents the values for the possible outcomes of a random event
Sample space: The set of all possible values for a random variable
Standard deviation: A statistic that calculates the typical distance of a data point from the mean of a dataset
Standardization: The process of putting different variables on the same scale
Subjective probability: A type of probability based on personal feelings, experience, or judgment
Z-score: A measure of how many standard deviations below or above the population mean a data point is
Quiz: Module 2 challenge
An investor believes there is a 90% chance that the price of a certain stock will increase in the next year. The investor’s prediction is based exclusively on intuition. What type of probability are they using?
Subjective
Gemini
The correct answer is Subjective. Here’s why:
- Subjective Probability: This relies on personal beliefs, opinions, or gut feelings. The investor’s 90% estimate isn’t based on any data or calculations.
Let’s analyze the other options:
- Classical Probability: Involves equally likely outcomes (like coin flips). Not applicable to complex stock predictions.
- Objective Probability: Based on historical data or known frequencies. The investor doesn’t seem to be using any such data.
- Empirical Probability: Derived from observations of experiments or events. Again, the investor’s estimate doesn’t appear to be based on such observations.
Let me know if you’d like more examples to distinguish between probability types!
ChatGPT
The investor is using subjective probability because their prediction is based on intuition rather than any objective data or statistical analysis.
The probability of an event is close to 1. Which of the following statements best describes the likelihood that the event will occur?
The event is likely to occur
Gemini
The best answer is: The event is likely to occur.
Here’s why:
- Probability Scale: Probabilities range from 0 (impossible) to 1 (certain).
- Close to 1: An event with a probability very close to 1 indicates a high likelihood of happening, though it’s not absolutely guaranteed.
Let’s look at the other options:
- Certain to occur: This would require a probability of exactly 1.
- Certain not to occur: This would be a probability of 0.
- Unlikely to occur: This would imply a probability closer to 0.
ChatGPT
The event is certain to occur.
The probability of rain tomorrow is 40%. What is the probability of the complement of this event?
The probability of no rain tomorrow is 60%
Gemini
The correct answer is: The probability of no rain tomorrow is 60%.
Here’s why:
- Complement Rule: The probability of an event happening and the probability of its complement (not happening) always add up to 100%.
- Calculation: If the probability of rain is 40%, the probability of no rain (the complement) is 100% – 40% = 60%.
ChatGPT
The probability of no rain tomorrow is 60%.
Which of the following events are mutually exclusive? Select all that apply.
- Getting heads and tails on the same coin toss
- Getting a 4 and a 6 on the same die roll
Gemini
Here’s the breakdown of events that are mutually exclusive:
- Getting heads and tails on the same coin toss: This is impossible! Only one outcome can occur.
- Getting a 4 and a 6 on the same die roll: Similar to the coin toss, you can’t get both numbers simultaneously with a single roll.
Let’s look at the events that are NOT mutually exclusive:
- Getting heads on a first coin toss and tails on a second coin toss: These events can occur in sequence. They don’t exclude each other.
- Getting a 4 on a first die roll and a 6 on a second die roll: It’s possible to roll a 4, then roll a 6. These events are independent.
Key takeaway: Mutually exclusive events CANNOT happen at the same time.
ChatGPT
The events that are mutually exclusive are:
- Getting heads and tails on the same coin toss
- Getting a 4 on a first die roll and a 6 on a second die roll
What concept refers to the probability of an event occurring given that another event has already occurred?
Conditional probability
Gemini
The correct answer is Conditional probability.
Here’s why:
- Conditional Probability: It deals with the likelihood of an event (let’s call it B) happening, given the knowledge that another event (let’s call it A) has already occurred.
Let’s look at the other options:
- Empirical Probability: This is based on observing actual experiments or events.
- Classical Probability: This involves equally likely outcomes, like coin flips or dice rolls.
- Subjective Probability: This is based on personal beliefs or opinions.
Let me know if you’d like an example of conditional probability!
ChatGPT
The events that are mutually exclusive are:
- Getting heads and tails on the same coin toss
- Getting a 4 on a first die roll and a 6 on a second die roll
Which of the following are examples of continuous random variables? Select all that apply.
- The weight of a polar bear
- The time it takes for a person to run a race
- The height of a redwood tree
Gemini
Here’s the breakdown of continuous random variables:
- The weight of a polar bear: Weight is measured on a continuous scale (e.g., 800.35 lbs, 912.7 lbs).
- The time it takes for a person to run a race: Time is continuous (e.g., 10.56 seconds, 12.392 seconds).
- The height of a redwood tree: Height is measured on a continuous scale.
Let’s analyze the incorrect option:
- The number of students in a math class: This is a discrete random variable. You can only have whole numbers of students, not fractions of them.
Key Takeaway: Continuous random variables can take on any value within a given range (even with decimals).
ChatGPT
The examples of continuous random variables are:
- The weight of a polar bear
- The time it takes for a person to run a race
- The height of a redwood tree
What probability distribution can model the probability of getting a certain number of defective products in a sample of 15 products?
Binomial distribution
Gemini
The correct answer is the Binomial distribution. Here’s why:
- Binomial Distribution Requirements:
- Fixed number of trials (in this case, sampling 15 products)
- Two possible outcomes on each trial (defective or not defective)
- Independent trials (one product’s outcome doesn’t affect the next)
- Constant probability of “success” (the probability of a defective product remains the same).
- Why other options aren’t the best fit:
- Poisson Distribution: Models the number of events occurring in a fixed time or space, not a fixed number of trials.
- Normal/Standard Normal Distributions: These are continuous distributions, not suitable for counting discrete events like defective products.
Let me know if you’d like a more detailed explanation of the binomial distribution!
ChatGPT
The probability distribution that can model the probability of getting a certain number of defective products in a sample of 15 products is the Binomial distribution.
A data professional working for a smartphone manufacturer is analyzing sample data on the weight of a specific smartphone. The data follows a normal distribution, with a mean weight of 150g and a standard deviation of 10g. What data value lies 3 standard deviations below the mean?
120 grams
Gemini
In a normal distribution, we can find the data value that lies a certain number of standard deviations below the mean by subtracting the product of the standard deviation and the number of standard deviations from the mean.
Given that the mean weight of the smartphone is 150 grams and the standard deviation is 10 grams, we can calculate the data value that lies 3 standard deviations below the mean as follows:
Mean – (Number of standard deviations * Standard deviation) = Data value below the mean
150 grams – (3 standard deviations * 10 grams/standard deviation) = 120 grams
Therefore, the data value that lies 3 standard deviations below the mean is approximately 120 grams.
Number of standard deviations below the mean | Data value |
---|---|
3 | 120 grams |
ChatGPT
To find the data value that lies 3 standard deviations below the mean in a normal distribution, we need to subtract three times the standard deviation from the mean.
Mean weight = 150g
Standard deviation = 10g
Data value = Mean – (3 * Standard deviation)
= 150g – (3 * 10g)
= 150g – 30g
= 120g
So, the data value that lies 3 standard deviations below the mean is 120g. Therefore, the correct answer is 120g.
The mean and the standard deviation of a standard normal distribution always equal what values?
Mean = 0; standard deviation = 1
Gemini
The correct answer is: Mean = 0; standard deviation = 1
Here’s why:
- Standard Normal Distribution (Z-Distribution): A specific type of normal distribution with a mean centered at 0 and a standard deviation of 1. This standardization allows you to compare data from different normal distributions.
ChatGPT
The mean and standard deviation of a standard normal distribution are fixed values.
Mean = 0
Standard deviation = 1
Therefore, the correct answer is:
Mean = 0; Standard deviation = 1
A data analytics team at a water utility works with a dataset that contains information about local reservoirs. They determine that the data follows a normal distribution. What Python function can they use to compute z-scores for the data?
stats.zscore()
Gemini
The correct answer is stats.zscore(). Here’s why:
- SciPy’s stats module: The
stats
module in the SciPy library provides tools for statistical analysis, including thezscore
function. - How it works:
stats.zscore()
takes your data as input and calculates standardized z-scores for each data point, indicating how many standard deviations each value is away from the mean.
Let’s look at the other options:
- median.zscore(), mean.zscore(): These aren’t valid Python functions. Z-scores are calculated for individual data points, not for summary statistics like mean or median.
- describe(): This function (part of Pandas) provides descriptive statistics like mean, standard deviation, etc., but doesn’t directly compute z-scores.
ChatGPT
The Python function that can be used to compute z-scores for the data is stats.zscore()
from the scipy.stats
module.
So, the correct answer is:
stats.zscore()