Skip to content
Home » Google Career Certificates » Google Advanced Data Analytics Professional Certificate » Go Beyond the Numbers: Translate Data into Insights » Module 1: Find and share stories using data

Module 1: Find and share stories using data

You’ll learn how to find the stories within data and share them with your audience. You’ll learn about the methods and benefits of data cleaning and how it can help you discover those stories. You’ll also go over the steps of the EDA process and learn how EDA can help you quickly understand data. Finally, you’ll explore different ways to visualize data to communicate key insights.

Learning Objectives

  • Recognize the importance of ethics and accessibility in visualizing data
  • Explain how EDA helps data professionals share stories from raw data sources
  • Recognize the importance of ethics and accessibility in sharing stories with data
  • Explain the process of exploratory data analysis (EDA) and the benefits of understanding data
  • Explain the importance of aligning EDA methods with business purposes while using the PACE framework
  • Identify the six different parts of the EDA process: discovering, structuring, cleaning, joining, validating, presenting
  • Explain how data analysis helps data professionals tell stories from raw data sources

Get started with the course


Video: Introduction to Course 3

Excited data archaeologist Robb invites you to explore hidden stories within data, like unearthing ancient artifacts.

  • Data analysis is like archaeology: Discover hidden insights and compelling trends amidst numbers.
  • Stories in data: Learn a 6-step process (EDA) to find and sculpt data into impactful narratives.
  • PACE framework: Apply planning, analysis, construction, and execution to data exploration.
  • Python skills in action: Expand your coding knowledge through hands-on data analysis exercises.
  • Data wrangling: Dive into data sources, types, structuring, cleaning, and ethical considerations.
  • Visual storytelling: Enhance your data insights with Tableau and audience-specific visualizations.
  • Real-world impact: Data-driven stories can change lives, as Robb’s healthcare example demonstrates.
  • Ready to explore? Grab your tools and join the data archaeology adventure!

Key takeaways:

  • Data analysis is about uncovering hidden stories and communicating them effectively.
  • This course equips you with tools and techniques to explore data, find insights, and craft compelling narratives.
  • By learning to tell data stories, you can make a real-world impact in various fields.

Imagine you’re an archaeologist,
someone who unearthed artifacts to study ancient civilizations
and preserve the stories of history. You’re excited because you’re the first
person to explore a new dig Site. The rising sun creates a warm
glow on the orange and yellow rock of the ancient
riverbed in front of you. You forget your early
morning tiredness and yesterday’s 15 hour flight as you
breathe in the crisp clean air, you pause as you remember the words of
your site leader the previous evening. This spot has never been studied, but
it’s absolutely perfect for preservation. You’re guaranteed to find something and
anything we find we’ll be able to present at the International Archaeological
Research Institute next summer. What could be hidden under that rock, what ancient mysteries will be uncovered,
what stories will be unearthed? The possibilities seem endless.
Hi there, my name is Robb. I’m a Consumer Product marketing leader. I work on marketing projects
here at Google, and every time I get
the opportunity to analyze data, it feels like I could be an archaeologist
on the verge of an incredible discovery. Some people see a data sheet or a table
of disorganized numbers and they think, this is so lame, but
data professionals know better, don’t we? We know that hidden inside the numbers,
columns and rows are golden nuggets of information, never before seen
insights or compelling trends. These interesting bits of hidden knowledge
are stories waiting to be shared, and stories are one of the most
impactful ways to communicate ideas. The amazing truth for us data professionals is that
all data have stories to tell. So whether you’re an aspiring
data professional, or you want to learn to tell a good story,
preferably both. Welcome to this course,
by now you have a fairly good idea of the scope of this program and
the basics of python coding. Now, it’s time to dig deep
into unexplored data, and try to make sense of it,
are you ready to explore? We will begin with how to find and sculpt stories using a six part process
called exploratory data analysis, then, we’ll discuss how PACE applies
to telling stories using data. Do you remember the data
professional workflow acronym PACE, that is, plan, analyze,
construct and execute. We’ll talk about how PACE
applies to exploring data and learn about the necessity of
visualizations in understanding the data. Finally, I’ll show you how to perform
exploratory data analysis on a data set in python, expanding on those
python skills you learned previously. Later in this course, you’ll learn
about data sources, data types, data structuring and data cleaning. Using python notebooks we’ll discuss
how to work with missing values, outliers and categorical data. We’ll talk about the ethics of
exploring and cleaning raw data, and how to communicate your questions and
findings to various audiences. Later in this course, you’ll learn more about the visual
analytics platform Tableau. We’ll talk about how to enhance your data
story with visuals and presentations. Along the way, I’ll share tips on shaping
your data stories for different audiences, like how to build data visualizations
that target audience’s needs. Data professionals at Google
identify the essential skills that are fundamental to working in data. Throughout this course, there will be
opportunities to practice finding and telling stories with data. In short, stories can change lives,
and data driven stories are particularly compelling
because they’re based in numbers. They can also communicate principles,
concepts, cautions, and new ideas to others. To give you an example from my own life, I actually started my career as a data
analyst for a healthcare consulting firm. I was responsible for identifying and recommending the most
effective treatments for patients with serious medical conditions
ranging from asthma to diabetes to cancer. To develop my recommendations, I analyzed the medical records of
millions of patients and compared and contrasted the outcomes, side effects and
medical costs of each treatment. Through my findings, we were able to recommend treatments that
would not only help these patients heal, but also help improve their quality of
life and reduce their medical bills. The stories are out there
just waiting to be found, so get out your shovels and magnifying
glasses and let’s start exploring.

Video: Robb: Obstacles and achievements

  • Faced racism and stereotypes regarding math/science in adolescence.
  • Rejected math/science in high school, favoring humanities and sports.
  • Surprised by data analysis work at healthcare consulting firm.
  • Took initiative to learn data analysis skills (SAS programming) after hours.
  • Transitioned to data analyst role due to acquired skills and passion.
  • Regrets initial avoidance of math/science but proud of later success.
  • Highlights lifelong learning and self-discovery possibilities.
  • Emphasizes diverse learning paths and personal pace.
  • Encourages individual learning approach and self-acceptance.

I’m rob. I’m a product marketing manager at Google. I kind of grew up in the late 90s and
the early 2000s in Boston and as an Asian American you know I
was often really ridiculed for you know all the standard
stereotypes of being Asian and among those were you know being good or
passionate about math or science and so when I got to high school I really
found myself rejecting math and science and
really trying to avoid those classes, trying to avoid studying them and leaning
more into things like humanities or even sports just because I just
wanted to fit in so badly. I was lucky enough to be hired
as essentially a literature review specialist at a economic
healthcare consulting firm. What I found when I was first hired
there was that there was this whole other arm of the company that was
devoted to analyzing millions and millions of medical records data To really
understand and help test the efficacy and safety of medications or pharmaceutical
medications to help people with really severe issues and they would
do this again by analyzing data for I think it was like over 30 million users. And when I heard about this I was so excited I said wow that is a really
cool field to potentially get into. So I had a chat with my manager who
was really open to me taking on maybe some side projects of course
after I finished my day to day job and I remember I would spend
time after work after hours, I had this textbook that was reading
up on just standard data analytics and statistics, but also something
called SAS programming, which is a statistical analysis software that was
used by the analysts at our company. And so I dove straight and head on
into that and eventually I became so fluent at it that I was able to
transition to becoming a data analyst. And during my time as a data analyst,
I was so passionate about understanding more and
more about statistics and math in general. I actually ended up taking night
courses at a local community college because it was my goal to really
try to study more about statistics. And eventually I was lucky enough to
apply to a lot of masters programs and statistics and
given my background my experience, I was fortunate enough to get into one. I really wish that I hadn’t, may be listened to the other people
kind of making fun of me and I just did what I wanted to do and
I do to a degree regret that right? And I didn’t have that foundation. But what makes me almost proud to
a degree is that like I was able as an adult to really transition into
this field and like it really just highlights in my mind that you can
really do whatever you want and accomplish whatever you want at any time
in your life, take a step back and just realize that we’re all different people
on different paths at the end of the day. You know, take your time. Each person learns in so many different
ways, whether it be through these awesome online courses that we offer or
enrolling in a university program or, you know, talking to their friends and
working with them. It’s okay. Whichever way you are best at learning. You know, the the point is take your time
and do what you need to do to learn. You shouldn’t put yourself under pressure
and compare yourself to anybody else. You know, just be you and it’s okay.

Video: Welcome to module 1

  • Topic: Foundations of data analysis and storytelling with data.
  • Focus: Six practices of exploratory data analysis (EDA).
  • Content:
    • Defining and understanding the six EDA practices.
    • Integrating EDA into the PACE workflow (plan, analyze, construct, execute).
    • Exploring the role and types of data visualizations in EDA.
  • Goal: Laying the foundation for storytelling using data.
  • Tone: Enthusiastic and welcoming.

This summary captures the key points of the introduction and highlights the focus on EDA and data visualization as the building blocks for data storytelling. It also conveys the instructor’s excitement about the course and encourages the learners to be engaged and ready to begin.

Hello again. Welcome to the first
section of this course. In the next few lessons, we’ll be
exploring the foundations of data analysis and how the six practices of exploratory
data analysis help us find and tell stories using data. We’ll begin by discussing each of the six
practices of exploratory data analysis. You’ll learn about what they are and
why they’re important. Then we’ll learn about how our data
professional workflow, PACE or plan, analyze, construct and execute fits into
the process of exploratory data analysis. Finally, you’ll learn the importance
of data visualizations in the data exploration process will consider
what data visualizations are and how data professionals use them
to learn about and share data. The next few lessons will
lay the foundation for how to tell stories using data. I hope you’re as excited
as I am to get started.

Video: Find stories using the six exploratory data analysis practices

Data Exploration Analogy: Cleaning up an old warehouse full of antiques becomes a metaphor for exploring and making sense of raw data.

Six main practices of EDA:

  1. Discovering: Familiarizing with the data, asking initial questions.
  2. Structuring: Organizing and formatting data for clarity and analysis.
  3. Cleaning: Removing errors and inconsistencies.
  4. Joining: Adding information from other data sources.
  5. Validating: Checking for accuracy and consistency.
  6. Presenting: Sharing the cleaned data and insights through visualizations.

Key Points:

  • EDA is an iterative process, often returning to different practices as needed.
  • Bias awareness is crucial in structuring and interpreting data.
  • Data visualizations enhance understanding and sharing insights.
  • Ethical data communication involves avoiding misrepresentation and providing context.
  • Uncovering stories within data is the ultimate goal of EDA.

Example: Inventorying the warehouse reveals food vendor supplies, hinting at a historical story.

Overall, this summary emphasizes the practical application of EDA and its iterative nature, using relatable examples to explain complex concepts.

Imagine a vast ocean of data, each wave a piece of information, each current a trend waiting to be discovered. As a data explorer, you set sail with six trusty tools – the EDA practices – to uncover compelling stories hidden within. This tutorial equips you with the map and compass to navigate this data odyssey.

I. Charting Your Course: Discovering and Structuring

  1. Dip Your Toes In (Discovering): Familiarize yourself with the data. What are the variables? What questions can you ask? Think of it as skimming the horizon, noting landmarks and currents.
  2. Organize the Deck (Structuring): Tidy up the data. Create categories, format dates, and ensure consistency. Imagine coiling ropes and hoisting sails, preparing for smooth sailing.

II. Battling the Data Kraken: Cleaning and Joining

  1. Scrub the Decks (Cleaning): Remove errors and inconsistencies. Missing values, typos, and outliers are the krakens of data analysis, so patch up any leaks!
  2. Expand Your Horizons (Joining): Combine data sets like merging islands on your map. This enriches your story and reveals deeper connections.

III. Verifying Your Treasure: Validating and Presenting

  1. Double-Check Your Course (Validating): Ensure your data is accurate and reliable. Think of it as checking your compass and sextant for true north.
  2. Share Your Bounty (Presenting): Visualize your findings through graphs, charts, and dashboards. Tell the story of your data with clarity and impact, like a captain presenting their map to the crew.

Remember:

  • Iteration is key: Don’t be afraid to revisit steps if needed. The data ocean is dynamic, so adjust your course as insights emerge.
  • Bias be aware: Recognize and mitigate biases in data collection and analysis. Your map should be objective, not skewed by hidden currents.
  • Visualize, visualize, visualize!: Data visualizations are your lighthouses, guiding you and others towards the heart of the story.

Now, grab your EDA compass and set sail! Explore different data sets, from weather patterns to movie ratings, and let the stories hidden within unfold. Remember, the most captivating tales are often found not on the surface, but through the meticulous and adventurous spirit of data exploration.

Bonus Tip: Practice makes perfect! Start with smaller data sets and gradually navigate towards larger, more complex oceans. Each successful story you find fuels your skills and confidence as a data explorer.

So, raise the anchor, unfurl the sails of curiosity, and embark on your journey to find stories in the vast ocean of data!

What is the process data professionals use to investigate, organize, and analyze datasets in order to summarize the data’s main characteristics?

Exploratory data analysis

Exploratory data analysis, or EDA, is the process data professionals use to investigate, organize, and analyze datasets in order to summarize the data’s main characteristics.

Imagine you work for a company that’s been
hired to clean out a very old one room warehouse full of antiques. Your manager says no one has used
the building for decades and doesn’t know what the owners used it for
for over the years. You’ve been assigned to clean it out and
create a detailed inventory of what you find inside so
that the items can be sold at auction. The manager suspects they’ll be
interesting antiques inside. Since many items are covered by blankets,
you don’t know what the room contains. You start cleaning from one side
of the room to the other and you keep a digital inventory
of the items you find. As you explore,
you find a pile of metal gears chains and connection parts of all
different sizes and shapes. You count them and write metal parts 47. You also find five large metal sheets,
12 large tricycles and seven metal poles, among other things. As you write these new items, you start
to wonder if they’re related or not. This simple example demonstrates how
data professionals explore data and learn more about the stories and
trends along the way. As you uncover and
clean up what is in front of you. The individual parts and pieces within the
data will start to tell a larger story. Of course data requires
a few extra techniques and practices that in the warehouse example. But the general idea is the same. You’ll need to make sense of raw data
content by reordering categorizing and reshaping it. You’ll find a lot of different
terms in the world of this process. Data wrangling data remediation
data munching and data cleaning. They’re all the most common. We combine all of these practices
into a familiar term that most data professionals are familiar with. Exploratory data analysis or EDA. Exploratory data analysis or
EDA is the process of investigating, organizing and analyzing data sets. And summarizing their main characteristics
often employing data wrangling and visualization methods. The six main practices of EDA
are discovering, structuring, cleaning, joining,
validating and presenting. These practices do not necessarily
have to go in this order. And depending on the needs
of the data team and the type of data they study they
may perform EDA in different ways. You’ll also find that often the EDA
process is iterative which means you’ll go through the six practices
multiple times in no particular order to prepare the data for further use. Let’s spend time on each
of the six practices, so that you have a better
idea of what I mean. Discovering is typically
EDA first practice. During this practice, data professionals
familiarize themselves with the data so they can start conceptualizing
how to use it. They review the data and
ask questions about it. During this phase data professionals
may ask what are the column headers and what do they mean? How many total data points are there? In the example of the old warehouse at
the beginning of the video the discovering practice might involve
walking through the room. And removing coverings to get an idea
of the amount and types of items. After some initial discovering
the next step is to start organizing, this EDA practice is called structuring. Structuring is the process of
taking raw data and organizing or transforming it to be more easily
visualized, explained or modeled. Structuring refers to categorizing and organizing data columns based on
the data already in the data set. In terms of the calendar data for
example it might look like categorizing data into months or
quarters rather than years. From the old warehouse analogy, structuring could be categorizing
the items into metal and non metal categories and
getting a total count for each type. Before we move to
the next practice of EDA, let’s take a moment to talk about bias. Bias in the context of data structuring is
organizing data in groupings categories or variables that don’t accurately
represent the whole data set. Most experts would agree that eliminating
all bias from how data is structured is almost impossible. Because each person’s individual ideas,
training and experiences are different. As professionals however it’s important to
try to avoid bias while structuring data. For example imagine you want to know
what percent of the population has a college degree in the UK. But your model data only
contains London residents. The data would be considered bias until
you add data from the rest of the UK. In our warehouse example trying
to avoid bias might involve remaining flexible with the categories, potentially creating new categories
as you uncover more and more items. The next EDA practice is data cleaning. Cleaning is the process of removing
errors that may distort your data or make it less useful. Missing values, misspellings, duplicate
entries or extreme outliers are all fairly common issues that need to be
addressed during the data set cleaning. In the warehouse example you
might decide to put broken or unusable items in a separate
box away from other items. We’ll move to another
practice called joining. Joining is the process of augmenting or adjusting data by adding
values from other data sets. In other words,
you might add more value or context to the data by adding more
information from other data sources. For example, you might find during
the discovering structuring. Or cleaning processes that the data
set doesn’t have enough data for you to complete a specific project. In that case you should enrich
the data by adding more to it. As an example remember when
I talked about the UK and college degrees to help understand bias. In that instance, joining would be
adding the data from the rest of the UK rather than just including
data on London residents. Going back to our warehouse analogy,
imagine a museum manager sorts through the items and
gives you the dates of when each was made. The information from the museum manager
acts as a different group of data. You can join with your own
as you inventory the items. Next on the list of EDA
practices is validating. Validating refers to the process of
verifying that the data is consistent and high quality. Validating data is the process for
checking for misspellings and inconsistent number or date formats. And checking that the data cleaning
process didn’t introduce more errors. Data professionals typically use digital
tools such as R, JavaScript or python to check for inconsistencies and
errors in a data set and its data types. As for our warehouse analogy validating
could be like using the museum manager’s knowledge to get
an idea of how old the items are. The last EDA practice is presenting. Presenting involves making
your cleaned data set or data visualizations available to others
for analysis or further modeling. In other words presenting practice is
sharing what you’ve learned through EDA. And asking for feedback whether in
the form of a clean data set or data visualization. We will be using the term
data visualization a lot. To be clear a data visualization
is a graph chart diagram or dashboard that is created as
a representation of information. You might think that presenting always
comes at the end of the EDA process. However presenting can come
at any point in the EDA. Data visualizations aren’t exclusive
to the presenting practice. They should be used throughout the EDA. They help you understand data and
point out trends and insights to others. In the warehouse analogy presenting
could mean showing your manager the progress on the warehouse and
how many different items were found. In the workplace as a data professional
presenting might look like preparing visuals and a slide presentation
to share with your team. As you begin to plan your presentation,
you should consider people with visual or auditory impairments by providing
robust descriptions of the data. You can use things like all text
descriptive text or captioned recording of the data so that your audience
can explore the data themselves. We will cover each of the EDA
practices in detail later on. But one of the most important things
to learn about the process is to ensure your EDA work does not
misrepresent the data itself. The story you uncover
should come from the data, not from your mind or biases in the data. It is your duty to convey your data in
both an ethical and accessible way. Consider the warehouse analogy to ensure
you communicate your data ethically. You should give an accurate
count of the materials and not overestimate the quantities
of the antiques. In the workplace communicating data
ethically would be presenting sales numbers in context year over year. So that rises and falls don’t appear
exaggerated in data visualizations. Once you complete the warehouse project,
you realize you’ve uncovered a story, you find a dusty pair of signs
that advertise food for sale. Suddenly, the items you found
start to make sense as a whole. These antiques appear to be a collection
of supplies used by a group of traveling food vendors. Now, how about that for
discovering a story? Of course, you should confirm this information
using historical records before sharing.

Reading: Case study: Deloitte

Reading

Video: Benj: Data science and storytelling

Benj, a product analyst at Google, leads a team focused on understanding Google Chrome through real-world usage and insights. Exploratory data analysis is crucial in his work, involving a deep examination of new data to grasp its context, sources, and potential biases. Storytelling is emphasized to effectively convey insights and drive change. Benj advocates for checking biases, approaching analyses with few preconceptions, and ensuring ethical handling of data. He emphasizes the importance of curiosity, encouraging continuous learning for effective data analysis.

Hi, I’m Benj. I’m a
product analyst at Google. I lead a team of analysts
attempting to understand Google Chrome through
real-world usage and insights. Exploratory data analysis is a key piece of my work
and in particular, when you get a new piece of data or when you
have a new question, the first thing
you have to do is take a deep look
through the data that’s there and understand
what’s in front of you. You have to understand
who the data is from. You have to understand why it
was created and what is and isn’t there and what important caveats are
sitting on top of it. Storytelling is the way that your insights make it to other people and
really make change. It’s oftentimes the
case that will make reports that are
tens of pages long. But what really
changes the way people think is when you can tell a short and clear story
about what’s going on. A really good way to
tell a story with data is to think about
categories of users, categories of devices, or categories of use cases. When you can tell
people that a lot of people are using Chrome
on low-end phones in India. It tells a story about who your user base is and
what they’re doing and knowing who they are
and knowing what that looks like allows them to think differently
about who the audience is, and what product
they should build. One thing I try to
do to make sure that I’m really
staying true to data, is trying to check my
own biases and make sure that I am giving back
what I actually see. One way that I
attempt to make sure that my reporting
isn’t biased is to try to come into new analyses with as few
preconceptions as I can. You need to know some
things to be true and to verify them
against the dataset to make sure that users aren’t spending 27 hours a day on
their phone because there aren’t 27 hours in a day but to not assume that everyone
is going to look the same. To not assume that all places in the world are going to act the same when it
comes to new data. It’s very important
ethically to make sure that your data is
working for everyone. To make sure that if you
try to remove identifiers or if you try to take data
and depersonalize it, that you’ve really done
that well and effectively. The simplest methods
aren’t always the best. It isn’t enough to just say, I tried something
and I need to stop. You have to explore
a little bit more and make sure you’re really
de-identifying data. Curiosity is one of the
most important traits for an analyst in general and by trying to
learn something new, you’re showing that
curiosity already.

Practice Quiz: Test your knowledge: Tell stories with data

Fill in the blank: The presenting stage of exploratory data analysis involves sharing _____, which can include graphs, charts, diagrams, or dashboards.

During which exploratory data analysis practice might a data professional familiarize themself with the meaning of column headers in a dataset?

If sampled data is organized in such a way that it does not accurately represent its population as a whole, what problem will occur?

Use PACE to inform EDA and data visualizations


Video: Combine PACE and EDA practices

This video addresses the importance of targeted curiosity in data analysis, where exploration is guided by purpose. The PACE framework (Plan, Analyze, Construct, Execute) and the six practices of EDA (Discover, Clean, Join, Validate, Structure, Present) are presented as tools to achieve this balance.

The key takeaways are:

  • Align EDA with PACE: Each EDA practice intersects with PACE stages. Planning guides exploration, cleaning aligns with construction, and presenting falls under execution.
  • Communication is crucial: Miscommunication between stakeholders and data professionals leads to confusion and wasted effort. Share PACE plans and analysis for feedback to ensure alignment.
  • Prioritize accurate data representation: Data professionals hold the responsibility to communicate data limitations and advocate for complete data sets when making projections, even under pressure.

Examples:

  • Office equipment company: Predicting sales based on 10 years of data requires extracting relevant columns, not all available data.
  • Hospital data: Grouping treatments by item type in data structuring helps prepare accurate purchase orders for supplies.

Overall, the video emphasizes the importance of using PACE and EDA as complementary tools to ensure focused, ethical, and effective data analysis.

Data analysis can be exhilarating – diving into the unknown, unveiling hidden patterns, and crafting compelling stories. But without focus, this curiosity can lead us astray. This is where PACE and EDA come in, guiding us towards meaningful insights aligned with a specific goal.

PACE:

  • Plan: Establish the project’s purpose, stakeholders’ expectations, and desired outcomes.
  • Analyze: Explore the data using EDA practices to understand its characteristics and limitations.
  • Construct: Build models, visualizations, or other deliverables that address the plan’s objectives.
  • Execute: Share findings, communicate insights, and implement solutions or recommendations.

EDA:

  • Discover: Explore the data to identify its scope, format, and potential issues.
  • Clean: Address missing values, inconsistencies, and formatting errors.
  • Join: Integrate relevant datasets to create a comprehensive picture.
  • Validate: Check for data accuracy and ensure results are statistically sound.
  • Structure: Organize data into a format suitable for analysis and visualization.
  • Present: Communicate findings effectively through visualizations, reports, or narratives.

The Magic Mix:

Now, let’s see how PACE and EDA work together in practice:

Scenario: You’re a marketing analyst tasked with increasing app downloads for a travel booking platform.

Plan:

  • Goal: Increase app downloads by 20% within 6 months.
  • Stakeholders: Marketing team, app development team.
  • Data sources: User demographics, app usage data, marketing campaign data.

Analyze:

  • Discover: Explore user demographics, identify popular features, and analyze download trends.
  • Clean: Address missing locations in user data, standardize date formats, and remove outliers.
  • Join: Combine user data with campaign data to understand campaign effectiveness.

Construct:

  • Target group segmentation: Based on demographics and app usage, identify user groups with high download potential.
  • Personalized ad campaigns: Design targeted ads based on user segments and preferred travel destinations.
  • App feature optimization: Analyze popular features and prioritize improvements based on user feedback.

Execute:

  • Present findings: Create compelling visualizations and reports to showcase insights and proposed actions.
  • Collaborate with stakeholders: Discuss recommendations with both marketing and app development teams.
  • Track and measure results: Monitor campaign performance and app download metrics to evaluate the success of the initiative.

Remember:

  • Iteration is key: Be prepared to adapt your plan based on new findings and evolving goals.
  • Communication is crucial: Keep stakeholders informed throughout the process to ensure alignment and buy-in.
  • Data integrity is paramount: Ensure your analysis is based on accurate and reliable data.

By combining PACE and EDA, you can transform your curiosity into impactful data-driven decisions. So, go forth, explore, analyze, and craft stories that not only delight, but also deliver!

Bonus Tip: Use tools like Jupyter notebooks or RStudio to combine code, analysis, and visualizations in a single document, increasing transparency and reproducibility.

I hope this tutorial gives you a roadmap for navigating the exciting world of data-driven insights!

When a data professional discusses a project plan and company goals with stakeholders, which element of the PACE model are they engaged in?

Plan

When a data professional discusses a project plan and company goals with stakeholders, they are engaged in the plan element of the PACE model. During planning, data professionals define the scope of the project and identify the informational needs of the organization.

Have you ever walked
into a room to grab a specific item and then become distracted and
forget why you were there? As data professionals, our curiosity and
excitement for finding stories in data might
cause us to forget the original purpose for
our data exploration. Obviously, we want to maintain our natural curiosity,
but most importantly, we want to focus that
curiosity on what questions need to be answered or what problems
need to be solved. We seek a balanced mindset, one of targeted curiosity. This balance can be
achieved by using PACE. As you learned PACE or
plan, analyze, construct, and execute is the workflow
some data professionals use to remain focused on the end goal of any
given data set. Imagine you work for a
multinational company that manufactures
office equipment. The finance department
ask you to use the last 10 years of data to predict sales for
the next six months. How might you approach your EDA of the data
given this task? In this video, you’ll
learn to combine PACE and EDA practices. Remember, EDA stands for
exploratory data analysis. You might think that because
analysis is in its name that EDA falls only in the analyzed part of
the PACE workflow, the truth is, EDA applies
to every part of PACE. You’ll find the six practices of EDA all intersect with
the other parts of PACE. Discovering, for example, is in line with the planning
part of PACE and presenting can be a major part of the
executing part of PACE. As you will learn
data professionals don’t just find and tell any
random story from the data. Data insights are guided by a project’s purpose and goals. The plan may come from a
stakeholder or manager. You can think of it as the project plan or the
stated goal of the company, for example, let’s consider the sales prediction from the office equipment company
earlier in the video. As you recall, your task is to predict the
next six months of sales performance based
on 10 years of sales data. When you start your EDA, you realize the datasets
you’ve been given contain a lot more
data than you need. It would benefit you
and your company to extract only the columns
you need to predict sales, which would be,
date of sale, item, price, and sales rep for
a total of four columns. While cleaning, joining and validating you can exclude
data on material cost, item ID numbers, database cost center numbers,
and vendor names. Cleaning 10 years
of data sets with four columns is much more
manageable than eight columns. Of course, in a
typical workplace, it shouldn’t surprise
you that after you’ve done a lot of the work
to complete your EDA based on the plan the
finance department says they actually wanted to
predict the profit margin, not the sales targets. Miscommunication happens
in every workplace, even in data analysis. If stakeholders, engineers and data specialists are
not clear on the plan, the results won’t tell a
cohesive or effective story. Instead, the results
will likely lead to; confusion, disagreement,
and wasted time. One example of good
communication would be sharing the PACE plan with
anyone who might be involved. Another example could be
sharing the analysis with a working group to get feedback before sharing the
analysis more broadly. Finally, it’s important
to understand stakeholders’ most
important goals for the company before
presenting them. We will address details about communication strategies
in upcoming videos. The point is, when you have
a dataset in front of you, remind yourself of the
reasons for your analysis. Once again, consider the
office equipment company, you’ve been given two datasets, one with transaction
ID number and date, the other with transaction
ID number and total cost, you’re asked to forecast the next six months’
sales numbers. If you consider the
purpose of this task, what is something you might
do with the two datasets? You might consider the
EDA joining practice and include all data from
the two datasets together, transaction ID, date
and total cost. For another example,
let’s say you have a hospital’s data on treatments performed
over the last year. If the goal is to
prepare data to make a purchase order
for the materials and supplies for
the upcoming year, what type of EDA
might you consider? It would be a good idea to start by using the
EDA practice of structuring to group the data by the type of items
needed for each treatment. Data professionals
are set up for success when they follow
a framework like PACE. When a professional
applies that framework to the way they perform the
six practices of EDA, they keep priorities
in order and they focus on achieving the
project’s purpose. Of course, as data
professionals, our first priority is to accurately represent
the data itself. If your company’s
project plan does not align with what the
data is telling you, it is your responsibility to communicate that
to stakeholders. Let me return to
the sales forecast requests at the office
equipment company. Imagine you’re given data on sales revenue specific from
two geographic regions, but stakeholders are
requesting a global forecast. As a data professional, it is your responsibility
to ask for globally representative
data to complete the task, working with data
from only two regions would be inadequate to
make a global projection. Timelines, stakeholder pressure
or client needs should never cause a data
professional to bypass what is
required by the data, misrepresentation of
data isn’t never warranted. Let’s review, data
professionals should strive to align their work
with the plan part of PACE. Keeping the focus on PACE helps determine the most
effective ways to perform EDA practices while maintaining ethical
representations of data. Coming up, we’ll explore
how PACE can help guide the development of data visualizations.
I’ll meet you there.

Reading: Reference guide: The EDA process

Video: PACE with data visualizations

Key Points:

  • Data visualizations are far more effective than tables for quickly understanding complex data.
  • Data professionals use visualizations throughout the EDA process, especially for presenting findings.
  • Visualizations help:
    • Understand large datasets with thousands of data points.
    • Identify trends, biases, and stories within the data.
    • Communicate insights to different audiences (manufacturing team vs. executive leadership).

Considerations:

  • Design visualizations based on your audience and their needs.
  • Use data ethically and avoid misrepresenting the data.
  • Ensure accessibility for individuals with color blindness.

Tools and Resources:

  • Digital visualization tools like Tableau and Python libraries like Matplotlib, Seaborn, and Plotly.
  • Upcoming videos will cover these tools in detail, alongside data ethics and accessibility.

Overall:

  • Data visualization is a crucial skill for data professionals to tell compelling and accurate stories with data.

This summary captures the main points of the text, emphasizing the importance and benefits of data visualization in EDA. It also highlights essential considerations like audience awareness, ethical presentation, and accessibility. The mention of upcoming videos and resources makes the summary informative and inviting for further learning.

Welcome to the world of PACE, where data visualizations become your superpower!

In this tutorial, you’ll discover how to effectively use visual tools to:

  • Plan your analysis:
    • Create mind maps to organize questions and hypotheses.
    • Use flowcharts to outline analysis steps.
    • Map relationships between variables with network diagrams.
  • Acquire and prepare data:
    • Visualize data distributions and identify anomalies with histograms and box plots.
    • Spot missing values and correlations with heatmaps and scatter plots.
    • Track data cleaning and transformation progress with process maps.
  • Clean and condition data:
    • Monitor data quality with bar charts and time series plots.
    • Evaluate feature engineering results with parallel coordinates plots and t-SNE visualizations.
  • Explore and visualize data:
    • Uncover hidden patterns and relationships with scatter plots, line plots, and bar charts.
    • Identify clusters and groups with heatmaps, dendrograms, and PCA visualizations.
    • Reveal trends and outliers with box plots, violin plots, and distribution plots.

Key Principles:

  • Match the visualization to the task: Choose the right visual to highlight specific insights.
  • Design for clarity and impact: Use clear labels, annotations, and a focused message.
  • Guide the viewer’s eye: Use visual cues to direct attention to key points.
  • Prioritize accessibility: Ensure visualizations are usable by people with visual impairments.
  • Maintain data integrity: Avoid misleading or inaccurate representations.

Hands-On Activities:

  • Practice with real-world examples: Apply PACE principles to diverse datasets.
  • Experiment with different tools: Explore Tableau, Python visualization libraries (Matplotlib, Seaborn, Plotly), and other platforms.
  • Get feedback and iterate: Share visualizations with peers and refine based on feedback.

Additional Tips:

  • Tell a compelling story: Use visualizations to weave a narrative that engages your audience.
  • Highlight key findings: Emphasize important takeaways with visual cues.
  • Prompt questions and discussions: Use visualizations to spark curiosity and exploration.

Ready to unleash the power of PACE and data visualizations? Let’s dive in and bring your data to life!

Let’s review a scenario, shall we? You’re given a giant table of numbers and
a data visualization, which would you choose to most
quickly communicate data insights? Data visualizations are far more effective
at quickly communicating complex information than data tables. Now, let’s consider how
this relates to EDA. Data visualizations are important tools
that data professionals use to tell stories with data throughout their
workflow particularly the EDA practice of presenting. Plotting parts or all of your data
set on a bar graph, a scatter plot, a pie chart, or
a histogram will help you and others to understand the data no matter
where you are in the EDA process. As a data professional, you won’t be
working with simple clean data sets that have only a few 100 lines of data. You’ll be working with dataframes
that have thousands or hundreds of thousands of data points
spanning months, years or decades even. The more data you have, the more
visualizations you’ll need to create to understand how each
variable impacts the other. When you start looking at new data,
a popular and valuable tactic is to visualize it. Like time series data,
on line charts to understand periodicity or scatter plots to get a good
idea of data distribution. Data visualization can also help you
explain your data set to stakeholders and other data professionals. It is your job to discover
the important points, trends, biases, and
stories that need to be shared, then design data visualizations in
an effective way for different audiences. For example, imagine you’re
a data professional who works for an appliance manufacturer. Let’s say, you perform an analysis for
the manufacturing team. During the analysis you discover
a delay in the manufacturing process. You need to communicate these
findings to two different audiences, the manufacturing supervisors and
the executive leadership team. When you share your findings, some
stakeholders may not understand the data insights without the help
of data visualizations. The ways data visualizations are designed
will communicate different messages to stakeholders with vested interests
in different parts of the project. For example, the manufacturing
supervisors might need to review time series data plotted out over time to
identify the manufacturing delays. Meanwhile, the executive leadership
team would likely be more interested in financial impact analysis. Those two data visualizations would
need to be designed differently based on the needs of your audience. We will go into detail on exactly how
to do that in the upcoming videos, but in the meantime, understand that
the balance of words and data visuals in a presentation impacts the business
decisions made from data insights. A carefully prepared data visualization,
can mean the difference between changing the mind of a stakeholder and
the data being ignored. However, visualizations can also cause
confusion or even misrepresent the data. For instance, imagine a data professional
develops a visualization and changes the scale of the axis or
the ratio of the graph height and width to make the line chart look flat or
steep. This example of skewing is the opposite
of what data analysis about. Because you will be the person most
familiar with the data in its story, it is essential that your visualizations
not mislead your audiences. Returning again to the manufacturing
equipment, example, if you provide a data visualization
showing a sharp increase in sales over the last six months, your audience might
assume the company is doing very well. But if data from the last two years
shows the six month increase comes right after a long 18 month decline, the
last six months has a different context. Showing only the last six months
as opposed to the two years, is misrepresenting the sales data. Your data driven storytelling,
is an opportunity to present facts and visualizations that are ethical,
accessible and representative of the data. Being an ethical presenter of the data,
you analyze, means being honest and very clear
about what is and isn’t in the data. Along with that, remember to design your
data visualizations in a way that is accessible to everyone. For example,
avoid pairing red and green and data visualizations as it can be difficult
for people with color blindness to read. Generally, blue and orange are the better
choice to use in data visualizations. We’ll go into more detail and ethics and
accessibility in upcoming videos. In other videos, we’ll use digital
visualization tools like Tableau and Python packages like Matplotlib, and
Seaborn, and Plotly to understand how to use your data and graphs and
charts in ethical and accessible ways. These tools will be part of your
everyday work as a data professional. Remember, creating visualizations
about your data sets will help you throughout the entire EDA process. There’s often no better tool for telling a good data driven story
than a well made visualization.

Practice Quiz: Test your knowledge: How PACE informs EDA and data visualizations

What are the primary drivers of a data-driven story? Select all that apply.

Fill in the blank: In order to help avoid _____ in the workplace, data professionals share the PACE plan with stakeholders and team members.

Why is it important to maintain proper scale of a graph’s axes in a data visualization?

Review: Find and share stories using data


Video: Wrap-up

  • Focus on story-telling: The text encourages data professionals to not only find stories within data, but also craft their own story of becoming a data professional.
  • Beyond coding: While coding is important, data professionals need diverse skills like exploratory data analysis (EDA) for cleaning, structuring, and presenting data.
  • EDA in practice: The course emphasizes the importance of EDA in daily work and introduces the PACE workflow for effective data exploration.
  • Data ethics and visualization: Ethical data representation and accurate data visualization are crucial for creating impactful stories.
  • Future learning: The course will involve using Python tools for EDA and working with realistic data sets, preparing students for real-world scenarios.
  • Impactful stories: The data you analyze has the potential to impact companies and even the world.
  • Continuous journey: The instructor expresses excitement for the shared learning journey and encourages students to discover new concepts and their own potential as data professionals.

Key takeaways:

  • Data analysis is more than just technical skills; it involves creativity, communication, and ethical considerations.
  • EDA is a vital skill for effective data exploration and story-telling.
  • This course aims to equip students with the necessary skills and knowledge to become successful data professionals.

We’ve talked a lot about
finding stories within data and about how it is your job to tell those
stories to the best of your ability. There is however, another story weaving
its way through the readings, videos and quizzes and is still continuing to
unfold as you listen to this video. And if it isn’t clear, it’s
the story of you! Coming this far in the program shows you are determined
to become a data professional. I hope you feel inspired by
the idea of defining not only the stories of your data but
your own story as well. I’m sure you noticed that this part of the
course had very little coding instruction in it. That’s by design. The job of data professionals is about
more than coding and statistical modeling. The first part of the course reflects
that. It is also important to recognize how exploratory data analysis or EDA applies to finding, sculpting, and
telling data-driven stories. If you talk to data professionals in
the career space, many will share how beneficial EDA is in day-to-day work. So I took some time to talk
about the practices of EDA– discovering, structuring,
cleaning, joining, validating, and presenting–and
review the value of these practices. We talked about the PACE workflow and how it can apply to EDA in our search for
stories within the data. You also learned about the ethics
of working with data and the importance of representing data
accurately when we tell data stories. Lastly, you learned how data
visualizations are essential for understanding, forming, and
presenting data-driven stories. You learned that telling a data-driven story
that accurately represents the data and is inclusive of its audience is
an example of exceptional data analysis. The story doesn’t end here though. We have many more concepts
to learn in this course, including using Python tools and
coding blocks to help you in your EDA practices of discovering,
structuring, and cleaning raw data sets. In this program, you will work with
data sets that represent the type of raw data you’ll likely see every day in
your career as a data professional. Remember, all data has stories to tell. These stories are often hidden well, and
it will take an especially curious and determined professional to discover them. The data driven-stories you’ll be
discovering have the potential to change an entire company or
even the world. I am looking forward to continuing our
journey together through the rest of this course as you discover questions, solve problems,
and learn new concepts. I hope with each new concept you learn, you discover a
little more about your own story as well. I’ll see you soon

Reading: Glossary terms from module 1

Terms and definitions from Course 3, Module 1

Quiz: Module 1 challenge

Fill in the blank: The exploratory data analysis process is_____, which means data professionals often work through the six practices multiple times.

A data team leader at a clothing manufacturer reviews a dataset that will be used to decide where to open new retail stores. They conceptualize how their analytics team can most effectively use the dataset. Which exploratory data analysis process does this scenario describe?

What procedures take place during the structuring exploratory data analysis step? Select all that apply.

Which of the following statements correctly compare data cleaning to data validation during exploratory data analysis? Select all that apply.

Fill in the blank: In exploratory data analysis, _____ is the process of augmenting a dataset by adding values from other sources.

What steps may be involved with presenting data insights to others during exploratory data analysis? Select all that apply.

Fill in the blank: To avoid miscommunication in the workplace, data professionals can share _____ with a working group to get early feedback.

A data professional works on a project that uses data from a study about teachers in Australia. They apply the PACE framework to perform exploratory data analysis practices effectively. Which of the following objectives will this help them achieve? Select all that apply.