In this end-of-course project, you’ll practice using Python to perform EDA on a workplace scenario dataset. Then, you’ll use Python and Tableau to visualize the data.
Learning Objectives
- Describe key findings for a relelvant audience member
- Create accessible visualizations with Tableau to summarize key insights from the practice of EDA
- Conduct Exploratory Data Analysis on a data science problem
- Analyze a given dataset to share the story, or key insights, for a given data science problem
- Demonstrate how to share stories with data by completing a portfolio project
Apply your skills to a workplace scenario
Video: Welcome to module 5
Summary of “Your Portfolio Project and Job Search”:
Key points for data professionals seeking job opportunities through portfolio projects:
- Showcase data-driven story telling: Your portfolio should demonstrate your ability to use data to tell compelling stories and communicate insights effectively. This is a crucial skill for data professionals.
- Address potential interview questions: Use your portfolio to address common interview questions about data cleaning, structuring, and validation by showcasing real-world challenges you’ve tackled and presented insights from.
- Prepare for data visualization presentations: Hone your skills and build confidence for interview presentations by creating visualizations based on provided datasets using the skills you learned in Python and Tableau.
- Experiential learning: Actively creating visualizations yourself strengthens your understanding and presentation skills compared to passively observing.
- Demonstrate industry relevance: Show potential employers how you stay current with data analytics trends and your ability to apply them to real-world scenarios.
- Project details: You’ll work with a database and business scenario, performing EDA in Python and creating 3-5 Tableau visualizations to answer the scenario’s questions.
- Outcome: A comprehensive portfolio presentation and documented workflow using the PACE strategy for explaining your work to hiring managers.
Overall message: This project is an excellent opportunity to practice and showcase the skills learned throughout the course, positioning you well for success in your job search as a data professional.
Hi, I’m Tiffany, and it’s
great to be with you again. You’ve made a lot of
progress in the program. I’m back to tell you more about your portfolio projects
and how you can use them in your
future job search. Remember, your portfolio will be a collection
of materials you can use to showcase your approach to solving
data-oriented problems. In the portfolio project
for this course, you will demonstrate
your knowledge of telling stories using data. It’s an incredibly
important skill for a data professional
to know on the job but it’s also critical for success
in an interview. As potential employees
assess you as a candidate, they might ask you
for specific examples of how you approach cleaning, structuring, and validating
data in the past. You can use your portfolio
as a way to discuss actual data challenges you have resolved and real stories
you have presented. Additionally, some
employers might ask you to create a presentation based
on a dataset they provide. The skills you’ve learned in
Python and Tableau to create data visualizations
will help you feel more comfortable and prepare
for those interviews, as well as build
out your portfolio. You already learned about
experiential learning, which is when people gain
understanding through doing. Watching an instructor create a visualization is one thing. Creating a
visualization yourself to expand your understanding of the concepts and
improve your skills at presenting to
stakeholders is another. This portfolio project is also a great opportunity to
discover how organizations are using data analytics
every day and show off your knowledge of how to
tell data-driven stories. To complete the
portfolio project, you’ll be working
with a database and an accompanying
business scenario. You’ll use instructions
to complete a Jupyter Notebook showing your EDA work and
3-5 visualizations in Tableau in response
to the scenario. By the time you
complete this project, you have a comprehensive
presentation you can add to your data
professional portfolio. In your PACE strategy document, you’ll also have
documentation of the steps you took
along the way, which you can use to explain your work to future
hiring managers. You’re almost finished
with this course, which means you’re advancing
your understanding of what it means to be
a data professional. Now it’s time to demonstrate
what you’ve learned. This portfolio project
will help you practice and demonstrate
the skills you’ve learned throughout the course. For example, you’ll be
able to show how to perform the practices
of EDA in Python. Using Tableau, you’ll
demonstrate how to create data visualizations
that accurately detail a dataset story, and you’ll be able
to demonstrate how to prepare and document a comprehensive workflows
strategy using PACE. Ready? Let’s go.
Video: Introduction to your Course 3 end-of-course portfolio project
Summary: Portfolio Project and Data Storytelling Journey
Key points:
- You’ve learned the 6-step EDA process for data storytelling.
- Apply these steps in a portfolio project: clean, analyze, and present data for various audiences.
- Refine your data cleaning and reframing skills to create compelling stories.
- Develop a document presenting key findings and showcasing your analysis.
- Learn additional skills to succeed in your data career.
- Data professionals transform messy data into clear stories for business decisions.
- This project shows employers your ability to create such stories from raw data.
- Data skill development is iterative, so continuous learning is crucial.
Overall message:
This project is your chance to apply your EDA skills, create impactful data stories, and demonstrate your potential as a data professional to employers. Embrace the ongoing learning journey to further your data expertise.
In this course, you’ve been learning
about using the six steps of the EDA process to tell stories with data. Now it’s time for an exciting next
step putting all this to work for your portfolio project. In the previous course you learn
the basic formatting and structure for writing code in python. These coding skills will be
critical to the completion of your next portfolio project which
will require you to clean, analyze and present data to technical and
non technical audiences. Not that you have some practice completing
portfolio projects, think about how to reframe the data, you’re analyzing and
cleaning it into a well thought out story. In this part of the course
you’ll take a data set and apply the six steps of EDA to formulate
a useful document that can help you present key findings to stakeholders
in other sections of this program. You will work to develop additional
skills to help you thrive in the data career space. There’s so much more to learn about telling stories
with data as a data professional. A large part of your job is focused on
transforming messy data into an organized, clear story that meets business schools
and help stakeholders understand important details needed for
making business decisions. This portfolio project is a great
opportunity to demonstrate to potential employers that you can do exactly that
turn messy data into a logical story. And remember developing your skills as a
data professional is an iterative process. So you can continue to improve as you
have new ideas or learn new things
Reading: Explore your Course 3 workplace scenarios
Reading
Overview
This certificate offers you a choice of several different workplace scenarios to use when completing each end-of-course project:
- Automatidata, featuring a fictional data consulting firm
- TikTok, created in partnership with the short-form video hosting company
- Waze, created in partnership with the realtime driving directions app
Each scenario offers you an opportunity to apply your skills and create work samples to share when applying for jobs; so, you will be practicing similar skills regardless of the workplace scenario. It is recommended that you work with the same scenario for each end-of-course project to have a cohesive experience. However, you are welcome to investigate any of the workplace scenarios you are interested in as you progress through the program.
eminder: We recommend that you choose one workplace scenario to follow for all end-of-course projects to ensure end-to-end project development.
The minimum requirement to earn your Advanced Data Analytics Certificate is to complete the end-of-course project, using one workplace scenario, for each course. You may complete the project for as many of the workplace scenarios as you wish. Completing the project for more than one workplace scenario in a single course offers you additional practice and work examples you can add to your portfolio and share with prospective employers during your job search.
This reading offers an overview of all available workplace scenarios. Before moving on, identify the scenario you would like to complete for the Course 3 end-of-course project.
Course 3 workplace scenarios
Automatidata
Project goal:
In this fictional scenario, the New York City Taxi and Limousine Commission (TLC) has approached the data consulting firm Automatidata to develop an app that enables TLC riders to estimate the taxi fares in advance of their ride.
Background:
Since 1971, TLC has been regulating and overseeing the licensing of New York City’s taxi cabs, for-hire vehicles, commuter vans, and paratransit vehicles.
Scenario:
The New York City TLC data is ready for exploratory data analysis (EDA) in Python. You will need to clean, join, validate, and create a visualization for the taxi commission data. The findings will be shared with internal stakeholders from different departments within Automatidata.
Course 3 tasks:
- Load data, explore, and extract the New York City TLC data with Python
- Use custom functions to organize the information within the New York City TLC dataset
- Build a dataframe for the New York City TLC project
- Create an executive summary for Automatidata for a general audience of internal professionals
Note: The story, all names, characters, and incidents portrayed in this project are fictitious. No identification with actual persons (living or deceased) is intended or should be inferred. And, the data shared in this project has been created for pedagogical purposes.
TikTok
Project goal:
The TikTok data team is developing a machine learning model for classifying claims made in videos submitted to the platform.
Background:
TikTok is the leading destination for short-form mobile video. The platform is built to help imaginations thrive. TikTok’s mission is to create a place for inclusive, joyful, and authentic content–where people can safely discover, create, and connect.
Scenario:
It is now time to begin the process of exploratory data analysis (EDA). As a data analyst on TikTok’s data team, you will complete the EDA process for the claims classification project. You’ll also use Tableau to create visuals for an executive summary to help non-technical stakeholders engage and interact with the data.
Course 3 tasks:
- Imports of relevant packages and TikTok data into Python
- EDA and cleaning
- Assess Tableau measures and dimensions
- Select and build visualization(s) type
- Create plots to visualize variables and relationships between variables
- Share your results with the TikTok team
Note: The story, all names, characters, and incidents portrayed in this project are fictitious. No identification with actual persons (living or deceased) is intended or should be inferred. And, the data shared in this project has been created for pedagogical purposes.
Waze
Project goal:
Waze leadership has asked your data team to develop a machine learning model to predict user churn. An accurate model will help prevent churn, improve user retention, and grow Waze’s business.
Background:
Waze’s free navigation app makes it easier for drivers around the world to get to where they want to go. Waze’s community of map editors, beta testers, translators, partners, and users helps make each drive better and safer.
Scenario:
Your team is still in the early stages of their user churn project. So far, you’ve completed a project proposal, and used Python to inspect and organize Waze’s user data. Now, the data is ready for exploratory data analysis (EDA) and further data visualization.
Course 3 tasks:
- Clean data
- Handle outliers
- Perform EDA
- Visualize data
- Share an executive summary with the Waze data team
Note: The story, all names, characters, and incidents portrayed in this project are fictitious. No identification with actual persons (living or deceased) is intended or should be inferred. And, the data shared in this project has been created for pedagogical purposes.
Key Takeaways
In Course 3, Go Beyond the Numbers: Translate Data into Insights, you explored the process of exploratory data analysis (EDA). You learned to Identify the core steps, basic methods, and benefits of structuring and cleaning data. Additionally, you investigated raw data using Python, and created data visualizations using Tableau
Course 3 skills:
- Conduct exploratory data analysis
- Create data visualization with Tableau
- Expand knowledge of Python coding
- Share insights and ideas with stakeholders
Course 3 end-of-course project deliverables:
- Complete EDA with workplace scenario dataset using Python
- Executive summary including a Tableau visualization
The end-of-course portfolio projects are designed for you to apply your data analytical skills within a workplace scenario. No matter which scenario you work with, you will practice your ability to discuss data analytic topics with coworkers, internal team members, and external clients.
As a reminder, you are required to complete one project for each course. To gain additional practice, or to add more samples to your portfolio, you may complete as many of the scenarios as you wish.
Automatidata scenario
Reading: Course 3 end-of-course portfolio project overview: Automatidata
Reading
Learn about the Course 3 Automatidata workplace scenario!
The end-of-course project in Course 3 focuses on your ability to use exploratory data analysis to organize and understand the data within a project. The end-of-course projects were designed with you in mind, offering an opportunity for you to practice and apply your data analytic skills. The materials provided here will guide you through discussions with co-workers, internal team members, and external stakeholders.
Learn more about the project, your role, and expectations in this reading.
Background on the Automatidata scenario
Automatidata works with its clients to transform their unused and stored data into useful solutions, such as performance dashboards, customer-facing tools, strategic business insights, and more. They specialize in identifying a client’s business needs and utilizing their data to meet those business needs.
Automatidata is consulting for the New York City Taxi and Limousine Commission (TLC). New York City TLC is an agency responsible for licensing and regulating New York City’s taxi cabs and for-hire vehicles. The agency has partnered with Automatidata to develop a regression model that helps estimate taxi fares before the ride, based on data that TLC has gathered.
The TLC data comes from over 200,000 taxi and limousine licensees, making approximately one million combined trips per day.
Note: This project’s dataset was created for pedagogical purposes and may not be indicative of New York City taxi cab riders’ behavior.
Project background
Automatidata is working on the TLC project. The following tasks are needed before the team can begin the data analysis process:
- EDA and cleaning
- Select and build visualization(s) type
- Create plots to visualize relationships between relevant variables
- Share your results with the Automatidata team
Your assignment
You will conduct exploratory data analysis on data for the TLC project. You’ll also use Tableau to create visuals for an executive summary to help non-technical stakeholders engage and interact with the data.
The members of Automatidata and the New York City TLC
Automatidata Team Members
- Udo Bankole, Director of Data Analysis
- Deshawn Washington, Data Analysis Manager
- Luana Rodriquez, Senior Data Analyst
- Uli King, Senior Project Manager
Your teammates at Automatidata have technical experience with data analysis and data science. However, you should always be sure to keep summaries and messages to these team members concise and to the point.
New York City TLC Team Members
- Juliana Soto, Finance and Administration Department Head
- Titus Nelson, Operations Manager
Note: The story, all names, characters, and incidents portrayed in this project are fictitious. No identification with actual persons (living or deceased) is intended or should be inferred. And, the data shared in this project has been created for pedagogical purposes.
The TLC team members are program managers who oversee operations at the organization. Their roles are not highly technical, so be sure to adjust your language and explanation accordingly.
Specific project deliverables
With this end-of-course project, you will gain valuable practice of your new skills as you complete the following deliverables:
- Course 3 PACE Strategy Document to consider questions, details, and action items for each stage of the project scenario
- Answer the questions in the Jupyter notebook project file
- Create a Jupyter notebook of full EDA
- Create a Tableau visualization showing two important variables
- Write an executive summary of results and include a visualization
Good luck in your role! Automatidata looks forward to seeing how you communicate your creative work and approach problem-solving!
Key takeaways
The end-of-course project is designed for you to practice and apply course skills in a fictional workplace scenario. By completing each course’s end-of-course project, you will have work examples that will enhance your portfolio and showcase your skills for future employers.
Practice Quiz: Activity: Create your Course 3 Automatidata project
Lab: Activity: Course 3 Automatidata project lab
Reading
In this lab portion of the end-of-course project, you will open a Jupyter Notebook and follow instructions to enter code and written responses where prompted.
Data dictionary
This project uses a dataset called 2017_Yellow_Taxi_Trip_Data.csv. It data gathered by the New York City Taxi & Limousine Commission and published by the city of New York as part of their NYC Open Data program. In order to improve the learning experience and shorten runtimes, a sample was drawn from the 113 million rows in the 2017 Yellow Taxi Trip Data table.
The dataset contains:
408,294 rows – each row represents a different trip
18 columns
Column name | Description |
---|---|
ID | Trip identification number |
VendorID | A code indicating the TPEP provider that provided the record. 1= Creative Mobile Technologies, LLC; 2= VeriFone Inc. |
tpep_pickup_datetime | The date and time when the meter was engaged. |
tpep_dropoff_datetime | The date and time when the meter was disengaged. |
Passenger_count | The number of passengers in the vehicle. This is a driver-entered value. |
Trip_distance | The elapsed trip distance in miles reported by the taximeter. |
PULocationID | TLC Taxi Zone in which the taximeter was engaged |
DOLocationID | TLC Taxi Zone in which the taximeter was disengaged |
RateCodeID | The final rate code in effect at the end of the trip. 1= Standard rate 2=JFK 3=Newark 4=Nassau or Westchester 5=Negotiated fare 6=Group ride |
Store_and_fwd_flag | This flag indicates whether the trip record was held in vehicle memory before being sent to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. Y= store and forward trip N= not a store and forward trip |
Payment_type | A numeric code signifying how the passenger paid for the trip. 1= Credit card 2= Cash 3= No charge 4= Dispute 5= Unknown 6= Voided trip |
Fare_amount | The time-and-distance fare calculated by the meter. |
Extra | Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges. |
MTA_tax | $0.50 MTA tax that is automatically triggered based on the metered rate in use. |
Improvement_surcharge | $0.30 improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015. |
Tip_amount | Tip amount – This field is automatically populated for credit card tips. Cash tips are not included. |
Tolls_amount | Total amount of all tolls paid in trip. |
Total_amount | The total amount charged to passengers. Does not include cash tips. |
Remember, you can access and download the data for any Jupyter notebook activity from within the notebook itself by navigating to the Lab Files dropdown menu at the top of the page, clicking into the /home/jovyan/work folder, selecting the relevant data file, and clicking Download.
Refer to NYC Open Data for more information related to this dataset.
Access the end-of-course project lab
Note: Click the Open Lab button to start your end-of-course project lab. Once you complete this activity, click Next to continue on to the exemplar reading.
(Lab) Activity Instructions:
The Jupyter Notebook will autosave as you work, or you can manually save it by clicking the Save and Checkpoint button or by selecting Save and Checkpoint from the File menu.
As you complete the end-of-course project lab, note the following features:
- Sections: Step-by-step instructions in each section lead you through the lab.
- Code blocks: Code blocks allow you to practice key Python coding concepts. Add code where prompted and then click the Run button to execute your code and view any possible output.
- Questions: Thought questions offer moments to pause and think about concepts and your output as you move through the lab.
To review how to work in Jupyter Notebooks, refer to the reading Practice Python skills in Jupyter Notebooks.
Be sure to complete this lab before moving on. The next course item will explain how to review an exemplar of a completed end-of-course project lab. You can compare the code and text responses in the exemplar to your own.
Reading: Activity Exemplar: Create your Course 3 Automatidata project
Reading
Completed Exemplars
To review the exemplars for the Course 3 executive summary and Tableau visualization, click the following links and select Use Template if applicable.
Links to exemplars:
Assessment of Exemplars
Course 3 Automatidata project lab
Compare the exemplar to the Python notebook you completed. Your responses may differ from the exemplar, but that is to be expected. What did you do well? Where can you improve? Use your answers to these questions to guide you as you progress through the end-of-course projects in the certificate.
Note: The exemplar represents one possible way to complete the Python notebook. Yours may differ in certain ways, such as your specific code input or responses to questions. What’s important is that you have an overall understanding of the purpose and functionality of a Python notebook for data analysis.
Your Python notebook should:
- Include the correct code for performing EDA and creating data visualizations
- Clearly communicate your responses to questions about code input and results
![](https://i0.wp.com/d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/2LLG3HB3TIuZh4jXHGd0yw_ab63ab97069b43d989a00c3db03764f1_image.png?w=1200&ssl=1)
Course 3 Automatidata Tableau visualization
Compare the exemplar of the Tableau visualization to the scatterplot you completed. Your work might differ in some respects from the exemplar, but that is to be expected. What did you do well? Where can you improve? Use your answers to these questions to guide you as you continue to progress through the course.
Note: The exemplar represents one possible way to complete the Tableau visualization. Yours might differ in certain ways, such as your choice of visualization colors. What’s important is that you have an overall understanding of the purpose and functionality of Tableau Public for data visualization.
Your Tableau visualization should:
- Use the same variables identified in your EDA practice with Python
- Enhance your scatterplot initially created with Python. Note: Using Tableau Public will naturally enhance your visualization. Ensure your data is shown clearly and accurately in this platform.
![](https://i0.wp.com/d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/fOn4LwEXQiqn3RLfS3X4Vg_507d244b263342d7afafb9bfd7ea66f1_image.png?w=1200&ssl=1)
Course 3 executive summary
Compare the exemplar to your completed executive summary. Your responses may differ from the exemplar, but that is to be expected. What did you do well? Where can you improve? Use your answers to these questions to guide you as you progress through the end-of-course projects in the certificate.
Note: The exemplar represents one possible way to complete the executive summary. Yours might differ in certain ways, such as your specific language, answers to questions or the layout you selected from the template offerings. What’s important is that you have an overall understanding of the purpose and organization of executive summaries for data projects.
Your executive summary should:
- Include key information that you want to share with teammates and/or stakeholders
- Use clear and concise language to effectively communicate your results
TikTok scenario
Reading: Course 3 end-of-course portfolio project overview: TikTok
Reading
Learn about the Course 3 TikTok workplace scenario!
The end-of-course project in Course 3 focuses on your ability to use exploratory data analysis to organize and understand the data within a project. As a reminder, in Course 1 you developed a project proposal that outlined milestones, which progress with each of the end-of-course projects. A visual representation is provided in the graphic shown here:
Learn more about the project, your role, and expectations in this reading.
Background on the TikTok scenario
At TikTok, our mission is to inspire creativity and bring joy. Our employees lead with curiosity and move at the speed of culture. Combined with our company’s flat structure, you’ll be given dynamic opportunities to make a real impact on a rapidly expanding company and grow your career.
TikTok users have the ability to submit reports that identify videos and comments that contain user claims. These reports identify content that needs to be reviewed by moderators. The process generates a large number of user reports that are challenging to consider in a timely manner.
TikTok is working on the development of a predictive model that can determine whether a video contains a claim or offers an opinion. With a successful prediction model, TikTok can reduce the backlog of user reports and prioritize them more efficiently.
Project background
TikTok’s data team is working on the claims classification project. The following tasks are needed before the team can begin the data analysis process:
- EDA and cleaning
- Select and build visualization(s) type
- Create plots to visualize variables and relationships between variables
- Share your results with the TikTok team
Your assignment
You will conduct exploratory data analysis on data for the claims classification project. You’ll also use Tableau to create visuals for an executive summary to help non-technical stakeholders engage and interact with the data.
Team members at TikTok
Data team roles
- Willow Jaffey- Data Science Lead
- Rosie Mae Bradshaw- Data Science Manager
- Orion Rainier- Data Scientist
The members of the data team at TikTok are well versed in data analysis and data science. Messages to these more technical coworkers should be concise and specific.
Cross-functional team members
- Mary Joanna Rodgers- Project Management Officer
- Margery Adebowale- Finance Lead, Americas
- Maika Abadi- Operations Lead
Your TikTok team includes several managers, who oversee operations. It is important to adjust your general correspondence appropriately to their roles, given that their responsibilities are less technical in nature.
Note: The story, all names, characters, and incidents portrayed in this project are fictitious. No identification with actual persons (living or deceased) is intended or should be inferred. And, the data shared in this project has been created for pedagogical purposes.
Specific project deliverables
With this end-of-course project, you will gain valuable practice and apply your new skills as you complete the following:
- Course 3 PACE Strategy Document to consider questions, details, and action items for each stage of the project scenario
- Answer the questions in the Jupyter notebook project file
- Clean your data, perform exploratory data analysis (EDA)
- Create data visualizations
- Create an executive summary to share your results
Key takeaways
The Google Advanced Data Analytics Certificate end-of-course project is designed for you to practice and apply course skills in a fictional workplace scenario. By completing each course’s end-of-course project, you will have work examples that will enhance your portfolio and showcase your skills for future employers.
Practice Quiz: Activity: Create your Course 3 TikTok project
Lab: Activity: Course 3 TikTok project lab
Reading
In this lab portion of the end-of-course project, you will open a Jupyter Notebook and follow instructions to enter code and written responses where prompted.
Data dictionary
This project uses a dataset called tiktok_dataset.csv. It contains synthetic data created for this project in partnership with TikTok.
The dataset contains:
19,383 rows – Each row represents a different published TikTok video in which a claim/opinion has been made.
12 columns
Column name | Type | Description |
---|---|---|
# | int | TikTok assigned number for video with claim/opinion. |
claim_status | obj | Whether the published video has been identified as an “opinion” or a “claim.” In this dataset, an “opinion” refers to an individual’s or group’s personal belief or thought. A “claim” refers to information that is either unsourced or from an unverified source. |
video_id | int | Random identifying number assigned to video upon publication on TikTok. |
video_duration_sec | int | How long the published video is measured in seconds. |
video_transcription_text | obj | Transcribed text of the words spoken in the published video. |
verified_status | obj | Indicates the status of the TikTok user who published the video in terms of their verification, either “verified” or “not verified.” |
author_ban_status | obj | Indicates the status of the TikTok user who published the video in terms of their permissions: “active,” “under scrutiny,” or “banned.” |
video_view_count | float | The total number of times the published video has been viewed. |
video_like_count | float | The total number of times the published video has been liked by other users. |
video_share_count | float | The total number of times the published video has been shared by other users. |
video_download_count | float | The total number of times the published video has been downloaded by other users. |
video_comment_count | float | The total number of comments on the published video. |
Remember, you can access and download the data for any Jupyter notebook activity from within the notebook itself by navigating to the Lab Files dropdown menu at the top of the page, clicking into the /home/jovyan/work folder, selecting the relevant data file, and clicking Download.
Access the end-of-course project lab
Note: Click the Open Lab button to start your end-of-course project lab. Once you complete this activity, click Next to continue on to the exemplar reading.
Reading: Activity Exemplar: Create your Course 3 TikTok project
Reading
Completed Exemplars
To review the exemplars for the Course 3 executive summary and Tableau visualization, click the following links and select Use Template if applicable.
Links to exemplars:
Assessment of Exemplars
Course 3 TikTok project lab
Compare the exemplar to the Python notebook you completed. Your responses may differ from the exemplar, but that is to be expected. What did you do well? Where can you improve? Use your answers to these questions to guide you as you progress through the end-of-course projects in the certificate.
Note: The exemplar represents one possible way to complete the Python notebook. Yours might differ in certain ways, such as your specific language, answers to questions or the layout you selected from the template offerings. What’s important is that you have an overall understanding of the purpose and functionality of a Python notebook for data analysis.
Your Python notebook should:
- Include the correct code for performing EDA and creating data visualizations
- Clearly communicate your responses to questions about code input and results
Course 3 TikTok Tableau visualizations
Compare the exemplar of the Tableau visualization to the one you completed. Your work might differ in some respects from the exemplar, but that is to be expected. What did you do well? Where can you improve? Use your answers to these questions to guide you as you continue to progress through the course.
Note: The exemplar represents one possible way to complete the Tableau visualization. Yours might differ in certain ways, such as your choice of visualization colors. What’s important is that you have an overall understanding of the purpose and functionality of Tableau Public for data visualization.
Your Tableau visualization should:
- Use the same variables identified in your EDA practice with Python
- Enhance your scatterplot initially created with Python. Note: Using Tableau Public will naturally enhance your visualization. Ensure your data is shown clearly and accurately in this platform.
Course 3 executive summary
Compare the exemplar to your completed executive summary. Your responses may differ from the exemplar, but that is to be expected. What did you do well? Where can you improve? Use your answers to these questions to guide you as you progress through the end-of-course projects in the certificate.
Note: The exemplar represents one possible way to complete the executive summary. Yours might differ in certain ways, such as your specific language and visual design. What’s important is that you have an overall understanding of the purpose and organization of executive summaries for data projects.
Your executive summary should:
- Include key information that you want to share with teammates and/or stakeholders
- Use clear and concise language to effectively communicate your results
Lab: Exemplar: Course 3 TikTok project lab
Waze scenario
Reading: Course 3 end-of-course portfolio project overview: Waze
Reading
Learn about the Course 3 Waze workplace scenario!
The end-of-course project in Course 3 focuses on your ability to use exploratory data analysis to organize and understand the data within a project. As a reminder, in Course 1 you developed a project proposal that outlined milestones, which progress with each of the end-of-course projects. A visual representation is provided in the graphic shown here:
Learn more about the project, your role, and expectations in this reading.
Background on the Waze scenario
Waze’s free navigation app makes it easier for drivers around the world to get to where they want to go. Waze’s community of map editors, beta testers, translators, partners, and users helps make each drive better and safer. Waze partners with cities, transportation authorities, broadcasters, businesses, and first responders to help as many people as possible travel more efficiently and safely.
You’ll collaborate with your Waze teammates to analyze and interpret data, generate valuable insights, and help leadership make informed business decisions. Your team is about to start a new project to help prevent user churn on the Waze app. Churn quantifies the number of users who have uninstalled the Waze app or stopped using the app. This project focuses on monthly user churn.
This project is part of a larger effort at Waze to increase growth. Typically, high retention rates indicate satisfied users who repeatedly use the Waze app over time. Developing a churn prediction model will help prevent churn, improve user retention, and grow Waze’s business. An accurate model can also help identify specific factors that contribute to churn and answer questions such as:
- Who are the users most likely to churn?
- Why do users churn?
- When do users churn?
For example, if Waze can identify a segment of users who are at high risk of churning, Waze can proactively engage these users with special offers to try and retain them. Otherwise, Waze may lose these users without knowing why.
Your insights will help Waze leadership optimize the company’s retention strategy, enhance user experience, and make data-driven decisions about product development.
Project background
Waze’s data team is working on the churn project. The following tasks are needed before the team can begin the data analysis process:
- EDA and cleaning
- Select and build visualization(s) type
- Create plots to visualize variables and relationships between variables
- Share your results with the data team
Your assignment
You will conduct exploratory data analysis on data for the churn project. You’ll also use tools to create visuals for an executive summary to help non-technical stakeholders engage and interact with the data.
Team members at Waze
Data team roles
- Harriet Hadzic – Director of Data Analysis
- May Santner – Data Analysis Manager
- Chidi Ga – Senior Data Analyst
- Sylvester Esperanza – Senior Project Manager
Data team members have technical experience with data analysis and data science. However, you should always be sure to keep summaries and messages to these team members concise and to the point.
Cross-functional team members
- Emrick Larson – Finance and Administration Department Head
- Ursula Sayo – Operations Manager
Your Waze team includes several managers overseeing operations. It is important to adapt your communication to their roles since their responsibilities are less technical.
Note: The story, all names, characters, and incidents portrayed in this project are fictitious. No identification with actual persons (living or deceased) is intended or should be inferred. And, the data shared in this project has been created for pedagogical purposes.
Specific project deliverables
With this end-of-course project, you will gain valuable practice and apply your new skills as you complete the following:
- Complete the questions in the Course 3 PACE strategy document
- Answer the questions in the Jupyter notebook project file
- Clean your data, perform exploratory data analysis (EDA)
- Create data visualizations
- Create an executive summary to share your results
Good luck with this project! Your Waze team members are looking forward to seeing how you communicate your creative work and approach problem-solving!
Key takeaways
The Google Advanced Data Analytics Certificate end-of-course project is designed for you to practice and apply course skills in a fictional workplace scenario. By completing each course’s end-of-course project, you will have work examples that will enhance your portfolio and showcase your skills for future employers.
Practice Quiz: Activity: Create your Course 3 Waze project
Lab: Activity: Course 3 Waze project lab
Reading
In this lab portion of the end-of-course project, you will open a Jupyter Notebook and follow instructions to enter code and written responses where prompted.
Data dictionary
This project uses a dataset called waze_dataset.csv. It contains synthetic data created for this project in partnership with Waze.
The dataset contains:
14,999 rows – each row represents one unique user
13 columns
Column name | Type | Description |
---|---|---|
ID | int | A sequential numbered index |
label | obj | Binary target variable (“retained” vs “churned”) for if a user has churned anytime during the course of the month |
sessions | int | The number of occurrence of a user opening the app during the month |
drives | int | An occurrence of driving at least 1 km during the month |
device | obj | The type of device a user starts a session with |
total_sessions | float | A model estimate of the total number of sessions since a user has onboarded |
n_days_after_onboarding | int | The number of days since a user signed up for the app |
total_navigations_fav1 | int | Total navigations since onboarding to the user’s favorite place 1 |
total_navigations_fav2 | int | Total navigations since onboarding to the user’s favorite place 2 |
driven_km_drives | float | Total kilometers driven during the month |
duration_minutes_drives | float | Total duration driven in minutes during the month |
activity_days | int | Number of days the user opens the app during the month |
driving_days | int | Number of days the user drives (at least 1 km) during the month |
Remember, you can access and download the data for any Jupyter notebook activity from within the notebook itself by navigating to the Lab Files dropdown menu at the top of the page, clicking into the /home/jovyan/work folder, selecting the relevant data file, and clicking Download.
Reading: Activity Exemplar: Create your Course 3 Waze project
Reading
Completed Exemplars
To review the exemplar for the Course 3 executive summary, click the link below and select Use Template.
Links to exemplars:
Assessment of Exemplar
Course 3 Waze project lab
Compare the exemplar Python notebook you completed. Your responses may differ from the exemplar, but that is to be expected. What did you do well? Where can you improve? Use your answers to these questions to guide you as you progress through the end-of-course projects in the certificate.
Note: The exemplar represents one possible way to complete the Python notebook. Yours may differ in certain ways, such as your specific code input or responses to questions. What’s important is that you have an overall understanding of the purpose and functionality of a Python notebook for data analysis.
Your Python notebook should:
- Include the correct code for performing EDA and creating data visualizations
- Clearly communicate your responses to questions about code input and results
Course 3 Executive Summary
Compare the exemplar to your completed executive summary. Your responses may differ from the exemplar, but that is to be expected. What did you do well? Where can you improve? Use your answers to these questions to guide you as you progress through the end-of-course projects in the certificate.
Note: The exemplar represents one possible way to complete the executive summary. Yours might differ in certain ways, such as your specific language, answers to questions or the layout you selected from the template offerings. What’s important is that you have an overall understanding of the purpose and organization of executive summaries for data projects.
Your executive summary should:
- Include key information that you want to share with teammates and/or stakeholders
- Use clear and concise language to effectively communicate your results
Lab: Exemplar: Course 3 Waze project lab
End-of-course portfolio project wrap-up
Video: End-of-course project wrap-up and tips for ongoing career success
This course prepares you for data careers by emphasizing transferable skills and portfolio building:
Key Learnings:
- Data analysis process: cleaning, organizing, and storytelling with data.
- Python for data manipulation.
- Importance of audience awareness in data communication.
- Highlighting transferable skills in job interviews.
Portfolio Projects:
- Demonstrate your work process and achievements to potential employers.
- Help you answer common interview questions about data skills and tools.
- Provide tangible examples of your ability to clean, analyze, and visualize data.
Future Steps:
- Learn about statistics and data-driven work.
- Practice AB testing simulations.
- Expand your portfolio with more data-related projects.
Remember:
- Document your learning process and skills effectively.
- Practice communicating your data insights clearly, considering your audience.
- Your portfolio is a key tool for showcasing your capabilities in data careers.
By following these steps, you can confidently present your data skills and knowledge to potential employers and increase your chances of success in data-related job interviews.
At this point in the program,
you’ve done a lot of work towards better understanding data and how it can be
useful for enacting change in a business. You completed a Jupyter notebook, created
visualizations to support your work and refine your presentation to meet
the needs of your particular audience. As you continue to make progress in this
program, remember that documenting your learning process and skills will help
you communicate what you’ve done to potential employers and
hiring managers in future interviews. You may recall from previous sections
of this course that audience awareness is essential. During the interview process, knowing
how to talk about your work process, transferable skills and other achievements
will lead to much greater success. In this course, you learned the importance of following
the pay structure in a data career. You practice using Python to manipulate
data and you demonstrated how to organize and analyze a set of
data to tell a compelling story. The portfolio projects were designed to
help you thrive on the job market and that transferable skills you applied
contributed to the tangible artifacts you created. As you begin preparing for
future interviews, you should be ready to answer questions like what is
your process for cleaning data? What tool do you use for
creating data visualizations? How and why do data visualizations
enhance the stories, data tells? And what considerations are top
of mind when sharing data stories with non technical stakeholders. Of course there may be many other
questions that you are asked as you interview for a data professional role. Each portfolio project will
help you prepare responses. For example in the portfolio
project you just completed, you use the EDA process to clean,
organize and analyze the data set. Then you turn your data into
a presentation full of visualizations that will help stakeholders understand insights
from the data story you discovered. Don’t forget that you recorded all
of your considerations, questions, process notes and more in your
pay strategy document as well. Coming up, you’re going to learn all
about the power of statistics and data driven work. Then you’ll have an opportunity
to use statistical analysis to simulate an AB test. By the end of these courses, you will
have lots of artifacts in your portfolio.
Reading: Course 3 glossary
Reading
Course review: Go Beyond the Numbers: Translate Data into Insights
Course Summary: Data Exploration and Visualization
This course transformed your initial excitement for data exploration into the confidence of a true data storyteller, just like an archaeologist uncovering hidden stories.
Key Learnings:
- Six Practices of EDA: Learn to structure, clean, discover, join, validate, and present your data to reveal its stories.
- Data Cleaning: Handle missing data, outliers, and categorical data using techniques like label encoding.
- Visualization Design: Create impactful visualizations using Tableau and consider best practices for different audiences.
- Workplace Skills: Develop communication, ethical, accessibility, and PACE workflow skills for your data career.
Future Steps:
- Build on this foundation with integral concepts in statistics, regression, and machine learning.
- Apply your data storytelling skills throughout your career as a data professional.
Instructor’s Message:
- The instructor encourages your continued journey as a data explorer and storyteller, wishing you success in revealing insights from data.
Do you remember
imagining yourself as an archaeologist at the
beginning of this course? You stood in front of an
ancient river bed at dawn, excited at the possibilities of what you might
find under the rock. What stories would you uncover? What long-lost mysteries
might you reveal? After learning the content
in this course, hopefully, you’ve had the same level of
excitement an archaeologist feels when it comes to
EDA and visualizations. As you’ve learned,
the six practices of EDA help find the stories that need to be told
from data sets. As you’re discovering,
structuring, and cleaning in your career, I hope that you are digging through the data
with determination, gathering your major
finds together, questioning your perspective, and researching more
about your discoveries. Then, I hope you remember the other three
practices of joining, validating, and presenting
to complete your EDA work. In this course, you’ve
had the chance to explore how data professionals take care of the
stories that have plot holes or puzzling scenes, or that is missing
data and outliers. You also learn how to change
categorical data into numerical data using the
label encoding technique. Finally, you considered
how to design visualizations and
present your data in really impactful ways. You learn the
advanced concepts of visualizing and you
started using Tableau. Throughout these
lessons, you learned some workplace skills like communicating to
different audiences, the importance of ethics, the need for accessibility, and the importance of
following the PACE workflow. These are the types
of skills which will serve you throughout
your career, from entry-level
data professional to senior data professional
and beyond. In the upcoming courses, you’ll be learning some integral
concepts in statistics, regression, and
machine learning. The knowledge you gained
in our course will be foundational to your progression through these next courses, as well as through
your career as a professional in
data analytics. It has been my pleasure
instructing you the practices of EDA and data visualization are
close to my heart, and I’m always excited to meet future data professionals
learning these principles. Great job on completing
this course. You have the makings of a solid storytelling
data professional. May you always
find excitement in exploring and telling
stories using data.
Reading: Get started on the next course
Reading
Congratulations on completing another course in the Google Advanced Data Analytics certificate! In this part of the program, you learned more about exploratory data analysis, preparing data visualizations, and considering how to choose relevant data variables to share with different stakeholders.
The entire program has seven courses:
- Foundations of Data Science – This course introduces the fundamentals of data science, how different data professionals operate in the workplace, and how these roles contribute to an organization’s vision of their future. The data science workflow PACE (plan, analyze, construct, enhance) is introduced to help you better understand how to navigate the technical and workplace expectations of this career.
- Getting Started with Python – In this course, you will get started with Python for data analytics by developing an understanding of Python syntax, logic, data types, objects, and object-oriented programming.
- Go Beyond the Numbers: Translate Data into Insights – Learn the fundamentals of data cleaning and visualizations and how to uncover meaningful stories in the data. (This is the course you just completed. Well done!)
- The Power of Statistics – Learn descriptive and inferential statistics, basic probability and probability distributions, sampling, confidence intervals, and hypothesis testing.
- Regression Analysis: Simplifying Complex Data Relationships – In this course, you will apply your knowledge to modeling variable relationships, with a focus on linear regression, analysis of variance (ANOVA), and logistic regression. From model assumptions to evaluation and interpretation, you will understand relationships in datasets based on PACE.
- The Nuts and Bolts of Machine Learning – This course covers the fundamentals of supervised machine learning, and introduces learners to unsupervised learning through K-means and other clustering models. Learners will use different classification techniques such as decision trees, random forests, and gradient boosting to approach a realistic business problem.
- Google Advanced Data Analytics Capstone – This course presents the capstone project for the Advanced Data Analytics certificate, which incorporates key concepts from each of the six preceding courses. The capstone project will yield data-driven suggestions including visualizations and models to provide insight for the business problem.
Now that you have completed this course, you are ready to move on to the next course: The Power of Statistics.
Keep up the great work!