Spreadsheets are a very important data analytics tool. In this part of the course, you will learn about how data analysts use spreadsheets in their work every day. You will also explore why structured thinking helps analysts better understand problems and come up with solutions.
Learning Objectives
- Discuss the data analyst’s use of spreadsheets with reference to roles and responsibilities
- Demonstrate the use of spreadsheets to complete basic tasks of the data analyst including entering and organizing data
- Demonstrate an understanding of the use of formulas in spreadsheets including a definition and specific examples
- Compare formulas and functions with reference to similarities and differences
- Describe the key ideas associated with structured thinking including the problem domain, scope of work, and context
- Working with spreadsheets
- Video: The amazing spreadsheet
- Video: Get to work with spreadsheets
- Reading: Spreadsheets and the data life cycle
- Practice Quiz: Hands-On Activity: Introduction to Google Sheets
- Video: Step-by-step in spreadsheets
- Reading: Learn more about spreadsheet basics
- Practice Quiz: Test your knowledge on working with spreadsheets
- Formulas in spreadsheets
- Functions in spreadsheets
- Save time with structured thinking
- Video: Before solving a problem, understand it
- Video: Scope of work and structured thinking
- Practice Quiz: Hands-On Activity: Create a scope of work
- Video: Staying objective
- Reading: The importance of context
- Reading: Learning Log: Define problems and ask questions with data
- Practice Quiz: Test your knowledge on structured thinking
- Weekly challenge 3
Working with spreadsheets
Video: The amazing spreadsheet
Spreadsheets are a versatile tool for data analysts. They can be used to answer data-driven questions, build evidence, visualize data, and support findings. Spreadsheets can also be used to perform both basic and complex calculations automatically. This helps analysts work more efficiently and understand the results of their calculations.
In this part of the program, you will revisit the spreadsheet and learn about some of the functions and formulas that you can use to perform calculations. You will also have the opportunity to work with real data from databases to reorganize a spreadsheet and perform data analysis.
Spreadsheets are a powerful and versatile tool that can be used for a variety of tasks, including data analysis. They are often the first tool that data analysts reach for when trying to answer data-driven questions.
Here are some of the benefits of using spreadsheets for data analysis:
- Spreadsheets are easy to use. Even if you don’t have a lot of experience with spreadsheets, you can learn how to use them quickly and easily.
- Spreadsheets are versatile. You can use spreadsheets to perform a wide variety of tasks, including data entry, calculations, and data visualization.
- Spreadsheets are portable. You can easily save and share spreadsheets, making them a great tool for collaboration.
- Spreadsheets are affordable. There are many free and open-source spreadsheet applications available.
Here are some of the things you can do with spreadsheets for data analysis:
- Enter data. Spreadsheets are a great way to enter data. You can easily enter data into a spreadsheet, and the spreadsheet will automatically format the data for you.
- Calculate data. Spreadsheets can be used to perform a wide variety of calculations. You can use formulas to calculate sums, averages, and other statistics.
- Visualize data. Spreadsheets can be used to visualize data. You can create charts and graphs to help you understand the data.
- Present data. Spreadsheets can be used to present data. You can export spreadsheets to PDF or PowerPoint files, or you can share them online.
If you are new to data analysis, spreadsheets are a great place to start. They are easy to use and versatile, making them a valuable tool for any data analyst.
Here are some additional tips for using spreadsheets for data analysis:
- Use clear and descriptive labels for your data. This will make it easier to understand and interpret your data.
- Use formulas to automate your calculations. This will save you time and help you avoid errors.
- Use charts and graphs to visualize your data. This will help you understand the data and communicate your findings to others.
- Share your spreadsheets with others. This will help you collaborate and get feedback on your work.
Hi, again. I’m glad you’re back. In this part of the program, we’ll revisit the spreadsheet. Spreadsheets are a powerful
and versatile tool, which is why they’re
a big part of pretty much everything
we do as data analysts. There’s a good chance a spreadsheet will
be the first tool you reach for when trying to answer data-driven questions. After you’ve defined what you
need to do with the data, you’ll turn to
spreadsheets to help build evidence that you
can then visualize, and use to support your findings. Spreadsheets are often the unsung heroes
of the data world. They don’t always get the
appreciation they deserve, but as a data detective, you’ll definitely want them in your evidence collection kit. I know spreadsheets have saved the day for me more than once. I’ve added data for purchase
orders into a sheet, setup formulas in one tab, and had the same formulas do the work for me in other tabs. This frees up time for me to work on other things during the day. I couldn’t imagine not
using spreadsheets. Math is a core part of
every data analyst’s job, but not every analyst enjoys it. Luckily, spreadsheets can make calculations more enjoyable, and by that, I mean
easier. Let’s see how. Spreadsheets can do both basic and complex
calculations automatically. Not only does this help you
work more efficiently, but it also lets you see the results and understand
how you got them. Here’s a quick look at some of the functions that you’ll use when performing calculations. Many functions can be used as part of a math formula as well. Functions and formulas
also have other uses, and we’ll take a
look at those too. We’ll take things one
step further with exercises that use real
data from databases. This is your chance to
reorganize a spreadsheet, do some actual data analysis, and have some fun with data.
Video: Get to work with spreadsheets
Data analysts use spreadsheets to organize and analyze data. They can use spreadsheets to create pivot tables, filter data, and perform calculations. Spreadsheets are a versatile tool that can be used for a variety of tasks.
In the video, you will learn about the following ways data analysts use spreadsheets:
- To analyze data from a construction company’s expenses.
- To create a pivot table to organize the data.
- To filter the data to focus on a specific time frame.
- To use formulas and functions to perform calculations on the data, such as finding the most expensive construction projects.
You will also have the opportunity to work in your own spreadsheets in the future.
Spreadsheets are a powerful tool for data analysis. They can be used to organize data, perform calculations, and create visualizations.
To get started with spreadsheets for data analysis, you will need to:
- Choose a spreadsheet application. There are many different spreadsheet applications available, such as Microsoft Excel, Google Sheets, and LibreOffice Calc. Choose the one that you are most comfortable with.
- Learn the basics of spreadsheet navigation. This includes how to move around the spreadsheet, select cells, and enter data.
- Learn how to use formulas and functions. Formulas are used to perform calculations on data in cells. Functions are pre-written formulas that can be used to perform common tasks.
- Learn how to create charts and graphs. Charts and graphs can be used to visualize data and make it easier to understand.
Once you have learned the basics of spreadsheets, you can start using them for data analysis. Here are some of the things you can do with spreadsheets for data analysis:
- Organize data. Spreadsheets can be used to organize data in a variety of ways. You can create tables, sort data, and filter data to find the information you need.
- Perform calculations. Spreadsheets can be used to perform calculations on data. You can use formulas to calculate sums, averages, and other statistics.
- Create visualizations. Spreadsheets can be used to create charts and graphs to visualize data. This can help you understand the data and communicate your findings to others.
If you are new to data analysis, spreadsheets are a great place to start. They are easy to use and versatile, making them a valuable tool for any data analyst.
Here are some additional tips for getting started with spreadsheets for data analysis:
- Use clear and descriptive labels for your data. This will make it easier to understand and interpret your data.
- Use formulas to automate your calculations. This will save you time and help you avoid errors.
- Use charts and graphs to visualize your data. This will help you understand the data and communicate your findings to others.
- Share your spreadsheets with others. This will help you collaborate and get feedback on your work.
To perform calculations in a spreadsheet, data analysts use formulas and functions.
True
To perform calculations in a spreadsheet, data analysts use formulas and functions, such as SUM, AVERAGE, and COUNT.
What are the first steps a data analyst takes when working with data in a spreadsheet?
Sort and filter
The first steps a data analyst takes when working with data in a spreadsheet are to sort and filter the data.
Data analysts spend a lot of time
organizing data and performing calculations. Luckily, there’s lots of different
tools to help them do just that, including spreadsheets. In this video we’ll
take a look at some of the ways data analysts use spreadsheets to help them
with their day to day responsibilities. Later, you’ll get to test out some of
these things yourself, but for now, let’s start with a quick look at how
data analysts use spreadsheets to do their jobs. This will change depending on
the work you need to complete. But here’s an overview of
a few of the major tasks. Imagine you work for a construction company. Your company needs your spreadsheet skills
to analyze some data about their expenses, so you access the appropriate data and
add it to your spreadsheet. We won’t cover all the details
of this project right now, but you will get a chance to see lots of
spreadsheet features up close and personal as we move forward. What do you do with the data now
that it’s in your spreadsheet? Again, this will be different for each job, but you might start by organizing your
data with the task you’ve been given. For example,
you might put your data in a pivot table. We’ve talked about pivot tables
before in this course. We’ll cover them in more detail later on, but
for now, just think of them as well organized and very useful tables. Next, you might filter
the data in the pivot table. Sorting and filtering data is
a common part of most jobs. This lets you focus only on the data
you’ll need for your analysis. In our example, maybe you only need
the expenses for a certain time frame, like the last three months.
After you filtered your data, you could perform some calculations
to learn more about it. Maybe you need to find out which
construction projects ended up costing the most money. This is where formulas and
functions are really handy. We’ll talk about them in just a bit, but
formulas and functions are great for doing some quick math, especially once you
run out of fingers and toes to count on. Now you’ve seen some of the ways data
analysts are using spreadsheets in their day to day work for
a lot of different tasks, including organizing their data and making
calculations. Before you know it we’ll have you working in your own spreadsheets.
Reading: Spreadsheets and the data life cycle
Reading
To better understand the benefits of using spreadsheets in data analytics, let’s explore how they relate to each phase of the data life cycle: plan, capture, manage, analyze, archive, and destroy.
- Plan for the users who will work within a spreadsheet by developing organizational standards. This can mean formatting your cells, the headings you choose to highlight, the color scheme, and the way you order your data points. When you take the time to set these standards, you will improve communication, ensure consistency, and help people be more efficient with their time.
- Capture data by the source by connecting spreadsheets to other data sources, such as an online survey application or a database. This data will automatically be updated in the spreadsheet. That way, the information is always as current and accurate as possible.
- Manage different kinds of data with a spreadsheet. This can involve storing, organizing, filtering, and updating information. Spreadsheets also let you decide who can access the data, how the information is shared, and how to keep your data safe and secure.
- Analyze data in a spreadsheet to help make better decisions. Some of the most common spreadsheet analysis tools include formulas to aggregate data or create reports, and pivot tables for clear, easy-to-understand visuals.
- Archive any spreadsheet that you don’t use often, but might need to reference later with built-in tools. This is especially useful if you want to store historical data before it gets updated.
- Destroy your spreadsheet when you are certain that you will never need it again, if you have better backup copies, or for legal or security reasons. Keep in mind, lots of businesses are required to follow certain rules or have measures in place to make sure data is destroyed properly.
Resources
Spreadsheet shortcuts can help you become more efficient with spreadsheets. If you’d like to learn more, you can explore the collection of Google Sheets shortcuts, or visit the Microsoft Excel shortcuts page if you are using Excel. Both of these resources contain a list of spreadsheet shortcuts you can save and reference as you work more with spreadsheets on your own.
Practice Quiz: Hands-On Activity: Introduction to Google Sheets
Video: Step-by-step in spreadsheets
In this video, you will learn how to use spreadsheets to organize data. Here are the steps:
- Open a new spreadsheet.
- Give the spreadsheet a descriptive title.
- Create a folder on your computer to store your spreadsheets and related files.
- Enter your data into the spreadsheet.
- Make the column widths wider so that you can see the data clearly.
- Format the data attributes (variables) in the first row of the spreadsheet.
- Add borders to the data table to make each piece of data more clearly visible.
These steps will help you organize your data and make it easier to analyze.
Step 1: Choose a spreadsheet application.
There are many different spreadsheet applications available, such as Microsoft Excel, Google Sheets, and LibreOffice Calc. Choose the one that you are most comfortable with.
Step 2: Enter your data.
Once you have chosen a spreadsheet application, you can start entering your data. Be sure to label your columns and rows clearly so that you can easily understand your data later.
Step 3: Organize your data.
Once you have entered your data, you can start organizing it. This may involve sorting your data in a particular order, grouping your data together, or creating charts and graphs to visualize your data.
Step 4: Perform calculations.
Spreadsheets can be used to perform calculations on your data. This may involve calculating sums, averages, or other statistics.
Step 5: Analyze your data.
Once you have organized and calculated your data, you can start analyzing it. This may involve looking for trends, patterns, or outliers in your data.
Step 6: Share your findings.
Once you have analyzed your data, you can share your findings with others. This may involve creating a report, presenting your findings to a group, or publishing your findings online.
Here are some additional tips for using spreadsheets for data analysis:
- Use clear and descriptive labels for your data. This will make it easier to understand and interpret your data.
- Use formulas to automate your calculations. This will save you time and help you avoid errors.
- Use charts and graphs to visualize your data. This can help you understand the data and communicate your findings to others.
- Share your spreadsheets with others. This will help you collaborate and get feedback on your work.
We’ve talked about how
spreadsheets are great for organizing data and
performing calculations. Now, it’s time to get our hands dirty and start building
a real spreadsheet. In this video, I’m going to
demonstrate some basic tasks we know data analysts
use spreadsheets for, including entering
and organizing data. We’ll start with a
step-by-step process to show you some tools to organize
your data in a spreadsheet. Consider these steps the basics. You won’t always have to use them when working with a data set, but if your data is a bit
messy when you get it, these steps can help you
get it ready for analysis. Let’s start by opening
a new spreadsheet. As a data analyst, you might not start with
a blank spreadsheet, but it’s good to know how
to do it, just in case. Start by opening Excel, Google Sheets or whatever spreadsheet software
you’re using, then select a new blank file. The first thing you’ll
want to do when you open a new spreadsheet is give it
a title. Here’s a pro tip. Make your title short, clear, and have it state exactly what the data in the
spreadsheet is about. Trust me, it’ll make searching
for it a lot easier. Creating a folder on your
computer specifically for spreadsheets and related files can also make it
easier to find them. For this spreadsheet, it’s
already saved in our drive. So we’ll open our File
menu to click Move. Then we’ll create a new folder, name it “Population Data,” and move the spreadsheet there. Our spreadsheet now
has a new home. This will save you a lot of unnecessary clicks and headaches when you look for this file. There’s a few different
ways data analysts get data they work with. Depending on the job, you might use data
from an open source, you might be given
data to work with or you might be asked
to find your own data. You’ll experience all of
these later in the program. There’s a lot of open
data sources online, where data is made
available to the public. For example, we’ll use
data from worldbank.org, that’s already in
the spreadsheet. The data shows the population of Latin American and Caribbean
countries from 2010-2019. Let’s open this spreadsheet. Time to get the data
ready for analysis. We’ll start by selecting
the whole sheet and making our columns wider by dragging the boundary
of one of the columns. This will help us see
the data clearly, then we can adjust any individual columns
that need it. You can make columns wider
in other ways as well, but this will work for now. The first row of the
spreadsheet is for data attributes or variables. It’s basically labeling the
type of data in each column. Let’s make the attributes
stand out from the rest of the rows by selecting it
and filling it with color. We’ll also make the labels bold. If we want to add
another data attribute between two of the
other attributes, we can always add a new column. Just click on any cell within a column and use the Insert
menu to add a new one. It will appear next
to the column you originally clicked,
pretty simple. Deleting a column
is just as simple. To delete, right-click in a cell in the column
you want to get rid of. The steps we’re showing
may be different depending on the spreadsheet
program you’re using, but should be pretty similar. Let’s add one more thing to
our data table: borders. This can help you see each
piece of data more clearly. To add borders start by
clicking the Select All button at the top left
corner of your spreadsheet. This is like a magic button
because you can click it whenever you need to make changes to every cell
in your spreadsheet. Then click the Border button in the menu, and choose the
type of borders you want. To keep our spreadsheets uniform, we’ll choose borders
for all cells. Just like that, we’ve
gone from raw to refined. Now our spreadsheet
is filled with data and it’s nice
to look at too. Using these organization
tools before you analyze can help you focus on the data
once you start your analysis. Now that we’ve gone
over some ways spreadsheets can be
used to organize data, you’re ready to start
working on them yourself. Later you’ll learn more
about spreadsheets, including some common
errors and how to fix them.
Reading: Learn more about spreadsheet basics
Overview
Below, you will find a list that covers two types of spreadsheet programs: Microsoft Excel and Google Sheets. The list includes quick-start guides, tutorials, and more. The examples in this course use Google Sheets, but you can follow along using Excel or any other spreadsheet application. The user interface might be a little different, but it should look and work similarly.
Microsoft Excel
- Office Quick Starts: Scroll down to the Downloadable guides section to download the Excel Quick Start Guide: This PDF guide begins with a labeled map of Excel that can guide you through the basic tasks you can accomplish in Excel. For tips on starting and opening Excel, this Microsoft Support page will show you how to begin a new workbook.
- Excel video training: This is a collection of step-by-step videos to use all sorts of Excel features, including adding and working within rows, columns, and cells; formatting; using formulas and functions; and adding charts and pivot tables.
- Sort data in a range or table: This page guides you through all of the steps you will need to sort data by number, text, and color. You’ll also have the option to sort by custom list so that you can customize exactly what you want to sort.
- Filter data in a range or table: This article has step-by-step instructions on how to filter an Excel spreadsheet to show only the data you want to see. You can also use built-in comparison operators, such as “greater than” and “top 10” to reveal only the most relevant data.
- Format a worksheet: The guide will help you select and format your Excel spreadsheet, then change the borders, shading, colors, and text. This can help improve your spreadsheet’s readability.
Pro tip: If you’re searching for information about using customizable options, check out Microsoft’s Guidelines for organizing and formatting data on a worksheet. This article provides clear methods for creating easy-to-read spreadsheets.
Google Sheets
- Google Sheets cheat sheet: The cheat sheet puts all the basics of Sheets on a single page for easy reference. Here, you can learn about customizing your spreadsheet and the data inside; working with rows, columns, and cells; sharing your spreadsheet with others; creating different versions and copies of a spreadsheet; and more.
- Get started with Sheets: Create and import files: This guide is a step-by-step guide for working with Sheets. You start by learning how to open a spreadsheet, then move on to adding data.
- Sort and filter your data: This resource can help you organize data in Sheets. Use this guide to sort part or all of a spreadsheet. You can sort by text, number, and color. Then, learn how to create filters to show only certain data while hiding the rest. Finally, the article includes information on creating, saving, and removing a filter view.
- Edit and format a spreadsheet: This will help you make easy-to-read spreadsheets. You will learn how to assign a color, customize borders around cells, and change the appearance of text. If you’d like to give your spreadsheet a theme, you can scroll to the bottom of the page and find how to apply it to parts of your spreadsheet.
Tip: Microsoft Excel and Google Sheets are very similar in terms of calculations, formulas, functions, and many other features. But there are some differences, which can make it tricky to switch from one to the other. If you are moving between Excel and Google Sheets, find a quick list of the differences between the two kinds of spreadsheet applications in Overview: Differences between Sheets and Excel.
Practice Quiz: Test your knowledge on working with spreadsheets
When giving a spreadsheet a title, what are some best practices to follow? Select all that apply.
Titles should state what the data in the spreadsheet is about
Titles should be clear
Titles should be short
Spreadsheet titles should be short, clear, and state exactly what the data in the spreadsheet is about.
Fill in the blank: Data analysts can use _____ to highlight the area around cells in order to see spreadsheet data more clearly.
borders
Data analysts use borders to highlight the area around cells in order to see spreadsheet data more clearly.
Within a spreadsheet, data analysts use which tools to save time and effort by automating commands? Select all that apply.
Formulas, Functions
Data analysts use formulas and functions to save time and effort by automating commands.
Formulas in spreadsheets
Video: Formulas for success
Formulas are equations that can be used to perform calculations in spreadsheets. They are made up of operators, which are symbols that represent mathematical operations such as addition (+), subtraction (-), multiplication (*), and division (/). Formulas can also include cell references, which are the addresses of cells in the spreadsheet.
To create a formula, you start with an equal sign (=) and then type the formula. For example, to calculate the total sales for the first row of data, you would type the following formula into cell F2:
=B2+C2+D2+E2
This formula tells the spreadsheet to add the values in cells B2, C2, D2, and E2.
You can also use formulas to perform more complex calculations, such as finding the average sales or the percent change in sales between two time periods. For example, to find the average sales for the first row of data, you would type the following formula into cell F3:
=(B2+C2+D2+E2)/4
This formula tells the spreadsheet to add the values in cells B2, C2, D2, and E2, and then divide the total by 4.
Formulas are a powerful tool that can be used to perform a wide variety of calculations in spreadsheets. By learning how to use formulas, you can make your data analysis more efficient and accurate.
Here are some additional tips for using formulas in spreadsheets:
- Use descriptive cell references to make your formulas easier to read and understand.
- Use parentheses to group values in formulas and to control the order in which operations are performed.
- Use the Formula Evaluator to help you troubleshoot errors in your formulas.
- Copy and paste formulas to save time when entering them into multiple cells.
The video also covers the following topics:
- How to use cell references in formulas
- How to copy and paste formulas
- How to troubleshoot errors in formulas
- How to use the Percent button to change a value to a percentage
- How to use the Formula Evaluator
Overall, the video provides a good overview of how to use formulas in spreadsheets. It is a good resource for anyone who wants to learn how to perform calculations in spreadsheets using formulas.
- Ask the right questions: The first step to successful data analysis is to ask the right questions. What do you want to learn from the data? What are your goals? Once you know what you’re looking for, you can start to think about the data you need to collect and the analysis methods you’ll use.
- Clean your data: Garbage in, garbage out. This is a fundamental principle of data analysis. Before you can start analyzing your data, you need to make sure it’s clean and free of errors. This may involve removing duplicate data, correcting typos, and filling in missing values.
- Use the right tools: There are a variety of tools available for data analysis, each with its own strengths and weaknesses. The right tool for you will depend on the type of data you’re working with, the analysis methods you want to use, and your budget.
- Understand your limitations: No data set is perfect. There will always be some level of uncertainty in your results. It’s important to understand the limitations of your data so that you can interpret your results correctly.
- Communicate your findings: The final step in data analysis is to communicate your findings to others. This could involve writing a report, giving a presentation, or creating a visualization. The way you communicate your findings will depend on your audience and your goals.
Here are some additional tips for success in data analysis:
- Be curious and ask questions.
- Be creative and think outside the box.
- Be persistent and don’t give up easily.
- Be open to feedback and be willing to learn.
- Be ethical and responsible with your data.
In spreadsheets, what is the term for the symbols used in formulas to perform a specific calculation?
Operators
In spreadsheets, the symbols used in a formula to perform a specific calculation are called operators.
So far we’ve covered how to
start a new spreadsheet, enter in data, and make it look refined and ready for
some serious analysis. Now we’ll learn how to perform calculations in your spreadsheet. You may need to
calculate everything from sums to averages, to finding minimum
and maximum amounts. You’ll use calculations for a lot of different
kinds of tasks. In this video, we’ll focus
on learning the basics and then do a little math with some sales data to practice. Let’s talk about formulas first. You might remember that
a formula is a set of instructions that perform
a specific calculation. Basically, formulas can
do the math for you. Now, they don’t only do math, they can do a lot more. Soon you’ll learn
different ways you can use them throughout the data
analysis processes. Formulas are built on operators
which are symbols that name the type of operation or calculation to be performed. For example, a plus sign
is a common operator. The formulas you use
as a data analyst will usually include at
least one operator. Now, let’s talk about math
expressions or equations. These can take a lot
of different forms, but you might be familiar
with them already. 3 minus 1, 15 plus 8 divided
by 2, 846 times 513. These are all examples
of expressions. Is this bringing back
memories of grade school? Well, back in math class, you most likely learned to
complete an expression by including an equal
sign and the solution. It’s slightly different
with spreadsheets. When you create a formula using an expression in a spreadsheet, you start the formula
with an equal sign. For example, if we
want to subtract, we type an equal sign
followed by the rest of the expression without any
spaces in the formula. Now let’s try an expression that’s a bit more challenging. We’ll type 31982, then a hyphen for a minus
sign, then 17795. To calculate, we press “Enter.” You’ll most likely use formulas this way
when dealing with large numbers or expressions
with multiple steps. Here are the operators you
will use to complete formulas. The plus sign for addition, the minus or hyphen
for subtraction, the asterisk for multiplication, and the forward
slash for division. The division and
multiplication symbols might be different than
what you’re used to. Small changes, but
important to keep in mind. If you already have data
in your spreadsheet, you can use cell references
in your formulas instead. A cell reference is
a single cell or range of cells in a worksheet that can
be used in a formula. Cell references
contain the letter of the column and the number of
the row where the data is. A range of cells is a
collection of two or more cells. A range can include cells
from the same row or column, or from different columns
and rows collected together. We’ll show you an example
in an upcoming video. Now let’s apply what we just
learned to some sales data. If we want to add
these figures to find the total sales for
the first row of data, you can click “cell F2”. From there, we’ll start
with an equal sign and use the cell references to input
values in your expression. We’re starting with cell
B2 because the year in A2 is not a value we want
to add to the total. Then press “Enter.” Just like that, your total sales has
been calculated for you, but what if you realized one of the values in your
data was wrong? No problem. You can change the
value in any cell using the formula and the total
will update automatically. The great thing about using
cell references is that they also automatically update when a formula is copied
to a new cell. Talk about a time-saver. Instead of entering
the same formula again for every new set
of cell references, just copy the formula using the menu or a keyboard
shortcut like Control plus C. Then paste the formula where
you want to apply it using Control plus V. And presto! The formula updates all the new cells and
values correctly. Now let’s say you also want
it to find the average sales. For this, you create a new
formula in a different cell. To group values in a
formula, use parentheses. This lets your spreadsheet know
which values to calculate together and the order of the
operations to be performed. For example, open parentheses, then B2 plus C2 plus D2 plus E2, and close parentheses,
then divide the value of all of this
by typing slash four. You are adding the values
in the four cells together and then using the slash to
divide the total by four, and just like the last one, we can copy and
paste the formula. Here’s another formula you
can use if you want to find the percent change in sales
between June and July. Once a formula
calculates the value, you can then use the percent button to change the value
to a percentage. When you apply the formula
to the other rows, both the formula and the percent will
automatically update. That doesn’t look like
the right answer. Looks like we’ve got
an error. Don’t worry. Errors can happen at any
stage of data analysis, and that includes when
you’re using spreadsheets. A formula has to be air tight. If there’s something
wrong with one of the cell references,
it won’t work. So what’s our error? Well, we can see that the
value in cell D4 is missing. It might take some
time and research on your part to find the correct
value, but it’s worth it. You want your analysis to
be as accurate as possible. When you do add the value, the formula takes
care of the rest. That was a lot to take in. Thanks for staying with me. You’ll be able to apply what you learned about formulas
here and later in the program to make your analysis more
efficient and your job, a little easier, and soon you’ll work in
your own spreadsheet. Happy spreadsheeting.
Reading: Quick reference: Formulas in spreadsheets
Reading
You have been learning a lot about spreadsheets and all kinds of time-saving calculations and organizational features they offer. One of the most valuable spreadsheet features is a formula. As a quick reminder, a formula is a set of instructions that does a specific calculation using the data in a spreadsheet. Formulas make it easy for data analysts to do powerful calculations automatically, which helps them analyze data more effectively. Below is a quick-reference guide to help you get the most out of formulas.
Formulas
The basics
- When you write a formula in math, it generally ends with an equal sign (2 + 3 = ?). But with formulas, they always start with one instead (=A2+A3). The equal sign tells the spreadsheet that what follows is part of a formula, not just a word or number in a cell.
- After you type the equal sign, most spreadsheet applications will display an autocomplete menu that lists valid formulas, names, and text strings. This is a great way to create and edit formulas while avoiding typing and syntax errors.
- A fun way to learn new formulas is just by typing an equal sign and a single letter of the alphabet. Choose one of the options that pops up and you will learn what that formula does.
Mathematical operators
- The mathematical operators used in spreadsheet formulas include:
- Subtraction – minus sign ( – )
- Addition – plus sign ( + )
- Division – forward-slash ( / )
- Multiplication – asterisk ( * )
Auto-filling
The lower-right corner of each cell has a fill handle. It is a small green square in Microsoft Excel and a small blue square in Google Sheets.
- Click the fill handle for a cell and drag it down a column to auto-fill other cells in the column with the same value or formula in that cell.
- Click the fill handle for a cell and drag it across a row to auto-fill other cells in the row with the same value or formula in that cell.
- If you want to create a numbered sequence in a column or row, do the following: 1) Fill in the first two numbers of the sequence in two adjacent cells, 2) Select to highlight the cells, and 3) Drag the fill handle to the last cell to complete the sequence of numbers. For example, to insert 1 through 100 in each row of column A, enter 1 in cell A1 and 2 in cell A2. Then, select to highlight both cells, click the fill handle in cell A2, and drag it down to cell A100. This auto-fills the numbers sequentially so you don’t have to type them in each cell.
Absolute referencing
- Absolute referencing is marked by a dollar sign ($). For example, =$A$10 has absolute referencing for both the column and the row value
- Relative references (which is what you normally do e.g. “=A10”) will change anytime the formula is copied and pasted. They are in relation to where the referenced cell is located. For example if you copied “=A10” to the cell to the right it would become “=B10”. With absolute referencing “=$A$10” copied to the cell to the right would remain “=$A$10”. But if you copied $A10 to the cell below, it would change to $A11 because the row value isn’t an absolute reference.
- Absolute references will not change when you copy and paste the formula in a different cell. The cell being referenced is always the same.
- To easily switch between absolute and relative referencing in the formula bar, highlight the reference you want to change and press the F4 key; for example, if you want to change the absolute reference, $A$10, in your formula to a relative reference, A10, highlight $A$10 in the formula bar and then press the F4 key to make the change.
Data range
- When you click into your formula, the colored ranges let you see which cells are being used in your spreadsheet. There are different colors for each unique range in your formula.
- In a lot of spreadsheet applications, you can press the F2 (or Enter) key to highlight the range of data in the spreadsheet that is referenced in a formula. Click the cell with the formula, and then press the F2 (or Enter) key to highlight the data in your spreadsheet.
Combining with functions
- COUNTIF() is a formula and a function. This means the function runs based on criteria set by the formula. In this case, COUNT is the formula; it will be executed IF the conditions you create are true. For example, you could use =COUNTIF(A1:A16, “7”) to count only the cells that contained the number 7. Combining formulas and functions allows you to do more work with a single command.
Video: Spreadsheet errors and fixes
This video covers the most common spreadsheet errors and how to fix them.
- DIV error: This error occurs when you try to divide by zero or an empty cell. To fix this error, you can use the IFERROR function to insert a custom message, such as “Not applicable”, whenever the formula would result in a DIV error.
- ERROR: This error occurs when the spreadsheet can’t interpret the formula. This can be caused by a missing comma or other typo. To fix this error, carefully check the formula for any errors.
- N/A error: This error occurs when the spreadsheet can’t find the data that is being referenced in the formula. This can happen when the data doesn’t exist or when the formula is misspelled. To fix this error, check the data and the formula for any errors.
- NAME error: This error occurs when the spreadsheet doesn’t recognize the name of a function or another object in the formula. To fix this error, check the spelling of the name and make sure that the object exists.
- NUM error: This error occurs when the spreadsheet can’t perform the calculation specified by the formula. This can happen when the data in the formula is inconsistent or wrong. To fix this error, check the data for any errors.
- VALUE error: This error can occur for a variety of reasons, such as when a text value is used in a numeric calculation or when a cell reference is incorrect. To fix this error, check the data and the formula for any errors.
- REF error: This error occurs when a formula references a cell that has been deleted. To fix this error, change the formula to reference a different cell or range of cells.
The video also provides some tips for troubleshooting spreadsheet errors:
- Carefully check the formula for any errors.
- Check the data for any errors.
- Use the IFERROR function to insert a custom message whenever a formula would result in an error.
- Use the SUM function and a range of cells instead of adding cell values by direct reference.
This video is a good resource for anyone who wants to learn how to troubleshoot spreadsheet errors.
Spreadsheet errors can be a major headache for data analysts. They can cause incorrect results, which can lead to bad decisions. There are a number of ways to find and fix spreadsheet errors.
Here are some common spreadsheet errors:
- Formula errors: These errors occur when there is a mistake in a formula. For example, a typo in a cell reference can cause a formula to return an incorrect result.
- Data entry errors: These errors occur when incorrect data is entered into a spreadsheet. For example, a number may be entered as text, or a date may be entered in the wrong format.
- Formatting errors: These errors occur when a spreadsheet is formatted incorrectly. For example, a number may be formatted as text, or a date may be formatted as a time.
- Logical errors: These errors occur when a formula is logically incorrect. For example, a formula may return a true value when it should return a false value.
Here are some ways to find and fix spreadsheet errors:
- Use a spreadsheet auditing tool: Spreadsheet auditing tools can help you find and fix errors in your spreadsheets. These tools can identify formulas that are returning incorrect results, data that is entered incorrectly, and formatting errors.
- Use a spreadsheet validation tool: Spreadsheet validation tools can help you prevent errors from occurring in the first place. These tools can be used to validate data entry, formulas, and formatting.
- Check your work: The best way to find errors is to simply check your work carefully. This includes looking for typos, verifying data entry, and making sure that formulas are correct.
Here are some additional tips for avoiding spreadsheet errors:
- Use clear and descriptive names for your cells and formulas.
- Use consistent formatting throughout your spreadsheet.
- Use a spreadsheet template whenever possible.
- Use a spreadsheet auditing tool to regularly check for errors.
- Back up your spreadsheets regularly.
Hi and welcome back. Recently we’ve been
learning about formulas. Sometimes data
analysts encounter a problem with our formulas
and we get an error. We’ve all been there and
it can be frustrating. But there are solutions, that’s what we’re going
to explore in this video. One error you may encounter
is the DIV error. The DIV error happens when a
formula is trying to divide a value in a cell by zero
or by an empty cell. In this spreadsheet, the percentage
Complete values in column C are calculated by dividing the values in the Tasks Completed column by the values in the
Required Tasks column. Notice that column C is already formatted
as a percentage. The DIV error is in
cell C4 because we’re dividing by zero the
value in cell A4. To avoid this problem, we can have this spreadsheet automatically enter
not applicable whenever a cell in column A contains a zero that
would cause the error. To do this, we’ll use
the IFERROR function. If it encounters a DIV error caused by a cell that
contains the zero, the phrase “Not applicable”
will be inserted. We can also copy the formula
to the rest of the cells in column C so it checks for any other cells that
contain a zero. Now let’s move on to ERROR. In Google Sheets, ERROR tells us the formula can’t be interpreted as it is input. This is also known
as a parsing error. Say we want to
tally the number of total tasks in column B and C, we use the SUM function, but the formula
equal sum B2 to B6, C2 to C6 causes an error. Examining it more closely, we see that a comma
is missing between the cell ranges B2
to B6 and C2 to C6. We can fix this by inserting
a comma between the cell ranges to indicate the
end of each data item. This is called a delimiter, which you will learn
more about soon. Now, the formula can correctly calculate the total
number of tasks as 25. Another type of error is N/A. The N/A error tells
you that the data in your formula can’t be
found by the spreadsheet. Generally, this means
the data doesn’t exist. This error most often occurs when using functions
such as VLOOKUP, which searches for a
certain value in a column to return a corresponding
piece of information. Here, we see a master list
of nuts and their prices. Using VLOOKUP, the spreadsheet
finds prices in the list, then calculates the prices for each store using the
assigned markup. But we have a N/A error
in cells B49 and C49. The VLOOKUP formula is correct, so what’s going on? Well, if we look carefully
at the name of the nut, “almond” has no match
in the lookup table, the lookup table uses the
plural “almonds” instead. So we change almond to almonds, and with that typo fixed, the right prices are filled in. Speaking of typos, sometimes a typo can cause a NAME error. A NAME error can happen when a formula’s name isn’t
recognized or understood. Suppose we see a NAME error in the nut prices spreadsheet. If we look carefully, the VLOOKUP function in cell
B21 is spelled incorrectly, it has one extra O; this causes a NAME error for both the price and the resulting markup
calculation for the store. To fix this error, we can delete the
extra O in VLOOKUP. Perfect. Sometimes an error is caused by inconsistent
or wrong data. For instance, the NUM
error tells us that a formula’s calculation can’t be performed as
specified by the data. The data doesn’t make sense
for that calculation. Here’s what I mean. Suppose we’re working on a large construction
project using a spreadsheet to track how many months it takes
to reach key milestones. We can use the
DATEDIF function to calculate the number of months between start and end dates. The function requires
the start date to be in the first cell referenced and the end date to be in the second
cell referenced. In our case, cells B2
and C2 respectively. The M represents months, as we want this spreadsheet
to calculate the number of months between our
start and end dates. But we get a NUM
error in cell D6. We notice that the end date
comes before the start date, so the DATEDIF function can’t calculate the
number of months between. It’s likely the
start and end dates were interchanged by accident. We can request verification
of the data to make sure. In the meantime, let’s
reverse the order of the cells in the formula to temporarily get
around the error. Now, the result is nine months. What if the client’s
name was accidentally inserted into the start
date in the spreadsheet? You guessed it, we get an error. The VALUE error can indicate a problem with a formula
or referenced cells. It’s often not clear right
away what the problem is, so this error might take a
little more effort to fix. In this case, John Welty was
input as the start date, making the calculation
impossible for the DATEDIF function
in the cell D6. We just replace the
text, John Welty, with the correct start date
of September 1st, 2016. Last is the REF error, which often comes
up when cells being referenced in a formula
have been deleted, thus making the formula unable to perform the calculation. Here’s a spreadsheet
used to calculate the number of seats available
for a company lunch. Let’s say the company decided not to run
the second floor, so we delete row 4. This results in a REF error when calculating the total seats
available in cell B5. To fix this, we can
change the formula to add the values in
cells B2 and B3. Also, in this case, we could have prevented the REF error by using
the SUM function and a range of cells
instead of adding the cell value by
direct reference. Now, if we delete row 10, the SUM function
calculates the total seats available. There you go. We’ve now fixed some of the most common
spreadsheet errors. When you see them again, you’ll know what they mean. Troubleshooting is a big
part of data analysis, so being able to find solutions is a key skill for
data analysts.
Reading: More about spreadsheet errors and fixes
Overview
When you are new to data analytics—and sometimes even when you aren’t—spreadsheet struggles are real. It never feels good when you type in what you are sure is a perfect formula or function, only to get an error message. Understanding errors and how to fix them is a big part of keeping your data clean, so it’s important to know how to deal with issues as they come up, and more importantly, not to get discouraged.
Remember, even the most advanced spreadsheet users come across problems from time to time.
As a follow-up to what you learned in the previous video, here are a few best practices and helpful tips. These strategies will help you avoid spreadsheet errors to begin with, making your life in analytics a whole lot less stressful:
- Filter data to make your spreadsheet less complex and busy.
- Use and freeze headers so you know what is in each column, even when scrolling.
- When multiplying numbers, use an asterisk (*) not an X.
- Start every formula and function with an equal sign (=).
- Whenever you use an open parenthesis, make sure there is a closed parenthesis on the other end to match.
- Change the font to something easy to read.
- Set the border colors to white so that you are working in a blank sheet.
- Create a tab with just the raw data, and a separate tab with just the data you need.
Now that you have learned some basic ways to avoid errors, you can focus on what to do when that dreaded pop-up does appear. The following table is a reference you can use to look up common spreadsheet errors and examples of each. Knowing what the errors mean takes some of the fear out of getting them.
Error | Description | Example |
---|---|---|
#DIV/0! | A formula is trying to divide a value in a cell by 0 (or an empty cell with no value) | =B2/B3, when the cell B3 contains the value 0 |
#ERROR! | (Google Sheets only) Something can’t be interpreted as it has been input. This is also known as a parsing error. | =COUNT(B1:D1 C1:C10) is invalid because the cell ranges aren’t separated by a comma |
#N/A | A formula can’t find the data | The cell being referenced can’t be found |
#NAME? | The name of a formula or function used isn’t recognized | The name of a function is misspelled |
#NUM! | The spreadsheet can’t perform a formula calculation because a cell has an invalid numeric value | =DATEDIF(A4, B4, “M”) is unable to calculate the number of months between two dates because the date in cell A4 falls after the date in cell B4 |
#REF! | A formula is referencing a cell that isn’t valid | A cell used in a formula was in a column that was deleted |
#VALUE! | A general error indicating a problem with a formula or with referenced cells | There could be problems with spaces or text, or with referenced cells in a formula; you may have additional work to find the source of the problem. |
If you are working with Microsoft Excel, an interactive page, How to correct a #VALUE! error, can help you narrow down the cause of this error. You can select a specific function from a drop-down list to display a link to tips to fix the error when using that function.
Pro tip: Spotting errors in spreadsheets with conditional formatting
Conditional formatting can be used to highlight cells a different color based on their contents. This feature can be extremely helpful when you want to locate all errors in a large spreadsheet. For example, using conditional formatting, you can highlight in yellow all cells that contain an error, and then work to fix them.
Conditional formatting in Microsoft Excel
To set up conditional formatting in Microsoft Excel to highlight all cells in a spreadsheet that contain errors, do the following:
- Click the gray triangle above row number 1 and to the left of Column A to select all cells in the spreadsheet.
- From the main menu, click Home, and then click Conditional Formatting to select Highlight Cell Rules > More Rules.
- For Select a Rule Type, choose Use a formula to determine which cells to format.
- For Format values where this formula is true, enter =ISERROR(A1).
- Click the Format button, select the Fill tab, select yellow (or any other color), and then click OK.
- Click OK to close the format rule window.
To remove conditional formatting, click Home and select Conditional Formatting, and then click Manage Rules. Locate the format rule in the list, click Delete Rule, and then click OK.
Conditional formatting in Google Sheets
To set up conditional formatting in Google Sheets to highlight all cells in a spreadsheet that contain errors, do the following:
- Click the empty rectangle above row number 1 and to the left of Column A to select all cells in the spreadsheet. In the Step-by-step in spreadsheets video, this was called the Select All button.
- From the main menu, click Format and select Conditional Formatting to open the Conditional format rules pane on the right.
- While in the Single Color tab, under Format rules, use the drop-down to select Custom formula is, enter =ISERROR(A1), select yellow (or any other color) for the formatting style, and then click Done.
To remove conditional formatting, click Format and select Conditional Formatting, and then click the Trash icon for the format rule.
Spreadsheet error resources
To learn more and read about additional examples of errors and solutions, explore these resources:
- Microsoft Formulas and Functions: This resource describes how to avoid broken formulas and how to correct errors in Microsoft Excel. This is a useful reference to have saved in case you run into a specific error and need to find solutions quickly while working in Excel.
- When Your Formula Doesn’t Work: Formula Parse Errors in Google Sheets: This resource is a guide to finding and fixing some common errors in Google Sheets. If you are working with Google Sheets, you can use this as a quick reference for solving problems you might encounter working on your own.
With some practice and investigative determination, you will become much more comfortable handling errors in spreadsheets. Each error you catch and fix will make your data clearer, cleaner, and more useful.
Practice Quiz: Test your knowledge on using formulas in spreadsheets
Which of the following are examples of operators used in formulas? Select all that apply.
Hyphen (-), Asterisk (*), Forward slash (/)
The asterisk, hyphen, and forward slash are examples of operators used in formulas.
In a spreadsheet, a formula should always start with which of the following operators?
Equal sign (=)
In a spreadsheet, a formula should always start with an equal sign.
What is the term for the set of cells that a data analyst selects to include in a formula?
Data range
The set of cells a data analyst selects to include in a formula is called the data range.
In a formula, the plus sign (+) is the operator for addition, and the hyphen (-) is the operator for subtraction.
True
In a formula, the plus sign (+) is the operator for addition, and the hyphen (-) is the operator for subtraction.
Functions in spreadsheets
Video: Functions 101
A function is a preset command that automatically performs a specific process or task using the data. Functions can be used to simplify calculations and make spreadsheets more efficient. Here are some examples of functions in spreadsheets:
- SUM: Adds all the numbers in a range of cells.
- AVERAGE: Calculates the average of all the numbers in a range of cells.
- MIN: Returns the smallest number in a range of cells.
- MAX: Returns the largest number in a range of cells.
To use a function in a spreadsheet, you simply type the function name followed by the range of cells that you want to apply the function to. For example, to calculate the total sales for the month of June, you would type the following formula into a cell:
=SUM(B2:C2)
This formula would return the number 100, because the total sales for June are 50+50=100.
Functions can also be used to perform more complex calculations, such as calculating the percent change in sales between two months or finding the lowest monthly sales in a data set.
Functions are a powerful tool that can make spreadsheets more efficient and informative. By learning how to use functions, you can become a more effective data analyst.
Functions 101 in Spreadsheets
Functions are a powerful tool that can make spreadsheets more efficient and informative. They allow you to perform complex calculations and transformations on your data with just a few clicks.
What is a function?
A function is a preset formula that performs a specific task on your data. For example, the SUM function adds all the numbers in a range of cells, while the AVERAGE function calculates the average of all the numbers in a range of cells.
How to use functions
To use a function in a spreadsheet, you simply type the function name followed by the range of cells that you want to apply the function to. For example, to calculate the total sales for the month of June, you would type the following formula into a cell:
=SUM(B2:C2)
This formula would return the number 100, because the total sales for June are 50+50=100.
Some common functions
Here are some of the most common functions that you will use in spreadsheets:
- SUM: Adds all the numbers in a range of cells.
- AVERAGE: Calculates the average of all the numbers in a range of cells.
- MIN: Returns the smallest number in a range of cells.
- MAX: Returns the largest number in a range of cells.
- COUNT: Counts the number of cells in a range that contain numbers or text.
- IF: Performs a conditional calculation based on a given condition.
- VLOOKUP: Looks up a value in a table and returns the corresponding value from another column in the table.
Nesting functions
You can also nest functions, which means that you can use one function as the argument to another function. For example, the following formula calculates the total sales for June, minus the cost of goods sold:
=SUM(B2:C2) - SUM(D2:E2)
In this formula, the SUM function is nested inside the SUM function. This allows you to perform complex calculations in a single formula.
Tips for using functions
Here are some tips for using functions in spreadsheets:
- Use the function wizard to help you insert and edit functions. To open the function wizard, click on the Insert Functions button in the formula bar.
- Be careful when nesting functions. Make sure that you understand the order in which the functions will be evaluated.
- Use meaningful function names and cell references. This will make your formulas easier to read and maintain.
Conclusion
Functions are a powerful tool that can make spreadsheets more efficient and informative. By learning how to use functions, you can become a more effective data analyst.
Here are some additional tips for learning and using functions in spreadsheets:
- Start by learning the most common functions, such as SUM, AVERAGE, MIN, MAX, COUNT, IF, and VLOOKUP.
- Practice using functions on your own data. This is the best way to learn how they work and how to use them effectively.
- Look for tutorials and online resources that can help you learn more about functions.
- Don’t be afraid to experiment. The best way to learn is by trying different things.
With a little practice, you will be using functions like a pro in no time!
Formulas are a
great way to become more efficient when
using spreadsheets, especially when you add shortcuts like copying and
pasting, into the mix. As you progress as
a data analyst, you’ll most likely learn more shortcuts to
help your process. But now it’s time to
move on to functions. While they’re closely
related to formulas, they’re not exactly the same. By the end of this video, you’ll understand
the difference and know when to use them both. In the world of spreadsheets a function is a
preset command that automatically performs
a specific process or task using the data. You might remember
some of the shortcuts we learned that can be
used with formulas. Think of functions as the
most useful of the shortcuts. The good news is a lot of spreadsheet functions have names that tell you what they do. There are tons of
functions out there. As you continue to work
with spreadsheets, you’ll find that you
use certain ones a lot, and others, rarely or not at all. For now, let’s take a look at some of the
functions that we can apply to our sales data
from the previous video. We’ll start with total sales. Let’s use the SUM function
for this in cell F2. The first steps are pretty similar to what we did
in the last video. First, we’ll select the cell where we want the
calculation to appear. Type equals, then add the
word SUM as our function. One of the great
things about functions is they don’t always
need operators, like a plus sign for addition. In this case, after
the open parentheses, you can go ahead and select the range of cells you’re adding. A colon between the
cell references shows that you’re using a range. In this case, the range includes
cells from the same row. After the closed
parentheses, we press Enter. Just like that, our total
sales number appears. Just like the formula
we used before, functions can be
copied and pasted into other cells in
the same column. But let’s undo that
step so that you can see another way to copy
a function or formula. Spreadsheets have something
called a fill handle. It’s a little box that appears in the lower right-hand corner
when you click on a cell. If you rest your
cursor on the box, you can then drag
the fill handle to the other boxes in the
same row or column. Any formula or function
in that cell will automatically be added to
the cells you fill plus, the fill handle will
update the formula so the cell references match the row of the columns
of the cells you fill. This means the formula
is calculated based on the data in each
separate row or column. Filling won’t work
for every situation, but it’s still a
pretty great trick. Now let’s find the
average sale for each month using the
AVERAGE function. Different functions perform
different calculations, but they work in the same way. Keep in mind, not
every calculation you’ll come across has its
own function to help you. For example, to find the percent change in sales
between June and July, you’d use the same formula
you used in an earlier video. Let’s say you’re asked to find the lowest monthly
sales in this data set. There’s a function for that. It’s called the MIN function, which stands for minimum. Here’s how it works. Say you need to find the
lowest monthly sales for the whole set. All you have to do is
set up the function. Then after the open parenthesis, select the values
from all three rows. This might be
important information for your stake holders. Let’s add color to the
cell with that value, in your data set to
make it stand out. In this case, click on cell
D2 and then fill color icon, which looks like a paint can, then choose a color. I’ll use yellow here. You can follow the same steps for the highest sales by using the, wait for it, MAX function. Looks like we have
an error message. What could be wrong? We forgot to include an open parentheses
after the function. No worries, it’s a quick fix. But this is a good reminder
to continually check the format of your functions and formulas as you use them. We’ll learn more
about Error messages and how to work with them later. That’s better. Now we’ll add color to the cell with
the highest sales too. This is just one way
to highlight key data. You’ll find out about
some others later. You’ve now had a peek
at some ways you can add and organize
data in a spreadsheet. You’ve also seen how
powerful formulas and functions can be when
applied to real world data. As a data analyst, this is just the beginning of your experience
with spreadsheets. You’ll soon find out how much more spreadsheets
have to offer. In the meantime, you’re free to practice some of these formulas, functions, and other
processes on your own. It can be fun to experiment, and see all that
spreadsheets can do. Soon, you will switch from spreadsheets to
structured thinking. The data analytics pieces are
starting to fit together. Exciting stuff is coming
right up. So stick around.
Which of the following are functions? Select all that apply.
MIN, AVERAGE, SUM
SUM, AVERAGE, and MIN are functions. A function is a preset command that automatically performs a specific process or task using data.
Reading: Quick reference: Functions in spreadsheets
Overview
As a quick refresher, a function is a preset command that automatically performs a specific process or task using the data in a spreadsheet. Functions give data analysts the ability to do calculations, which can be anything from simple arithmetic to complex equations. Use this reading to help you keep track of some of the most useful options.
Functions
The basics
- Just like formulas, start all of your functions with an equal sign; for example =SUM. The equal sign tells the spreadsheet that what follows is part of a function, not just a word or number in a cell.
- After you type the equal sign, most spreadsheet applications will display an autocomplete menu that lists valid functions, names, and text strings. This is a great way to create and edit functions while avoiding typing and syntax errors.
- A fun way to learn new functions is by simply typing an equal sign and a single letter of the alphabet. Choose one of the options that pops up and learn what that function does.
Difference between formulas and functions
- A formula is a set of instructions used to perform a calculation using the data in a spreadsheet.
- A function is a preset command that automatically performs a specific process or task using the data in a spreadsheet.
Popular functions
A lot of people don’t realize that keyboard shortcuts like cut, save, and find are actually functions. These functions are built into an application and are amazing time-savers. Using shortcuts lets you do more with less effort. They can make you more efficient and productive because you are not constantly reaching for the mouse and navigating menus. Use these links to discover the most popular shortcuts, for Chromebook, PC, and Mac.
Auto-filling
The lower-right corner of each cell has a fill handle. It is a small green square in Microsoft Excel and a small blue square in Google Sheets.
- Click the fill handle for a cell and drag it down a column to auto-fill other cells in the column with the same formula or function used in that cell.
- Click the fill handle for a cell and drag it across a row to auto-fill other cells in the row with the same formula or function used in that cell.
Relative, absolute, and mixed references
- Relative references (cells referenced without a dollar sign, like A2) will change when you copy and paste the function into a different cell. With relative references, the location of the cell that contains the function determines the cells used by the function.
- Absolute references (cells fully referenced with a dollar sign, like $A$2) will not change when you copy and paste the function into a different cell. With absolute references, the cells referenced always remain the same.
- Mixed references (cells partially referenced with a dollar sign, like $A2 or A$2) will change when you copy and paste the function into a different cell. With mixed references, the location of the cell that contains the function determines the cells used by the function, but only the row or column is relative (not both).
- In spreadsheets, you can press the F4 key to toggle between relative, absolute, and mixed references in a function. Click the cell containing the function, highlight the referenced cells in the formula bar, and then press F4 to toggle between and select relative, absolute, or mixed referencing.
Data ranges
- When you click a cell that contains a function, colored data ranges in the formula bar indicate which cells are being used in the spreadsheet. There are different colors for each unique range in a function.
- Colored data ranges help prevent you from getting lost in complex functions.
- In spreadsheets, you can press the F2 key to highlight the range of data used by a function. Click the cell containing the function, highlight the range of data used by the function in the formula bar, and then press F2. The spreadsheet will go to and highlight the cells specified by the range.
Data ranges evaluated for a condition
COUNTIF is an example of a function that returns a value based on a condition that the data range is evaluated for. The function counts the number of cells that meet the criteria. For example, in an expense spreadsheet, use COUNTIF to count the number of cells that contain a reimbursement for “airfare.”
For more information, refer to:
- Microsoft Support’s page for COUNTIF
- Google Help Center’s documentation for COUNTIF where you can copy a sheet with COUNTIF examples (click “Use Template” if you click the COUNTIF link provided on this page)
Conclusion
There are a lot more functions that can help you make the most of your data. This is just the start. You can keep learning how to use functions to help you solve complex problems efficiently and accurately throughout your entire career.
Practice Quiz: Hands-On Activity: Create a Custom Data Table
Practice Quiz: Test your knowledge on using functions in spreadsheets
Data analysts use which of the following functions to quickly perform calculations in a spreadsheet? Select all that apply.
AVERAGE, MIN, SUM
AVERAGE, MIN, and SUM are functions used to quickly perform calculations in a spreadsheet.
What is the term for a preset command in a spreadsheet?
Function
A preset command in a spreadsheet is called a function.
You are working with spreadsheet data about a cross-country relay race. Each runner’s times are located in cells H2 through H28. To find the runner with the slowest time, what is the correct function?
=MAX(H2:H28)
The function is =MAX(H2:H28). The largest numeric value corresponds to the slowest time in the race. MAX returns the largest numeric value from a range of cells. And H2:H28 is the specified range.
Save time with structured thinking
Video: Before solving a problem, understand it
- Albert Einstein once said that if he had one hour to save the planet, he would spend 59 minutes defining the problem and one minute resolving it. This shows the importance of defining the problem before trying to solve it.
- A lot of times, teams jump right into data analysis without clearly defining the problem. This can lead to them solving the wrong problem or not having the right data.
- In this video, we will learn how to develop a structured approach to defining the problem domain. This is the specific area of analysis that encompasses every activity affecting or affected by the problem.
- Before we can do anything else, we need to understand the problem domain and all of its parts and relationships. This is like putting together a jigsaw puzzle without knowing what the picture is supposed to be.
- Data analysts face the same challenges. They are not always given the complete picture at the start of a project. A big part of their job is to develop a structured approach and use critical thinking to find the best solution.
- This starts with understanding the problem domain. We need to train our brains to think structurally in order to successfully solve problems as data analysts.
Here are some key points from the text:
- Defining the problem is an important first step in data analysis.
- A structured approach to defining the problem domain can help us to understand the problem better and to find the best solution.
- Data analysts need to be able to think structurally in order to solve problems.
Introduction
Data analysis is the process of collecting, cleaning, and analyzing data to extract insights. It is a powerful tool that can be used to solve a variety of problems. However, before you can solve a problem with data analysis, you need to understand the problem.
Why is understanding the problem important?
There are a few reasons why understanding the problem is important in data analysis. First, it helps you to identify the right data to collect. If you don’t understand the problem, you may collect the wrong data, which can lead to inaccurate results. Second, understanding the problem helps you to choose the right analytical methods. There are many different analytical methods available, and each one is better suited for solving certain types of problems. Third, understanding the problem helps you to interpret the results of your analysis. If you don’t understand the problem, you may misinterpret the results, which can lead to incorrect conclusions.
How to understand a problem
There are a few steps you can take to understand a problem:
- Define the problem. What is the specific problem you are trying to solve? What are the symptoms of the problem? What are the consequences of the problem?
- Gather information. What data is available that can help you to understand the problem? Where can you find this data?
- Analyze the data. This involves cleaning the data, exploring the data, and identifying patterns in the data.
- Develop a hypothesis. Based on your understanding of the data, what is the likely cause of the problem?
- Test your hypothesis. This involves collecting more data and conducting further analysis.
- Interpret the results. What do the results of your analysis tell you about the problem?
Conclusion
Understanding the problem is an essential first step in data analysis. By taking the time to understand the problem, you can increase the chances of success in your data analysis project.
Here are some additional tips for understanding a problem in data analysis:
- Talk to the people who are affected by the problem. They can provide valuable insights into the problem and its causes.
- Brainstorm with a team of people. This can help you to come up with different perspectives on the problem.
- Use visualization tools. This can help you to see the data in a new way and to identify patterns that you may not have noticed otherwise.
- Be patient. It takes time to understand a complex problem. Don’t rush the process.
Albert Einstein once
said,” If I were given one hour to
save the planet, I would spend 59 minutes defining the problem and one
minute resolving it.” Now, that might seem extreme, but it does show us just
how important it is to define the problems
before trying to solve them. A lot of times, teams jump right into data
analysis before realizing a few months later
that they are either solving the wrong problem or they don’t have
the right data. In this video, we will
learn how to develop a structured approach to
defining the problem domain. This is important
because if you define the problem clearly
from the start, it’ll be easier to solve, which saves a lot of time,
money, and resources. In the data world, we call this first piece
the problem domain: the specific area of
analysis that encompasses every activity affecting or
affected by the problem. Before we can do anything else, we need to understand the problem domain
and all of its parts and relationships so that we can discover the whole story. Actually calling it the first piece makes me think
of a jigsaw puzzle. Say you have a puzzle. Let’s think of that puzzle
as our problem domain. You have all 500 pieces
but you lost the box. So you don’t know what
image the puzzle will reveal. Will it be an animal? A waterfall? A bowl of oranges? Whatever it is, it’s going
to be tough trying to put it together without an
image you can refer to. Even the greatest puzzler
in the galaxy would need a new process and lots of time to
complete that puzzle. Data analysts face the same kinds of challenges too. You might remember that
data analysts aren’t always given the
complete picture at the start of a project. A big part of their
job is to develop a structured approach and use critical thinking to
find the best solution. That starts with understanding
the problem domain. This is where structured
thinking comes into play. To successfully solve a
problem as a data analyst, you need to train your brain
to think structurally. That’s exactly what you’ll learn coming up. See you there.
Video: Scope of work and structured thinking
Structured thinking is the process of recognizing the current problem or situation, organizing available information, revealing gaps and opportunities, and identifying the options. It is a way to be prepared and to have a clear plan for completing a project or solving a problem.
Structured thinking helps data analysts save time and effort by preventing them from having to redo their work. It also makes their job easier by allowing them to better understand the work they are doing.
One of the starting places for structured thinking is the problem domain, which is the specific area of analysis that encompasses every activity affecting or affected by the problem. Once you know the problem domain, you can set your base and lay out all your requirements and hypotheses before you start investigating.
Another way to practice structured thinking is to use a scope of work (SOW). An SOW is an agreed-upon outline of the work you are going to perform on a project. It should include things like work details, schedules, and reports that the client can expect.
A scope of work can be a simple but powerful tool. With a solid scope of work, you will be able to address any confusion, contradictions, or questions about the data up-front and make sure these setbacks don’t stand in your way.
In the next video, you will learn about the importance of contextualizing data and avoiding bias.
Introduction
A scope of work (SOW) is an agreed-upon outline of the work that will be performed on a project. It is a valuable tool for data analysts because it can help to avoid confusion, contradictions, and questions about the data up-front.
Structured thinking is a process of recognizing the current problem or situation, organizing available information, revealing gaps and opportunities, and identifying the options. It is a valuable tool for data analysts because it can help us to better understand the work we are doing and to avoid mistakes.
How to create a scope of work
To create a scope of work, you will need to:
- Define the problem domain. This is the specific area of analysis that you are interested in.
- Identify the deliverables. What are the specific products or services that you will be providing?
- Set the timeline. When will the work be completed?
- Define the milestones. What are the key checkpoints along the way?
- Identify the resources. What people, equipment, and materials will be needed?
- Establish the budget. How much will the project cost?
How to use structured thinking
To use structured thinking, you will need to:
- Define the problem. What is the specific problem that you are trying to solve?
- Gather information. What data is available that can help you to understand the problem?
- Analyze the data. This involves cleaning the data, exploring the data, and identifying patterns in the data.
- Develop a hypothesis. Based on your understanding of the data, what is the likely cause of the problem?
- Test your hypothesis. This involves collecting more data and conducting further analysis.
- Interpret the results. What do the results of your analysis tell you about the problem?
Conclusion
A scope of work and structured thinking are both valuable tools for data analysts. By using these tools, you can avoid setbacks and ensure that your data analysis projects are successful.
Here are some additional tips for creating a scope of work and using structured thinking:
- Be clear and concise. The scope of work should be easy to understand by everyone involved in the project.
- Be realistic. The timeline and budget should be achievable.
- Be flexible. Things change, so be prepared to adjust the scope of work as needed.
- Communicate regularly. Keep everyone involved in the project updated on your progress.
What process do data analysts use to recognize the current situation, organize information, and identify options?
Structured thinking
Data analysts use structured thinking to recognize the current situation, organize information, and identify opportunities.
Earlier I told you that carefully defining a business problem can ultimately save time,
money, and resources. All of this is achieved
through structured thinking. Structured thinking
is the process of recognizing the current
problem or situation, organizing available information, revealing gaps and opportunities, and identifying the options. In other words, it’s a way
of being super prepared. It’s having a clear list of what you are expected to deliver, a timeline for major
tasks and activities, and checkpoints so the team
knows you’re making progress. In this video, we’ll look at how structured thinking helps
us save time and effort, but also makes our job
as data analysts easier because it allows us to better understand the work we are doing. In the business world, it’s common for teams
to spend hours of valuable time trying to
solve an important problem, only to end up back
where they started. Not only is the initial
problem not resolved, but they’ve spent hours
not resolving it. This outcome negatively
affects you, your team, and the
organization as a whole. But it can usually be prevented. Many times the situation is a result of not fully
understanding the issue. Structured thinking will help you understand problems
at a high level so that you can identify
areas that need deeper investigation
and understanding. The starting place for structured thinking is
the problem domain, which you might have
remembered from earlier. Once you know the specific
area of analysis, you can set your base and lay out all your requirements and hypotheses before you
start investigating. With a solid base in place, you’ll be ready to deal with
any obstacles that come up. What kind of obstacles? Well, let’s say you’re asked to predict
the future value of an apartment building
based on a given dataset. You have hundreds
of variables and every one is crucial
to your analysis. But what if one variable
accidentally gets left out, like square footage, for example? You’d have to go back and
redo all your hard work. That’s because
missing variables can lead to inaccurate conclusions. Another way that you can
practice structured thinking and avoid mistakes is by
using a scope of work. A scope of work or
SOW is an agreed- upon outline of the work you’re going to
perform on a project. For many businesses, this includes things
like work details, schedules, and reports that
the client can expect. Now, as a data analyst, your scope of work
will be a bit more technical and include those basic items
we just mentioned, but you’ll also focus on things like data preparation,
validation, analysis of quantitative
and qualitative datasets, initial results, and maybe even some visuals to really
get the point across. Let’s bring a scope of work to life with a simple example. Say a couple has hired
a wedding planner. We’ll focus on just one task,
the wedding invitations. Here’s what might be in
scope of work: deliverables, timeline, milestones,
and reports. Let’s break down just one
of these, deliverables. The wedding planner and couple will need to
decide on the invitation, make a list of people to invite,
collect their addresses, print the invitations,
address the envelopes, stamp them, and mail them out. Now let’s check
out the timelines. You’ll notice the dates and the milestones which
keep us on track. Finally, we have the reports, which give our couple
some peace of mind by telling them when each
step is complete. A scope of work can be a
simple but powerful tool. With a solid scope of work, you’ll be able to
address any confusion, contradictions, or
questions about the data up- front and make sure these sneaky setbacks
don’t stand in your way. This is a simple example of what a scope of
work might look like. But later, you’ll be able to
practice building your own. Next up in our scope, we’ll check out setbacks from a different angle by learning the importance of contextualizing
data and avoiding bias. Looking forward to sharing
some cool insights with you.
Practice Quiz: Hands-On Activity: Create a scope of work
Reading
5 Data Analytics Projects for Beginners
If you’re getting ready to launch a new career as a data analyst, chances are you’ve encountered an age-old dilemma. Job listings ask for experience, but how do you get experience if you’re looking for your first data analyst job?
This is where your portfolio comes in. The projects you include in your portfolio demonstrate your skills and experience—even if it’s not from a previous data analytics job—to hiring managers and interviewers. Populating your portfolio with the right projects can go a long way toward building confidence that you’re the right person for the job, even without previous work experience.
In this article, we’ll discuss five types of projects you should include in your data analytics portfolio, especially if you’re just starting out. You’ll see some examples of how these projects are presented in real portfolios, and find a list of public data sets you can use to start completing projects.
Data analysis project ideas
As an aspiring data analyst, you’ll want to demonstrate a few key skills in your portfolio. These data analytics project ideas reflect the tasks often fundamental to many data analyst roles.
1. Web scraping
While you’ll find no shortage of excellent (and free) public data sets on the internet, you might want to show prospective employers that you’re able to find and scrape your own data as well. Plus, knowing how to scrape web data means you can find and use data sets that match your interests, regardless of whether or not they’ve already been compiled.
If you know some Python, you can use tools like Beautiful Soup or Scrapy to crawl the web for interesting data. If you don’t know how to code, don’t worry. You’ll also find several tools that automate the process (many offer a free trial), like Octoparse or ParseHub.
If you’re unsure where to start, here are some websites with interesting data options to inspire your project:
- Wikipedia
- Job portals
2. Data cleaning
A significant part of your role as a data analyst is cleaning data to make it ready to analyze. Data cleaning (also called data scrubbing) is the process of removing incorrect and duplicate data, managing any holes in the data, and making sure the formatting of data is consistent.
As you look for a data set to practice cleaning, look for one that includes multiple files gathered from multiple sources without much curation. Some sites where you can find “dirty” data sets to work with include:
- CDC Wonder
- Data.gov
- World Bank
- Data.world
- /r/datasets
Example data cleaning project: This Medium article outlines how data analyst Raahim Khan cleaned a set of daily-updated statistics on trending YouTube videos.
3. Exploratory data analysis (EDA)
Data analysis is all about answering questions with data. Exploratory data analysis, or EDA for short, helps you explore what questions to ask. This could be done separate from or in conjunction with data cleaning. Either way, you’ll want to accomplish the following during these early investigations.
- Ask lots of questions about the data.
- Discover the underlying structure of the data.
- Look for trends, patterns, and anomalies in the data.
- Test hypotheses and validate assumptions about the data.
- Think about what problems you could potentially solve with the data.
Example exploratory data analysis project: This data analyst took an existing dataset on American universities in 2013 from Kaggle and used it to explore what makes students prefer one university over another.
10 free public datasets for EDA
An EDA project is an excellent time to take advantage of the wealth of public datasets available online. Here are 10 fun and free datasets to get you started in your explorations.
1. National Centers for Environmental Information: Dig into the world’s largest provider of weather and climate data.
2. World Happiness Report 2021: What makes the world’s happiest countries so happy?
3. NASA: If you’re interested in space and earth science, see what you can find among the tens of thousands of public datasets made available by NASA.
4. US Census: Learn more about the people and economy of the United States with the latest census data from 2020.
5. FBI Crime Data Explorer (CDE): Explore crime data collected by more than 18,000 law enforcement agencies.
6. World Health Organization COVID-19 Dashboard: Track the latest coronavirus numbers by country or WHO region.
7. Latest Netflix Data: This Kaggle dataset (updated in April 2021) includes movie data broken down into 26 attributes.
8. Google Books Ngram: Download the raw data from the Google Books Ngram to explore phrase trends in books published from 1960 to 2015.
9. NYC Open Data: Discover New York City through its many publicly available datasets on topics like the Central Park squirrel population to motor vehicle collisions.
10. Yelp Open Dataset: See what you can find while exploring this collection of Yelp user reviews, check ins, and business attributes.
4. Sentiment analysis
Sentiment analysis, typically performed on textual data, is a technique in natural language processing (NLP) for determining whether data is neutral, positive, or negative. It may also be used to detect a particular emotion based on a list of words and their corresponding emotions (known as a lexicon).
This type of analysis works well with public review sites and social media platforms, where people are likely to offer public opinions on various subjects.
To get started exploring what people feel about a certain topic, you can start with sites like:
- Amazon (product reviews)
- Rotten Tomato (movie reviews)
- News sites
Example sentiment analysis project: This blog post on Towards Data Science explores the use of linguistic markers in Tweets to help diagnose depression.
5. Data visualization
Humans are visual creatures. This makes data visualization a powerful tool for transforming data into a compelling story to encourage action. Great visualizations are not only fun to create, they also have the power to make your portfolio look beautiful.
Example data visualization project: Data analyst Hannah Yan Han visualizes the skill level required for 60 different sports to find out which is toughest.
Five free data visualization tools
You don’t need to pay for advanced visualization software to start creating stellar visuals either. These are just a few of the free visualization tools you can use to start telling a story with data:
1. Tableau Public: Tableau ranks among the most popular visualization tools. Use the free version to transform spreadsheets or files into interactive visualizations (here are some examples from April 2021).
2. Google Charts: This gallery of interactive charts and data visualization tools makes it easy to embed visualizations within your portfolio using HTML and JavaScript code. A robust Guides section walks you through the creation process.
3. Datawrapper: Copy and paste your data from a spreadsheet or upload a CSV file to generate charts, maps, or tables—no coding required. The free version allows you to create unlimited visualizations to export as PNG files.
4. D3 (Data-Driven Documents): With a bit of technical know-how, you can do a ton with this JavaScript library.
5. RAW Graphs: This open source web app makes it easy to turn spreadsheets or CSV files into a range of chart types that might otherwise be difficult to produce. The app even provides sample data sets for you to experiment with.
Three data analysis projects you can complete today
There’s a lot of data out there, and a lot you can do with it. Trying to figure out where to start can be overwhelming. If you need a little direction for your next project, consider one of these data analysis Guided Projects on Coursera that you can complete in under two hours. Each includes split-screen video instruction, and you don’t have to download or own any special software.
1. Exploratory Data Analysis with Python and Pandas: Apply EDA techniques to any table of data using Python.
2. Twitter Sentiment Analysis Tutorial: Clean thousands of tweets and use them to predict whether a customer is happy or not.
3. COVID19 Data Visualization Using Python: Visualize the global spread of COVID-19 using Python, Plotly, and a real data set.
Video: Staying objective
Contextualizing data is important because it allows us to understand the meaning of the data. For example, knowing the date and time that the data was collected can help us to understand why certain trends are occurring. Additionally, knowing who collected the data and how it was collected can help us to identify any potential biases in the data.
Bias in data can occur when the data is collected in a way that is not representative of the population as a whole. For example, if a survey is only given to people who are already interested in a particular topic, the results of the survey will be biased towards that topic.
To avoid bias in data, it is important to start with an accurate representation of the population and to collect the data in the most appropriate and objective way possible.
Here are some tips for contextualizing data:
- Consider the who, what, where, when, how, and why of the data.
- Who collected the data?
- What is it about?
- What does the data represent in the world, and how does it relate to other data?
- When was the data collected?
- Where was the data collected?
- How was the data collected?
- Why was the data collected?
By asking yourself these questions, you can better understand the meaning of the data and identify any potential biases.
What is objectivity in data analytics?
Objectivity in data analytics is the ability to interpret data without letting personal biases or prejudices influence the results. This means that data analysts should be aware of their own biases and take steps to mitigate them.
Why is objectivity important in data analytics?
Objectivity is important in data analytics because it helps to ensure that the results are accurate and reliable. When data analysts are not objective, they may unknowingly introduce bias into the results, which can lead to incorrect conclusions.
How to stay objective in data analytics
There are a number of things that data analysts can do to stay objective in their work, including:
- Be aware of your own biases. The first step to staying objective is to be aware of your own biases. This means being aware of your own personal beliefs, values, and experiences, and how they might influence your interpretation of data.
- Consider multiple perspectives. When analyzing data, it is important to consider multiple perspectives. This means looking at the data from different angles and considering different interpretations.
- Use statistical methods to minimize bias. There are a number of statistical methods that can be used to minimize bias in data analysis. These methods can help to ensure that the results are not influenced by the data analyst’s own biases.
- Get feedback from others. It is helpful to get feedback from others on your data analysis. This can help you to identify any potential biases in your work.
Conclusion
Objectivity is an important quality for data analysts to have. By following the tips above, data analysts can help to ensure that their work is accurate and reliable.
Here are some additional tips for staying objective in data analytics:
- Use a variety of data sources. This will help to reduce the risk of bias from any single source.
- Be transparent about your methods. This will allow others to see how you analyzed the data and to identify any potential biases.
- Be open to feedback. Be willing to listen to feedback from others and to make changes to your analysis if necessary.
A data analyst considers who, what, when, where, why, and how in order to achieve what goal?
To put information into context
A data analyst asks who, what, when, where, why, and how in order to put information into context.
Welcome back. In this video, we’ll explore the importance
of contextualizing data, and recognizing data
bias. Let’s get started. Data doesn’t live in a
vacuum, it needs context. Earlier, we learnt
that context is the condition in which
something exists or happens. Actions can be appropriate
in some context, but inappropriate in others, for example, yelling move, is rude one context, if your friend is standing
in front of the TV, but it’s entirely
appropriate in another, if that friend is about to get hit by a kid on a tricycle. Do you see the difference? In the world of data, numbers don’t mean
much without context. I’ll let my fellow Googler Ed, tell you a little
bit more about that As we have more and more
data available to us. We can leverage that data in increasingly
sophisticated ways, and generate more powerful
insights from it. We use data at many
different levels. Sometimes our data
is descriptive, answering questions like, how much did we spend on
travel last month? Data becomes more valuable, as we generate diagnostic
and predictive insights, like understanding why travel
spend increased last month. Data is most valuable, however, when we can generate
prescriptive insights. For example, how can we leverage data to incentivize
more efficient travel? Figuring out what data means, is just as important
as collecting it. As a data analyst, a big part of your job, is putting data into context. It’s also up to you, to remain objective and recognize all sides of an argument,
before drawing conclusions. The thing about context, is that it’s very personal. If two people curate
the same data set, and follow the same directions, there’s a chance they will end
up with different results. Why? Because there is no universal set of
contextual interpretations. Everyone approaches
it in their own way. Even if the data collection
process is correct, the analysis can still
be misinterpreted. Conclusions can be influenced by your own conscious and
subconscious biases, which are based on cultural, social and market norms. For example, if you
ask a Boston resident, which baseball team is the best, chances are, they’re going
to say Boston Red Sox. Which brings us to a major
limitation of data analytics. If the analysis is not objective, the conclusions
can be misleading. To really understand
what the data is about, you have to think
through who, what, where, when, how and why. It’s good to ask
yourself questions like, who collected the data? And what is it about? What does the data
represent in the world, and how does it
relate to other data? When, was the data collected? Data collected awhile ago may have certain limitations, given the present day situation. For example, if we collected phone numbers over the past
century, at some point, mobile phones would
have been introduced, leading to the need for an
additional phone number field. You should also think about, where, was the data collected? A lot can change across cities, states and countries, and
how was it collected. A survey might not
be as effective as an in-person
interview, for example. Of course, there’s the, why. The why can have a particularly strong
relationship with bias. Why? Because sometimes,
data is collected, or even made up, to
serve an agenda. The best thing you can do for the fairness and
accuracy of your data, is to make sure you start with an accurate representation
of the population, and collect the data in the most appropriate, and objective way. Then, you’ll have the facts so you can pass on to your team. Hopefully you now understand the importance of fair
and objective data, and how important a context is, when it comes to understanding
and interpreting it. Next up, we’ll figure out
how we can bring it to life.
Reading: The importance of context
Overview
Context is the condition in which something exists or happens. Context is important in data analytics because it helps you sift through huge amounts of disorganized data and turn it into something meaningful. The fact is, data has little value if it is not paired with context.
Understanding the context behind the data can help us make it more meaningful at every stage of the data analysis process. For example, you might be able to make a few guesses about what you’re looking at in the following table, but you couldn’t be certain without more context.
2010 | 28000 |
2005 | 18000 |
2000 | 23000 |
1995 | 10000 |
On the other hand, if the first column was labeled to represent the years when a survey was conducted, and the second column showed the number of people who responded to that survey, then the table would start to make a lot more sense. Take this a step further, and you might notice that the survey is conducted every 5 years. This added context helps you understand why there are five-year gaps in the table.
Years (Collected every 5 years) | Respondents |
---|---|
2010 | 28000 |
2005 | 18000 |
2000 | 23000 |
1995 | 10000 |
Context can turn raw data into meaningful information. It is very important for data analysts to contextualize their data. This means giving the data perspective by defining it. To do this, you need to identify:
- Who: The person or organization that created, collected, and/or funded the data collection
- What: The things in the world that data could have an impact on
- Where: The origin of the data
- When: The time when the data was created or collected
- Why: The motivation behind the creation or collection
- How: The method used to create or collect it
Understanding and including the context is important during each step of your analysis process, so it is a good idea to get comfortable with it early in your career. For example, when you collect data, you’ll also want to ask questions about the context to make sure that you understand the business and business process. During organization, the context is important for your naming conventions, how you choose to show relationships between variables, and what you choose to keep or leave out. And finally, when you present, it is important to include contextual information so that your stakeholders understand your analysis.
Reading: Learning Log: Define problems and ask questions with data
Reading
Overview
In a previous learning log, you reflected on what you learned from the SMART questions you asked during your real life data conversation. Now, you’ll complete an entry in your learning log using notes about your data conversation to explain your initial insights to potential stakeholders. By the time you complete this entry, you will have a stronger understanding of how you might use data to define problems and what information is useful for stakeholders at this stage. This will help you develop formal documents like a scope of work (SOW) as a data analyst in the future.
Summarize your findings
As a data analyst, part of your job is to communicate the data analysis process and your insights to stakeholders. This often involves defining the problem and summarizing key questions and available data early on. You might include this information in a formal document for stakeholders like a scope of work (SOW) at the beginning of a project. As a reminder, an SOW is an agreed-upon outline of the tasks to be performed during a project; it is important to ensure your stakeholders understand this key information at that stage.
Before you start your learning log entry, take a moment to review your notes and your reflection for Learning Log: Ask SMART questions about real life data. Imagine that you are going to design a data analysis project based on this data conversation.
In the learning log template linked below, you will create a summary of key information you think a stakeholder would need to know about this project. In this case, your stakeholder could be a member of the executive team, like a project manager. Here are some questions to help you get started:
- What is the problem?
- Can it be solved with data? If so, what data?
- Where is this data? Does it exist, or do you need to collect it?
- Are you using private data that someone will need to give you access to, or publicly available data?
- Who are the relevant sponsors and stakeholders for this project? Who is involved, and how?
- What are the boundaries for your project? What do you consider “in-scope?” What do you consider “out-of-scope?”
- Is there any other information you think is relevant to the project?
- Is there any information you need or questions you need answered before you can begin?
As you think about these questions, it’s likely you’ll discover that you don’t have all the information you need. This is part of the process!
When kicking off data analysis projects, expect to have a lot of conversations. By identifying what you know and what you don’t know, it makes it much easier to plan your next data conversation, so that you can get the answers you need.
Reflection
Now that you have started identifying which information would be useful for a potential stakeholder, write 5-7 sentences (100-140 words) summarizing the key questions, the data available, and the answers or insights you have gained so far in your learning log template.
When you’ve finished your entry in the learning log template, make sure to save the document so your response is somewhere accessible. This will help you continue applying data analysis to your everyday life. You will also be able to track your progress and growth as a data analyst.
Practice Quiz: Test your knowledge on structured thinking
What are the key elements of structured thinking? Select all that apply.
- Revealing gaps and opportunities in order to identify the options
- Recognizing the current problem or situation
- Organizing available information
Structured thinking is the process of recognizing the current problem or situation, organizing available information, revealing gaps and opportunities, and identifying the options.
Fill in the blank: A scope of work is an agreed-upon _____ of the work you’re going to perform on a project.
outline
A scope of work is an agreed-upon outline of the work you’re going to perform on a project.
What are some strategies to ensure your data is accurate and fair? Select all that apply.
- Make sure you start with an accurate representation of the population in the sample
- Think through the “who, what, where, when, how, and why” of your data
- Collect the data in an objective way
To ensure your data is accurate and fair, make sure you start with an accurate representation of the population in the sample; collect the data in an objective way; and ask questions about the data.
Weekly challenge 3
Reading: Glossary: Terms and definitions
Quiz: *Weekly challenge 3*
Which of the following are examples of expressions? Select all that apply.
1+2+3
7* 3
Which of the following are good practices when working with data in a spreadsheet? Select all that apply.
Create a folder on your computer specifically for spreadsheets
Add data labels or attributes at the beginning of each row
Choose a short and clear title that states exactly what the data in the spreadsheet is about
A data analyst could use spreadsheets to achieve which of the following tasks?
Predict next quarter’s sales
Which of the following statements accurately describe formulas and functions? Select all that apply.
Functions are preset commands that perform calculations.
Formulas and functions assist data analysts in calculations, both simple and complex.
Formulas are instructions that perform specific calculations.
In the function =MAX(B5:B15), what does B5:B15 represent?
Range
What is the correct spreadsheet formula for multiplying cell K3 times cell K8?
=K3*K8
Fill in the blank: By negatively influencing data collection, ____ can have a detrimental effect on analysis.
bias
In data analytics, the structured thinking process includes recognizing the current problem and organizing the available information. What are the additional aspects of this process? Select all that apply.
Identifying the relevant options
Revealing gaps and opportunities
Communicating with stakeholders