Skip to content
Home » Google Career Certificates » Google Data Analytics Professional Certificate » Course 3: Prepare Data for Exploration » Week 3: Databases: Where data lives

Week 3: Databases: Where data lives

When you’re analyzing data, you’ll access much of the data from a database. It’s where data lives. In this part of the course, you’ll learn all about databases, including how to access them and extract, filter, and sort the data they contain. You’ll also check out metadata to discover the different types and how analysts use them.

Learning Objectives

  • Describe databases with references to their functions and components
  • Explain metadata as it relates to databases
  • Discuss the importance of metadata and how it relates to the work of a data analyst
  • Demonstrate an understanding of the issues and steps involved in accessing data from multiple sources
  • Explain the use of filters and sorting functionality in spreadsheets
  • Demonstrate an understanding of how to use spreadsheet functionality to import and inspect a given set of data
  • Demonstrate an understanding of how to use SQL functions to extract data from a database
Table Of Contents
  1. Working with databases
  2. Managing data with metadata
  3. Accessing different data sources
  4. Sorting and filtering
  5. Working with large datasets in SQL

Working with databases


Video: All about databases

This course will teach you about databases and metadata. You will learn how to store, sort, and analyze data in databases. You will also learn how to use metadata to understand and interpret data. Finally, you will learn how to import data from databases into spreadsheets.

This course is an important part of the data analysis process. By learning about databases and metadata, you will be able to prepare your data for analysis and solve business problems more effectively.

All about databases in data analysis

Databases are an essential tool for data analysts. They provide a way to store, organize, and analyze large amounts of data. Databases can be used to store data from a variety of sources, such as surveys, customer transactions, and social media posts.

There are many different types of databases, each with its own strengths and weaknesses. Some popular types of databases include:

  • Relational databases: Relational databases are the most common type of database. They store data in tables, which are made up of rows and columns. Relational databases are good for storing and organizing structured data.
  • NoSQL databases: NoSQL databases are a newer type of database that are designed to store and organize unstructured data. Unstructured data is data that does not fit into a traditional relational database schema. NoSQL databases are often used for storing data from social media, sensor networks, and other sources of unstructured data.
  • In-memory databases: In-memory databases store data in RAM, which makes them very fast for querying. In-memory databases are often used for real-time analytics and applications.

Benefits of using databases for data analysis

There are many benefits to using databases for data analysis, including:

  • Data storage: Databases provide a way to store large amounts of data in a structured and organized manner. This makes it easy to find and access the data you need for your analysis.
  • Data integrity: Databases can help to ensure the integrity of your data. This means that the data is accurate and complete.
  • Data security: Databases can help to protect your data from unauthorized access.
  • Data analysis: Databases provide a platform for running complex data analysis queries. This allows you to extract insights from your data that would be difficult to obtain otherwise.

How to use databases for data analysis

To use databases for data analysis, you will need to:

  1. Choose the right database type: Consider the type of data you need to store and the type of analysis you want to perform when choosing a database type.
  2. Create a database schema: A database schema is a blueprint for your database. It defines the tables, columns, and relationships between the tables in your database.
  3. Load data into the database: You can load data into the database from a variety of sources, such as CSV files, Excel spreadsheets, and other databases.
  4. Write SQL queries: SQL is a programming language that is used to query and manipulate data in databases. You can use SQL to write queries to extract data from the database, perform calculations on the data, and group and summarize the data.
  5. Analyze the data: Once you have extracted the data from the database, you can use statistical software or other tools to analyze the data and extract insights.

Conclusion

Databases are a powerful tool for data analysis. By understanding how to use databases, you can gain insights from your data that would be difficult to obtain otherwise.

Hello again. So far, you’ve seen how
data can be gathered and analyzed to solve all
kinds of problems. Next step, we’re going to learn all
about databases. As a refresher, a database is a collection of data
stored in a computer system, but storage is just the beginning. You’ll discover how databases
make it possible to find the exact piece of information
you need for your analysis. You’ll also learn how to sort data in
order to zoom in on what you need to generate insightful reports
and much more. Then we’ll go even deeper, and
I mean really, really deep. I’m talking about metadata. You’ve probably heard someone say,
wow that’s so meta. Usually they’re talking about something
referencing back to itself or being completely self aware. For example if a character in a book
knows she’s in a book, that’s meta. If you make a documentary about making
documentaries, that’s also meta. And here at Google, I constantly
analyze how I analyze data. That’s definitely meta. I do that to give my work a quality
check to make sure my methods are fair. And to be certain that I’m paying
attention to any biases that might affect the outcome. As an analyst, you should do this too. Sometimes we get a little
too close to our data. So stepping back and asking ourselves
if our processes make sense is key. But let’s back up just a bit and
define metadata. Metadata is data about data. Like I said: deep. Metadata is extremely important
when working with databases. Think of it like a reference guide. Without the guide all you have is a bunch of data with
no context explaining what it means. Metadata tells you where the data comes
from, when and how it was created, and what it’s all about. Up next, you’ll learn how to take data
from a database or another source and bring it into a spreadsheet. You’ll do this either by
importing it directly or by using SQL to generate the request. And once you have data in a spreadsheet,
the possibilities are endless. Everything we’re about to cover is a very
important part of the prepare phase of the data analysis process. It’s how data analysts figure out
which kind of data is going to be most helpful to them. If you have the right data, you’re much more likely to be able to
solve your business problems successfully. So, ready to tap into
the incredible power of databases? Let’s go!

Video: Database features

This video teaches about databases, including relational databases, primary keys, and foreign keys.

A relational database is a database that contains a series of related tables that can be connected via their relationships.

A primary key is an identifier that references a column in which each value is unique. It uniquely identifies a record in a relational database table.

A foreign key is a field within a table that’s a primary key in another table. It’s how one table can be connected to another.

A table can only have one primary key but it can have multiple foreign keys.

Primary and foreign keys are important for organizing data in databases and making it easy to access and analyze.

Database features in data analysis

Databases are essential tools for data analysts. They provide a way to store, organize, and analyze large amounts of data. Databases can be used to store data from a variety of sources, such as surveys, customer transactions, and social media posts.

Databases offer a number of features that make them ideal for data analysis, including:

  • Data storage: Databases can store large amounts of data in a structured and organized manner. This makes it easy to find and access the data you need for your analysis.
  • Data integrity: Databases can help to ensure the integrity of your data. This means that the data is accurate and complete.
  • Data security: Databases can help to protect your data from unauthorized access.
  • Data analysis: Databases provide a platform for running complex data analysis queries. This allows you to extract insights from your data that would be difficult to obtain otherwise.

Here are some specific database features that are useful for data analysis:

  • Indexes: Indexes allow you to quickly search for data in a database. This is especially useful for large databases.
  • Views: Views allow you to create custom views of your data. This can be helpful for simplifying complex queries or for hiding sensitive data.
  • Triggers: Triggers allow you to automatically perform actions when data is changed in a database. This can be helpful for auditing data changes or for maintaining data integrity.
  • Stored procedures: Stored procedures are pre-written SQL queries that can be reused. This can save time and improve the performance of your queries.

By understanding the features of databases, data analysts can use them to store, organize, and analyze data more effectively.

Here are some examples of how data analysts can use database features for data analysis:

  • Use indexes to quickly search for specific data points. For example, an analyst might use an index to quickly find all customers who have purchased a particular product.
  • Use views to simplify complex queries. For example, an analyst might create a view that shows the top 10 customers who have spent the most money in a given month.
  • Use triggers to audit data changes. For example, an analyst might create a trigger that logs all changes to a customer database.
  • Use stored procedures to reuse common queries. For example, an analyst might create a stored procedure that calculates the average customer lifetime value.

By using the features of databases effectively, data analysts can gain valuable insights from their data.

If you create a database table and include a primary key in the table, what must you ensure? Select all that apply

The primary key is unique

The primary key’s value isn’t null or blank

If you create a database table with a primary key, it must be unique and its value must not be null or blank.

A table in a relational database can have only one foreign key.

False

A table in a relational database is allowed to have multiple foreign keys.

Databases are essential tools for
data analysts. I use them constantly. Just about all of the data I
access is stored within databases. Databases store and
organize data, making it much easier for data analysts to manage and
access information. They help us get insights faster, make
data-driven decisions, and solve problems. You’ve already heard a bit
about what databases are and how they’re used by data analysts. Now let’s learn more about
database features and components. Here’s a simple database structure. It contains tables with information
from a car manufacturer. The top level includes car dealerships,
product details, and repair parts. Then if you drill down to the next
level by selecting one of those tables, you’ll find more specific
details about each item. This is called a relational database. A relational database is a database
that contains a series of related tables that can be connected
via their relationships. For two tables to have a relationship,
one or more of the same fields must
exist inside both tables. For example, here, branch ID
exists in this table and this one. If a field exists within both tables, we
can use it to connect the tables together. The branch ID field is the key
to connecting these tables. There are two types of keys. A primary key is an identifier that
references a column in which each value is unique. You can think of it as a unique
identifier for each row in a table. For our dealership table with information
about the different dealership branches, branch ID is the primary key. Similarly, for the product details table
about each car, VIN is our primary key. As an analyst you may
need to create tables. If you do decide to include a primary key,
it should be unique, meaning no two rows can
have the same primary key. Also, it cannot be null or blank. There are also foreign keys. A foreign key is a field within a table
that’s a primary key in another table. In other words, a foreign key is how
one table can be connected to another. Because our repair parts table contains
information about each car part, the primary key is part ID. Each row in our repair parts
table represents one unique part. All the other keys in this table, such as
the VIN, are the foreign keys that allow the repair parts table to be
connected to the other tables. As you can see,
a table can only have one primary key but it can have multiple foreign keys. Understanding primary and
foreign keys can be tricky, so you’ll have more opportunities
to practice coming up. But as a general summary, a primary key is used to ensure data
in a specific column is unique. It uniquely identifies a record
in a relational database table. Only one primary key is
allowed in a table and they cannot contain null or blank values. And a foreign key is a column or group of
columns in a relational database table that provides a link between the data and
two tables. It refers to the field in a table that’s
the primary key of another table. Lastly, it’s important to note that
more than one foreign key is allowed to exist in a table. Feel free to rewatch this video to
be sure you understand primary and foreign keys clearly. And coming up,
you’ll begin practicing how to access and analyze data from actual databases. That will be a great opportunity to
improve your understanding of primary and foreign keys, database organization and how you might use databases in
your future analytics career.

Reading: Databases in data analytics

Reading

Reading: Inspecting a dataset: A guided, hands-on tour

Reading

Practice Quiz: Test your knowledge on working with databases

Fill in the blank: A _____ is an identifier that references a database column in which each value is unique.

Fill in the blank: A relational database contains a series of _____ that can be connected to form relationships.

A key benefit of working with normalized databases is that they help lower data redundancy. Which of the following is an example of redundancy?

Managing data with metadata


Video: Exploring metadata

Metadata is data about data. It is used to describe, organize, and manage data. There are three common types of metadata: descriptive, structural, and administrative.

Descriptive metadata describes a piece of data and can be used to identify it at a later point in time. For example, the descriptive metadata of a book in a library would include the ISBN, author, and title.

Structural metadata indicates how a piece of data is organized and whether it’s part of one or more data collections. For example, the structural metadata of a book would include the chapters and pages.

Administrative metadata indicates the technical source of a digital asset. For example, the administrative metadata of a photo would include the file type, date and time, and camera used.

Metadata is important for data analysts because it helps them interpret the contents of the data within a database. It tells an analyst what the data is all about, which makes it possible to put the data to work solving problems and making data-driven decisions.

Metadata is like the back of a book. It can tell you a lot about the book, but you have to actually read the book to know what it’s all about.

Exploring metadata in data analysis

Metadata is data about data. It is used to describe, organize, and manage data. Metadata can be very useful for data analysts, as it can help them to understand the data they are working with and to identify potential problems.

There are a number of ways to explore metadata in data analysis. One common approach is to use a database management system (DBMS). DBMSs typically store metadata in a separate table, which can be queried using SQL.

Another approach to exploring metadata is to use a data visualization tool. Data visualization tools can be used to create charts and graphs that show the relationships between different pieces of metadata. This can be helpful for identifying patterns and trends in the data.

Here are some specific examples of how data analysts can use metadata:

  • Identifying duplicate records: Metadata can be used to identify duplicate records in a dataset. This can be done by comparing the values of different pieces of metadata, such as the customer ID or the product code.
  • Identifying outliers: Metadata can be used to identify outliers in a dataset. Outliers are data points that are significantly different from the rest of the data. Outliers can be caused by errors in the data or by unusual events.
  • Understanding the relationships between different data points: Metadata can be used to understand the relationships between different data points. This can be done by examining the values of different pieces of metadata, such as the date and time of a transaction or the location of a customer.
  • Enriching the data: Metadata can be used to enrich the data by adding additional information about the data points. For example, the metadata for a customer record might include the customer’s address and phone number.

By exploring metadata, data analysts can gain a deeper understanding of the data they are working with and can identify potential problems. This information can then be used to improve the quality of the data and to make better data-driven decisions.

Here are some tips for exploring metadata in data analysis:

  • Start by identifying the different types of metadata that are available. This will help you to understand what information is available about the data you are working with.
  • Use a variety of tools to explore the metadata. This will help you to get a complete understanding of the data.
  • Look for patterns and trends in the metadata. This can help you to identify potential problems and opportunities.
  • Use the metadata to enrich the data. This can help you to get a deeper understanding of the data and to make better data-driven decisions.

Now that you understand
the different ways to organize data in a database, let’s talk about how you
can describe that data. In this video, we’ll start
exploring metadata, which is a very important aspect
of database management. Metadata is an abstract
concept, though. Let’s kick things off with
a simple, everyday example. Did you know that every time a photo is taken
with a smartphone, data is automatically collected and stored within that photo? Take a look. Choose any
photo on your computer. Here’s a cute shot of my
friend’s dogs, Rudy and Matilda. On your photo, right-click
on “Get Info” or “Properties.” This will give you
the photo’s metadata, which may tell you the
type of file it is; the date and time it was taken; the geolocation, or
where it was taken; what kind of device was used to take
the photo; and much more. Pretty amazing, right? Here’s
another example. Every time you send
or receive an email, metadata is sent right
along with that message. You can find it by clicking on “Show Original” or “View
Message Details.” An email message’s metadata
includes its subject, who it’s from, who it’s to, and the date and
time it was sent. The metadata even
knows how quickly it was delivered after
the sender pressed, “Send.” Metadata is information
that’s used to describe the data that’s
contained in something, like a photo or an email. Keep in mind that metadata
is not the data itself. Instead, it’s data
about the data. In data analytics, metadata
helps data analysts interpret the contents of
the data within a database. That’s why metadata is so important when working
with databases. It tells an analyst what
the data is all about. That makes it possible
to put the data to work solving problems and making
data-driven decisions. As a data analyst, there are three common types of metadata that you’ll come across: descriptive, structural,
and administrative. Descriptive metadata is metadata that describes a piece of data and can be used to identify it at a
later point in time. For instance, the
descriptive metadata of a book in a
library would include the code you see on
its spine, known as a unique International
Standard Book Number, also called the ISBN. It would also include the
book’s author and title. Next is structural metadata, which is metadata that
indicates how a piece of data is organized and
whether it’s part of one or more than
one data collection. Let’s head back to the library. An example of structural data
would be how the pages of a book are put together to
create different chapters. It’s important to note
that structural metadata also keeps track of the
relationship between two things. For example, it can show us
that the digital document of a book manuscript was actually the original version
of a now printed book. Finally, we have
administrative metadata. Administrative metadata
is metadata that indicates the technical
source of a digital asset. When we looked at the
metadata inside the photo, that was administrative metadata. It shows you the
type of file it was, the date and time it was
taken, and much more. Here’s one final thought to
help you understand metadata. If you’re on your way to the
library to pick out a book, you could research
a book’s title, author, length, and
number of chapters. That’s all metadata, and it can tell you a lot
about the book, but you have to actually read the book to know
what it’s all about. Likewise, you can read
about data analytics, but you have to take
this course to earn the Google Data
Analytics certificate. Keep moving forward to
gain that new perspective.

Reading: Metadata is as important as the data itself

Reading

Video: Using metadata as an analyst

Data analysts use metadata to:

  • Put data into context
  • Create a single source of truth
  • Make data more reliable
  • Make it easier and faster to bring together multiple sources for data analysis
  • Confirm that external data is clean, accurate, relevant, and timely
  • Ensure that the right content is pulled for the particular project and used appropriately

Metadata repositories are useful for all of these reasons. They also help ensure that data analysts are pulling the right content for the particular project and using it appropriately.

Metadata is a valuable tool for data analysts, and it can be used to improve the quality, consistency, and reliability of data.

Using metadata as an analyst

Metadata is data about data. It is used to describe, organize, and manage data. Analysts use metadata to understand the data they are working with and to identify potential problems.

Here are some specific ways that analysts can use metadata:

  • Identify duplicate records: Metadata can be used to identify duplicate records in a dataset. This can be done by comparing the values of different pieces of metadata, such as the customer ID or the product code.
  • Identify outliers: Metadata can be used to identify outliers in a dataset. Outliers are data points that are significantly different from the rest of the data. Outliers can be caused by errors in the data or by unusual events.
  • Understanding the relationships between different data points: Metadata can be used to understand the relationships between different data points. This can be done by examining the values of different pieces of metadata, such as the date and time of a transaction or the location of a customer.
  • Enriching the data: Metadata can be used to enrich the data by adding additional information about the data points. For example, the metadata for a customer record might include the customer’s address and phone number.
  • Improving the quality of the data: Metadata can be used to improve the quality of the data by identifying and correcting errors. For example, if the metadata for a customer record indicates that the customer’s address is in California, but the customer’s zip code is in Texas, then the analyst can correct the error.

Analysts can use metadata in a variety of ways to improve their analysis. For example, an analyst might use metadata to identify the most important customers for a company. This could be done by examining the metadata for customer records to see which customers have spent the most money or which customers have made the most purchases.

Metadata can also be used to identify trends and patterns in the data. For example, an analyst might use metadata to see which products are selling the best in different regions. This information could then be used to make better decisions about where to allocate marketing resources.

Overall, metadata is a valuable tool for analysts. It can be used to improve the quality, consistency, and reliability of the data, as well as to identify trends and patterns in the data.

Here are some tips for using metadata as an analyst:

  • Start by identifying the different types of metadata that are available. This will help you to understand what information is available about the data you are working with.
  • Use a variety of tools to explore the metadata. This will help you to get a complete understanding of the data.
  • Look for patterns and trends in the metadata. This can help you to identify potential problems and opportunities.
  • Use the metadata to enrich the data. This can help you to get a deeper understanding of the data and to make better data-driven decisions.

By following these tips, analysts can use metadata to improve their analysis and make better decisions.

What can a data analyst achieve more easily with a metadata repository? Select all that apply.
  • Confirm how or when data was collected
  • Verify that data from an outside source is being used appropriately
  • Bring together multiple sources of data

Using a metadata repository, a data analyst can find it easier to bring together multiple sources of data, confirm how or when data was collected, and verify that data from an outside source is being used appropriately.

Now that you know
what metadata is, it’s time to explore why
data analysts use it. You already know that
data needs to be identified and described before it can help you solve a problem or make an effective
business decision. Putting data into
context is probably the most valuable thing
that metadata does, but there are still many more
benefits of using metadata. Here’s one. Metadata creates a
single source of truth by keeping things
consistent and uniform. We data analysts
love consistency. We always aim for this kind of uniformity in our data
and our databases. After all, data that’s
uniform can be organized, classified, stored, accessed,
and used effectively. Plus, when a database
is consistent, it’s so much easier to discover relationships between the data inside it and the data elsewhere. Metadata also makes data
more reliable by making sure it’s accurate,
precise, relevant, and timely. This also
makes it easier for data analysts to identify the root causes of any
problems that might pop up. The bottom line is, when the data we work with is high quality, it makes things easier
and improves our results. One of the ways
data analysts make sure their data is consistent and reliable is by using something called a
metadata repository. A metadata repository is a database specifically
created to store metadata. Metadata repositories
can be stored in a physical location, or
they can be virtual, like data that
exists in the cloud. These repositories describe
where metadata came from, keep it in an
accessible form so it can be used quickly and easily, and keep it in a
common structure for everyone who
may need to use it. Metadata repositories
make it easier and faster to bring together multiple
sources for data analysis. They do this by describing the state and location
of the metadata, the structure of
the tables inside, and how data flows
through the repository. They even keep track of who accesses the metadata and when. Here’s a real-world example. As a health care
analyst at Google, I use second and
third party data. As you learned, second party
data is data that’s collected by a group directly from
its audience and then sold. Third party data comes
from outside sources, which are not the original
collectors of that data. They get it from websites or
programs that pull the data from the various platforms where it was
originally generated. It’s a bit complex, but the main thing
to remember is that third party data doesn’t come from inside your own business. If my team needs to work with data that wasn’t
created at Google, that means we sometimes
don’t know very much about its quality
and credibility, but we need to be certain
that our data can be trusted and was
collected responsibly. After all, if the data is unreliable, our results
can be unreliable too. That’s why understanding
the metadata of the external database
is so important. It lets us confirm that
the data is clean, accurate, relevant, and timely. This is particularly important if the data comes from
another organization. One other important
step when working with external data is confirming
that we’re allowed to use it. We’ll often reach out
to the owner to make sure we can access
or purchase it. To sum up, metadata repositories are useful for all these reasons. Plus, they help ensure
that my team is pulling the right content for the particular project and
using it appropriately. We can confirm this because
the metadata clearly describes how and when
the data was collected, how it’s organized,
and much more. Soon you’ll learn
even more about using metadata in data analytics, and if you’re finding metadata
particularly fascinating, you’ll discover some really
exciting career choices that focus on
metadata. Stay tuned.

Video: Metadata management

Metadata and metadata repositories are powerful tools for data analysts. They can be used to create a single source of truth, keep data consistent and uniform, and ensure that the data is accurate, precise, relevant, and timely. Metadata is also important for data governance, which is the process of ensuring the formal management of a company’s data assets.

Metadata analysts are responsible for organizing and maintaining company data, ensuring that it’s of the highest possible quality. They create basic metadata identification and discovery information, describe the way different data sets work together, and explain the many different types of data resources. Metadata analysts also create standards that everyone follows and the models used to organize the data.

Metadata analysts play an important role in helping businesses to understand their data and make better decisions. They are passionate about making data accessible and sharing it with colleagues and other stakeholders.

If you are interested in a career in data analytics, becoming a metadata analyst may be a good option for you.

Metadata management in data analysis

Metadata management is the process of organizing, storing, and maintaining metadata. Metadata is data about data, and it can be used to describe, understand, and use data more effectively.

Metadata management is important for data analysis because it can help to improve the quality, consistency, and reliability of data. It can also help to make data more accessible and easier to use.

There are a number of different ways to manage metadata. One common approach is to use a metadata repository. A metadata repository is a database that is specifically designed to store and manage metadata. Metadata repositories can be used to store a variety of metadata, including data definitions, data quality rules, and data lineage.

Another approach to metadata management is to use a metadata management tool. A metadata management tool is a software application that can be used to automate the process of managing metadata. Metadata management tools can be used to create, update, and delete metadata, as well as to generate reports on metadata.

Metadata management is an important part of data analysis because it can help to improve the quality, consistency, and reliability of data. It can also help to make data more accessible and easier to use.

Here are some specific benefits of metadata management for data analysis:

  • Improved data quality: Metadata can be used to define data quality standards and to track the quality of data over time. This can help to identify and correct data errors.
  • Increased data consistency: Metadata can be used to define data standards and to ensure that data is consistently formatted and organized. This can make it easier to analyze data and to compare data from different sources.
  • Enhanced data accessibility: Metadata can be used to create indexes and search capabilities for data. This can make it easier to find and access the data that you need for analysis.
  • Improved data sharing: Metadata can be used to describe data in a way that is understandable to non-technical users. This can make it easier to share data with other stakeholders and to collaborate on data analysis projects.

Metadata management is an essential part of any data analysis process. By carefully managing metadata, you can improve the quality, consistency, and accessibility of your data. This will make it easier to get the insights you need to make better decisions.

Here are some tips for managing metadata for data analysis:

  • Develop a metadata management plan: This plan should define the goals of your metadata management efforts, as well as the processes and tools that you will use.
  • Identify the metadata that you need: This will depend on the specific needs of your data analysis projects.
  • Collect and store the metadata: This can be done in a metadata repository or in a metadata management tool.
  • Maintain the metadata: This includes updating the metadata as needed and ensuring that it is accurate and up-to-date.
  • Make the metadata accessible: This can be done by creating indexes and search capabilities, or by publishing the metadata in a documentation format.

By following these tips, you can effectively manage metadata for data analysis and get the most out of your data.

Metadata and metadata
repositories are very powerful tools in
the data analyst toolbox. As we discussed previously, data analysts use them to create a single source of truth, keep data consistent and uniform, and ensure that the data
we work with is accurate, precise, relevant, and timely. These tools also make
it easier to access and use data by
standardizing our processes. In this video, we’ll explore
more components of metadata and learn how metadata analysts work to keep things organized. We know that the amount of data out there continues to grow, but lots of businesses just
aren’t using their data. Sometimes, they don’t
know what they have, sometimes they can’t
find it or sometimes a business just doesn’t trust it. Especially in
bigger companies, data can span numerous
different processes and systems. And pulling together data from so many places can
be a big challenge. For example, let’s say a
company starts out with a traditional data storage
system in its offices. But then, as the amount of data it owns continues to expand, cloud storage is needed too. Plus, this company could
also be accessing and using second or third party data
from a partner organization. Each of these systems has its own rules and
requirements, so each organizes the data in a completely different way,
adding even more complexity. It’s no wonder so
many organizations struggle to find the right
data at the right moment. On the other hand,
metadata is stored in a single, central location
and it gives the company standardized information
about all of its data. This is done in two ways. First, metadata
includes information about where each system is located and where the data sets are located
within those systems. Second, the metadata
describes how all of the data is connected
between the various systems. Another important aspect of metadata is something
called data governance. Data governance is
a process to ensure the formal management of
a company’s data assets. This gives an organization better control of their data and helps a company manage issues related to data
security and privacy, integrity, usability, and internal and
external data flows. It’s important to note that
data governance is about more than just standardizing
terminology and procedures. It’s about the roles
and responsibilities of the people who work with
the metadata every day. These are metadata
specialists, and they organize and maintain
company data, ensuring that it’s of the
highest possible quality. These people create basic
metadata identification and discovery information, describe the way different
data sets work together, and explain the many different
types of data resources. Metadata specialists also
create very important standards that everyone follows and the models used to
organize the data. There’s one thing they
all have in common. Whether they work
at a tech company, a nonprofit association,
or a financial institution, metadata analysts are
great team players. They’re passionate
about making data accessible by sharing with colleagues and
other stakeholders. If you’re looking for a role that encourages you to explore all the data that the
digital world has to offer, following the path to becoming a metadata analyst may be
the right choice for you. But either way, businesses of all kinds face market trends and competition, and they
need to understand why one process works
while another doesn’t. Data analytics allows
them to answer key questions and keep improving.

Video: Megan: Fun with metadata

Megan is an agency measurement lead at Google. She helps advertising agencies understand and use measurement and analytics to improve their media plans. She has seen a lot of change in the field in the past 17 years, as data availability and modeling techniques have become more advanced and accessible.

Metadata is the key to understanding a data set. It describes what is in the rows and columns of the data, and can be helpful in understanding a single data set or in collaborating with others on an analytics project.

Megan gives an example of how she used metadata to help an advertiser build a data lake. By understanding the different data sources and what information they contained, she was able to quickly and easily identify the basic constructs that the advertiser needed to focus on.

Megan also emphasizes the importance of making measurement and analytics accessible to people who may not have experience with them. By helping people understand how measurement and analytics can help them achieve their goals, she can make them more comfortable using these tools.

Overall, Megan’s message is that metadata is an essential tool for understanding and using data effectively. It can help people make better decisions about their media plans and achieve their goals.

My name is Megan, and I am an agency
measurement lead here at Google. Basically, I help to demystify measurement
and analytics for advertising agencies. So people that are tasked with
executing media plans for advertisers but also people that are interested in
measuring the impact that media is having for their clients. So I’ve been doing this for
about 17 years now and have seen a lot of evolution in the space
from data availability, from different modeling techniques becoming more
advanced but also more accessible, and it’s just been a really cool journey
to see how it’s evolved, how analytics has become more mainstream, and how people
are getting more excited about it. Metadata is basically the key
to your larger data set. It helps describe what’s in the rows and the columns of the data that
you’ll be working with. Metadata is kind of a shorthand or a CliffsNotes version of a much
more complex set of information. It can be helpful in just kind of helping
you get a handle on what’s in a single data set that you may have access to. It’s an important part of the discovery
process of any analytics project as you’re working with either a client or a vendor
to understand the resources that you’ll have to address a problem and
what might be missing. It just gives you the keys to unlock
that data in a really simple and straightforward way and is a great communication tool. When I was working for an advertiser, one of the things that we were trying to
do was build something called a data lake. So essentially, this is bringing together
all of the sources of data that you might want to use in an analysis into one
place, which can be really, really tricky. One of the benefits of metadata was
figuring out where we had sources that may overlap, where we had data
sources that had things in common. And what the unique pieces of
information were that we were getting from each of those data sets. So as we thought about tackling this
really huge and important project, we were able to use
metadata to quickly and easily get to the basic constructs
that we were trying to tackle. When you’re working with people who maybe
don’t have analytics as their day job, getting that “aha” moment, helping them understand
how measurement and analytics are tools that can help them
achieve their goals, is really important. And just getting to that idea of you
made something that was previously inaccessible a little bit more
accessible for that team and something they feel comfortable
putting into practice is really important and really kind of
a great way to come out of a partnership. [SOUND]

Practice Quiz: Test your knowledge on metadata

A large company has several data collections across its many departments. What kind of metadata indicates exactly how many collections a piece of data lives in?

The date and time a photo was taken is an example of which kind of metadata?

A large metropolitan high school gives each of its students an ID number to differentiate them in its database. What kind of metadata are the ID numbers?

A company needs to merge third-party data with its own data. Which of the following actions will help make this process successful? Select all that apply.

Accessing different data sources


Video: Working with more data sources

This video discusses the different places where data analysts go to connect with data. There are two basic types of data: internal and external. Internal data is data that lives within a company’s own systems, while external data comes from outside the company.

Internal data can be gathered from a variety of sources within a company, including sales, marketing, customer relationship management, finance, human resources, and data archives. Internal data is often free to access because the company already owns it.

External data can be obtained from a variety of sources, including other businesses, government sources, the media, professional associations, schools, and open data initiatives. Open data initiatives make data sets available to the public for free.

Data analysts can use internal and external data to create deeper analyses and add more perspective to their work. For example, a healthcare analyst might partner with other healthcare organizations or nonprofits to use their data to create a more comprehensive analysis of the healthcare industry.

The video also discusses how to import data from different sources into a spreadsheet. This is an important skill for data analysts to have, as it allows them to combine data from different sources into a single dataset for analysis.

Overall, this video provides a good overview of the different types of data that data analysts use and how to access them.

Working with more data sources in data analysis

Working with more data sources can be a challenge, but it can also be very rewarding. By combining data from multiple sources, you can gain a deeper understanding of your subject matter and produce more accurate and insightful analyses.

Here are some tips for working with more data sources in data analysis:

  • Identify the data sources that you need. What data do you need to answer your research question? Once you know what data you need, you can start to identify the sources where you can find it.
  • Gather the data from the different sources. This may involve downloading the data from a website, extracting it from a database, or converting it from one format to another.
  • Clean and prepare the data. Once you have gathered the data from the different sources, you need to clean and prepare it for analysis. This may involve removing duplicate records, correcting errors, and converting the data into a consistent format.
  • Merge the data into a single dataset. Once the data is clean and prepared, you can merge it into a single dataset for analysis. This will allow you to analyze the data as a whole and identify trends and patterns that may not be apparent when you are looking at the data from each source individually.
  • Analyze the data. Once the data is merged into a single dataset, you can start to analyze it. This may involve using statistical methods, machine learning algorithms, or other data analysis techniques.
  • Interpret the results. Once you have analyzed the data, you need to interpret the results and draw conclusions. This may involve identifying trends and patterns, comparing different groups of data, and making predictions.

Working with more data sources can be challenging, but it is also a valuable skill for data analysts to have. By following the tips above, you can overcome the challenges and produce more accurate and insightful analyses.

Here are some additional tips for working with more data sources:

  • Use a data warehouse or data lake. A data warehouse or data lake can help you to store and manage data from multiple sources in a centralized location. This can make it easier to access and analyze the data.
  • Use a data integration tool. A data integration tool can help you to automate the process of merging data from multiple sources into a single dataset. This can save you time and effort.
  • Use a statistical programming language. A statistical programming language, such as R or Python, can be used to perform complex data analysis tasks. This can be helpful when working with large datasets or when using advanced statistical methods.
  • Use a data visualization tool. A data visualization tool can help you to communicate the results of your analysis in a clear and concise way. This can be helpful for communicating your findings to stakeholders or for publishing your results.

By following these tips, you can overcome the challenges of working with more data sources and produce more accurate and insightful analyses.

In this video, we’ll discuss the different places
data analysts go to connect with data. There’s all kinds
of data out there and it’s important to
know how to access it. Earlier, you learn that there
are two basic types of data used by data analysts:
internal and external. Internal data is data that lives within a
company’s own systems. It’s typically also generated
from within the company. You may also hear internal data described as primary data. External data is
data that lives and is generated outside
an organization. It can come from a variety of places, including
other businesses, government sources, the media, professional associations,
schools, and more. External data is sometimes
called secondary data. Gathering internal data
can be complicated. Depending on your data
analytics project, you might need data from lots of different sources
and departments, including sales, marketing, customer
relationship management, finance, human resources,
and even the data archives. But the effort is worth it. Internal data has plenty of
advantages for a business. It provides information that’s relevant to problems
you’re trying to solve, and it’s free to access because the company
already owns it. With internal data,
analysts can work on all data projects without ever looking beyond
their own walls. But sometimes internal data doesn’t give you
the full picture. In those cases, data
analysts can turn to external data and apply that information
to their analysis. For instance, as
health care analysts, we often partner with other healthcare organizations
or nonprofits and use their data to create deeper analyses and add some more industry-
level perspective. In an earlier video, you learned that
openness has created a lot of data for analysts to use, largely through open
data initiatives. As a reminder, openness or open data refers to
the free access, usage and sharing of data. For example, the United
States government makes hundreds of thousands
of data sets available to the
public on Data.gov. These data sets contain
information on weather patterns, educational progress,
crime rates, transportation, and much more. There are lots of reasons for these open data initiatives. One is to make government
activities more transparent, like letting the public
see where money is spent. It also helps educate citizens about voting
and local issues. Open data also improves public
service by giving people ways to be a part
of public planning or provide feedback
to the government. Finally, open data
leads to innovation and economic growth by helping people and companies better
understand their markets. Google actually hosts lots of public databases with
information on science, transportation, economics,
climate, and more. As an example, a
bike sharing company could use traffic
data from within our public
transportation database to see where the
roads are busiest, then choose those locations for their bikes in order to reduce cars on the road and give people another
transportation option. Now you’re familiar with
internal and external data and how you can access both. Coming up, we’ll learn how
to import all the data you collect from different
sources into a spreadsheet.

Reading: From external source to a spreadsheet

Reading

Video: Importing data from spreadsheets and databases

This video discusses how to import data from different sources into a spreadsheet. One way to import data is to upload a CSV file. CSV files are plain text files that store data in a table format. To import a CSV file, you can use the File > Import menu option in your spreadsheet application.

Another way to import data is to download it from a website. Many websites offer open source data that you can download and use in your spreadsheets. To download data from a website, you can usually use the download button on the website. Once you have downloaded the data, you can import it into your spreadsheet using the File > Import menu option.

Once you have imported your data into a spreadsheet, you can sort and filter it to focus on the information that is relevant to you. Sorting allows you to arrange your data in a specific order, such as alphabetically or numerically. Filtering allows you to hide rows or columns that do not meet certain criteria.

By following the steps in this video, you can learn how to import data from different sources into a spreadsheet and sort and filter it to focus on the information that is relevant to you.

Importing data from spreadsheets and databases in data analysis

Spreadsheets and databases are two of the most common sources of data for data analysis. Spreadsheets are typically used to store small to medium-sized datasets, while databases are used to store large datasets.

There are a number of ways to import data from spreadsheets and databases into a data analysis tool. One common way is to use a data integration tool. Data integration tools allow you to connect to different data sources and extract the data into a single format. Once the data is in a single format, you can import it into your data analysis tool.

Another way to import data from spreadsheets and databases is to use the built-in import capabilities of your data analysis tool. Most data analysis tools have the ability to import data from a variety of file formats, including CSV, Excel, and SQL.

Here are some specific steps on how to import data from spreadsheets and databases into a data analysis tool:

  1. Identify the data source. What spreadsheet or database do you want to import the data from?
  2. Choose a data integration tool or the built-in import capabilities of your data analysis tool.
  3. Connect to the data source. This may involve entering login credentials or providing other authentication information.
  4. Select the data that you want to import. You may need to specify the tables, columns, and rows that you want to import.
  5. Import the data into your data analysis tool.

Once you have imported the data into your data analysis tool, you can start to analyze it using the various tools and features that are available.

Here are some additional tips for importing data from spreadsheets and databases:

  • Make sure that the data is in a consistent format. This will make it easier to import the data into your data analysis tool and to analyze it.
  • Test the import process before you import a large dataset. This will help you to identify any problems with the import process and to resolve them before you import the entire dataset.
  • Use a data dictionary to document the data that you are importing. This will help you to understand the meaning of the data and to use it correctly in your analysis.

By following these tips, you can import data from spreadsheets and databases into your data analysis tool quickly and easily.

Fill in the blank: CSV files use plain text and are _____ by characters, such as a comma.

delineated

CSV files use plain text and are delineated by characters, such as a comma. A delineator indicates a boundary or separation between two things.

At this point, you’ve learned all
about internal and external data, and how to prepare it for use. Now, we’ll go through the process
of actually importing data from different sources. Sometimes you want to upload a spreadsheet
from your files, such as a CSV file. CSV stands for comma-separated values. A CSV file saves data in a table format. Now, let’s bring that file
into a fresh spreadsheet. We’ll start by selecting a file,
then import. Then we’ll choose to upload a file. Navigate to it, open it and
insert it as a new sheet. CSV files use plain text and
they’re delineated by characters. So each column or field is clearly
distinct from another when importing. As you learned, CSVs
are comma-separated, and usually the spreadsheet app will
auto-detect those separations. But sometimes, you might need to indicate
that the separator is another character or a space by selecting the different
options in this window. Also, if you are planning
to work with the data set, you would usually convert to text,
numbers or other options here. But plain text is okay for
reporting purposes. So we can leave those fields alone. Finally, select Import data. Now our CSV file’s ready to
work with in our spreadsheet. I spend most of my time at work analyzing
spreadsheets full of healthcare information. I typically start by looking
at a larger data set. Then I pull a subset of it into
a spreadsheet so I can work with it. Maybe I want to analyze year over year
growth in user demand on Google Search for certain healthcare services,
like telemedicine. Or maybe I want to look at data sets from
external healthcare organizations or agencies for more insight into this trend. For example, with telemedicine, maybe I’ll look at a spreadsheet
that lists telemedicine providers. There are so many ways spreadsheets can
help you find the insights you need. One source I use a lot is the World
Health Organization’s data repository. This is a place where anybody
can access open-source data. As you can see, there’s
tons of data available. You can search by theme,
category, indicator and country. You can also access World Health
Organization metadata if you want to learn more about the data in the repository. For our example, we’ll look at
medical doctors by country and year. This information would be useful for a data analysis project looking into
how many doctors are available to treat patients within a certain population
compared to other populations. To get this data, we’ll start on this
webpage, which contains the data set we want. Then we’ll download
the data as a CSV file. Then open a new spreadsheet and
import the file by selecting File, Import. Next, upload your file and
select Import Data. After reviewing the data to make sure
it looks clean, we can title it and begin our work. I know this is a lot of
information to take in, but you’ll get much more comfortable
with this the more you practice. Coming up, we’ll learn how to sort and filter your data to focus on
the information relevant to you.

Reading: Exploring public datasets

Overview

Practice Quiz: Test your knowledge on accessing data sources

A CSV file saves data in a table format. What does CSV stand for?

A data analyst wants to bring data from a CSV file into a spreadsheet. This is an example of what process?

A CSV file makes it easier for data analysts to complete which tasks? Select all that apply.

Sorting and filtering


Video: Sorting and filtering

In this video, the speaker discusses how to sort and filter data in a spreadsheet to focus on only the data that is relevant to the problem you are trying to solve.

Sorting involves arranging data into a meaningful order to make it easier to understand, analyze, and visualize. Data can be sorted in ascending or descending order, and alphabetically or numerically. Sorting can be done across all of a spreadsheet or just in a single column or table. You can also sort by multiple variables.

Filtering means showing only the data that meets a specific criteria while hiding the rest. A filter simplifies a spreadsheet by only showing us the information we need.

The speaker provides several examples of how to use sorting and filtering to analyze data in a spreadsheet. For example, you could sort the data by city and state to see a list of sales reps by the cities and states in which they work. You could also filter the data to see only the sales reps who worked with a particular product.

Sorting and filtering are very important tools in the data analyst’s toolbox. They can be used to focus on specific data sets and to extract insights from large amounts of data.

Sorting and filtering in data analysis

Sorting and filtering are two important data analysis techniques that can be used to focus on specific data sets and to extract insights from large amounts of data.

Sorting involves arranging data into a meaningful order to make it easier to understand, analyze, and visualize. Data can be sorted in ascending or descending order, and alphabetically or numerically. Sorting can be done across all of a data set or just in a single column or table. You can also sort by multiple variables.

For example, you could sort a data set of sales data by product name to see which products are selling the best. Or, you could sort a data set of customer data by customer satisfaction score to identify customers who are most likely to churn.

Filtering means showing only the data that meets a specific criteria while hiding the rest. A filter simplifies a data set by only showing us the information we need.

For example, you could filter a data set of sales data to only show sales that were made in the past month. Or, you could filter a data set of customer data to only show customers who have spent more than $100 in the past year.

Sorting and filtering can be used together to create powerful data analysis tools. For example, you could sort a data set of sales data by product name and then filter the data to only show sales that were made in the past month. This would give you a list of the top-selling products in the past month.

Sorting and filtering are essential tools for any data analyst. By understanding how to use these techniques, you can quickly and easily focus on specific data sets and extract insights from large amounts of data.

Here are some tips for using sorting and filtering in data analysis:

  • Use sorting to organize your data into a meaningful order. This will make it easier to identify trends and patterns in your data.
  • Use filtering to focus on specific data sets. This will help you to extract more specific insights from your data.
  • Use sorting and filtering together to create powerful data analysis tools. For example, you could sort a data set by one variable and then filter the data by another variable to create a very specific view of your data.
  • Be careful not to overfilter your data. If you filter too much, you may end up with a data set that is too small to be statistically significant.

By following these tips, you can use sorting and filtering to become a more effective data analyst.

In the past few videos, you’ve learned about both
internal and external data. Now I’ll show you how to
focus on only the data that’s relevant to the problem
you’re trying to solve. This is useful if
you’re working with a very large complex spreadsheet, which data analysts
encounter all the time. Having lots of data can
make it difficult to quickly find and analyze
the information you need. No two analytics
projects are the same. Often data analysts process, view, and use data very differently, even if it comes from
the exact same source. Here’s an example. Check
out this spreadsheet that shows a company’s sales
reps and where they work. Different data
analysts might want different information
from the spreadsheet, and that’s where sorting
and filtering comes in. Sorting and filtering the
data in a spreadsheet helps us customize the
way data is presented. They can also organize data so analysts can zoom in on
the pieces that matter. Think of it like a magnifying
glass for our data. Let’s begin with sorting. Sorting involves arranging data into a meaningful order to make it easier to understand,
analyze, and visualize. Data can be sorted in
ascending or descending order, and alphabetically or numerically. Sorting can be done across all of a spreadsheet or just in
a single column or table. You can also sort by
multiple variables. For instance, if our data set contains both city
and state fields, we can sort first by
city and then by state. Anytime you’re sorting data, it’s always a good idea to
freeze the header row first. To do this, we’ll
highlight the row. Then from the view menu, choose freeze and one row. This locks the row in place. Now when we scroll
down the sheet, the header row stays
visible so we know the category of each column. Looks good to me. Now let’s
sort the entire spreadsheet. We’ll sort by city first. To do this, select
the city column, then use the drop-down
arrow to sort the sheet. Select A to Z. This will sort all the
columns from A to Z by row, with the selected column being
the primary sort criteria. The cities are now
sorted alphabetically, and they’re still grouped with
the corresponding states, sales reps, and auto parts. The details across each
row are automatically kept together when sorting
a particular section, as you can see here. Multiple criteria sorting is another very useful
data analysis tool. For instance, let’s say
we want to see a list of sales reps by the cities and
states in which they work. First, we select the
entire data set, then choose data and sort range. In the dialog box, make sure that “Data has
header row” is highlighted. That way row A, city, states, sales rep, and auto parts won’t be
part of the sort. Then in the sort
by drop-down menu, select state and the
sort order A to Z. Now add another sort column. In the “then by” drop-down, select city and the sort order A to Z. Finally, select Sort. Now we can search the
data to easily find a sales rep who works in a
particular state and city. Sorting is useful when you
want to look at everything in a spreadsheet in alphabetical
or numerical order. But sometimes data
analysts want to isolate a particular
piece of information. To do this, they use a filter. Filtering means showing
only the data that meets a specific criteria
while hiding the rest. A filter simplifies
a spreadsheet by only showing us the
information we need. For example, we could
add a filter to see only the sales reps who worked
with a particular product. To do this, we first select
Data and Create a filter. Choose the column with
the data we need. In this case, Auto Parts. Filter buttons will appear in the corner of each column header. To filter our spreadsheet
by auto part, click the button in
the Auto Parts header. In this example,
let’s say we want to only see sales reps
who worked with rims. Remove the check marks from the categories we
don’t want to see, which is everything
except for rims. Then select okay. The filter temporarily hides anything that doesn’t
meet the condition. But note that, even though they aren’t visible,
they’re still there. When it’s time to view the
entire area spreadsheet again, simply turn off the filter. Sorting and filtering are very important tools in
the data analyst’s toolbox. In the next video,
you’ll discover even more ways to narrow in on the exact information you need for any data analysis project.

Practice Quiz: Hands-On Activity: Clean data in spreadsheets with sorting and filtering

Practice Quiz: Self-Reflection: Considering databases and spreadsheets for sorting and filtering

Reading

Practice Quiz: Test your knowledge on sorting and filtering

What is the process for arranging data into a meaningful order to make it easier to understand, analyze, and visualize?

A data analyst is reviewing a national database of real estate sales. They are only interested in sales of condominiums. How can the analyst narrow their scope?

A data analyst works for a rental car company. They have a spreadsheet that lists car ID numbers and the dates cars were returned. How can they sort the spreadsheet to find the most recently returned cars?

Fill in the blank: To keep a header row at the top of a spreadsheet, highlight the row and select _____ from the View menu.

Working with large datasets in SQL


Video: Setting up BigQuery, including sandbox and billing options

BigQuery offers two free account tiers: sandbox and free trial.

The sandbox account is available at no charge and anyone with a Google account can log in and use it. However, it has some limitations, such as a maximum of 12 projects at a time and no support for Data Manipulation Language (DML) operations.

The free trial account gives you access to more of what BigQuery has to offer with fewer overall limitations. It includes $300 in credit for use in Google Cloud during the first 90 days. However, it requires that you set up a payment option with Google Cloud.

You can upgrade to a paid account with either type of account at any time and retain all of your existing projects. If you set up a free trial account but choose not to upgrade to a paid account when your trial period ends, you can set up a free sandbox account at that time. However, projects from your trial won’t transfer to your sandbox.

In the next video, the instructor will walk through the steps required to create a sandbox account and explore the SQL workspace.

Setting up BigQuery, including sandbox and billing options

Sandbox account

To set up a BigQuery sandbox account, follow these steps:

  1. Go to the BigQuery sandbox documentation page.
  2. In the upper right corner, log in to the Google account you want to use for your BigQuery sandbox account.
  3. Click the “Go to BigQuery” button.
  4. In the drop-down menu, select your country and agree to the terms of service agreement.
  5. Click the “Create Project” button.
  6. Name your project and give it an ID.
  7. Click the “Create” button.
  8. Click the “Done” button.

Billing options

If you want to use more features of BigQuery, you can sign up for a free trial account. To do this, follow these steps:

  1. Go to the BigQuery pricing page.
  2. Click the “Start a free trial” button.
  3. Enter your contact information and payment information.
  4. Click the “Start free trial” button.

Your free trial will give you $300 in credit for use in Google Cloud during the first 90 days. You can use this credit to pay for BigQuery resources, such as storage and queries.

Upgrading to a paid account

If you need to use more BigQuery resources than your free trial account provides, you can upgrade to a paid account at any time. To do this, follow these steps:

  1. Go to the BigQuery pricing page.
  2. Click the “Choose a plan” button.
  3. Select the plan that meets your needs.
  4. Click the “Upgrade” button.

Switching to a sandbox account

If you have a free trial account and you want to switch to a sandbox account, you can do so at any time. To do this, follow these steps:

  1. Go to the BigQuery console.
  2. Click the “hamburger menu” icon in the upper left corner.
  3. Select “Projects”.
  4. Click the name of the project you want to switch to.
  5. Click the “Billing” tab.
  6. Under “Billing”, click the “Change billing account” button.
  7. Select the “Sandbox account” option.
  8. Click the “Change billing account” button.

Conclusion

You can now set up a BigQuery sandbox or free trial account, and upgrade to a paid account at any time.

Hi. Welcome back.
Throughout this course, you’ve seen how BigQuery
can be used to view and analyze data from
tons of sources. Now we’re going to explore the different account tiers
that BigQuery offers, so you know how to choose the
right one for your needs, and how you can access them. BigQuery is offered
to you at no charge. There are paid
options available, but we won’t need them for the
activities in this course. Instead, we’re going to talk
about two account types: sandbox, and free trial. A Sandbox account is
available at no charge and anyone with a Google account
can log in and use it. There are a couple of limitations
to this account type. For example, you get a maximum
of 12 projects at a time. This means that if you want
to make a 13th project, you’ll have to delete
one of your original 12. It also doesn’t allow you
to insert new records to a database or update the field values of
existing records. These Data Manipulation
Language or DML operations aren’t supported in the sandbox. However, you won’t need to do
this in course activities. You can read more about
the limitations of a sandbox account in the
BigQuery documentation. This is the account type we’ll use for most of our activities. It’s simple to set up. So, later in this video we’ll walk through the steps required to create an account. Before that though,
we should talk about the other way to use
BigQuery without charges. The Google Cloud free trial. The free trial gives you access
to more of what BigQuery has to offer with fewer
overall limitations. The free trial offers
$300 in credit for use in Google Cloud
during the first 90 days. You won’t get anywhere near that spending limit if you just use the BigQuery console
to practice SQL queries. After you spend the $300
credit or after 90 days, your free trial will expire and you will need to personally select to upgrade to a paid account to keep
working in Google Cloud. Your method of
payment will not be automatically charged after
your free trial ends. The free trial does
require that you set up a payment option
with Google Cloud. But unless you
choose to opt-in for an account upgrade,
it won’t charge you. However, it does require you
to enter a payment type. So, we understand if you don’t feel comfortable
with this option. This is one reason the BigQuery
sandbox account exists, so you don’t have to enter
any payment information. With either type of account, you can upgrade to
a paid account at anytime and retain all of
your existing projects. If you set up a free trial
account but choose not to upgrade to a paid account
when your trial period ends, you can set up a free sandbox
account at that time. However, projects from your trial won’t transfer
to your sandbox. It would be like
starting from scratch. Just something to keep in mind. Now we’re going to set
up your sandbox account, which you can change
into a free trial or upgrade to a paid
account if you choose. First, we’ll go to the BigQuery sandbox
documentation page. Then go to the upper right
corner and log in to whichever Google
account you want to use for the BigQuery
sandbox account. Then we’ll select
the “Go to BigQuery” button on the
documentation page. This gives us a
drop-down to select a country and to read the
terms of service agreement. This will bring us to
the SQL workspace, which we’ll be using an
upcoming activities. Choose “Create Project” and name the project
and give it an ID. Choose “Create,” and then
“Done.” There we have it. In the next video, we’ll explore what each part of
the SQL workspace does and how we’ll use it in future activities.
See you there.

Reading: Using BigQuery

Reading

Video: How to use BigQuery

  • The BigQuery SQL workspace is a tool for writing and running SQL queries on BigQuery data.
  • To navigate to the SQL workspace, go to the BigQuery landing page and select the “SQL workspace” option from the drop-down menu.
  • To search for a public dataset, go to the Explorer menu and select “Explore public datasets.”
  • To select a public dataset, click on the dataset you want to use.
  • To run a query, click the “Compose new query” button and write your query in the editor.
  • To upload your own data to BigQuery, go to the Explorer menu and select “Create dataset.”
  • To upload a data file to BigQuery, select the “Upload” option from the “Create table” dialog.

How to use BigQuery

BigQuery is a fully managed, petabyte-scale analytics data warehouse that enables businesses to analyze all their data very quickly.

To use BigQuery, you will need to create a Google Cloud Platform (GCP) project and enable the BigQuery API. Once you have done that, you can start loading data into BigQuery.

Loading data into BigQuery

There are several ways to load data into BigQuery, including:

  • Uploading files from your local machine: You can upload CSV, JSON, Avro, Parquet, ORC, and Datastore export files to BigQuery.
  • Streaming data from streaming services: You can stream data from streaming services such as Cloud Pub/Sub and Cloud Dataflow into BigQuery.
  • Importing data from other cloud providers: You can import data from other cloud providers such as Amazon S3 and Azure Blob Storage into BigQuery.

Running queries on BigQuery

Once you have loaded data into BigQuery, you can start running queries on it. BigQuery supports standard SQL, so you can use the same SQL syntax that you use with other databases.

To run a query in BigQuery, you can use the BigQuery console, the BigQuery API, or the BigQuery CLI.

Using the BigQuery console

To run a query in the BigQuery console, follow these steps:

  1. Go to the BigQuery console.
  2. Click the “Compose new query” button.
  3. Write your query in the editor.
  4. Click the “Run” button.

Using the BigQuery API

To run a query using the BigQuery API, you can use the following steps:

  1. Install the BigQuery API client library for your programming language.
  2. Create a BigQuery client object.
  3. Create a query object.
  4. Set the query text.
  5. Execute the query.

Using the BigQuery CLI

To run a query using the BigQuery CLI, you can use the following steps:

  1. Install the BigQuery CLI.
  2. Run the following command:
bq query --project-id <project-id> --query-file <query-file>

Analyzing data in BigQuery

Once you have run a query, you can analyze the data in BigQuery. BigQuery provides a variety of features for analyzing data, including:

  • Visualization: BigQuery provides a variety of visualization tools, such as charts and graphs, for visualizing data.
  • Machine learning: BigQuery provides machine learning features for training and deploying machine learning models on your data.
  • Geospatial analysis: BigQuery provides geospatial analysis features for analyzing geospatial data.

Pricing

BigQuery is a pay-as-you-go service. You are charged for the amount of data that you store in BigQuery and the amount of data that you query.

Conclusion

BigQuery is a powerful data warehouse that can be used to analyze all your data very quickly. It is easy to use and provides a variety of features for analyzing data.

Additional resources

  • BigQuery documentation: https://cloud.google.com/bigquery/docs
  • BigQuery tutorials: https://cloud.google.com/bigquery/docs/tutorials
  • BigQuery pricing: https://cloud.google.com/bigquery/pricing

Hey there. In this video, we’re going to learn
about each part of the BigQuery SQL workspace so you can use it
during this course and throughout your
career as a data analyst. It’s an extremely valuable
and widely popular tool, so understanding how it
works is super helpful. Feel free to follow along on your screen as we
explore BigQuery. You may notice that my screen appears a little different than yours, since BigQuery is constantly updating
its interface. Don’t worry if this happens as minor differences won’t stop you from understanding
the basics. To begin, go to the
BigQuery landing page, then login to the account
you created earlier. To navigate to the
SQL workspace, select the menu on
the left side of the screen and scroll down
to the Big Data header. Then hover over
the BigQuery label and click ”SQL workspace”
from the drop-down. Now that we’re in
the SQL workspace, we’re going to search
for public datasets, select a dataset through
the Data Explorer, run a query, and upload
our own data for querying. First, we’ll search for
a public dataset to use. To select a public dataset, navigate to the Explorer menu on the left side of the screen. Click the “Add Data” button in the upper right of the menu. Then in the drop-down menu, select “Explore
public datasets.” This will open the
marketplace and show you available
public datasets. Let’s go to the search
marketplace bar and search for noaa_lightning, a dataset we’ll use in
an upcoming activity. Click on the ”Cloud-to-Ground Lightning Strikes” dataset. This will give us
a description and preview of the dataset
which captures observations about
lightning activity and weather patterns
in the United States. Click “View dataset.” This will bring you back to the SQL workspace and create
a tab for the dataset. We can then move back to the
Editor tab we have opened, or click “Compose new query”
to begin writing with SQL. On the left, notice that the BigQuery public
data drop-down list is in the Explorer menu. We can click the arrow to expand the BigQuery data list and
pick out a new dataset. Let’s select the
first dataset in the drop-down list, austin_311. When we do, it expands to show the table
within the dataset. We can open the
dataset for a preview. The Schema tab contains the names of each
column in the dataset. The Details tab contains
additional metadata, such as the creation
date of the dataset. The Preview tab contains the
first rows from the dataset. On this page, we can click “Query” to automatically create a new editor window with the template for a query
already populated. From here, put an
asterisk after Select, where our cursor pops up, then run the query. Congratulations, you ran
a SQL query in BigQuery. The query you ran
returned rows from the dataset which populate in a window beneath the
editor interface. Results from any query you
run will also display here. Now let’s say you
have the results of a survey that you want to upload to BigQuery and
analyze using SQL. To add your own
data to BigQuery, choose the ID of the
project you want to add to. Select the three vertical
dots icon to open options for the project then
choose “Create dataset.” Name the dataset
something that will help you identify it later, such as upload_test_dataset. Then click “Create dataset.” Next, go to the Explorer
menu and choose the three vertical dots next to the dataset under the
Projects drop down. Now we’ll select the
icon for create table, which opens a pop-up window. Under Source and
create table from, select “Upload” or whichever method you prefer to
upload your data. Here, we can upload
any data file, such as a CSV file. Let’s give our table a helpful
name such as test_table. Make sure that the
schema is set to auto detect and select
“Create table.” There’s more to
come with BigQuery. Feel free to re-watch
this video anytime and keep practicing.
See you soon.

Video: BigQuery in action

This video teaches viewers about SQL queries, which are used to communicate with databases and extract specific data from them. The video begins by reviewing what databases and SQL are, and then explains how to write a query to view an entire data set. Next, the video shows how to filter a data set to only include specific data, such as data from a particular state. Finally, the video discusses the importance of writing organized and readable queries.

Here are the key takeaways from the video:

  • SQL queries are used to communicate with databases and extract specific data from them.
  • To write a query to view an entire data set, start with the SELECT statement, followed by an asterisk (*), and then the name of the data set.
  • To filter a data set to only include specific data, use the WHERE clause.
  • It is important to write organized and readable queries, so that they are easy to understand for yourself and others.

Here is an example of a SQL query to view an entire data set:

SQL

SELECT * FROM solar_potential_by_postal_code;

Here is an example of a SQL query to filter a data set to only include data from Pennsylvania:

SQL

SELECT * FROM solar_potential_by_postal_code WHERE state_name = 'Pennsylvania';

BigQuery in action

BigQuery is a powerful data warehouse that can be used to analyze all your data very quickly. It is easy to use and provides a variety of features for analyzing data.

Here are some examples of how BigQuery can be used in action:

  • Analyze customer behavior: BigQuery can be used to analyze customer behavior data, such as purchase history, website visits, and social media activity. This information can be used to improve customer segmentation, targeting, and retention.
  • Identify fraud and abuse: BigQuery can be used to identify fraudulent transactions, suspicious activity, and other types of abuse. This information can be used to protect businesses from financial losses and reputational damage.
  • Optimize business operations: BigQuery can be used to optimize business operations, such as supply chain management, inventory management, and marketing campaigns. This information can be used to improve efficiency, reduce costs, and increase profits.
  • Make better business decisions: BigQuery can be used to make better business decisions by providing insights into key metrics, such as customer satisfaction, product performance, and market trends. This information can be used to develop new products and services, expand into new markets, and allocate resources more effectively.

Example tutorial

Here is a simple example of how to use BigQuery to analyze customer purchase history data:

  1. Create a BigQuery dataset to store your purchase history data. You can do this using the BigQuery console or the BigQuery API.
  2. Load your purchase history data into the dataset. You can do this by uploading a file from your local machine or by streaming data from a streaming service.
  3. Write a SQL query to analyze the data. For example, you could write a query to find the top 10 products by sales or the average order value for each customer segment.
  4. Run the query and view the results. You can view the results in the BigQuery console, export the results to a file, or use the results to create a visualization.

Here is an example of a SQL query to find the top 10 products by sales:

SQL

SELECT product_id, SUM(quantity) AS total_sales
FROM `my_dataset.purchase_history`
GROUP BY product_id
ORDER BY total_sales DESC
LIMIT 10;

This query will return a table with two columns: product_id and total_sales. The rows in the table will be sorted in descending order by total_sales, so that the top 10 products by sales are listed at the top of the table.

This is just a simple example of how to use BigQuery. BigQuery can be used to perform much more complex analysis on a variety of different types of data.

Conclusion

BigQuery is a powerful data warehouse that can be used to analyze all your data very quickly and easily. It is a great tool for businesses of all sizes to improve their decision-making and gain insights into their data.

In an existing company database, the customers table contains the following columns: CustomerId, FirstName, LastName, Company, Address, City, State, Country, PostalCode, Phone, Fax, Email, and SupportRepId.
Create a query to return all the columns in the customer table for only customers in Germany.

SELECT *
FROM customers
WHERE Country = "Germany"

What is the last name of the customer in the second row of the results returned from the query?

Schneider

Great job. Schneider is the last name of the customer in the second row returned when making the following query:

SELECT * FROM customers WHERE Country = ‘Germany’

You’ve learned how sorting and filtering data in spreadsheets helps
data analysts customize the information. Customizing data makes it more
meaningful and easier to understand, analyze, and visualize. You also discovered that some spreadsheets
can be extremely long and complex. So knowing how to zero in on
the exact data you need while setting aside the rest helps you
focus on your analysis. This is also true for databases. Sometimes a data set is too large to
download, or it won’t fit in a spreadsheet. So a data analyst will use SQL to create
a query to view the specific data that they want from within the larger set. We’ve learned that a database is
a collection of data stored in a computer system. And that SQL stands for
Structured Query Language. Data analysts use query languages
to communicate with the database. In an earlier video you also learned that
a relational database contains a series of tables that can be connected
to form relationships. These relationships are represented
by primary and foreign keys. Data analysts write queries in order
to get data from these tables. Let’s see how this works. We’ll start with our table viewer. Here we can see what public
data sets are available. We’ll scroll through the data before
we start using it to get a feel for what it’s all about, and
to make sure it’s clean. Some table viewers let you preview
a few rows before even writing a query. This is helpful if you want to take a
quick look to be sure the data set will be right for your project. To show you how this works,
let’s check out a sample data set. This one shows how much sunlight
hits rooftops in a year. This would be very useful for a data analyst working on a solar
energy project, for example. We’ll start by previewing the data set.
Click on it, like this. Then we’ll select a subset of this data,
where we find regions, states, yearly sunlight and more. Now to see the entire data set,
let’s write a query. The first step is finding out the complete
correct name of the data set. To do this, select the data set,
solar potential by postal code, and select query table. The name of the data set is
shown inside the two backticks. This is to help us read
the query more easily. We can also remove the backticks in this
case, and our query would still run. The words you see before the dot
represent the database name. And the words after the dot
represent the table name. Let’s select and copy the data set name
now because we’ll need it in a second. Now we’ll click on the plus
sign to compose a new query. Most queries begin with the word SELECT. Then we add a space. Because we want to see the entire
data set, we’ll put an asterisk next. The asterisk says we want
to include all columns. This is a great shortcut
because without it, we’d have to type in
every single field name. Next we’ll press return and type FROM. FROM does just what it sounds like. It indicates where
the data is coming from. After that, we’ll add another space. Now, we paste in the name of
the data set that we copied earlier. And finally, run the query. Now, you can carefully inspect the data set
before we begin working with it. One important thing to keep in mind:
SQL queries can be written in a lot of different ways, but
still provide the same results. For example, we could have written this query as one
long line of instructions like this, and we’d still get the same results. The additional lines and spaces don’t
impact the query’s outcome, but they keep your query organized and
easier to read for yourself and others. Now, if the project doesn’t
require all of these fields, we can use SQL to view a particular
piece or pieces of data. To do this, we specify a certain
column name in the query. For example, maybe we only want
to see data from Pennsylvania. So we’ll begin our query
the same way we just learned. SELECT, space, add an asterisk. Then FROM our solar potential database.
But this time we’ll add WHERE. WHERE also does exactly
what it sounds like. It tells the database where to look for
information. In this case, the state name column.
So add a space and state underscore name,
the name of the column. Now because we only want to see data from
Pennsylvania, we add an equal sign and the word Pennsylvania with
single quotes around it. In SQL single quotes indicate
the beginning and ending of a string. Finally, we run the query. Now we can review the data on solar
potential for only Pennsylvania. Now we’ve got the data we want and
we’re ready to start putting it to work, which we’ll cover later on. But for now, let’s celebrate
finishing another module. You’ve covered a lot of complex and
highly technical information. As you keep practicing though, things
will start to feel a lot more natural. For now, take a moment to sit back and
think about all you’ve learned. You discovered metadata and how it keeps data organized by
describing what that data is all about. You’ve seen how internal and external data
are accessed and how data analysts use them to find compelling insights
to solve business problems. And you can sort and filter your data to
really pinpoint the information you need. Finally, you just learned about queries
and you even practiced writing some. Coming up, you’ll have a few readings and then a weekly challenge
to test your knowledge. This will help you confirm that you’ve
understood what we’ve worked on in these videos. And as always,
if you’re ever unsure about a question, I highly encourage you to review the
videos and readings to find the answer. You’re the data detective now,
so use those skills. Keep up the great work and
I’ll see you after the weekly challenge.

Practice Quiz: Hands-On Activity: Introduction to BigQuery

Practice Quiz: Hands-On Activity: Create a custom table in BigQuery

Reading: In-depth guide: SQL best practices

Practice Quiz: Hands-On Activity: Applying SQL

Practice Quiz: Test your knowledge on using SQL with large datasets

In MySQL, what is acceptable syntax for the SELECT keyword? Select all that apply.

A database table is named blueFlowers. What type of case is this?

In BigQuery, what optional syntax can be removed from the following FROM clause without stopping the query from running?
FROM `bigquery-public-data.sunroof_solar.solar_potential_by_postal_code`

In the following FROM clause, what is the table name in the SQL query?
FROM bigquery-public-data.sunroof_solar.solar_potential_by_postal_code

Weekly challenge 3


Reading: Glossary: Terms and definitions

Quiz: *Weekly challenge 3*

Relational databases illustrate relationships between tables. Which fields represent the connection between these tables? Select all that apply.

When working with data from an external source, what can metadata help data analysts do? Select all that apply.

An email is sent to a customer support address for a large company. Which one of the following is a piece of metadata about that email?

Fill in the blank: Data security and privacy are important issues that should be addressed using a _____ process.

What are some key benefits of using external data? Select all that apply.

A football team has a list of all games played in their home stadium containing the columns game_date, opponent_name, total_revenue, and game_result. How could they sort this data to determine the game that made the team the most money?

When writing a query in BigQuery and leveraging a dataset, what do the words before the dot (.) in the dataset name represent?

You are working with a database table that contains customer data. The support_rep_id column contains the identification number of the support representative assigned to a customer’s account. You are only interested in customers who are assigned to support representative 4.
You write the SQL query below.
SELECT * FROM Customer
What code would be added to return only customers assigned to the support representative with the ID of 4?