Q: AI assistant

The answer is data lake . Data lakes are a type of database that are designed to store large amounts of raw data in its original format. This makes them ideal for BI professionals who need to store and analyze all of their data, regardless of its format or source. Data lakes are typically used to store a variety of data types, including structured, semi-structured, and unstructured data. This allows BI professionals to get a complete view of their data and to identify trends and patterns that would not be possible if they were only looking at data from a single source. Data lakes are also designed to be scalable, so they can grow as the amount of data increases. This is important for BI professionals who need to be able to access and analyze large amounts of data on a regular basis. Here are some of the benefits of using a data lake for BI: Data storage: Data lakes are designed to store large amounts of data in its original format. This makes them ideal for BI professionals who need to store and analyze all of their data, regardless of its format or source. Data integration: Data lakes can be used to integrate data from a variety of sources, such as transactional databases, CRM systems, and ERP systems. This allows BI professionals to get a holistic view of the data and to identify trends and patterns that would not be possible if they were only looking at data from a single source. Data scalability: Data lakes are designed to be scalable, so they can grow as the amount of data increases. This is important for BI professionals who need to be able to access and analyze large amounts of data on a regular basis. If you are a BI professional, I recommend using a data lake to store your raw data. This will give you the flexibility and scalability you need to be successful in your projects.

Q: AI assistant

The answer is snowflake . A snowflake schema is an extension of a star schema, which contains additional dimensions. In a snowflake schema, the dimensions are normalized, which means that they are divided into smaller tables. This makes the schema more efficient for querying, but it can also make it more complex to manage. Snowflake schemas are often used for large datasets, where the star schema would become too cumbersome. They are also used for datasets where the dimensions are highly interconnected. Here is an example of a snowflake schema: Fact table: - order_id (primary key) - customer_id - product_id - quantity Dimensions: - customer (customer_id, name, address, etc.) - product (product_id, name, description, etc.) - time (order_date, order_time, etc.) Subdimensions: - customer_address (customer_id, street, city, state, zipcode) - product_category (product_id, category_name) - time_hour (order_date, order_time, hour) The customer dimension is normalized into two subdimensions: customer_address and product_category. This allows for more efficient querying, as the data for each dimension is stored in a separate table. Snowflake schemas can be complex to manage, but they can be very efficient for querying large and interconnected datasets.

Q: AI assistant

The answer is combined systems . Combined systems are a type of database that stores relevant data locally for analysis and less relevant data remotely. This allows for faster and more efficient querying of the relevant data, while also reducing the cost of storing the less relevant data. Combined systems are often used for big data analytics, where the datasets can be very large and complex. By storing the relevant data locally, combined systems can provide real-time insights into the data. Here is an example of a combined system: Local storage: The most relevant data is stored on a local server, which is close to the users who need to access it. This data can be queried very quickly, as it is stored on the same machine as the users. Remote storage: Less relevant data is stored on a remote server, which is typically located in a data center. This data can still be queried, but it will take longer, as the data has to be transferred over the network. Combined systems can be complex to manage, but they can provide significant performance benefits for big data analytics. The other options are not as accurate: Separated storage and computing systems: This option does not store any data locally. All of the data is stored remotely, which can lead to slower querying performance. Single-homed databases: This option stores all of the data locally. This can be expensive, as it requires a lot of storage space. Distributed databases: This option stores the data across multiple servers. This can be more efficient than storing all of the data on a single server, but it can also be more complex to manage.

Q: A business intelligence professional stores large amounts of raw data in its original format within a database system. Then, they can access the data whenever they need it for their BI project. What type of database are they using?

Data lake They are using a data lake. A data lake is a database system that stores large amounts of raw data in its original format until it’s needed.

Q: Fill in the blank: A _ schema is an extension of a star schema, which contains additional dimensions.

snowflake A snowflake schema is an extension of a star schema, which contains additional dimensions.

Q: What type of database stores relevant data locally for analysis and less relevant data remotely?

Separated storage and computing systems Separated storage and computing systems are databases that store relevant data locally for analysis and less relevant data remotely.

Question 1

Reading

Accepted Answer

Course-2-overview

Question 2

Reading

Accepted Answer

Design-efficient-database-systems-with-schemas Download

Question 3

Reading

Accepted Answer

In this lesson, you have been learning about the different aspects of databases and how they influence the way a business intelligence system functions. The database framework—including how platforms are organized and how data is stored and processed—affects how data is used. Therefore, understanding different technologies helps you make more informed decisions about the BI tools and processes you create. This reading provides a breakdown of databases including OLAP, OLTP, row-based, columnar, distributed, single-homed, separated storage and compute, and combined.

OLAP versus OLTP

Database technologyDescriptionUseOLAPOnline Analytical Processing (OLAP) systems are databases that have been primarily optimized for analysis.Provide user access to data from a variety of source systemsUsed by BI and other data professionals to support decision-making processesAnalyze data from multiple databasesDraw actionable insights from data delivered to reporting tablesOLTPOnline Transaction Processing (OLTP) systems are databases that have been optimized for data processing instead of analysis.Store transaction dataUsed by customer-facing employees or customer self-service applicationsRead, write, and update single rows of dataAct as source systems that data pipelines can be pulled from for analysis

Row-based versus columnar

Database technologyDescriptionUseRow-basedRow-based databases are organized by rows.Traditional, easy to write database organization typically used in OLTP systemsWrites data very quicklyStores all of a row’s values togetherEasily optimized with indexingColumnarColumnar databases are organized by columns instead of rows.Newer form of database organization, typically used to support OLAP systemsRead data more quickly and only pull the necessary data for analysisStores multiple row’s columns together

Distributed versus single-homed

Database technologyDescriptionUseDistributedDistributed databases are collections of data systems distributed across multiple physical locations.Easily expanded to address increasing or larger scale business needsAccessed from different networksEasier to secure than a single-homed database systemSingle-homedSingle-homed databases are databases where all of the data is stored in the same physical location.Data stored in a single location is easier to access and coordinate cross-teamCuts down on data redundancyCheaper to maintain than larger, more complex systems

Separated storage and compute versus combined

Database technologyDescriptionUseSeparated storage and computeSeparated storage and computing systems are databases where less relevant data is stored remotely, and relevant data is stored locally for analysis.Run analytical queries more efficiently because the system only needs to process the most relevant dataScale computation resources and storage systems separately based on your organization’s custom needsCombined storage and computeCombined systems are database systems that store and analyze data in the same place.Traditional setup that allows users to access all possible data at onceStorage and computation resources are linked, so resource management is straightforward

Question 4

AI assistant

Accepted Answer

The answer is data lake.

Data lakes are a type of database that are designed to store large amounts of raw data in its original format. This makes them ideal for BI professionals who need to store and analyze all of their data, regardless of its format or source.

Data lakes are typically used to store a variety of data types, including structured, semi-structured, and unstructured data. This allows BI professionals to get a complete view of their data and to identify trends and patterns that would not be possible if they were only looking at data from a single source.

Data lakes are also designed to be scalable, so they can grow as the amount of data increases. This is important for BI professionals who need to be able to access and analyze large amounts of data on a regular basis.

Here are some of the benefits of using a data lake for BI:

Data storage: Data lakes are designed to store large amounts of data in its original format. This makes them ideal for BI professionals who need to store and analyze all of their data, regardless of its format or source.
Data integration: Data lakes can be used to integrate data from a variety of sources, such as transactional databases, CRM systems, and ERP systems. This allows BI professionals to get a holistic view of the data and to identify trends and patterns that would not be possible if they were only looking at data from a single source.
Data scalability: Data lakes are designed to be scalable, so they can grow as the amount of data increases. This is important for BI professionals who need to be able to access and analyze large amounts of data on a regular basis.

If you are a BI professional, I recommend using a data lake to store your raw data. This will give you the flexibility and scalability you need to be successful in your projects.

Question 5

AI assistant

Accepted Answer

The following statements correctly describe primary and foreign keys:

A primary key references a column or a group of columns in which each row uniquely identifies each record in the table.
A foreign key forms connections because it is a field within a database table that is a primary key in another table.

Therefore, the correct answer is both of the above statements.

A primary key is a unique identifier for each record in a table. It cannot be null and cannot contain duplicate values. A foreign key is a column in one table that references the primary key of another table. It is used to establish a relationship between the two tables.

For example, let's say we have two tables: a customers table and an orders table. The customers table has a primary key called customer_id. The orders table has a foreign key called customer_id that references the customer_id primary key in the customers table.

This relationship allows us to see which customer placed each order. For example, if we have an order with a customer_id of 123, we can look up the customer_id of 123 in the customers table to see the name of the customer who placed the order.

Primary and foreign keys are essential for maintaining data integrity in relational databases. They help to ensure that each record in a table is unique and that the data in the two tables is consistent.

Question 6

AI assistant

Accepted Answer

The answer is snowflake.

A snowflake schema is an extension of a star schema, which contains additional dimensions. In a snowflake schema, the dimensions are normalized, which means that they are divided into smaller tables. This makes the schema more efficient for querying, but it can also make it more complex to manage.

Snowflake schemas are often used for large datasets, where the star schema would become too cumbersome. They are also used for datasets where the dimensions are highly interconnected.

Here is an example of a snowflake schema:

Fact table: - order_id (primary key) - customer_id - product_id - quantity Dimensions: - customer (customer_id, name, address, etc.) - product (product_id, name, description, etc.) - time (order_date, order_time, etc.) Subdimensions: - customer_address (customer_id, street, city, state, zipcode) - product_category (product_id, category_name) - time_hour (order_date, order_time, hour)

The customer dimension is normalized into two subdimensions: customer_address and product_category. This allows for more efficient querying, as the data for each dimension is stored in a separate table.

Snowflake schemas can be complex to manage, but they can be very efficient for querying large and interconnected datasets.

Question 7

AI assistant

Accepted Answer

The answer is combined systems.

Combined systems are a type of database that stores relevant data locally for analysis and less relevant data remotely. This allows for faster and more efficient querying of the relevant data, while also reducing the cost of storing the less relevant data.

Combined systems are often used for big data analytics, where the datasets can be very large and complex. By storing the relevant data locally, combined systems can provide real-time insights into the data.

Here is an example of a combined system:

Local storage: The most relevant data is stored on a local server, which is close to the users who need to access it. This data can be queried very quickly, as it is stored on the same machine as the users.
Remote storage: Less relevant data is stored on a remote server, which is typically located in a data center. This data can still be queried, but it will take longer, as the data has to be transferred over the network.

Combined systems can be complex to manage, but they can provide significant performance benefits for big data analytics.

The other options are not as accurate:

Separated storage and computing systems: This option does not store any data locally. All of the data is stored remotely, which can lead to slower querying performance.
Single-homed databases: This option stores all of the data locally. This can be expensive, as it requires a lot of storage space.
Distributed databases: This option stores the data across multiple servers. This can be more efficient than storing all of the data on a single server, but it can also be more complex to manage.

Question 8

A business intelligence professional stores large amounts of raw data in its original format within a database system. Then, they can access the data whenever they need it for their BI project. What type of database are they using?

Accepted Answer

Data lake

They are using a data lake. A data lake is a database system that stores large amounts of raw data in its original format until it’s needed.

Question 9

Which of the following statements correctly describe primary and foreign keys? Select all that apply.

Accepted Answer

A primary key references a column or a group of columns in which each row uniquely identifies each record in the table.
A foreign key forms connections because it is a field within a database table that is a primary key in another table.

A primary key references a column or a group of columns in which each row uniquely identifies each record in the table. A foreign key forms connections because it is a field within a database table that is a primary key in another table.

Question 10

Fill in the blank: A _ schema is an extension of a star schema, which contains additional dimensions.

Accepted Answer

snowflake

A snowflake schema is an extension of a star schema, which contains additional dimensions.

Question 11

What type of database stores relevant data locally for analysis and less relevant data remotely?

Accepted Answer

Separated storage and computing systems

Separated storage and computing systems are databases that store relevant data locally for analysis and less relevant data remotely.

Question 12

Reading

Accepted Answer

Whether you are creating a new database model or exploring a system in place already, it is important to ensure that all elements exist in the schema. The database schema enables you to validate incoming data being delivered to your destination database to prevent errors and ensure the data is immediately useful to users.

Here is a checklist of common elements a database schema should include:

The relevant data: The schema describes how the data is modeled and shaped within the database and must encompass all of the data being described.
Names and data types for each column: Include names and data types for each column in each table within the database.
Consistent formatting: Ensure consistent formatting across all data entries. Every entry is an instance of the schema, so it needs to be consistent.
Unique keys: The schema must use unique keys for each entry within the database. These keys build connections between the tables and enable users to combine relevant data from across the entire database.

Key takeaways

As you receive more data or business needs change, databases and schemas may also need to change. Database optimization is an iterative process, which means you may need to check the schema multiple times throughout the database’s useful life. Use this checklist to help you ensure that your database schema remains functional.

Question 13

Reading

Accepted Answer

So far, you’ve learned about the differences between various types of database schemas, the factors that influence the choice of database schemas, and how to design a database schema for a data warehouse using best practices.

In this reading, you’ll review a database schema created for a fictional scenario and explore the reasoning behind its design. In your role as a BI professional, you’ll need to understand why a database was built in a certain way.

Database schema

Francisco’s Electronics is launching an e-commerce store for its new home office product line. If it’s a success, company decision-makers plan to bring the rest of their products online as well. The company brought on Mia, a senior BI engineer, to help design its data warehouse. The database needed to store order data for analytics and reporting, and the sales manager needed to generate reports quickly to track the sales so that the success of the site can be determined.

Below is a diagram of the schema of the sales_warehouse database Mia designed. It contains different symbols and connectors that represent two important pieces of information: the major tables within the system and the relationships among these tables.

The sales_warehouse database schema contains five tables: Sales, Products, Users, Locations, and Orders, which are connected via keys. The tables contain five to eight columns (or attributes) that range in data type. The data types include varchar or char (or character), integer, decimal, date, text (or string), timestamp, bit, and other types depending on the database system chosen.

Review the database schema

To understand a database schema, it’s helpful to understand the purpose of using certain data types and the relationships between fields. The answers to the following questions justify why Mia designed Francisco’s Electronics’ schema this way:

What kind of database schema is this? Why was this type of database selected?

Mia designed the database with a star schema because Francisco’s Electronics is using this database for reporting and analytics. The benefits of star schema include simpler queries, simplified business reporting logic, query performance gains, and fast aggregations.

What naming conventions are used for the tables and fields? Are there any benefits of using these naming conventions?

This schema uses a snake case naming convention. In snake case, underscores replace spaces and the first letter of each word is lowercase. Using a naming convention helps maintain consistency and improves database readability. Since snake case for tables and fields is an industry standard, Mia used it in the database.

What is the purpose of using the decimal fields in data elements?

For fields related to money, there are potential errors when calculating prices, taxes, and fees. You might have values that are technically impossible, such as a value of $0.001, when the smallest value for the United States dollar is one cent, or $0.01. To keep values consistent and avoid accumulated errors, Mia used a decimal(10,2) data type, which only keeps the last two digits after the decimal point.

Note: Other numeric values, such as exchange rate and quantities, may need extra decimal places to minimize rounding differences in calculations. Also, other data types may be better suited for other fields. To track when an order is created (created_at), you can use a timestamp data type. For other fields with various text sizes, you can use varchar.

What is the purpose of each foreign and primary key in the database?

Mia designed the Sales table with a primary key ID and included foreign keys in the other tables to reference the primary keys. The foreign keys must be the same data type as their corresponding primary keys. As you’ve learned, primary keys uniquely identify precisely one record on a table, and foreign keys establish integrity references from that primary key to records in other tables.

Sales table key id & foreign keysAssociated tableorder_id Orders tableproduct_id Products tableuser_idUsers tableshipping_address_idLocations tablebilling_address_idLocations table

Key takeaways

In this reading, you explored why a database schema was designed in a certain way. In the world of business intelligence, you’ll spend a lot of time modeling business operations with data, exploring data, and designing databases. You can apply your knowledge of this database schema’s design to build your own databases in the future. This will enable you to use and store data more efficiently in your career as a BI professional.

Question 14

When designing a data warehouse, BI professionals take into account which of the following considerations? Select all that apply.

Accepted Answer

The business needs
The model that the data warehouse will follow
The shape and volume of the data

When designing a data warehouse, BI professionals take into account the business needs, the shape and volume of data, and what model the data warehouse will follow.

Question 15

Fill in the blank: Logical data modeling involves representing different _ in a physical data model.

Accepted Answer

Tables

Logical data modeling involves representing different tables in a physical data model.

Question 16

A BI professional considers the relevant data for a project, the names and data types of table columns, formatting of data entries, and unique keys for database entries and objects. What will these activities enable them to accomplish?

Accepted Answer

Select appropriate elements for their database schema

These activities will enable the BI professional to select appropriate elements for their database schema.

Question 17

Reading

Accepted Answer

As you advance in your business intelligence career, you will encounter many different tools. One of the great things about the skills you have been learning in these courses is that they’re transferable between different solutions. No matter which tools you end up using, the overall logic and processes will be similar! This reading provides an overview of many of these business intelligence solutions.

ToolUsesAzure Analysis Service (AAS)Connect to a variety of data sourcesBuild in data security protocolsGrant access and assign roles cross-teamAutomate basic processesCloudSQLConnect to existing MySQL, PostgreSQL or SQL Server databasesAutomate basic processesIntegrate with existing apps and Google Cloud services, including BigQueryObserve database processes and make changesLooker StudioVisualize data with customizable charts and tablesConnect to a variety of data sourcesShare insights internally with stakeholders and onlineCollaborate cross-team to generate reportsUse report templates to speed up your reportingMicrosoft PowerBIConnect to multiple data sources and develop detailed modelsCreate personalized reportsUse AI to get fast answers using conversational languagesCollaborate cross-team to generate and share insights on Microsoft applicationsPentahoDevelop pipelines with a codeless interfaceConnect to live data sources for updated reportsEstablish connections to an expanded libraryAccess an integrated data science toolkitSSAS SQL ServerAccess and analyze data across multiple online databasesIntegrate with existing Microsoft services including BI and data warehousing tools and SSRS SQL ServerUse built-in reporting toolsTableauConnect and visualize data quicklyAnalyze data without technical programming languagesConnect to a variety of data sources including spreadsheets, databases, and cloud sourcesCombine multiple views of the data in intuitive dashboardsBuild in live connections with updating data sources

Question 18

Reading

Accepted Answer

In a previous reading, you were given a list of common business intelligence tools and some of their uses. Many of them have built-in pipeline functionality, but there are a few ETL-specific tools you may encounter. Creating pipeline systems—including ETL pipelines that move and transform data between different data sources to the target database—is a large part of a BI professional's job, so having an idea of what tools are out there can be really useful. This reading provides an overview.

ToolUsesApache NifiConnect a variety of data sources
Access a web-based user interface
Configure and change pipeline systems as needed
Modify data movement through the system at any timeGoogle DataFlowSynchronize or replicate data across a variety of data sources
Identify pipeline issues with smart diagnostic features
Use SQL to develop pipelines from the BigQuery UI
Schedule resources to reduce batch processing costs
Use pipeline templates to kickstart the pipeline creation process and share systems across your organizationIBM InfoSphere Information ServerIntegrate data across multiple systems
Govern and explore available data
Improve business alignment and processes
Analyze and monitor data from multiple data sourcesMicrosoft SQL SISConnect data from a variety of sources integration
Use built-in transformation tools
Access graphical tools to create solutions without coding
Generate custom packages to address specific business needsOracle Data IntegratorConnect data from a variety of sources
Track changes and monitor system performance with built-in features
Access system monitoring and drill-down capabilities
Reduce monitoring costs with access to built-in Oracle servicesPentaho Data IntegratorConnect data from a variety of sources
Create codeless pipelines with drag-and-drop interface
Access dataflow templates for easy use
Analyze data with integrated toolsTalendConnect data from a variety of sources
Design, implement, and reuse pipeline from a cloud serverAccess and search for data using integrated Talend services
Clean and prepare data with built-in tools

Question 19

What is the term for the predetermined locations where pipeline data is sent in order to be acted on?

Accepted Answer

Target tables

Target tables are the predetermined locations where pipeline data is sent in order to be acted on.

Question 20

A BI professional uses a pipeline to access source systems, then reads and collects the necessary data from within them. Which ETL stage does this scenario describe?

Accepted Answer

Extract

This describes the extract stage. During extraction, the pipeline accesses source systems, then reads and collects the necessary data from within them.

Question 21

Many BI tools are built upon similar principles and often have similar utilities. Therefore, a BI professional’s general understanding of one tool can be applied to others. What is this an example of?

A transferable skill

Applying knowledge of one tool to another is an example of a transferable skill.

Accepted Answer

A transferable skill

Applying knowledge of one tool to another is an example of a transferable skill.

Question 22

Reading

Accepted Answer

Activity_-Create-a-Google-Cloud-account Download

Question 23

Reading

Accepted Answer

Guide-to-Dataflow Download

Question 24

Reading

Accepted Answer

Optional-Activity_-Create-a-streaming-pipeline-in-Dataflow Download

Question 25

Reading

Accepted Answer

In this course, you will primarily be using BigQuery and SQL when interacting with databases in Google DataFlow. However, DataFlow does have the option for you to work with Python, which is a widely used general-purpose programming language. Python can be a great tool for business intelligence professionals, so this reading provides resources and information for adding Python to your toolbox!

Elements of Python

There are a few key elements about Python that are important to understand:

Python is open source and freely available to the public.
It is an interpreted programming language, which means it uses another program to read and execute coded instructions.
Data is stored in data frames, similar to R.
In BI, Python can be used to connect to a database system to work with files.
It is primarily object-oriented.
Formulas, functions, and multiple libraries are readily available.
A community of developers exists for online code support.
Python uses simple syntax for straightforward coding.
It integrates with cloud platforms including Google Cloud, Amazon Web Services, and Azure.

Resources

If you’re interested in learning Python, there are many resources available to help. Here are just a few:

General tips for learning programming languages

As you have been discovering, there are often transferable skills you can apply to a lot of different tools—and that includes programming languages! Here are a few tips:

Define a practice project and use the language to help you complete it. This makes the learning process more practical and engaging.
Keep in mind previous concepts and coding principles. After you have learned one language, learning another tends to be much easier.
Take good notes or make cheat sheets in whatever format (handwritten or typed) that works best for you.
Create an online filing system for information that you can easily access while you work in various programming environments.

Question 26

Reading

Accepted Answer

Previously, you started exploring Google Dataflow, a Google Cloud Platform (GCP) tool that reads data from the source, transforms it, and writes it in the destination location. In this lesson, you will begin working with another GCP data-processing tool: BigQuery. As you may recall from the Google Data Analytics Certificate, BigQuery is a data warehouse used to query and filter large datasets, aggregate results, and perform complex operations.

As a business intelligence (BI) professional, you will need to gather and organize data from stakeholders across multiple teams. BigQuery allows you to merge data from multiple sources into a target table. The target table can then be turned into a dashboard, which makes the data easier for stakeholders to understand and analyze. In this reading, you will review a scenario in which a BI professional uses BigQuery to merge data from multiple stakeholders in order to answer important business questions.

The problem

Consider a scenario in which a BI professional, Aviva, is working for a fictitious coffee shop chain. Each year, the cafes offer a variety of seasonal menu items. Company leaders are interested in identifying the most popular and profitable items on their seasonal menus so that they can make more confident decisions about pricing; strategic promotion; and retaining, expanding, or discontinuing menu items.

The solution

Data extraction

In order to obtain the information the stakeholders are interested in, Aviva begins extracting the data. The data extraction process includes locating and identifying relevant data, then preparing it to be transformed and loaded. To identify the necessary data, Aviva implements the following strategies:

Meet with key stakeholders

Aviva leads a workshop with stakeholders to identify their objectives. During this workshop, she asks stakeholders questions to learn about their needs:

What information needs to be obtained from the data (for instance, performance of different menu items at different restaurant locations)?
What specific metrics should be measured (sales metrics, marketing metrics, product performance metrics)?
What sources of data should be used (sales numbers, customer feedback, point of sales)?
Who needs access to this data (management, market analysts)?
How will key stakeholders use this data (for example, to determine which items to include on upcoming menus, make pricing decisions)?

Observe teams in action

Aviva also spends time observing the stakeholders at work and asking them questions about what they’re doing and why. This helps her connect the goals of the project with the organization’s larger initiatives. During these observations, she asks questions about why certain information and activities are important for the organization.

Organize data in BigQuery

Once Aviva has completed the data extraction process, she transforms the data she’s gathered from different stakeholders and loads it into BigQuery. Then she uses BigQuery to design a target table to organize the data. The target table helps Aviva unify the data. She then uses the target table to develop a final dashboard for stakeholders to review.

The results

When stakeholders review the dashboard, they are able to identify several key findings about the popularity and profitability of items on their seasonal menus. For example, the data indicates that many peppermint-based products on their menus have decreased in popularity over the past few years, while cinnamon-based products have increased in popularity. This finding leads stakeholders to decide to retire three of their peppermint-based drinks and bakery items. They also decide to add a selection of new cinnamon-based offerings and launch a campaign to promote these items.

Key findings

Organizing data from multiple sources in a tool like BigQuery allows BI professionals to find answers to business questions. Consolidating the data in a target table also makes it easier to develop a dashboard for stakeholders to review. When stakeholders can access and understand the data, they can make more informed decisions about how to improve services or products and take advantage of new opportunities.

Question 27

Reading

Accepted Answer

Activity_-Set-up-a-sandbox-and-query-a-public-dataset-in-BigQuery-C_-Download

Question 28

Reading

Accepted Answer

As you have been learning, target tables are predetermined locations where pipeline data is sent in order to be acted on in a database system. Essentially, a source table is where data comes from, and a target table is where it’s going. This reading provides more information about the data-extraction process and how target tables fit into the greater logic of business intelligence processes.

Data extraction

Data extraction is the process of taking data from a source system, such as a database or a SaaS, so that it can be delivered to a destination system for analysis. You might recognize this as the first step in an ETL (extract, transform, and load) pipeline. There are three primary ways that pipelines can extract data from a source in order to deliver it to a target table:

Update notification: The source system issues a notification when a record has been updated, which triggers the extraction.
Incremental extraction: The BI system checks for any data that has changed at the source and ingests these updates.
Full extraction: The BI system extracts a whole table into the target database system.

Once data is extracted, it must be loaded into target tables for use. In order to drive intelligent business decisions, users need access to data that is current, clean, and usable. This is why it is important for BI professionals to design target tables that can hold all of the information required to answer business questions.

The importance of target tables

As a BI professional, you will want to take advantage of target tables as a way to unify your data and make it accessible to users. In order to draw insights from a variety of different sources, having a place that contains all of the data from those sources is essential.

Question 29

Reading

Accepted Answer

Activity_-Create-a-target-table-in-BigQuery-1

bq-results-20231011-194538-1697053556642-bq-results-20231011-194538-1697053556642 Download

Question 30

Reading

Accepted Answer

In this activity, you used BigQuery to create a target table to store data you pulled from a dataset of street tree information from San Francisco, California. In your BI role, you’ll need to use programs such as BigQuery and Dataflow to move and analyze data with SQL. Now, you’ve practiced a key part of the Extraction stage of the BI pipeline: pulling data from a source and placing it into its own table.

The exemplar you’re about to review will help you evaluate whether you completed the activity correctly. Because this activity involves copying, pasting, and executing a complete SQL query, you will just need to check that your result matches this exemplar.

If you find that the result you received is different from the exemplar provided, double check the formatting of the query you copied. Review the explanation of the SQL query in this activity to learn more about how the SQL query works and how to write your own in your projects.

Access the exemplar

To explore the query result exemplar, download the following attachment:

In this activity, you ran the following SQL query to create a target table:

SELECT address, COUNT(address) AS number_of_trees FROM `bigquery-public-data.san_francisco_trees.street_trees` WHERE address != "null" GROUP BY address ORDER BY number_of_trees DESC LIMIT 10;

The SELECT clause selects the address of each tree. By using the COUNT function, you count the number of trees at each address and return a single row of data per address, instead of per tree. This data is saved as a new column.
The FROM clause is straightforward as it specifies the street_trees table within the San Francisco Street Trees dataset.
The WHERE clause is necessary to ensure that your target table only includes rows that have a value in the address column.
The GROUP BY clause specifies that you’re grouping data by the address, and the ORDER BY clause sorts the data in descending order by number_of_trees column.
The LIMIT clause limits the query to return only the top ten rows of data. When working with large datasets, including a limit will decrease the processing time required to return the data.

If you need a refresher on SQL code, review some resources from the Google Data Analytics Certificate: Review Google Data Analytics Certificate content about SQL and Review Google Data Analytics Certificate content about SQL best practices.

The result of this query is a target table with two columns. It features the address column, as well as the total number of trees planted at the address you calculated in the SELECT clause. If properly executed, the first value in the address column is 100x Cargo Way. Next to it, the number_of_trees is 135. If you didn’t receive this result, please review the code and run it again.

Furthermore, the target table shows the 10 addresses with the most trees planted by the Department of Public Works in the city of San Francisco:

100x Cargo Way
700 Junipero Serra Blvd
1000 San Jose Ave
1200 Sunset Blvd
1600 Sunset Blvd
2301 Sunset Blvd
1501 Sunset Blvd
2401 Sunset Blvd
100 STAIRWAY5
2601 Sunset blvd

And the number of trees for each address is as follows:

100x Cargo Way: 135
700 Junipero Serra Blvd: 125
1000 San Jose Ave: 113
1200 Sunset Blvd: 110
1600 Sunset Blvd: 102
2301 Sunset Blvd: 94
1501 Sunset Blvd: 93
2401 Sunset Blvd: 92
100 STAIRWAY5: 87
2601 Sunset Blvd: 84

Key takeaways

Target tables are the destination for data during the Extraction stage of a pipeline. You’ll use them in your role as a BI professional to store data after pulling it from their sources. Once they’re in a target table, you can transform them with BigQuery or Dataflow and load them into reporting tables. You’ll learn about the Transform and Load stages of data pipelines later in this course.

Question 31

Reading

Accepted Answer

Case-study_-Wayfair-Working-with-stakeholders-to-create-a-pipeline

Question 32

Reading

Accepted Answer

Attribute: In a dimensional model, a characteristic or quality used to describe a dimension

Columnar database: A database organized by columns instead of rows

Combined systems: Database systems that store and analyze data in the same place

Compiled programming language: A programming language that compiles coded instructions that are executed directly by the target machine

Data lake: A database system that stores large amounts of raw data in its original format until it’s needed

Data mart: A subject-oriented database that can be a subset of a larger data warehouse

Data warehouse: A specific type of database that consolidates data from multiple source systems for data consistency, accuracy, and efficient access

Database migration: Moving data from one source platform to another target database

Dimension (data modeling): A piece of information that provides more detail and context regarding a fact

Dimension table: The table where the attributes of the dimensions of a fact are stored

Design pattern: A solution that uses relevant measures and facts to create a model in support of business needs

Dimensional model: A type of relational model that has been optimized to quickly retrieve data from a data warehouse

Distributed database: A collection of data systems distributed across multiple physical locations

Fact: In a dimensional model, a measurement or metric

Fact table: A table that contains measurements or metrics related to a particular event

Foreign key: A field within a database table that is a primary key in another table (Refer to primary key)

Functional programming language: A programming language modeled around functions

Google DataFlow: A serverless data-processing service that reads data from the source, transforms it, and writes it in the destination location

Interpreted programming language: A programming language that uses an interpreter, typically another program, to read and execute coded instructions

Logical data modeling: Representing different tables in the physical data model

Object-oriented programming language: A programming language modeled around data objects

OLAP (Online Analytical Processing) system: A tool that has been optimized for analysis in addition to processing and can analyze data from multiple databases

OLTP (Online Transaction Processing) database: A type of database that has been optimized for data processing instead of analysis

Primary key: An identifier in a database that references a column or a group of columns in which each row uniquely identifies each record in the table (Refer to foreign key)

Python: A general purpose programming language

Response time: The time it takes for a database to complete a user request

Row-based database: A database that is organized by rows

Separated storage and computing systems: Databases where data is stored remotely, and relevant data is stored locally for analysis

Single-homed database: Database where all of the data is stored in the same physical location

Snowflake schema: An extension of a star schema with additional dimensions and, often, subdimensions

Star schema: A schema consisting of one fact table that references any number of dimension tables

Target table: The predetermined location where pipeline data is sent in order to be acted on

Question 33

AI assistant

Accepted Answer

The answer is design pattern.

Business intelligence professionals often apply a design pattern to create an effective data model. A design pattern is a reusable solution to a common problem in software design. In the context of data modeling, design patterns provide a way to create data models that are scalable, extensible, and maintainable.

There are a number of different data modeling design patterns, but one of the most common is the star schema. A star schema is a data model that is optimized for analytical processing. It consists of a central fact table and a number of dimension tables. The fact table contains the measures that will be analyzed, and the dimension tables contain the attributes that will be used to filter and group the data.

Another common data modeling design pattern is the snowflake schema. A snowflake schema is a variation of the star schema that is used to model more complex data relationships. In a snowflake schema, the dimension tables are normalized to reduce data redundancy.

Design patterns can be used to create data models for a variety of different purposes, including:

Data warehousing
Business intelligence
Machine learning
Data science

By applying a design pattern, business intelligence professionals can create data models that are efficient, scalable, and easy to maintain.

Question 34

AI assistant

Accepted Answer

The following statements accurately describe primary keys:

Only one primary key can exist in a table.
A primary key uniquely identifies each record in a table.
Primary keys can appear as foreign keys in other tables.

Primary keys are essential for maintaining data integrity in a database. By ensuring that each record has a unique identifier, primary keys help to prevent duplicate data and ensure that data relationships are correctly enforced.

Primary keys can be either a single column or a composite key, which is made up of multiple columns. Composite keys are often used to uniquely identify records in tables that have a one-to-many relationship with other tables.

For example, an Orders table might have a composite primary key that consists of the OrderID and CustomerID columns. This would ensure that each order is uniquely identified, even if two different customers place orders for the same product.

Primary keys can also be used to create foreign keys in other tables. A foreign key is a column in one table that references the primary key of another table. Foreign keys are used to establish relationships between tables and to ensure that data is consistent across the database.

For example, an OrderItems table might have a foreign key that references the OrderID primary key of the Orders table. This would ensure that each order item is associated with a valid order.

Primary keys are an important part of database design and help to ensure the integrity and consistency of data.

Question 35

AI assistant

Accepted Answer

Dimensions in a dimensional model represent the supporting details of the events or facts being measured. They can be anything that provides context for the facts, such as:

Date: The date and time of the event
Product: The product or service that was purchased
Customer: The customer who made the purchase
Location: The location where the purchase was made
Channel: The channel through which the purchase was made (e.g., online, in-store, etc.)

Dimensions are used to filter, group, and analyze the facts in a dimensional model. For example, you could use the Date dimension to analyze sales by day, month, or year. Or, you could use the Product dimension to analyze sales by product category or individual product.

Dimensions can also be used to create hierarchies. For example, the Date dimension could have a hierarchy that includes year, month, and day. This would allow you to drill down from a high-level overview of sales to a more detailed analysis of sales by day.

Dimensions are an essential part of dimensional modeling. They provide the context that is needed to understand and analyze the facts in a data warehouse.

Here are some examples of dimensions in a dimensional model:

E-commerce: Product, customer, location, order date, channel
Financial services: Account, customer, transaction type, transaction date, location
Telecommunications: Customer, phone number, call type, call date, location

Dimensions can be combined to create more complex analyses. For example, you could combine the Product and Customer dimensions to analyze sales by product category and customer segment. Or, you could combine the Date and Location dimensions to analyze sales trends over time and across different regions.

Dimensions are a powerful tool for data analysis. By understanding the different types of dimensions and how to use them, you can gain valuable insights into your business data.

Question 36

AI assistant

Accepted Answer

The answer is dimension table.

In a dimensional model, a foreign key is used to connect a dimension table to the appropriate fact table.

A fact table contains the quantitative data that you want to measure, such as sales, revenue, or units sold. Dimension tables contain the qualitative data that provides context for the facts, such as product, customer, date, and location.

Foreign keys are used to link the dimension tables to the fact table. This allows you to filter, group, and analyze the facts by the different dimensions.

For example, the following diagram shows a simple dimensional model for sales data:

Fact table: Sales Dimension tables: Product Customer Date Location Foreign keys: Sales.ProductKey REFERENCES Product.ProductID Sales.CustomerKey REFERENCES Customer.CustomerID Sales.DateKey REFERENCES Date.DateID Sales.LocationKey REFERENCES Location.LocationID

In this model, the Sales fact table contains the sales data, such as the product sold, the customer who purchased it, the date of the sale, and the location of the sale. The dimension tables contain the supporting details for these facts, such as the product name, customer name, date, and location name.

The foreign keys are used to link the dimension tables to the fact table. For example, the Sales.ProductKey column references the Product.ProductID column. This means that each row in the Sales table has a corresponding row in the Product table.

This allows you to filter, group, and analyze the sales data by product. For example, you could use the ProductKey foreign key to filter the Sales table to only include sales of a specific product. Or, you could use the ProductKey foreign key to group the Sales table by product category and calculate the total sales for each product category.

Foreign keys are an essential part of dimensional modeling. They allow you to link the dimension tables to the fact table and create a comprehensive data model that can be used to analyze your business data from multiple perspectives.

Question 37

AI assistant

Accepted Answer

There is one fact table in a star schema. The fact table is at the center of the star schema and is surrounded by dimension tables. The dimension tables contain the descriptive attributes of the data, such as product, customer, and date. The fact table contains the quantitative measures, such as sales, revenue, and units sold.

The star schema is a simple yet powerful data modeling technique that is widely used in data warehouses and business intelligence systems. It is easy to understand and implement, and it provides a good foundation for data analysis.

Here is an example of a star schema for sales data:

Fact table: Sales Dimension tables: Product Customer Date Location Foreign keys: Sales.ProductKey REFERENCES Product.ProductID Sales.CustomerKey REFERENCES Customer.CustomerID Sales.DateKey REFERENCES Date.DateID Sales.LocationKey REFERENCES Location.LocationID

In this example, the Sales fact table contains the sales data, such as the product sold, the customer who purchased it, the date of the sale, and the location of the sale. The dimension tables contain the supporting details for these facts, such as the product name, customer name, date, and location name.

The foreign keys are used to link the dimension tables to the fact table. This allows you to filter, group, and analyze the sales data by the different dimensions. For example, you could use the ProductKey foreign key to filter the Sales table to only include sales of a specific product. Or, you could use the ProductKey foreign key to group the Sales table by product category and calculate the total sales for each product category.

The star schema is a valuable tool for data analysis. It allows you to create a comprehensive data model that can be used to analyze your business data from multiple perspectives.

Question 38

AI assistant

Accepted Answer

The process that the business intelligence team is performing is called database migration. Database migration is the process of moving data from one source database to another target database. This can be done for a variety of reasons, such as to improve performance, security, or scalability.

Database migration can be a complex process, and it is important to carefully plan and execute the migration to avoid any data loss or corruption. There are a number of different database migration tools and services available, and the best approach to use will depend on the specific needs of the organization.

Here are some of the benefits of database migration:

Improved performance: Newer database technologies may offer better performance than older technologies. Migrating to a newer database platform can improve the speed and responsiveness of your applications.
Increased security: Newer database technologies may also offer better security features than older technologies. Migrating to a newer database platform can help to protect your data from unauthorized access and attack.
Improved scalability: Newer database technologies may be able to scale more easily to meet the needs of growing businesses. Migrating to a newer database platform can help to ensure that your database can handle increased traffic and data volumes.
Reduced costs: Migrating to a cloud-based database platform can help to reduce the costs of maintaining and operating your database infrastructure.

If you are considering migrating your database, it is important to work with a qualified database administrator to ensure that the migration is successful.

Question 39

AI assistant

Accepted Answer

Details about an instance in a row-based database are recorded and organized by column. This means that all of the data for a specific column is stored together, regardless of the row in which it appears.

For example, consider a database table with the following columns:

ColumnTypeDescriptionProductIDIntegerThe unique identifier of the productProductNameStringThe name of the productPriceDecimalThe price of the product

In a row-based database, all of the ProductIDs would be stored together, followed by all of the ProductNames, followed by all of the Prices. This makes it easy to retrieve all of the data for a specific column, such as all of the ProductNames or all of the Prices.

Row-based databases are the most common type of database, and they are used for a wide variety of applications. They are efficient for storing and retrieving large amounts of data, and they are easy to use and manage.

Here is an example of how data is organized in a row-based database:

ProductID | ProductName | Price ------- | ----------- | -------- 1 | iPhone 13 | 999 2 | MacBook Air | 1299 3 | iPad Pro | 799

Each row in the table represents an instance of a product. The data for each column is stored together, regardless of the row in which it appears. For example, all of the ProductIDs are stored together, followed by all of the ProductNames, followed by all of the Prices.

This makes it easy to retrieve all of the data for a specific column, such as all of the ProductNames or all of the Prices. For example, to retrieve all of the ProductNames, you would simply query the ProductName column.

Row-based databases are an efficient and easy-to-use way to store and retrieve large amounts of data. They are the most common type of database, and they are used for a wide variety of applications.

Question 40

AI assistant

Accepted Answer

The type of database system that the business intelligence team is working with is a distributed database system. A distributed database system is a type of database system that stores data across multiple physical servers. This can be done to improve performance, scalability, and reliability.

In a distributed database system, the data is partitioned across the different servers. The partitioning can be done based on a variety of factors, such as the type of data, the frequency of access, or the geographic location of the users.

Distributed database systems are often used by large organizations that have large volumes of data. They can also be used by organizations that need to provide access to data to users in different geographic locations.

Here are some of the benefits of using a distributed database system:

Improved performance: Distributed database systems can improve performance by distributing the workload across multiple servers. This can be especially beneficial for applications that need to process large amounts of data.
Increased scalability: Distributed database systems can be scaled easily to meet the needs of growing businesses. Simply adding more servers to the cluster can increase the capacity of the database system.
Improved reliability: Distributed database systems are more reliable than traditional centralized database systems. If one server fails, the other servers can continue to operate, ensuring that the database is still available.

However, distributed database systems can also be more complex and expensive to manage than traditional centralized database systems.

Here is an example of how a distributed database system might be used:

A large retail company has a distributed database system that stores data about its products, customers, and sales. The data is partitioned across multiple servers based on the geographic location of the customers. This allows the company to provide fast and reliable access to data to its customers, regardless of where they are located.

Distributed database systems are a powerful tool that can be used to improve the performance, scalability, and reliability of database systems. However, it is important to carefully consider the needs of your organization before implementing a distributed database system.

Question 41

AI assistant

Accepted Answer

The answer is data. A database schema must describe data because this is necessary when users want to understand how the data is shaped and the relationships within the database.

A database schema is a blueprint of a database. It describes the logical and physical structure of the database, including the tables, columns, data types, and relationships between tables.

The database schema is important for a number of reasons. First, it helps to ensure the integrity of the data. By defining the data types and relationships between tables, the database schema helps to prevent invalid data from being entered into the database.

Second, the database schema makes it easier for users to understand and use the database. By providing a detailed description of the data, the database schema helps users to understand how the data is organized and how to access it.

Finally, the database schema is essential for database administration tasks. For example, the database schema is used to create and maintain database objects, such as tables, indexes, and views.

Here are some of the benefits of having a well-defined database schema:

Improved data quality: The database schema helps to ensure the integrity of the data by defining the data types and relationships between tables. This helps to prevent invalid data from being entered into the database.
Increased data accessibility: The database schema makes it easier for users to understand and use the database by providing a detailed description of the data. This helps users to understand how the data is organized and how to access it.
Reduced database administration costs: The database schema is essential for database administration tasks, such as creating and maintaining database objects. This can help to reduce the time and cost of database administration.

Overall, a well-defined database schema is essential for ensuring the integrity, accessibility, and manageability of data.

Question 42

AI assistant

Accepted Answer

Typical target destinations for the data loading stage of ETL include:

Data lake
Data warehouse
Website application
Analytics platform

Data lakes are a type of data storage that can store large amounts of unstructured and semi-structured data. Data warehouses are a type of data storage that is optimized for analytical queries. Website applications can be used to display data to users. Analytics platforms can be used to analyze data and generate insights.

Here is a brief description of each target destination:

Data lake: A data lake is a centralized repository that stores all of an organization's data in its native format. This includes structured, semi-structured, and unstructured data. Data lakes are typically used for big data analytics and machine learning.
Data warehouse: A data warehouse is a system that stores data in a structured format that is optimized for analytical queries. Data warehouses are typically used for business intelligence and reporting purposes.
Website application: A website application is a web-based application that can be used to display data to users. Website applications are often used to create dashboards and reports that can be accessed by users from any location.
Analytics platform: An analytics platform is a software application that can be used to analyze data and generate insights. Analytics platforms typically provide a variety of features, such as data visualization, data mining, and machine learning.

The specific target destination that is chosen for the ETL loading stage will depend on the specific needs of the organization. For example, if the organization needs to store large amounts of unstructured data, then a data lake may be the best choice. If the organization needs to perform complex analytical queries, then a data warehouse may be the best choice. If the organization needs to display data to users, then a website application may be the best choice. And if the organization needs to analyze data and generate insights, then an analytics platform may be the best choice.

It is also possible to load data to multiple target destinations. For example, an organization may load data to a data lake for long-term storage and to a data warehouse for analytical queries.

Question 43

AI assistant

Accepted Answer

The answer is object-oriented.

Python is an object-oriented programming language, which means that it's modeled around chunks of code that capture certain information, or objects. Objects can represent anything from a person to a product to a mathematical function.

Object-oriented programming has a number of benefits, including:

Modularity: Objects can be reused and combined in different ways to create new programs. This makes it easier to develop and maintain complex programs.
Data encapsulation: Objects can hide their internal data from other objects, which helps to protect the data and prevent errors.
Polymorphism: Objects of different types can respond to the same message in different ways. This makes it possible to write code that is more flexible and reusable.

Python is a popular choice for object-oriented programming because it is easy to learn and use. It also has a number of features that make it well-suited for object-oriented development, such as its support for multiple inheritance and its dynamic typing system.

Here is an example of a simple object-oriented program in Python:

Python

class Person: def __init__(self, name, age): self.name = name self.age = age def greet(self): print(f"Hello, my name is {self.name} and I am {self.age} years old.") # Create a new Person object bob = Person("Bob", 40) # Call the greet() method on the bob object bob.greet()

This program defines a class called Person with two attributes, name and age. It also defines a method called greet(), which prints a greeting to the console.

To create a new Person object, we use the Person() constructor. We can then call the greet() method on the object to print a greeting to the console.

Object-oriented programming is a powerful programming paradigm that can be used to develop complex and efficient programs. Python is a popular choice for object-oriented programming because it is easy to learn and use, and it has a number of features that make it well-suited for object-oriented development.

Question 44

Which of the following statements correctly describe Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) tools? Select all that apply.

Accepted Answer

Online Transaction Processing (OLTP) tools are optimized for data processing instead of analysis.
Online Analytical Processing (OLAP) tools are optimized for analysis in addition to processing and can analyze data from multiple databases.

Question 45

Fill in the blank: In order to create an effective data model, business intelligence professionals will often apply a _, which uses relevant measures and facts to create a model that supports business needs.

Accepted Answer

design pattern

Question 46

Which of the following statements accurately describe primary keys? Select all that apply.

Accepted Answer

Only one primary key can exist in a table.
A primary key uniquely identifies each record in a table.
Primary keys can appear as foreign keys in other tables.

Question 47

In a dimensional model, what might dimensions represent? Select all that apply.

Accepted Answer

Attributes, Supporting details

Question 48

Fill in the blank: In a dimensional model, a foreign key is used to connect a _ table to the appropriate fact table.

Accepted Answer

dimension

Question 49

How many fact tables exist in a star schema?

Accepted Answer

One

Question 50

A business intelligence team wants to improve the state of their database schemas. While working toward this goal, they move data from one source platform to another target database. What process does this situation describe?

Accepted Answer

Database migration

Question 51

In row-based databases, each row in a table is an instance or an entry in the database. How are details about that instance recorded and organized?

Accepted Answer

By column

Question 52

A business intelligence team is working with a database system in which relevant data is stored locally and less relevant data is stored remotely. What type of database system are they using?

Accepted Answer

Separated storage and computing system

Question 53

Fill in the blank: A database schema must describe _ because this is necessary when users want to understand how the data is shaped and the relationships within the database.

Accepted Answer

relevant data

Question 54

In the ETL loading stage, what are typical target destinations to which the data might be delivered? Select all that apply.

Accepted Answer

Data warehouse, Analytics platform

Question 55

Fill in the blank: Python is a programming language that is _, which means it’s modeled around chunks of code that capture certain information.

Accepted Answer

object-oriented

Sales table key id & foreign keys	Associated table
order_id	Orders table
product_id	Products table
user_id	Users table
shipping_address_id	Locations table
billing_address_id	Locations table

Week 1: Data models and pipelines

Get started with data modeling, schemas, and databases

Video: Introduction to Couse 2

Video: Ed: Overcome imposter syndrome

Reading: Course 2 overview

Video: Welcome to module 1

Video: Data modeling, design patterns, and schemas

Video: Get the facts with dimensional models

Video: Dimensional models with star and snowflake schemas

Reading: Design efficient database systems with schemas

Video: Different data types, different databases

Reading: Database comparison checklist

Practice Quiz: Test your knowledge: Data modeling, schemas, and databases

Choose the right database

Video: The shape of the data

Video: Design useful database schemas

Reading: Four key elements of database schemas

Reading: Review a database schema

Practice Quiz: Test your knowledge: Choose the right database

How data moves

Video: Data pipelines and the ETL process

Video: Maximize data through the ETL process

Video: Choose the right tool for the job

Reading: Business intelligence tools and their applications

Reading: ETL-specific tools and their applications

Practice Quiz: Test your knowledge: How data moves

Data-processing with Dataflow

Video: Introduction to Dataflow

Practice Quiz: [Optional] Activity: Create a Google Cloud account

Reading: Guide to Dataflow

Practice Quiz: [Optional] Activity: Create a streaming pipeline in Dataflow

Video: Coding with Python

Reading: Python applications and resources

Organize data in BigQuery

Video: Gather information from stakeholders

Reading: Merge data from multiple sources with BigQuery

Practice Quiz: Activity: Set up a sandbox and query a public dataset in BigQuery

Reading: Unify data with target tables

Practice Quiz: Activity: Create a target table in BigQuery

Reading: Activity Exemplar: Create a target table in BigQuery

Reading: Case study: Wayfair – Working with stakeholders to create a pipeline

Review: Data models and pipelines

Video: Wrap-up

Reading: Glossary terms from module 1

Quiz: Module 1 challenge

Share this:

Like this: