Skip to content

You’ll start this course by exploring data modeling, common schemas, and database elements. You’ll consider how business needs determine the kinds of database systems that BI professionals implement. Then, you’ll discover pipelines and ETL processes, which are tools that move data and ensure that it’s accessible and useful.

Learning Objectives

  • Identify and define key database models and schemas.
  • Assess which database design pattern and schema is appropriate for different data.
  • Discuss data model alternatives that would be optimal, performant, and adherent to the reporting requirements looking into current data size and growth.
  • Define ETL and explain what it means.
  • Identify key information from stakeholders necessary to create a data pipeline.
  • Describe different types of pipelines.
  • Describe the key stages of a data pipeline.
  • Understand what a data pipeline is, its objectives, and how it works.
Table Of Contents
  1. Get started with data modeling, schemas, and databases
  2. Choose the right database
  3. How data moves
  4. Data-processing with Dataflow
  5. Organize data in BigQuery
  6. Review: Data models and pipelines

Get started with data modeling, schemas, and databases


Video: Introduction to Couse 2

This course will teach you how to build tools to provide stakeholders with ongoing insights by automating the process of pulling data from different sources, monitoring it, and providing data-driven insights. You will learn about design patterns and database schemas, data pipelines and ETL processes, database optimization, quality testing, and applying your skills to a realistic business scenario. By the end of this course, you will be able to use data modeling and pipelines to make other people’s jobs easier by automating, simplifying, and enhancing their processes.

As a BI professional, you aren’t just answering
your team’s questions, you’re empowering them with the data to answer
their own questions. By pinpointing the
answers they require, you can build tools that
enable them to access and use the data they
need when they need it. Hey, there. Welcome
to this course. If you’ve already completed
the previous one, you might remember me, but if you’re just
joining us, I’m Ed. I’m a product manager
here at Google. I’m really excited
to help you get started with data
models and extract, transform, and load,
or ETL pipelines. As you’ve been learning, BI professionals
are responsible for analyzing data to generate meaningful insights
and solve problems, answer questions, find patterns, and inform business decisions. A large part of this
is building tools to provide stakeholders
with ongoing insights. This course is going to focus
on those tools and how to automate them in order to pull data from
different sources, monitor it, and provide
data-driven insights. First, you’ll learn about design patterns and
database schemas, including common structures
that BI professionals use. You’ll also be introduced to data pipelines and
ETL processes. You’ve learned that
ETL stands for extract, transform, and load. This refers to the process of gathering data from
source systems, converting it into
a useful format, and bringing it into
a data warehouse or other unified
destination system. This will be an
important part of your job as a BI professional. You’ll also develop strategies for gathering information
from stakeholders in order to help you develop more useful tools and
processes for your team. After that, you’ll focus on database
optimization to reduce response time or
the time it takes for a database to
complete a user request. This will include exploring
different types of databases and the five factors
of database performance, workload, throughput, resources, optimization,
and contention. Finally, you’ll learn
about the importance of quality testing
your ETL processes, validating your database schema, and verifying business rules. Once you’ve finished
this course, you’ll apply your skills to a realistic business scenario, which is a great
way to demonstrate your BI knowledge to
potential employers. As you’ve been learning,
a large part of BI is making other people’s jobs
easier by automating, simplifying, and enhancing
their processes. For example, in one of
my projects I helped the central finance team aggregate years’ worth
of global sales. This allowed my team to identify the underlying
drivers that affected trends in prices and
quantities sold. They were then able
to clearly report these findings to
key stakeholders. I love solving
problems and making my teams lives a
little easier which is one of the reasons why I’m
so excited to teach you more about data modeling and
pipelines in this course. All right, let’s get started.

Video: Ed: Overcome imposter syndrome

Imposter syndrome is a belief that you are not where you need to be, in terms of your skill, your perspectives, background, or your experience. It is common among product managers, who often work with people who are very skilled in their areas of expertise.

One way to overcome imposter syndrome is to focus on your unique perspective. Everyone has a unique combination of skills, experience, and interests that they can bring to the table.

Another way to overcome imposter syndrome is to be vulnerable and transparent about how you are feeling. Talk to people you trust about your challenges and ask for help when you need it.

It is also important to focus on your strengths and give yourself credit for the things you do well. You are more likely to be successful by leaning into your strengths than by trying to hide your weaknesses.

Hi. I’m Ed. I’m a Product Manager at Google. As a product manager, I define the vision
for a product and make sure that it
aligns with what users actually need that
product to do for them. For me, imposter
syndrome is a belief that you are not
where you need to be, in terms of your skill, your perspectives, background,
or your experience. I’ve definitely experienced
imposter syndrome. I work with a lot of people who are very skilled in
their areas of expertise. Not everyone can have every single skill across the board. You end up thinking, oh, maybe I should be
able to program like her, or maybe I should be as good
a data scientist as him, and maybe I should have
this level of perspective as it seems like everyone
around me does as well. That’s not necessarily true. I really think it’s
important to focus on the unique perspective
that you provide, because everyone’s perspective and expertise and interest, and the way in which they’re
going to apply all of those, they’re all going to differ. There are unique
combinations of things that you provide that other
people cannot provide, simply by virtue of the
fact that you are you. I found that the most useful
technique in overcoming Imposter syndrome
is being vulnerable and transparent that
you feel that way. Find people that you trust, find people that
you can speak with and tell them how
you’re feeling. Tell them why you’re
feeling that way. Feeling a certain way
doesn’t have to be an indication of who you are
or what you’re capable of. It simply is. Being able to be
vulnerable and say, hey, I don’t understand this, or I would like a little
bit extra information, that can be helpful, not only for you, but also
for people around you. We tend to focus
on the negatives. We tend to focus on
the challenges or the constructive aspects that we might see that we
need to improve on. While not giving
ourselves enough credit for the things that we do well, the things that are strengths that we should lean more into. You’re going to be
more successful by understanding and
really leaning into your strengths
than simply trying to hide or runaway
from your failures.

Reading: Course 2 overview

Video: Welcome to module 1

This course will teach you about data modeling, schemas, and pipeline processes, which are essential tools for BI professionals. You will learn about the foundations of data modeling, common schemas, and key database elements. You will also learn how business needs determine the kinds of database systems that BI professionals implement. Finally, you will learn about pipelines and ETL processes, which are the tools that move data throughout the system and make sure it’s accessible and useful. By the end of this course, you will have added many more important tools to your BI toolbox.

Welcome to the first section of this
course. You’re going to learn about how BI professionals use data models to
help them build database systems, how schemas help professionals understand
and organize those systems and how pipeline processes move data from
one part of the system to another. We’ll start by exploring data modeling
foundations as well as common schemas and key database elements. We’ll also consider
how business needs determine the kinds of database systems that
a BI professional might implement. We’ll then shift to pipelines and ETL
processes, which are the tools that move data throughout the system and
make sure it’s accessible and useful. By the time you’re done, you’ll have added
many more important tools to your BI toolbox. Let’s get started.

Video: Data modeling, design patterns, and schemas

This video introduces data modeling, design patterns, and schemas.

  • Data modeling is a way of organizing data elements and how they relate to one another.
  • Design patterns are solutions that use relevant measures and facts to create a model to support business needs.
  • Schemas are a way of describing how data is organized.

BI professionals use data modeling to create destination database models. These models organize the systems, tools, and storage accordingly, including designing how the data is organized and stored.

Design patterns and schemas are used in BI to create systems that are consistent and efficient. BI professionals use these tools to create data models that meet the specific needs of their businesses.

Data Modeling, Design Patterns, and Schemas

Data modeling is the process of organizing data elements and how they relate to one another. It is a way to create a conceptual model of data that can be used to design and implement databases.

Design patterns are reusable solutions to common data modeling problems. They provide a template for creating data models that are efficient and effective.

Schemas are a way of describing how data is organized in a database. They define the structure of the database, including the tables, columns, and relationships between them.

Data Modeling

Data modeling can be used to model data for a variety of purposes, including:

  • Designing databases
  • Developing data warehouses
  • Creating data marts
  • Designing data integration solutions
  • Documenting data requirements

Data modeling is a complex process, but it is essential for creating efficient and effective databases.

Design Patterns

There are many different design patterns that can be used for data modeling. Some of the most common design patterns include:

  • Entity-relationship (ER) diagrams: ER diagrams are used to model the entities and relationships between entities in a database.
  • Dimensional modeling: Dimensional modeling is used to model data for analytical purposes. It is often used to create data warehouses and data marts.
  • Normalized models: Normalized models are designed to minimize data redundancy and improve data integrity.
  • Star schemas: Star schemas are a type of dimensional model that is often used for data warehousing.
  • Snowflake schemas: Snowflake schemas are a more complex type of dimensional model that is often used for data warehousing.

Schemas

Schemas are used to describe the structure of a database. They define the tables, columns, and relationships between them. Schemas can be created using a variety of different tools, such as SQL or graphical database design tools.

Schemas are important for a number of reasons. They help to ensure that data is organized in a consistent way. They also make it easier to query and analyze data.

Using Design Patterns and Schemas in BI

Design patterns and schemas are used in BI to create systems that are consistent and efficient. BI professionals use these tools to create data models that meet the specific needs of their businesses.

For example, a BI professional might use a dimensional modeling design pattern to create a data warehouse for a retail company. The data warehouse would be used to store data about products, customers, and sales. The BI professional would also create a schema for the data warehouse, which would define the tables, columns, and relationships between them.

The BI professional could then use the data warehouse and schema to query and analyze data to answer questions such as:

  • What are the most popular products?
  • Which customers are spending the most money?
  • What are the trends in sales?

Conclusion

Data modeling, design patterns, and schemas are essential tools for BI professionals. By understanding these concepts, BI professionals can create efficient and effective data models that meet the specific needs of their businesses.

Fill in the blank: In order to create an effective model, a design pattern uses _____ that are important to the business. Select all that apply.

measures

In order to create an effective model, a design pattern uses measures and facts that are important to the business.

facts

In order to create an effective model, a design pattern uses measures and facts that are important to the business.

In this video, we’re going
to explore data modeling, design patterns, and schemas. If you’ve been working
with databases or if you’re coming from the Google
Data Analytics certificate, you may be familiar with data modeling as a way to
think about organizing data. Maybe you’re even already using schemas to understand how
databases are designed. As you’ve learned, a database is a collection of data stored
in a computer system. In order to make
databases useful, the data has to be organized. This includes both
source systems from which data is ingested and moved and the destination database where it
will be acted upon. These source systems
could include data lakes, which are database systems
that store large amounts of raw data in its original
format until it’s needed. Another type of source system is an Online Transaction
Processing or OLTP database. An OLTP database is
one that has been optimized for data processing
instead of analysis. One type of destination
system is a data mart, which is a subject
oriented database that can be a subset of a
larger data warehouse. Another possibility is using an Online Analytical
Processing or OLAP database. This is a tool that has been optimized for analysis
in addition to processing and can analyze
data from multiple databases. You will learn more about
these things later. But for now, just understand
that a big part of a BI professional’s responsibility is to create the
destination database model. Then it will organize
the systems, tools and storage accordingly, including designing how the
data is organized and stored. These systems all play a part in the tools you’ll be
building later on. They’re important foundations
for key BI processes. When it comes to organization, you likely know that
there are two types of data: unstructured
and structured. Unstructured data
is not organized in any easily
identifiable manner. Structure data has been
organized in a certain format, such as rows and columns. If you’d like to revisit
different data types, take a moment to review this information from the
Data Analytics certificate. Now, it can be tricky to
understand structure. This is where data
modeling comes in. As you learned previously, a data model is a
tool for organizing data elements and how they
relate to one another. These are conceptual
models that help keep data consistent
across the system. This means that
give us an idea of how the data is
organized in theory. Think back to Furnese’s perfect train of
business intelligence. A data model is like a
map of that train system. It helps you navigate
the database by giving you directions
through the system. Data modeling is a process
of creating these tools. In order to create
the data model, BI professionals will often use what is referred to
as a design pattern. Design pattern is a solution
that uses relevant measures and facts to create a model
to support business needs. Think of it like a re-usable
problem-solving template, which may be applied to
many different scenarios. You may be more familiar
with the output of the design pattern,
a database schema. As a refresher, a
schema is a way of describing how something
such as data is organized. You may have encountered schemas before while working
with databases. For example, some
common schemas you might be familiar with
include relational models, star schemas, snowflake
schemas and noSQL schemas. These different
schemas enabled us to describe the model being
used to organize the data. If the design pattern
is the template for the data model then the schema is the
summary of that model. Because BI professionals play such an important role in
creating these systems, understanding data modeling is an essential part of the job. Coming up, you’re going to learn more about how design
patterns and schemas are used in BI and get
a chance to practice data modeling
yourself. Bye for now.

Video: Get the facts with dimensional models

This video introduces dimensional modeling, a type of relational modeling technique that is used in business intelligence. Dimensional models are optimized for quickly retrieving data from a data warehouse.

Dimensional models are made up of two types of tables: fact tables and dimension tables. Fact tables contain measurements or metrics related to a particular event. Dimension tables contain attributes of the dimensions of a fact. These tables are joined together using foreign keys to give meaning and context to the facts.

Dimensional modeling is a powerful tool for BI professionals because it allows them to quickly and easily analyze data. By understanding how dimensional modeling works, BI professionals can design database schemas that are efficient and effective.

In the next video, we will look at different types of schemas that can be used with dimensional modeling.

Get the facts with dimensional models in business intelligence

Dimensional modeling is a data modeling technique that is used in business intelligence to create data warehouses and data marts. Dimensional models are optimized for quickly retrieving data for analysis.

Dimensional models are made up of two types of tables: fact tables and dimension tables.

  • Fact tables: Fact tables contain measurements or metrics related to a particular event. For example, a fact table for sales might contain columns for date, product, customer, and quantity sold.
  • Dimension tables: Dimension tables contain attributes of the dimensions of a fact. For example, a dimension table for customers might contain columns for customer name, address, and phone number.

Fact tables and dimension tables are joined together using foreign keys to give meaning and context to the facts.

Dimensional modeling is a powerful tool for BI professionals because it allows them to quickly and easily answer questions about their data. For example, a BI professional could use a dimensional model to answer questions such as:

  • What are the top-selling products by month?
  • Which customers are spending the most money?
  • What are the trends in sales over time?

Dimensional models are used by businesses of all sizes to make better decisions about their operations. For example, a retail company might use a dimensional model to identify which products are selling well and which customers are most profitable. A manufacturing company might use a dimensional model to track production costs and identify areas for improvement.

Here are some of the benefits of using dimensional models in business intelligence:

  • Faster data retrieval: Dimensional models are optimized for quickly retrieving data for analysis. This is because fact tables are typically denormalized, which means that they contain redundant data. This redundancy makes it possible to join fact tables to dimension tables very quickly.
  • Easier data analysis: Dimensional models are designed to make it easy to analyze data. The fact table contains the measurements or metrics that you want to analyze, and the dimension tables provide the context for those measurements. This makes it easy to see how different factors are related to each other.
  • Flexibility: Dimensional models are flexible enough to be used to analyze a wide variety of data. This makes them a good choice for businesses that need to be able to analyze different types of data over time.

If you are a BI professional, it is important to understand how dimensional models work. By understanding dimensional modeling, you can design database schemas that are efficient and effective for data analysis.

if you’ve been working with database SQL you’re probably already familiar with
relational databases. In this video, you’re going to return to the concept
of relational databases and learn about a specific kind of relational
modeling technique that is used in business intelligence:
dimensional modeling. As a refresher, a relational database contains
a series of tables that can be connected to form relationships. These relationships are established
using primary and foreign keys. Check out this car dealership
database. Branch ID is the primary key in the car
dealerships table, but it is the foreign key in
the product details table. This connects these two tables directly. VIN is the primary key in
the product details table and the foreign key in the repair parts table. Notice how these connections actually
create relationships between all of these tables. Even the car dealerships and repair parts tables are connected
by the product details table. If you took the Google Data
Analytics Certificate, you learn that a primary key is an
identifier in the database that references a column in which each
value is unique. For BI, we’re going to expand this idea. A primary key is an identifier in
a database that references a column or a group of columns in which each row
uniquely identifies each record in the table. In this database we
have primary keys in each table. Branch ID, VIN, and part ID. A foreign key is a field within
a database table that’s a primary key in another table. The primary keys from each table also
appear as foreign keys in other tables. Which builds those connections. Basically, a primary key can be used to impose
constraints on the database that ensure data in a specific column is unique
by specifically identifying a record in a relational database table. Only
one primary key can exist in a table, but a table may have many foreign keys. Okay now let’s move on
to dimensional models. A dimensional model is a type of
relational model that has been optimized to quickly retrieve data
from a data warehouse. Dimensional models can be broken
down into facts for measurement and dimensions that add attributes for
context. In a dimensional model, a fact is a measurement or metric. For example a monthly sales number could
be a fact and a dimension is a piece of information that provides more detail and
context regarding that fact. It’s the who, what,
where, when, why and how. So if our monthly sales number is
the fact then the dimensions could be information about each sale, including
the customer, the store location and what products were sold. Next,
let’s consider attributes. If you earned your Google
Data Analytics certificate, you learned about attributes in tables. An attribute is a characteristic or
quality of data used to label the table
columns. In dimensional models, attributes work kind of the same way. An
attribute is a characteristic or quality that can be used to describe a dimension. So a dimension provides
information about a fact and an attribute provides
information about a dimension. Think about a passport. One dimension on your passport
is your hair and eye color. If you have brown hair and eyes, brown is the attribute
that describes that dimension. Let’s use another simple example to
clarify this; in our car dealership example if we explore the customer
dimension we might have attributes such as name, address and
phone number listed for each customer. Now that we’ve established the facts,
dimensions, and attributes, It’s time for
the dimensional model to use these things to create two types of tables:
fact tables and dimension tables. A fact table contains measurements or
metrics related to a particular event. This is the primary table
that contains the facts and their relationship with the dimensions. Basically each row in the fact
table represents one event. The entire table could aggregate
several events such as sales in a day. A dimension table is where attributes
of the dimensions of a fact are stored. These tables are joined the appropriate
fact table using the foreign key. This gives meaning and
context to the facts. That’s how tables are connected
in the dimensional model. Understanding how dimensional modeling
builds connections will help you understand database
design as a BI professional. This will also clarify database schemas
which are the output of design patterns. Coming up, We’re going to check out different kinds
of schemas that result from this type of modeling. To understand how these
concepts work in practice

Video: Dimensional models with star and snowflake schemas

A schema is a way of describing how data is organized in a database. It is the logical definition of the data elements, physical characteristics, and inter-relationships that exist within the model.

There are several common schemas that are used in business intelligence, including star, snowflake, and denormalized schemas.

  • A star schema consists of one fact table that references any number of dimension tables. It is shaped like a star, with the fact table at the center and the dimension tables connected to it. Star schemas are designed for high-scale information delivery and make output more efficient because of the limited number of tables and clear direct relationships.
  • A snowflake schema is an extension of a star schema with additional dimensions and, often, subdimensions. These dimensions and subdimensions break down the schema into even more specific tables, creating a snowflake pattern. Snowflake schemas can be more complicated than star schemas, but they can be useful for more complex analytical tasks.
  • A denormalized schema is a schema that sacrifices some data integrity for performance. It is often used for data warehousing applications where speed is more important than accuracy.

By understanding the different types of schemas, BI professionals can choose the best schema for their specific needs.

Dimensional models with star and snowflake schemas in Business Intelligence

Dimensional models are a type of data model that is commonly used in business intelligence (BI). They are designed to make it easy to analyze data by organizing it into facts and dimensions. Facts are the measures of interest, such as sales revenue or customer satisfaction. Dimensions are the attributes that describe the facts, such as product category, customer location, or date.

Star and snowflake schemas are two common types of dimensional models.

Star schema

A star schema consists of one fact table that is connected to any number of dimension tables. The fact table contains the measures of interest, and the dimension tables contain the attributes that describe the measures.

The star schema is a simple and efficient design that is well-suited for high-volume analytical queries. It is also easy to understand and maintain.

Snowflake schema

A snowflake schema is an extension of the star schema. It adds additional dimensions and, often, subdimensions. Subdimensions are child tables that break down a dimension into more specific categories.

The snowflake schema can be more complex than the star schema, but it can be useful for more complex analytical tasks. For example, a snowflake schema could be used to analyze sales data by product category, customer location, and date.

Which schema to use?

The best schema for a particular BI project will depend on the specific needs of the business. If the project requires high-volume analytical queries, then a star schema is a good choice. If the project requires more complex analytical tasks, then a snowflake schema may be a better choice.

Here is an example of a star schema for a sales data warehouse:

  • Fact table: Sales
    • Measures: Sales revenue, quantity sold
    • Dimensions: Product, customer, date

Here is an example of a snowflake schema for a sales data warehouse:

  • Fact table: Sales
    • Measures: Sales revenue, quantity sold
    • Dimensions: Product
      • Subdimensions: Product category, product subcategory
    • Dimensions: Customer
      • Subdimensions: Customer location, customer segment
    • Dimensions: Date

Dimensional models with star and snowflake schemas are powerful tools for BI professionals. They can be used to analyze a wide variety of data and to generate insights that can help businesses make better decisions.

What type of schema consists of one fact table that references any number of dimension tables?

Star

A star schema consists of one fact table that references any number of dimension tables.

In a previous video, we explored how BI
professionals used dimensional models. They make it possible to organize data
using connected facts, dimensions and attributes, to create a design pattern. A schema is the final
output of that pattern. As you’ve learned, a schema is a way of
describing how something, such as data, is organized. In a database, it’s the logical definition of the data
elements, physical characteristics, and inter-relationships that
exist within the model. Think of the schema like a blueprint,
it doesn’t hold data itself, but describes the shape of the data and how it
might relate to other tables or models. Any entry in the database is
an instance of that schema and will contain all of the properties
described in the schema. There are several common schemas that you
may encounter in business intelligence, including star, snowflake and
denormalized, or NoSQL schemas. Star and snowflake schemas are some of
the most common iterations of an actual dimensional model in practice. A star schema is a schema consisting
of one fact table that references any number of dimension tables. As its name suggests,
this schema is shaped like a star. Notice how each of the dimension tables is
connected to the fact table at the center. Star schemas are designed to monitor
data instead of analyzing it. In this way, they enable
analysts to rapidly process data. Therefore they’re ideal for
high scale information delivery, and they make output more efficient because
of the limited number of tables and clear direct relationships. Next we have snowflake schemas, which tend
to be more complicated than star schemas, but the principle is the same. A snowflake schema is an extension of
a star schema with additional dimensions and, often, subdimensions. These dimensions and subdimensions break
down the schema into even more specific tables, creating a snowflake pattern. Like snowflakes in nature,
a snowflake schema and the relationships within
it can be complex. Here’s an example, notice how the fact
table is still at the center, but now there are subdimension tables
connected to the dimension tables, which gives us a more complicated web. Now you have a basic idea of the common
schemas you might encounter in BI. Understanding schemas can help you
recognize the different ways databases are constructed and how BI professionals
influence database functionality. Later on, you’re going to have more
opportunities to explore these different schemas and even construct some yourself.

Reading: Design efficient database systems with schemas

Video: Different data types, different databases

This video discusses several types of databases, including OLTP, OLAP, row-based, columnar, distributed, single-homed, separated storage and compute, and combined databases.

  • OLTP (online transaction processing) databases are optimized for data processing instead of analysis. They are used for transactional applications, such as order processing and customer relationship management (CRM).
  • OLAP (online analytical processing) databases are optimized for analysis. They are used for data warehousing and business intelligence applications.
  • Row-based databases store data in rows and columns. They are good for transactional applications, but not as good for analytical queries.
  • Columnar databases store data in columns. They are good for analytical queries, but not as good for transactional applications.
  • Distributed databases store data across multiple physical locations. They are good for large data sets and scalability.
  • Single-homed databases store all data in the same physical location. They are less common for large data sets.
  • Combined systems store and analyze data in the same place. They are a traditional setup, but can become unwieldy as more data is added.
  • Separated storage and computing systems store less relevant data remotely and relevant data locally for analysis. They are more efficient for analytical queries and can scale storage and computations independently.

BI professionals need to understand the different types of databases and how they are used in order to design effective data models and queries.

In this video, the speaker also discusses the importance of database migrations. Database migrations involve moving data from one source platform to another target database. This can be a complex process, but it is often necessary as businesses grow and technology changes.

BI professionals often play a key role in facilitating database migrations. They need to understand the source and target databases, as well as the data that needs to be migrated. They also need to develop a plan for the migration and ensure that the data is properly transformed and loaded into the target database.

Different data types, different databases in Business Intelligence

The type of data that you are storing and analyzing will determine the type of database that is best suited for your needs. For example, if you are storing and analyzing structured data, such as customer names and order details, then you will need a relational database. If you are storing and analyzing unstructured data, such as text and images, then you will need a NoSQL database.

Here is a brief overview of the different types of data types and the types of databases that are best suited for each type of data:

  • Structured data: Structured data is data that is organized in a predefined format. It is typically stored in rows and columns in a relational database. Examples of structured data include customer names, order details, and product inventory.
  • Unstructured data: Unstructured data is data that does not have a predefined format. It is typically stored in documents, images, and videos. Examples of unstructured data include customer reviews, social media posts, and medical images.
  • Relational databases: Relational databases are designed to store and manage structured data. They use tables to store data, and relationships to connect the tables together. Relational databases are the most common type of database used in business intelligence.
  • NoSQL databases: NoSQL databases are designed to store and manage unstructured data. They do not use tables and relationships, but instead use a variety of other data structures, such as key-value pairs, documents, and graphs. NoSQL databases are becoming increasingly popular for business intelligence applications that involve large amounts of unstructured data.

Here are some examples of how different data types are used in business intelligence:

  • Using a relational database to store customer data: A relational database could be used to store customer name, address, phone number, and order history. This data could then be analyzed to identify trends in customer behavior, such as which products are most popular or which customers are most likely to churn.
  • Using a NoSQL database to store social media data: A NoSQL database could be used to store social media posts from customers. This data could then be analyzed to identify customer sentiment, brand awareness, and customer churn.
  • Using a relational database to store product data: A relational database could be used to store product name, description, price, and inventory levels. This data could then be analyzed to identify best-selling products, low-stock products, and products that are likely to be discontinued.

When choosing a database for your business intelligence application, it is important to consider the type of data that you will be storing and analyzing. If you are storing and analyzing structured data, then a relational database is the best choice. If you are storing and analyzing unstructured data, then a NoSQL database is the best choice.

It is also important to consider the size and complexity of your data set. If you have a large and complex data set, then you will need a database that can scale to meet your needs. Relational databases can scale well, but NoSQL databases are even better at scaling.

Finally, you need to consider the budget and resources that you have available. Relational databases are typically more affordable and easier to manage than NoSQL databases.

By considering these factors, you can choose the best database for your business intelligence application.

Which database framework features a collection of data systems that exist across many physical locations?

Distributed

A distributed database framework features a collection of data systems that exist across many physical locations.

As we continue our discussion of
data based modeling and schemas, it’s important to understand that there
are different facets of databases that a business intelligence
professional might need to consider for their organization. This is because the database framework,
including how platforms are organized and how data is stored and
processed, affects how data is used. Let’s start with an example. Think about
a grocery stores database systems. They manage daily business processes and
analyze and draw insights from data. For example, in addition to enabling users
to manage sales, a grocer’s database must help decision makers understand
what items customers are buying and which promotions are the most effective. In this video, we’re going to check out a
few examples of database frameworks and learn how they’re
different from one another. In particular, databases vary based
on how the data is processed, organized and stored. For this reason it’s important to know
what type of database your company is using. You will design different data models
depending on how data is stored and accessed on that platform. In addition,
another key responsibility for BI professionals is to
facilitate database migrations, which are often necessary when
technology changes and businesses grow. A database migration involves moving data
from one source platform to another target database. During a migration users transition
the current database schemas, to a new desired state. This could involve adding tables or
columns, splitting fields, removing elements, changing data types or
other improvements. The database migration process often
requires numerous phases and iterations, as well as lots of testing. These are huge projects for BI teams and
you don’t necessarily just want to take the original schema and
use it in the new one. So in this video we’ll discuss several
types of databases including OLTP, OLAP, Row-based, columnar,
distributed, single-homed, separated storage and
compute and combined databases. The first two database technologies
were going to explore, OLTP and OLAP systems, are based on
how data is processed. As you’ve learned, an online transaction
processing or OLTP database is one that has been optimized for
data processing instead of analysis. OLTP databases managed
database modification and are operated with traditional
database management system software. These systems are designed to
effectively store transactions and help ensure consistency. An example of an OLTP database
would be an online bookstore. If two people add the same book to their
cart, but there’s only one copy then the person who completes the checkout
process first will get the book. And the OLTP system ensures that there
aren’t more copies sold than are in stock. OLTP databases are optimized to read,
write and update single rows of data to ensure
that business processes go smoothly. But they aren’t necessarily designed
to read many rows together. Next, as mentioned previously, OLAP
stands for online analytical processing. This is a tool that has been optimized for
analysis in addition to processing and can analyze data from multiple databases. OLAP systems pull data from multiple
sources at one time to analyze data and provide key business insights. Going back to our online bookstore,
an OLAP system could pull data about customer purchases
from multiple data warehouses. In order to create personalized home pages
for customers based on their preferences. OLAP database systems enable organizations
to address their analytical needs from a variety of data sources. Depending on the data maturity of the
organization, one of your first tasks as a BI professional could be
to set up an OLAP system. Many companies have OLTP systems
in place to run the business, but they’ll rely on you to create a system
that can prioritize analyzing data. This is a key first step
to drawing insights. Now moving along to row-based and
columnar databases, as the name suggests,
Row based databases are organized by rows. Each row in a table is an instance or
an entry in the database and details about that instance
are recorded and organized by column. This means that if you wanted the average
profit of all sales over the last five years from the bookstore database. You would have to pull each row from
those years even if you don’t need all of the information contained in those rows. Columnar databases on the other hand,
are databases organized by columns. They’re used in data warehouses
because they are very useful for analytical queries. Columnar databases process data quickly, only retrieving information
from specific columns. In our average profit of all sales,
example, with a columnar database, you could choose to specifically pull
the sales column instead of years worth of rows. The next databases are focused on storage. Single-home databases are databases
where all the data is stored in the same physical location. This is less common for organizations
dealing with large data sets. And will continue to
become rarer as more and more organizations move their data
storage to online and cloud providers. Now, distributed databases are collection
of data systems distributed across multiple physical locations. Think about them like telephone books:
it’s not actually possible to keep all the telephone numbers in the world
in one book, it would be enormous. So instead, the phone numbers
are broken up by location and across multiple books in order
to make them more manageable. Finally, we have more ways of storing and
processing data. Combined systems,
our database systems that store and analyze data in the same place. This is a more traditional setup because
it enables users to access all of the data that needs to stay
in the system long-term. But it can become unwieldy as more data
is added. Like the name implies, separated storage and computing systems are databases where
less relevant data is stored remotely. And the relevant data is
stored locally for analysis. This helps the system run analytical
queries more efficiently because you interact with relevant data. It also makes it possible to scale
storage and computations independently. For example, if you have a lot of data but
only a few people are querying it, you don’t need as much computing power,
which can save resources. There are a lot of aspects of
databases that could affect the BI professionals work. Understanding if a system is OLTP or
OLAP, relational or columnar, distributed or
single-homed, separated storage and computing or combined, or even some
combination of these is essential. Coming up we’ll go even more in
depth about organizing data.

Reading: Database comparison checklist

Reading

Practice Quiz: Test your knowledge: Data modeling, schemas, and databases

A business intelligence professional stores large amounts of raw data in its original format within a database system. Then, they can access the data whenever they need it for their BI project. What type of database are they using?

Which of the following statements correctly describe primary and foreign keys? Select all that apply.

Fill in the blank: A _ schema is an extension of a star schema, which contains additional dimensions.

What type of database stores relevant data locally for analysis and less relevant data remotely?

Choose the right database


Video: The shape of the data

A data warehouse is a specific type of database that consolidates data from multiple source systems for data consistency, accuracy, and efficient access. It is used to support data-driven decision making.

BI professionals help design data warehouses by considering the following factors:

  • Business needs: the questions the organization wants to answer or the problems they want to solve.
  • Shape and volume of data: the rows and columns of tables within the warehouse and how they are laid out, as well as the current and future volume of data.
  • Model: the tools and constraints of the system, such as the database itself and any analysis tools that will be incorporated into the system.

The data warehouse model for a bookstore would likely include a fact table for sales data and dimension tables for store, customer, product, promotion, time, stock, and currency. This would create a star schema, which is a common data warehouse model that is well-suited for answering specific questions and generating dashboards.

The logic behind data warehouse design is to organize the data in a way that is efficient for data analysis and that meets the specific needs of the business.

The shape of the data in business intelligence refers to the way the data is organized and structured. This includes the types of data, the relationships between the data, and the format of the data.

The shape of the data is important because it affects how easy it is to analyze and interpret the data. For example, if the data is well-organized and structured, it will be easier to use data mining and machine learning techniques to extract insights from the data.

There are a number of different ways to structure data for business intelligence. One common approach is to use a data warehouse. A data warehouse is a central repository for data from multiple source systems. The data in a data warehouse is typically organized in a star schema or snowflake schema.

A star schema is a data warehouse model that consists of a fact table and one or more dimension tables. The fact table contains the quantitative data, such as sales figures or customer churn rates. The dimension tables contain the qualitative data, such as customer demographics or product categories.

A snowflake schema is a more complex data warehouse model that consists of a fact table and multiple layers of dimension tables. The dimension tables are related to each other in a hierarchical fashion.

In addition to data warehouses, there are a number of other ways to structure data for business intelligence. For example, data can be stored in relational databases, NoSQL databases, or data lakes.

The best way to structure data for business intelligence will depend on the specific needs of the organization. However, there are a few general principles that can be followed:

  • Organize the data in a logical way. The data should be structured in a way that makes it easy to understand and analyze.
  • Use consistent naming conventions. This will make it easier to work with the data and to create reports and dashboards.
  • Document the data. The data should be well-documented so that users understand what the data means and how it should be used.

Here are some tips for working with the shape of data in business intelligence:

  • Identify the key dimensions of the data. What are the different categories or groups that the data can be divided into?
  • Identify the key metrics of the data. What are the quantitative measurements that are most important to the business?
  • Understand the relationships between the dimensions and metrics. How are the different dimensions and metrics related to each other?
  • Choose the right data structure. The data structure should be appropriate for the type of data and the intended use of the data.
  • Clean and prepare the data. The data should be cleaned and prepared before it is analyzed. This includes tasks such as correcting errors, removing outliers, and transforming the data into a consistent format.

By following these tips, you can work effectively with the shape of data in business intelligence to extract insights from the data and improve decision-making.

Fill in the blank: The shape of data refers to the rows and columns of tables within a data warehouse, as well as the _____ of data it contains.

volume

The shape of data refers to the rows and columns of tables within a data warehouse, as well as the volume of data it contains.

You’ve been investigating
data modeling and database schemas as well as how different
types of databases are used in BI. Now we’re going to explore how these
concepts can be used to design data warehouses. But before we get into
data warehouse design, let’s get a refresher on what
a data warehouse actually is. As you probably remember
from earlier in this course, a database is a collection of
data stored in a computer system. Well, a data warehouse is
a specific type of database that consolidates data from
multiple source systems for data consistency, accuracy and
efficient access. Data warehouses are used to support
data driven decision making. Often these systems are managed by
data warehousing specialists but BI professionals may help design them when
it comes to designing a data warehouse. There are a few important things
that BI professional will consider. Business needs, the shape and
volume of the data and what model the data warehouse will follow. Business needs are the questions
the organization wants to answer or the problems they want
to solve these needs. Help determine how it will use store and
organize its data. For example, hospital storing patient
records to monitor health changes has different data requirements than a
financial firm analyzing market trends to determine investment strategies. Next let’s explore the shape and
volume of data from the source system. Typically the shape of data
refers to the rows and columns of tables within the warehouse and
how they are laid out. The volume of data currently and in the
future also changes how the warehouse is designed and the model the warehouse will
follow includes all of the tools and constraints of the system,
such as the database itself and any analysis tools that will be
incorporated into the system. Let’s return to our bookstore example
to develop its data warehouse. We first need to work with stakeholders
to determine their business needs. You’ll have an opportunity to learn
more about gathering information from stakeholders later. But for now let’s say they tell
us that they’re interested in measuring store profitability and website traffic in order to evaluate
the effectiveness of annual promotions. Now we can look at the shape of the data. Consider the business processes or
events that are being captured by tables in the system because
this is a retail store. The primary business process is sales. We could have a sales table that includes
information such as quantity ordered, total based amount, total tax amount,
total discounts and total net amount. These are the facts as a refresher. A fact is a measurement or
metric used in the business process. These facts could be related to a series
of dimension tables that provide more context. For instance, store,
customer product promotion, time, stock or
currency could all be dimensions. The information in these tables gives more
context to our fact tables which record the business processes and events. Notice how this data model
is starting to shape up. There are several dimension tables all
connected to a fact table at the center and this means we just
created a star schema. With this model,
you can answer the specific question, the effectiveness of annual promotions and also generate a dashboard with
other KPIs and drill down reports. In this case, we started with
the businesses specific needs, looked at the data dimensions we had and organize them into tables
that formed relationships. Those relationships helped us determine
that a star schema will be the most useful way to organize this data warehouse. Understanding the logic behind
data warehouse design will help you develop effective by processes and
systems coming up, you’re going to work more
with database schemas and learn about how data is pulled into
the warehouse from other sources.

Video: Design useful database schemas

A database schema is a way of describing how data is organized. It doesn’t actually contain the data itself, but describes how the data is shaped and the relationships within the database.

A database schema should include the following four elements:

  • Relevant data: The schema should include all of the data being described. Otherwise, it won’t be a very useful guide for users trying to understand how the data is laid out.
  • Names and data types for each column: The schema should include the column names and the datatype to indicate what data belongs there.
  • Consistent formatting: The schema should include consistent formatting across all of the data entries in the database. This means using the same data types for the same columns, and formatting the data in a consistent way.
  • Unique keys for each entry: The schema should include unique keys for each entry within the database. This helps to ensure that the data is accurate and consistent.

A database schema is an important part of any BI project. It helps to ensure that the data is organized in a way that is efficient and easy to analyze.

How to design useful database schemas in business intelligence

A database schema is a blueprint for how data is organized in a database. It defines the tables, columns, and relationships between them. A well-designed schema is essential for efficient and effective business intelligence (BI).

Here are some tips for designing useful database schemas in BI:

  1. Understand the business needs. What questions do you want to be able to answer with the data? What reports do you need to generate? Once you understand the business needs, you can start to identify the entities and relationships that need to be represented in the schema.
  2. Choose the right data model. There are a number of different data models that can be used for BI, such as star schemas, snowflake schemas, and fact tables. The best data model for your needs will depend on the specific questions you want to answer and the type of data you have.
  3. Normalize the data. Normalization is the process of organizing data into tables in a way that reduces redundancy and improves data integrity. There are a number of different normalization levels, but it is generally recommended to normalize data to at least third normal form (3NF) for BI.
  4. Implement naming conventions. Consistent naming conventions will make the schema easier to understand and maintain. For example, you may want to use all caps for table names and lowercase with underscores for column names.
  5. Document the schema. It is important to document the schema so that users understand how the data is organized and how to use it. The documentation should include information about the tables, columns, relationships, and data types.

Here are some additional tips for designing useful database schemas in BI:

  • Use descriptive column names.
  • Avoid using reserved words in column names.
  • Use surrogate keys for primary keys.
  • Use foreign keys to define relationships between tables.
  • Create indexes on frequently queried columns.
  • Consider using partitioning to improve performance for large datasets.
  • Test the schema thoroughly before deploying it to production.

By following these tips, you can design database schemas that are useful for BI. This will help you to extract valuable insights from your data and make better decisions.

Here are some examples of useful database schemas for BI:

  • Star schema: A star schema is a common data model for BI. It consists of a fact table and one or more dimension tables. The fact table contains the quantitative data, such as sales figures or customer churn rates. The dimension tables contain the qualitative data, such as customer demographics or product categories.
  • Snowflake schema: A snowflake schema is a more complex data model than a star schema. It consists of a fact table and multiple layers of dimension tables. The dimension tables are related to each other in a hierarchical fashion.
  • Fact table: A fact table is a table that contains the quantitative data for a particular business process. For example, a sales fact table might contain data about sales orders, such as product ID, customer ID, and order amount.

The best data model for your needs will depend on the specific questions you want to answer and the type of data you have. However, the tips above will help you to design a database schema that is useful for BI.

Earlier, we learned about what considerations go into
designing data warehouses. Based on the business
needs and the shape of the data in
our previous example, we created the dimensional
model with a star schema. That process is sometimes
called Logical data modeling. This involves representing
different tables in the physical data model. Decisions have to be made about how a system will
implement that model. In this video, we’re
going to learn more about what a schema needs to have
for it to be functional. Later, you will use your
database schema to validate incoming data to prevent system errors and ensure
that the data is useful. For all of these reasons, it’s important to
consider the schema early on in any BI project. There are four elements a
database schema should include. The relevant data, names and data types for each
column and each table. Consistent formatting
across data entries and unique keys for every
database entry and object. As we’ve already learned, a database schema is a way of describing how
data is organized. It doesn’t actually
contain the data itself, but describes how the data is shaped and the relationships
within the database. It needs to include all of
the data being described. Or else it won’t be a
very useful guide for users trying to understand
how the data is laid out. Let’s return to our
bookstore database example. We know that our data contains a lot of information
about the promotions, customers, products,
dates, and sales. If our schema doesn’t
represent that, then we’re missing
key information. For instance, it’s often necessary for a BI
professional to add new information to an
existing schema if the current schema can’t answer a specific
business question. If the business wants to know which customer service employee responded the most to requests, we would need to add
that information to the data warehouse and update
the schema accordingly. The schema also needs
to include names and data types for each column in each table within
the database. Imagine if you didn’t organize
your kitchen drawers, it would be really difficult
to find anything if all of your utensils were
just thrown together. Instead, you probably have a specific place where you keep your spoons, forks and knives. Columns are like your
kitchen drawer organizers. They enable you to
know what items go where in order to keep
things functioning. Your schema needs to
include the column names and the datatype to indicate
what data belongs there. In addition to making
sure the schema includes all of
the relevant data, names and data types
for each column, it’s also important to
have consistent formatting across all of the data
entries in the database. Every day to entry is an
instance of the schema. For example, imagine we have two transactional systems that we’re combining
into one database. One tracks the promotion
sent to users, and the other track
sales to customers. In the source systems, the marketing system that tracks promotions could have
a user ID column, while the sale system
has customer ID instead. To be consistent in
our warehouse schema, we’ll want to use just
one of these columns. In the schema for this database, we might have a column in one of our tables for product prices. If this data is stored a string type data instead
of numerical data, it can’t be used in calculations such as adding sales
together in a query. Additionally, if any
of the data entries have columns that are
empty or missing values, this might cause issues. Finally, it’s important
that there are unique keys for each entry
within the database. We covered primary and foreign
keys in previous videos. These are what build connections between
tables and enable us to combine relevant data from cross the entire database. In summary, in order for a
database schema to be useful, it should contain the relevant
data from the database, the names and data types for
each column and each table, consistent formatting across
all of the entries within the database and unique
keys connecting the tables. These four elements
will ensure that your schema continues
to be useful. Developing your schema
is an ongoing process. As your data or
business needs change, you can continue to adapt the database schema to
address these needs. More to come on that soon.

Reading: Four key elements of database schemas

Reading

Reading: Review a database schema

Reading

Practice Quiz: Test your knowledge: Choose the right database

When designing a data warehouse, BI professionals take into account which of the following considerations? Select all that apply.

Fill in the blank: Logical data modeling involves representing different _ in a physical data model.

A BI professional considers the relevant data for a project, the names and data types of table columns, formatting of data entries, and unique keys for database entries and objects. What will these activities enable them to accomplish?

How data moves


Video: Data pipelines and the ETL process

Data pipelines are a series of processes that transport data from different sources to their final destination for storage and analysis. They automate the flow of data from sources to targets while transforming the data to make it useful as soon as it reaches its destination.

Data pipelines are used to:

  • Save time and resources
  • Make data more accessible and useful
  • Find what, where, and how data is combined
  • Automate the processes involved in extracting, transforming, combining, validating, and loading data for further analysis and visualization
  • Eliminate errors and combat system latency

Data pipelines can pull data from multiple sources, consolidate it, and then migrate it over to its proper destination. These sources can include relational databases, a website application with transactional data, or an external data source.

Data pipelines are often used in conjunction with ETL systems, which stand for extract, transform, and load. ETL is a type of data pipeline that enables data to be gathered from source systems, converted into a useful format, and brought into a data warehouse or other unified destination system.

Example:

An online streaming service wants to create a data pipeline to understand their viewers demographics to inform marketing campaigns. The stakeholders are interested in monthly reports.

The data pipeline would be set up to automatically pull in the data from the source systems at monthly intervals. Once the data is ingested, the pipeline would perform some transformations to clean and standardize it. The transformed data would then be loaded into target tables that have already been set up in the database.

Once the data pipeline is built, it can be scheduled to automatically perform tasks on a regular basis. This means that BI team members can focus on drawing business insights from the data rather than having to repeat the process of extracting, transforming, and loading the data over and over again.

Conclusion:

Data pipelines are a valuable tool for BI professionals. They can save time and resources, make data more accessible and useful, and help to eliminate errors and combat system latency.

What is a data pipeline?

A data pipeline is a series of processes that transport data from different sources to their final destination for storage and analysis. Data pipelines automate the flow of data from sources to targets while transforming the data to make it useful as soon as it reaches its destination.

What is the ETL process?

The ETL process stands for extract, transform, and load. It is a type of data pipeline that enables data to be gathered from source systems, converted into a useful format, and brought into a data warehouse or other unified destination system.

How do data pipelines and the ETL process work together?

Data pipelines and the ETL process work together to ensure that data is properly extracted, transformed, and loaded into a destination system where it can be used for analysis and reporting.

The following steps outline a typical data pipeline process:

  1. Extract: The data is extracted from the source systems.
  2. Transform: The data is transformed into a format that is compatible with the destination system and meets the needs of the business analysts and other users. This may involve cleaning and standardizing the data, aggregating or summarizing the data, and converting the data to different data types.
  3. Load: The data is loaded into the destination system. This may be a data warehouse, data lake, or data mart.

The following are some of the benefits of using data pipelines and the ETL process:

  • Improved data quality: Data pipelines and the ETL process can help to improve the quality of data by cleaning and standardizing it. This can lead to more accurate and reliable insights.
  • Reduced time and effort: Data pipelines and the ETL process can automate the tasks of extracting, transforming, and loading data. This can free up time and resources for business analysts and other users to focus on more important tasks.
  • Increased scalability: Data pipelines and the ETL process can help to scale data operations. This is important for businesses that are growing rapidly or that need to process large volumes of data.

How to build a data pipeline

There are a number of different ways to build a data pipeline. The best approach will vary depending on the specific needs of the business. However, there are some general steps that can be followed:

  1. Identify the data sources and destination system: The first step is to identify the data sources and the destination system. The data sources may include relational databases, CRM systems, ERP systems, and other applications. The destination system is the system where the data will be stored and analyzed.
  2. Design the data pipeline: Once the data sources and destination system have been identified, the next step is to design the data pipeline. This involves determining how the data will be extracted from the source systems, transformed, and loaded into the destination system.
  3. Implement the data pipeline: Once the data pipeline has been designed, it can be implemented using a variety of tools and technologies. There are a number of commercial and open source data pipeline tools available.
  4. Test and monitor the data pipeline: Once the data pipeline has been implemented, it is important to test and monitor it to ensure that it is working properly. This includes testing the data extraction, transformation, and loading processes, as well as monitoring the performance of the pipeline.

Conclusion

Data pipelines and the ETL process are essential tools for business intelligence. They can help to improve the quality, scalability, and efficiency of data operations. This can lead to more accurate and reliable insights, as well as increased agility and competitiveness.

What are some of the key processes performed with data pipelines? Select all that apply.
  • Define what, where, and how data is combined
  • Help eliminate errors and latency
  • Automate the extraction, transformation, combination, validation, and loading of data

Data pipelines are used to define what, where, and how data is combined. They automate the processes involved in extracting, transforming, combining, validating, and loading of data. They also help eliminate errors and latency.

So far, we’ve been
learning a lot about how data is
organized and stored within data warehouses and how schemas described those systems. Part of your job as
a BI professional is to build and maintain
a data warehouse, taking into consideration
all of these systems that exist and are collecting
and creating data points. To help smooth this process, we use data pipelines. As a refresher, a data pipeline is a series of processes
that transports data from different sources to their final destination
for storage and analysis. This automates the flow of
data from sources to targets while transforming
the data to make it useful as soon as it
reaches its destination. In other words,
data pipelines are used to get data from
point A to point B, automatically save time and resources and make data
more accessible and useful. Basically, data
pipelines to find what, where, and how data is combined. They automate the processes involved in extracting,
transforming, combining, validating, and loading data for further
analysis and visualization. Effective data
pipelines also help eliminate errors and
combat system latency. Having to manually move data over and over whenever
someone asks for it or to update a report repeatedly would be
very time-consuming. For example, if a
weather station is getting daily information
about weather conditions, it will be difficult
to manage it manually because of
the sheer volume. They need a system that takes in the data and gets it where it needs to go so it can be
transformed into insights. One of the most
useful things about a data pipeline is that it can pull data from multiple sources, consolidate it, and then migrate it over to its
proper destination. These sources can include
relational databases, a website application with transactional data or an
external data source. Usually, the pipeline has a push mechanism that
enables it to ingest data from multiple
sources in near real time or regular intervals. Once the data has been
pulled into the pipeline, it can be loaded to
its destination. This could be a data warehouse, data lake or data mart, which we’ll learn
more about coming up. Or it can be pulled
directly into a BI or analytics application
for immediate analysis. Often while data is being
moved from point A to point B, the pipeline is also
transforming the data. Transformations include
sorting, validation, and verification, making
the data easier to analyze. This process is
called the ETL system. ETL stands for extract,
transform, and load. This is a type of
data pipeline that enables data to be gathered
from source systems, converted into a useful format, and brought into
a data warehouse or other unified
destination system. ETL is becoming more and more standard for
data pipelines. We’re going to learn
more about it later on. Let’s say a business
analyst has data in one place and needs to
move it to another, that’s where a data
pipeline comes in. But a lot of the time, the structure of the
source system isn’t ideal for analysis which is why a BI professional
wants to transform that data before it gets
to the destination system and why having set
database schemas already designed and ready to receive data is so important. Let’s now explore these steps
in a little more detail. We can think of a data pipeline functioning in three stages, ingesting the raw data, processing and consolidating
it into categories, and dumping the data into reporting tables that
users can access. These reporting tables are
referred to as target tables. Target tables are the
predetermined locations where a pipeline data is sent
in order to be acted on. Processing and transforming data while it’s being moved is important because it ensures the data is ready to be
used when it arrives. But let’s explore this
process in action. Say we’re working with an
online streaming service to create a data pipeline. First, we’ll want to consider the end goal
of our pipeline. In this example, our
stakeholders want to understand their viewers demographics to
inform marketing campaigns. This includes information about their viewers ages
and interests, as well as where
they are located. Once we’ve determined what
the stakeholders goal is, we can start thinking
about what data we need the pipeline to ingest. In this case, we’re going to want demographic data
about the customers. Our stakeholders are
interested in monthly reports. We can set up our
pipeline to automatically pull in the data we want
at monthly intervals. Once the data is ingested, we also want our pipeline to perform some transformations, so that it’s clean
and consistent once it gets delivered
to our target tables. Note that these tables
would have already been set up within our
database to receive the data. Now, we have our customer
demographic data and their monthly
streaming habits in one table ready for
us to work with. The great thing
about data pipelines is that once they’re built, they can be scheduled
to automatically perform tasks on
a regular basis. This means BI team members can focus on drawing
business insights from the data rather than having to repeat this process
over and over again. As a BI professional, a big part of your job will involve creating these systems, ensuring that they’re
running correctly, and updating them whenever
business needs change. The valuable benefit that your team will
really appreciate.

Video: Maximize data through the ETL process

ETL is a type of data pipeline that enables data to be gathered from source systems, converted into a useful format, and brought into a data warehouse or other unified destination system. ETL processes work in three stages: extract, transform, and load.

Extract: The pipeline accesses source systems and reads and collects the necessary data.

Transform: The data is validated, cleaned, and prepared for analysis. The datatypes are also mapped from the sources to the target systems.

Load: The data is delivered to its target destination, such as a data warehouse, data lake, or analytics platform.

ETL processes are a common type of data pipeline that BI professionals often build and interact with.

Example:

A business wants to understand its monthly sales data. The sales data is stored in a transactional database, which is not optimized for analytical queries. The business can use an ETL process to extract the sales data from the transactional database, transform it into a format that is optimized for analysis, and load it into a data warehouse. The business can then use the data warehouse to analyze its monthly sales data and gain insights.

Benefits of ETL:

  • Improved data quality
  • Reduced time and effort
  • Increased scalability
  • Improved accuracy and reliability of insights
  • Increased agility and competitiveness

Tutorial on Maximizing Data through the ETL Process in Business Intelligence

The ETL process is a critical component of any business intelligence (BI) system. It enables organizations to ingest, transform, and load data from a variety of sources into a centralized data warehouse or data lake. This makes the data more accessible and useful for analysis and reporting.

However, simply implementing the ETL process is not enough. Organizations need to carefully plan and execute their ETL processes in order to maximize the value of their data. Here are some tips:

  1. Identify your business goals. What do you want to achieve with your BI system? Once you know your goals, you can tailor your ETL process to ensure that it is aligned with them. For example, if you are interested in analyzing historical sales data, you will need to make sure that your ETL process extracts and loads all of the relevant sales data from your source systems.
  2. Understand your data sources. What data sources do you need to ingest? What is the format of the data in each source system? Once you have a good understanding of your data sources, you can design your ETL process accordingly.
  3. Design your data warehouse or data lake. Where will you store your transformed data? What schema will you use? It is important to design your data warehouse or data lake in a way that is optimized for your BI needs.
  4. Choose the right ETL tools and technologies. There are a variety of ETL tools and technologies available. Choose the ones that are best suited for your specific needs. Consider factors such as cost, scalability, and ease of use.
  5. Implement a data governance framework. Data governance is the process of managing and protecting data throughout its lifecycle. This includes establishing policies and procedures for data access, quality, and security. A data governance framework is essential for ensuring that your data is reliable and trustworthy.

Best practices for maximizing data through the ETL process

Here are some best practices for maximizing data through the ETL process:

  • Use a data catalog. A data catalog is a repository of information about your data assets. It can help you to identify and understand your data sources, as well as the data that they contain.
  • Clean and normalize your data. Data cleaning and normalization are essential for improving the quality of your data. Data cleaning involves removing errors and inconsistencies from the data. Data normalization involves converting the data into a consistent format.
  • Transform your data for analysis. Your ETL process should transform your data into a format that is optimized for analysis. This may involve aggregating, summarizing, or joining data from different sources.
  • Load your data into a data warehouse or data lake. Once your data has been cleaned, transformed, and loaded into a data warehouse or data lake, it can be used for analysis and reporting.
  • Monitor your ETL process. It is important to monitor your ETL process to ensure that it is running smoothly and efficiently. You should also monitor the quality of your data to ensure that it is accurate and reliable.

By following these tips and best practices, organizations can maximize the value of their data through the ETL process. This can lead to improved decision-making, increased efficiency, and competitive advantage.

In which ETL stage would a business intelligence professional map data types from the sources to the target system in order to ensure the data fits the destination?

Transform

In the transform stage, a business intelligence professional maps data types from the sources to the target system in order to ensure the data fits the destination.

We’ve been learning a lot about data pipelines and
how they work. Now, we’re going to discuss a specific kind
of pipeline: ETL. I mentioned previously that ETL enables data to be gathered
from source systems, converted into a useful format, and brought into a
data warehouse or other unified
destination system. Like other pipelines,
ETL processes work in stages and these stages are
extract, transform, and load. Let’s start with extraction. In this stage, the
pipeline accesses a source systems
and then read and collects the necessary
data from within them. Many organizations store their data in
transactional databases, such as OLTP systems, which are great for
logging records or maybe the business
uses flat files, for instance, HTML or log files. Either way, ETL makes the
data useful for analysis by extracting it from
its source and moving it into a
temporary staging table. Next we have transformation. The specific
transformation activities depend on the structure and format of the destination and the requirement
of the business case, but as you’ve learned, these transformations
generally include validating, cleaning, and preparing
the data for analysis. This stage is also when the ETL pipeline maps
the datatypes from the sources to the
target systems so the data fits the
destination conventions. Finally, we have
the loading stage. This is when data is delivered
to its target destination. That could be a data
warehouse, a data lake, or an analytics platform that works with direct data feeds. Note that once the data
has been delivered, it can exist within
multiple locations in multiple formats. For example, there could be a snapshot table that
covers a week of data and a larger archive that has some of
the same records. This helps ensure
the historical data is maintained within the system while
giving stakeholders focused, timely data, and if the business
is interested in understanding and comparing
average monthly sales, the data would be moved to an OLAP system that have been optimized for
analysis queries. ETL processes are
a common type of data pipeline that
BI professionals often build and interact with. Coming up, you’re going to learn more about these systems
and how they’re created.

Video: Choose the right tool for the job

This video is about how BI professionals choose the right tool. Here are the key takeaways:

  • Consider the KPIs, how your stakeholders want to view the data, and how the data needs to be moved.
  • KPIs are quantifiable values that are closely linked to the business strategy.
  • Stakeholders might ask for graphs, static reports, or dashboards.
  • Some BI tools include Looker Studio, Microsoft Power BI, and Tableau.
  • Some back-end tools include Azura Analysis Service, CloudSQL, Pentaho, SSAS, and SSRS SQL Server.
  • Not all BI tools can read data lakes.
  • Consider how to transfer the data, how it should be updated, and how the pipeline combines with other tools in the data transformation process.
  • You might end up using a combination of tools to create the ideal system.
  • BI tools have common features, so the skills you learn can be used no matter which tools you end up working with.

Choose the right tool for the job in Business Intelligence

Business Intelligence (BI) tools are essential for businesses of all sizes to make better decisions. However, with so many different BI tools on the market, it can be difficult to know which one is right for your business.

In this tutorial, we will discuss the key factors to consider when choosing a BI tool and provide recommendations for some of the most popular BI tools on the market.

Key factors to consider when choosing a BI tool

The following are some of the key factors to consider when choosing a BI tool:

  • Features: What features are important to your business? Some common BI features include data visualization, reporting, dashboards, and analytics.
  • Ease of use: How easy is the tool to use? Consider the skill level of your users when choosing a BI tool.
  • Cost: How much does the tool cost? BI tools can range in price from free to thousands of dollars per month.
  • Scalability: How scalable is the tool? Can it handle the volume and complexity of your data?
  • Integration: Does the tool integrate with your existing systems?

Popular BI tools

Here are some of the most popular BI tools on the market:

  • Tableau Tableau is a popular BI tool that is known for its ease of use and powerful data visualization capabilities. Tableau is a good choice for businesses of all sizes, but it can be expensive for larger businesses.
  • Microsoft Power BI Microsoft Power BI is a powerful BI tool that is included with Microsoft Office 365. Power BI is a good choice for businesses that are already using Office 365, as it is easy to integrate with other Microsoft products.
  • Looker Studio Looker Studio is a free BI tool that is easy to use and offers a variety of features, including data visualization, reporting, and dashboards. Looker Studio is a good choice for businesses of all sizes, but it may not be as powerful as some other BI tools.
  • Qlik Sense Qlik Sense is a powerful BI tool that is known for its ability to handle large and complex data sets. Qlik Sense is a good choice for larger businesses that need a powerful BI tool that can scale with their needs.
  • ThoughtSpot ThoughtSpot is a powerful BI tool that is known for its ability to perform complex analytics in real time. ThoughtSpot is a good choice for businesses that need a BI tool that can help them make quick decisions based on real-time data.

Choosing the right BI tool for your business

The best way to choose the right BI tool for your business is to consider the key factors discussed above. Think about the features that are important to you, the skill level of your users, your budget, and your scalability needs.

Once you have considered these factors, you can start to narrow down your choices. You may want to read reviews of different BI tools or try out a few different tools before making a decision.

Conclusion

Choosing the right BI tool is an important decision for any business. By considering the key factors discussed above, you can choose a BI tool that will help you make better decisions and improve your business performance.

In previous videos, we’ve been
exploring pipeline processes that ingest data from different sources, transform
it to match the destination formatting, and push it to a final destination where
users can start drawing business insights. BI professionals play a key role in
building and maintaining these processes, and they use a variety of tools
to help them get the job done. In this video, we’ll learn how BI
professionals choose the right tool. As a BI professional, your organization
will likely have preferred vendors, which means you’ll be given
a set of available BI solutions. One of the great things about BI is
that different tools have very similar principles behind them and
similar utility. This is another example
of a transferable skill. In other words, your general understanding
can be applied to other solutions, no matter which ones your
organization prefers. For instance, the first database management system
I learned was Microsoft Access. This experience helped me gain a basic
understanding of how to build connections between tables, and that made learning
new tools more straightforward. Later in my career,
when I started working with MySQL, I was already able to recognize
the underlying principles. Now it’s possible that you’ll
choose the tools you’ll be using. If that’s the case,
you’ll want to consider the KPIs, how your stakeholders want to view the
data, and how the data needs to be moved. As you’ve learned,
a KPI is a quantifiable value closely linked to the business strategy, which
is used to track progress toward a goal. KPIs let us know whether or
not we’re succeeding, so that we can adjust our processes
to better reach objectives. For example, some financial
KPIs are gross profit margin, net profit margin, and return on assets. Or some HR KPIs are rate of promotion and
employee satisfaction. Understanding your organization’s
KPIs means you can select tools based on those needs. Next, depending on how your
stakeholders want to view the data, there are different tools you can choose. Stakeholders might ask for graphs,
static reports, or dashboards. There are a variety of tools, including
Looker Studio, Microsoft, PowerBI and Tableau. Some others are Azura Analysis Service,
CloudSQL, Pentaho, SSAS, and SSRS SQL Server,
which all have reporting tools built in. That’s a lot of options. You’ll get more insights about
these different tools later on. After you’ve thought about how your
stakeholders want to view the data, you’ll want to consider
your back-end tools. This is when you think about
how the data needs to be moved. For example,
not all BI tools can read data lakes. So, if your organization uses
data lakes to store data, then you need to make sure you
choose a tool that can do that. Some other important considerations when
choosing your back-end tools include how to transfer the data,
how it should be updated, and how the pipeline combines with other
tools in the data transformation process. Each of these points helps you
determine must haves for your toolset, which leads to the best options. Also, it’s important to know that you
might end up using a combination of tools to create the ideal system. As you’ve been learning, BI tools have
common features, so the skills you learn in these courses can be used no matter
which tools you end up working with. Going back to my example, I was able to understand the logic behind
transforming and combining tables. Whether I was using Microsoft Access or
MySQL. This foundation has transferred across
the different BI tools I’ve encountered throughout my career. Coming up, you’ll learn more about the solutions
that you might work with in the future. You’ll also start getting
hands on with some data soon.

Reading: Business intelligence tools and their applications

Reading

Reading: ETL-specific tools and their applications

Reading

Practice Quiz: Test your knowledge: How data moves

What is the term for the predetermined locations where pipeline data is sent in order to be acted on?

A BI professional uses a pipeline to access source systems, then reads and collects the necessary data from within them. Which ETL stage does this scenario describe?

Many BI tools are built upon similar principles and often have similar utilities. Therefore, a BI professional’s general understanding of one tool can be applied to others. What is this an example of?

Data-processing with Dataflow


Video: Introduction to Dataflow

Google Dataflow is a serverless data processing service that can be used to create data pipelines. Dataflow pipelines can be created using Python, SQL, or pre-built templates. Dataflow also includes security features to help keep data safe.

Key points:

  • Dataflow pipelines are made up of steps that read data from a source, transform it, and write it to a destination.
  • Dataflow can be used to perform a variety of data processing tasks, such as batch processing, stream processing, and machine learning.
  • Dataflow pipelines can be created using Python, SQL, or pre-built templates.
  • Dataflow includes security features to help keep data safe.

How to use Dataflow:

  1. Log in to Google Dataflow.
  2. Go to the jobs page.
  3. Create a job from template or from SQL.
  4. Build your pipeline.
  5. Run your pipeline.

Additional tips:

  • Use snapshots to save the current state of your pipeline.
  • Use the pipeline section to view a list of your pipelines.
  • Use the notebook section to create and share Jupyter Notebooks.
  • Use the SQL workspace to write and execute SQL queries.

Introduction to Dataflow in Business Intelligence

Dataflow is a critical component of business intelligence (BI). It allows businesses to collect, transform, and load data from a variety of sources into a single, unified view. This unified view of data can then be used to create dashboards, reports, and other BI insights.

Benefits of using Dataflow in BI

There are several benefits to using Dataflow in BI, including:

  • Scalability: Dataflow can scale to handle large volumes of data, making it ideal for enterprise BI applications.
  • Flexibility: Dataflow can be used to transform data in a variety of ways, making it suitable for a wide range of BI use cases.
  • Reliability: Dataflow is a reliable and secure service, making it ideal for mission-critical BI applications.

How to use Dataflow in BI

To use Dataflow in BI, you will need to:

  1. Identify your data sources. Dataflow can read data from a variety of sources, including relational databases, cloud storage, and streaming data sources.
  2. Define your data transformations. Dataflow can be used to perform a variety of data transformations, such as cleaning, filtering, and aggregating data.
  3. Create a Dataflow pipeline. A Dataflow pipeline is a sequence of steps that read data from a source, transform it, and write it to a destination.
  4. Run your Dataflow pipeline. Once you have created a Dataflow pipeline, you can run it to transform your data.
  5. Load your transformed data into your BI system. Once your data has been transformed, you can load it into your BI system to create dashboards, reports, and other BI insights.

Here is an example of how Dataflow can be used in BI:

A retail company wants to use Dataflow to create a BI dashboard that shows sales data by product category and region. The company has sales data stored in a relational database. The company also wants to include data from its e-commerce platform, which is stored in cloud storage.

To create the BI dashboard, the company would first need to create a Dataflow pipeline. The pipeline would read data from the relational database and the cloud storage. The pipeline would then transform the data to match the format required by the BI dashboard. Finally, the pipeline would write the transformed data to a destination, such as a data warehouse or BigQuery.

Once the data has been loaded into the destination, the company can use a BI tool to create a dashboard that shows sales data by product category and region.

Conclusion

Dataflow is a powerful tool that can be used to create a scalable, flexible, and reliable BI data pipeline. By using Dataflow, businesses can collect, transform, and load data from a variety of sources into a single, unified view. This unified view of data can then be used to create dashboards, reports, and other BI insights.

Recently, you’re introduced
to data pipelines. You learn that many of the procedures and
understandings involved in one pipeline tool can be transferred
to other solutions. So in this course we’re going
to be using Google Dataflow. But even if you end up working with
a different pipeline tool, the skills and steps involved here will be very useful. And using Google Dataflow now will
be a great opportunity to practice everything you’ve learned so far. We’ll start by introducing you to data
flow and going over its basic utilities. Later on you’ll use this tool to
complete some basic BI tasks and set up your own pipeline. Google Data Flow is a serverless
data-processing service that reads data from the source, transforms it, and
writes it in the destination location. Dataflow creates pipelines with open
source libraries which you can interact with using different languages
including Python and SQL. Dataflow includes a selection of pre-built
templates that you can customize or you can use SQL statements
to build your own pipelines. The tool also includes security
features to help keep your data safe. Okay, let’s open Dataflow and
explore it together now. First, we’ll log in and go to the console. Once the console is open,
let’s find the jobs page. If this is your first time using Dataflow,
it will say no jobs to display. The jobs page is where we’ll find
current jobs in our project space. There are options to create jobs from
template or create jobs from SQL. Snapshot save the current state
of a streaming pipeline so that you can start a new version
without losing the current one. This is great for testing your pipelines,
updating them seamlessly for users and backing up and recovering old versions. The pipeline section contains a list
of the pipelines you’ve created. Again, if this is your first time using
data flow, it will display the processes you need to enable before you
can start building pipelines. Now is a great time to do that. Just click fix all to enable the API
features and set your location. The Notebook section
enables you to create and save shareable Jupyter Notebooks
with live code. This is useful for first time ETL
tool users to check out examples and visualize the transformations. Finally, we have the SQL workspace. If you’ve worked with BigQuery before, such as in the Google Data Analytics
Certificate, this will be familiar. This is where you write and execute SQL
queries while working within Dataflow and there you go. Now you can log into Google Dataflow and
start exploring it on your own. We’ll have many more opportunities
to work with this tool soon.

Practice Quiz: [Optional] Activity: Create a Google Cloud account

Reading: Guide to Dataflow

Practice Quiz: [Optional] Activity: Create a streaming pipeline in Dataflow

Video: Coding with Python

Python is a popular programming language that is well-suited for business intelligence (BI). It is a general-purpose language that can be used to connect to databases, develop pipelines, and process big data.

Python is primarily object-oriented and interpreted. This means that it is modeled around data objects and that it is executed by an interpreter rather than being compiled.

One of the most valuable things about Python for BI is its ability to create and save data objects. This allows BI professionals to interact with data in a flexible and efficient way.

Python can also be used to create notebooks, which are interactive programming environments for creating data reports. This can be a great way to build dynamic reports for stakeholders.

Google Dataflow is a serverless data processing service that can be used to create data pipelines. Python can be used to write Dataflow pipelines, which allows BI professionals to take advantage of the scalability, flexibility, and reliability of Dataflow.

Key takeaways:

  • Python is a popular programming language that is well-suited for BI.
  • Python is primarily object-oriented and interpreted.
  • Python can be used to create and save data objects, which is valuable for BI.
  • Python can be used to create notebooks, which are interactive programming environments for creating data reports.
  • Python can be used to write Dataflow pipelines, which allows BI professionals to take advantage of the scalability, flexibility, and reliability of Dataflow.

Coding with Python in Business Intelligence

Python is a powerful programming language that can be used for a variety of tasks in business intelligence (BI). It can be used to connect to databases, manipulate data, and create visualizations. Python is also a popular language for developing machine learning models, which can be used to automate BI tasks and generate insights.

Here are some of the ways that Python can be used in BI:

  • Connecting to databases: Python can be used to connect to a variety of databases, including relational databases, cloud databases, and big data databases. This allows BI professionals to access and analyze data from a variety of sources.
  • Manipulating data: Python has a number of libraries that can be used to manipulate data, such as NumPy and Pandas. These libraries allow BI professionals to clean, filter, and aggregate data in a variety of ways.
  • Creating visualizations: Python can be used to create a variety of visualizations, such as charts, graphs, and maps. This allows BI professionals to communicate insights to stakeholders in a visually appealing and informative way.
  • Developing machine learning models: Python is a popular language for developing machine learning models. These models can be used to automate BI tasks, such as forecasting and anomaly detection. Python also has a number of libraries that make it easy to deploy machine learning models to production.

Here is a simple example of how to use Python to connect to a database and query data:

import pymysql

# Connect to the database
conn = pymysql.connect(host='localhost', user='root', password='', db='mydb')

# Create a cursor
cur = conn.cursor()

# Execute a query
cur.execute('SELECT * FROM customers')

# Fetch the results
rows = cur.fetchall()

# Close the cursor and connection
cur.close()
conn.close()

# Print the results
for row in rows:
    print(row)

This code will connect to a MySQL database called mydb and execute a query to select all rows from the customers table. The results of the query will then be printed to the console.

Here is a more complex example of how to use Python to manipulate data and create a visualization:

import pandas as pd
import matplotlib.pyplot as plt

# Read the data from a CSV file
df = pd.read_csv('sales_data.csv')

# Calculate the total sales for each product category
product_sales = df.groupby('product_category')['sales'].sum()

# Create a bar chart of the product sales
plt.bar(product_sales.index, product_sales.values)

# Set the chart title and labels
plt.title('Product Sales')
plt.xlabel('Product Category')
plt.ylabel('Sales')

# Show the chart
plt.show()

This code will read the sales data from a CSV file and calculate the total sales for each product category. The results will then be used to create a bar chart of the product sales.

These are just a few examples of how Python can be used in BI. Python is a powerful tool that can be used to automate tasks, generate insights, and communicate findings to stakeholders.

Here are some tips for using Python in BI:

  • Use libraries: There are a number of Python libraries that can be used for BI tasks. These libraries can save you time and effort by providing pre-written code for common tasks.
  • Start small: Don’t try to do too much too soon. Start with simple tasks and gradually work your way up to more complex tasks.
  • Get help: There is a large community of Python users who are willing to help. If you get stuck, don’t be afraid to ask for help online or in forums.

Learning to use Python in BI can be a rewarding experience. By learning Python, you can automate your work, generate insights, and communicate your findings more effectively.

If you’re coming into
these courses from the Google Data
Analytics Certificate, or if you’ve been working
with relational databases, you’re probably familiar with
the query language, SQL. Query languages are specific computer
programming languages used to communicate
with a database. As a BI professional, you may be expected to use other kinds of programming
languages too. That’s why in this video, we’ll explore one of the most popular
programming languages out there, Python. A programming language
is a system of words and symbols used to write instructions
that computers follow. There are lots of different
programming languages, but Python was specifically
developed to enable users to write commands in fewer lines
than most other languages. Python is also open source, which means it’s freely
available and may be modified and shared by
the people who use it. There’s a large community
of Python users who develop tools and libraries
to make Python better, which means there are
a lot of resources available for BI
professionals to tap into. Python is a general purpose
programming language that can be applied to
a variety of contexts. In business intelligence,
it’s used to connect to a database system to
read and modify files. It can also be combined with other software tools to develop pipelines and it can even process big data and
perform calculations. There are a few key things
you should understand about Python as you begin
your programming journey. First, it is primarily
object-oriented and interpreted. Let’s first understand what it means to be object-oriented. Object-oriented
programming languages are modeled around data objects. These objects are chunks of code that capture
certain information. Basically, everything in
the system is an object, and once data has been
captured within the code, it’s labeled and defined
by the system so that it can be used again later without having to
re-enter the data. Because Python has been adopted pretty broadly by
the data community, a lot of libraries have been developed to pre-define
data structures and common operations that you can apply to the
objects in your system. This is extremely useful
when you need to repeat analysis or even use the same transformations
for multiple projects. Not having to re-enter the
code from scratch saves time. Note that object-oriented
programming languages differ from functional
programming languages, which are modeled
around functions. While Python is primarily
object-oriented, it can also be used as a functional
programming language to create and apply functions. Part of the reason Python is so popular is that it’s flexible. But for BI, the really valuable thing
about Python is its ability to create and save data objects that can then be
interacted with via code. Now, let’s consider
the fact that Python is an
interpreted language. Interpreted languages are
programming languages that use an interpreter; typically another program to read and execute
coded instructions. This is different from a
compiled programming language, which compiles
coded instructions that are executed directly
by the target machine. One of the biggest
differences between these two types of
programming languages is that the compiled code executed by the machine is almost
impossible for humans to read. So Python’s
interpreted language, it’s very useful for BI
professionals because it enables them to use language
in an interactive way. For example, Python can be
used to make notebooks. A notebook is an interactive, editable programming environment for creating data reports. This can be a great way to build dynamic reports
for stakeholders. Python is a great tool to
have in your BI toolbox. There’s even an option to use Python commands in
Google Dataflow. Pretty soon, you’ll get to check it out for
yourself when you start writing Python in
your Dataflow workspace.

Reading: Python applications and resources

Reading

Organize data in BigQuery


Video: Gather information from stakeholders

Before building BI processes for stakeholders, BI professionals need to gather information about the current processes in place, the stakeholders’ goals, metrics, and final target tables. They also need to identify the stakeholders’ assumptions and biases about the project. BI professionals can do this by creating a presentation and leading a workshop session with the different teams, observing the stakeholders at work, and asking them questions.

Tutorial on how to gather information from stakeholders in Business Intelligence

Gathering information from stakeholders is an essential step in any Business Intelligence (BI) project. By understanding the needs of your stakeholders, you can ensure that your BI solutions are aligned with their goals and that they will have the data they need to make informed decisions.

Here are some tips on how to gather information from stakeholders in BI:

1. Identify your stakeholders.

The first step is to identify all of the stakeholders who will be impacted by your BI project. This could include executives, managers, analysts, and other employees who rely on data to make decisions.

2. Understand their needs and goals.

Once you have identified your stakeholders, you need to understand their individual needs and goals. What kind of data do they need? What questions are they trying to answer? What are their biggest pain points?

3. Use a variety of data gathering methods.

There are a variety of different ways to gather information from stakeholders. You can conduct interviews, surveys, workshops, or simply have informal conversations.

4. Be specific.

When gathering information from stakeholders, be as specific as possible. Don’t just ask them what data they need. Ask them specific questions about the types of reports they would like to see, the metrics they would like to track, and the insights they would like to gain from the data.

5. Be clear and concise.

When communicating with stakeholders, be clear and concise. Avoid using jargon or technical terms that they may not understand.

6. Be responsive.

Be responsive to stakeholder feedback. If they have any questions or concerns, be sure to address them promptly.

Here are some specific examples of questions you can ask stakeholders when gathering information:

  • What are your biggest pain points when it comes to data?
  • What kind of data do you need to make better decisions?
  • What questions are you trying to answer with data?
  • What metrics are most important to you?
  • What reports would you like to see?
  • How would you like to be able to access and interact with data?
  • What are your goals for this BI project?

By gathering information from stakeholders and understanding their needs, you can ensure that your BI solutions are successful.

Here are some additional tips for gathering information from stakeholders in BI:

  • Use a data dictionary. A data dictionary is a document that describes the data that is used in a BI system. It can be a helpful tool for communicating with stakeholders and ensuring that everyone is on the same page.
  • Create a data governance plan. A data governance plan defines the policies and procedures for managing data in a BI system. It can help to ensure that the data is accurate, reliable, and secure.
  • Provide training to stakeholders. Once you have implemented your BI solutions, it is important to provide training to stakeholders on how to use them. This will help them to get the most out of the data and to make better decisions.

By following these tips, you can effectively gather information from stakeholders in BI and ensure that your solutions meet their needs.

You’ve already
learned quite a bit about the different
stakeholders that a BI professional might work with in an organization and how
to communicate with them. You’ve also learned that gathering information
from stakeholders at the beginning of a project is an essential step
of the process. Now that you understand
more about pipelines, let’s consider what information
you need to gather from stakeholders before building
BI processes for them, that way you’ll know
exactly what they need and can help make their work
as efficient as possible. Part of your job as
a BI professional is understanding the
current processes in place and how you can integrate BI tools into those
existing workstreams. Oftentimes in BI, you aren’t just trying to answer individual questions every day, you’re trying to find
out what questions your team is asking so that you can build them
a tool that enables them to get that
information themselves. It’s rare for people
to know exactly what they need and
communicate that to you. Instead, they will
usually come to you with a list of
problems or symptoms, and it’s your responsibility to figure out how to help them. Stakeholders who are
less familiar with data simply don’t know what BI
processes are possible. This is why cross business
alignment is so important. You want to create a
user-centered design where all of the requirements for
the entire team are met, that way your solutions address
everyone’s needs at once, streamlining their
processes as a group. It can be challenging
to figure out what all of your different
stakeholders require. One option is to
create a presentation and lead a workshop session
with the different teams. This can be a great
way to support cross business alignment and
determine everyone’s needs. It’s also very helpful to
spend some time observing your stakeholders
at work and asking them questions about what
they’re doing and why. In addition, it’s important to establish the metrics
and what data the target table should contain early on with cross
team stakeholders. This should be done before
you start building the tools. As you’ve learned, a metric is a single quantifiable data point that is used to
evaluate performance. In BI, the metrics businesses are usually interested in are KPIs that help them assess how successful they are at
achieving certain goals. Understanding those goals
and how they can be measured is an
important first step in building a BI tool. You also know that
target tables are the final destination
where data is acted on. Understanding the end goals helps you design
the best process. It’s important to
remember that building BI processes is a collaborative
and iterative process. You will continue gathering information from your
stakeholders and using what you’ve
learned until you create a system that
works for your team, and even then you might
change it as new needs arise. Often, your stakeholders will have identified
their questions, but they may not have identified their assumptions or biases
about the project yet. This is where a BI professional
can offer insights. Collaborating closely
with stakeholders ensures that you are
keeping their needs in mind as you design the BI tools that will
streamline their processes. Understanding their
goals, metrics, and final target tables, and communicating
across multiple teams will ensure that you make
systems that work for everyone.

Reading: Merge data from multiple sources with BigQuery

Reading

Practice Quiz: Activity: Set up a sandbox and query a public dataset in BigQuery

Reading: Unify data with target tables

Reading

Practice Quiz: Activity: Create a target table in BigQuery

Reading: Activity Exemplar: Create a target table in BigQuery

Reading

Reading: Case study: Wayfair – Working with stakeholders to create a pipeline

Review: Data models and pipelines

Video: Wrap-up

This section of the course has covered the following topics:

  • A BI professional’s role in the organization and storage of data
  • Data models and schemas
  • Design patterns based on the organization’s needs
  • Database design
  • Data pipelines, ETL processes, and building BI tools to automate data movement
  • Strategies for gathering information from stakeholders to ensure that the tools solve business problems

The next section of the course will cover how to maintain BI tools and optimize database systems.

Hey, great work so far. You’re almost done with
the first section of this course. You’ve
learned a lot. So far, we’ve discussed a BI professional’s role in the organization and
storage of data. You also investigated
data models and schemas, how BI professionals develop design patterns based on
the organization’s needs, and how databases are designed. You’ve been introduced to data
pipelines, ETL processes, and building BI tools
that help automate moving data from storage systems
to target destinations. You’ve even started
using tools to begin building your
own pipelines. Finally, you learned strategies for gathering information from stakeholders to ensure
that the tools you create for them actually
solve the business problems. But creating systems that
manage and move data is just one part of a
BI professional’s job. You also have to make
sure that those systems continue working for
your stakeholders. Coming up, you’re going
to discover how to maintain your BI tools and
optimized database systems. I hope you’re excited to learn more because there’s a lot more I wanted to share with you. But first, you have
another challenge ahead. Feel free to spend some time
with the glossary and review any of the section’s content before moving on to
your next assessment. Then I’ll be here when you’re
ready to take next step.

Reading: Glossary terms from module 1

Reading

Quiz: Module 1 challenge

Which of the following statements correctly describe Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) tools? Select all that apply.

Fill in the blank: In order to create an effective data model, business intelligence professionals will often apply a _, which uses relevant measures and facts to create a model that supports business needs.

Which of the following statements accurately describe primary keys? Select all that apply.

In a dimensional model, what might dimensions represent? Select all that apply.

Fill in the blank: In a dimensional model, a foreign key is used to connect a _ table to the appropriate fact table.

How many fact tables exist in a star schema?

A business intelligence team wants to improve the state of their database schemas. While working toward this goal, they move data from one source platform to another target database. What process does this situation describe?

In row-based databases, each row in a table is an instance or an entry in the database. How are details about that instance recorded and organized?

A business intelligence team is working with a database system in which relevant data is stored locally and less relevant data is stored remotely. What type of database system are they using?

Fill in the blank: A database schema must describe _ because this is necessary when users want to understand how the data is shaped and the relationships within the database.

In the ETL loading stage, what are typical target destinations to which the data might be delivered? Select all that apply.

Fill in the blank: Python is a programming language that is _, which means it’s modeled around chunks of code that capture certain information.