Skip to content

You’ll learn more about database systems, including data marts, data lakes, data warehouses, and ETL processes. You’ll also investigate the five factors of database performance: workload, throughput, resources, optimization, and contention. Finally, you’ll consider how to design efficient queries that get the most from a system.

Learning Objectives

  • Discover strategies to create an ETL process that works to meet organizational and stakeholder needs and maintain an ETL process efficiently.
  • Understand what the different data storage and extraction processes and tools may include (Extract/L: Stitch/Segment/Fivetran, Transform: DBT/Airflow/Looker).
  • Explain how to optimize when building new tables.
  • Identify and describe where new tables can fit in the pipeline.
  • Recognize the different aspects of databases, including OLAP and OLTP, columnar and relational, distributed and single-homed databases.
  • Understand the importance of database performance and optimization.
  • Describe the different five factors of database performance: workload, throughput, resources, optimization, and contention.
  • Perform pipeline debugging using queries.

Database performance


Video: Welcome to module 2

This video series will explore how to improve the performance of data pipelines by understanding and optimizing database queries. This will allow you to deliver the most up-to-date information to your stakeholders more efficiently.

The series will cover the following topics:

  • How to increase throughput and minimize resource contention
  • Database systems, including data marts, data lakes, data warehouses, and ELT processes
  • The five factors of database performance: workload, throughput, resources, optimization, and contention
  • Tips for improving database intake and storage
  • How to design efficient queries

By the end of the series, you will be able to:

  • Optimize database queries to improve the performance of data pipelines
  • Choose the right database system for your needs
  • Understand the factors that affect database performance
  • Improve database intake and storage
  • Design efficient queries that get the most out of your systems

In order to efficiently deliver
the most up to date information to your stakeholders, you must first understand and optimize
query performance within your pipelines. And that’s we’re going to
explore the next few videos. We’ll discover how to increase throughput
and minimize the competition for resources within the system to enable the
largest possible workload to be processed. We’ll get into database
systems including data marts, data lakes, data warehouses and
ELT processes. That’s not a typo,
ELT is different from ETL. It stands for extract, load and transform. You’ll also witness how these systems
contribute to the overall efficiency of your data systems. In addition, you’ll investigate the five
factors of database performance, workload, throughput, resources,
optimization and contention. And you’ll gain some tips for
making sure your database intake and storage are the best they can be. Finally, we’ll begin thinking about how to
design efficient queries that really get the most out of your systems. Let’s do it.

Video: Data marts, data lakes, and the ETL process

In addition to data warehouses, there are other data storage and processing patterns that BI professionals may encounter, such as data marts, data lakes, and ELT processes.

Data marts: Data marts are subject-oriented databases that can be a subset of a larger data warehouse. They are useful for accessing the relevant data that needs to be pulled for a particular project.

Data lakes: Data lakes are database systems that store large amounts of raw data in its original format until it’s needed. This makes the data easily accessible, because it doesn’t require a lot of processing.

ELT processes: ELT stands for Extract, Load, and Transform. It is a type of data pipeline that enables data to be gathered from different sources, usually data lakes, then loaded into a unified destination system and transformed into a useful format.

These new technologies and processes offer a number of advantages over traditional data warehouses, such as:

  • Increased flexibility and scalability
  • Reduced storage costs
  • Faster data processing
  • Ability to handle a wider variety of data types

BI professionals who are curious and lifelong learners will be well-positioned to take advantage of these new technologies and processes to deliver better insights to their stakeholders.

Data marts, data lakes, and the ETL process

Data marts

A data mart is a subset of a data warehouse that is focused on a specific business area or department. For example, a company might have a data mart for its sales team, its marketing team, and its finance team.

Data marts are typically smaller and more focused than data warehouses, which makes them faster and easier to query. They are also often more affordable to build and maintain.

Data lakes

A data lake is a central repository that stores all of a company’s data, regardless of its format or structure. Data lakes can store structured data, such as relational database tables, as well as unstructured data, such as images, videos, and text files.

Data lakes are often used to store raw data that has not yet been processed or analyzed. This data can then be analyzed using a variety of tools and techniques, such as machine learning and artificial intelligence.

The ETL process

The ETL process is a three-step process for extracting data from source systems, transforming it into a consistent format, and loading it into a target system.

  1. Extract: The first step is to extract the data from the source systems. This may involve connecting to the source systems and querying the data, or it may involve exporting the data from the source systems into files.
  2. Transform: The second step is to transform the data into a consistent format. This may involve cleaning the data, removing errors, and converting the data to a common data model.
  3. Load: The third step is to load the data into the target system. This may involve connecting to the target system and inserting the data into tables, or it may involve importing the data into the target system from files.

The ETL process is important for ensuring that data is accurate, consistent, and accessible for analysis.

Conclusion

Data marts, data lakes, and the ETL process are all important components of modern data warehouses. Data marts provide fast and easy access to data for specific business areas or departments. Data lakes provide a central repository for all of a company’s data, regardless of its format or structure. And the ETL process ensures that data is accurate, consistent, and accessible for analysis.

When to use data marts

Data marts are a good choice for organizations that:

  • Need to provide fast and easy access to data for specific business areas or departments.
  • Have limited resources to build and maintain a data warehouse.
  • Need to comply with data privacy or security regulations.

When to use data lakes

Data lakes are a good choice for organizations that:

  • Need to store a large volume of data, including unstructured data.
  • Need to perform complex analytics on their data.
  • Need to be able to scale their data storage and processing capabilities quickly and easily.

When to use the ETL process

The ETL process should be used by any organization that needs to extract data from multiple source systems, transform it into a consistent format, and load it into a target system. This includes organizations that are using data marts, data lakes, or traditional data warehouses.

Choosing the right technology

The best technology for your organization will depend on your specific needs and requirements. If you are not sure which technology is right for you, it is a good idea to consult with a data expert.

Fill in the blank: A data lake is a database system that stores large amounts of _____ in its original format until it’s needed.

raw data

A data lake is a database system that stores large amounts of raw data in its original format until it’s needed. While the raw data has been tagged to be identifiable, it is not organized.

What is the term for a pipeline that extracts, loads, then transforms the data?

ELT

ELT is a pipeline that extracts, loads, then transforms the data. It enables data to be gathered from data lakes, loaded into a unified destination system, and transformed into a useful format.

One of the amazing things about BI,
is that the tools and processes are constantly evolving. Which means BI professionals always
have new opportunities to build and improve current systems. So, let’s learn about some other
interesting data storage and processing patterns you might
encounter as a BI professional. Throughout these courses, we’ve learned
about database systems that make use of data warehouses for their storage needs. As a refresher, a data warehouse is
a specific type of database that consolidates data from multiple
source systems for data consistency, accuracy and efficient access. Basically, a data warehouse is a huge
collection of data from all the company’s systems. Data warehouses were really common
when companies used a single machine, to store and
compute their relational databases. However, with the rise of cloud
technologies and explosion of data volume, new patterns for data storage and
computation emerged. One of these tools is a data mart,
as you may recall a data mart is a subject oriented database that can be
a subset of a larger data warehouse. NBI, subject oriented, describes something
that is associated with specific areas or departments of a business such as finance,
sales or marketing. As you’re learning, BI projects commonly
focus on answering various questions for different teams. So a data mart is a convenient way to
access the relevant data that needs to be pulled for a particular project. Now, let’s check out data lakes. A data lake is a database system that
stores large amounts of raw data in its original format until it’s needed. This makes the data easily accessible, because it doesn’t require
a lot of processing. Like a data warehouse, a data lake
combines many different sources, but data warehouses are hierarchical with
files and folders to organize the data. Whereas data lakes are flat and while data
has been tagged so it is identifiable, it’s not organized, it’s fluid,
which is why it’s called a data lake. Data lakes don’t require the data
to be transformed before storage. So they are useful if your BI system is
ingesting a lot of different data types. But of course, the state eventually
needs to get organized and transformed. One way to integrate data lakes
into a data system is through ELT previously we learned
about the ETL process, where data is extracted from
the source into the pipeline. Transformed, while it is being transported
and then loaded into its destination. ELT takes the same steps but
reorganizes them so that the pipeline Extracts,
Loads and then Transforms the data. Basically ELT is a type of data pipeline
that enables data to be gathered from different sources. Usually data lakes, then loaded into
a unified destination system and transformed into a useful format. ELT enables BI professionals to ingest so
many different kinds of data into a storage system as soon
as that data is available. And they only have to transform the data
they need, ELT also reduces storage costs and enables businesses to scale storage
and computation resources independently. As technology advances, the processes and tools available
also advance and that’s great. Some of the most successful BI
professionals do well because they are curious lifelong learners.

Reading: ETL versus ELT

Reading

Video: The five factors of database performance

Database performance is a measure of the workload that can be processed by a database, as well as the associated costs. The factors that influence database performance are:

  • Workload: The combination of transactions, queries, analysis, and system commands being processed by the database system at any given time.
  • Throughput: The overall capability of the database’s hardware and software to process requests.
  • Resources: The hardware and software tools available for use in a database system, such as disk space and memory.
  • Optimization: Maximizing the speed and efficiency with which data is retrieved in order to ensure high levels of database performance.
  • Contention: When two or more components attempt to use a single resource in a conflicting way.

Understanding these factors can help you to improve the performance of your database system.

Tutorial on “The five factors of database performance” in Business Intelligence

Database performance is a critical factor in any business intelligence (BI) system. When a database is performing well, users can get the information they need quickly and easily. This can lead to better decision-making and improved business outcomes.

There are five factors that influence database performance:

  1. Workload: The workload of a database is the combination of transactions, queries, analysis, and system commands that are being processed at any given time. The workload can fluctuate depending on the time of day, week, or month. For example, the workload may be higher at the end of the month when reports are being run.
  2. Throughput: Throughput is the rate at which a database can process requests. It is measured in transactions per second (TPS). Throughput is affected by the hardware and software of the database system, as well as the workload.
  3. Resources: Resources are the hardware and software tools that are available to the database system. This includes the CPU, memory, disk space, and network bandwidth. Resources can affect throughput and performance.
  4. Optimization: Optimization is the process of tuning the database system to improve performance. This can include things like creating indexes, partitioning tables, and using caching.
  5. Contention: Contention occurs when two or more components of the database system are trying to access the same resource at the same time. This can lead to performance degradation.

How to improve database performance

There are a number of things that you can do to improve database performance. Here are a few tips:

  • Understand the workload: The first step to improving database performance is to understand the workload. This includes identifying the most common types of queries and transactions that are being processed. Once you understand the workload, you can start to optimize the database system for those specific tasks.
  • Tune the database: Tuning the database is another important way to improve performance. This involves adjusting the configuration of the database system to improve performance. For example, you may need to create indexes or partition tables.
  • Monitor the database: It is important to monitor the database system on a regular basis to identify any potential performance problems. This can be done using a variety of tools and techniques.
  • Upgrade the hardware: If the database system is overloaded, you may need to upgrade the hardware. This can include adding more CPU, memory, or disk space.

Conclusion

By understanding the five factors that influence database performance and taking steps to improve performance, you can ensure that your BI system is able to meet the needs of your users.

A database is performing slowly because multiple components are attempting to use the same piece of data at the same time. Which of the factors of database performance should be addressed?

Contention

The factor of contention should be addressed. Contention occurs when two or more components attempt to use a single resource in a conflicting way.

We’ve been investigating
database optimization and why it’s important to make sure that users are
able to get what they need from the system as
efficiently as possible. Successful optimization can be measured by the
database performance. Database performance
is a measure of the workload that can be
processed by a database, as well as the associated costs. In this video, we’re
going to consider the factors that influence
database performance, workload, throughput, resources, optimization,
and contention. First, we’ll start
with workload. In BI, workload refers to the combination of
transactions, queries, analysis, and system
commands being processed by the database
system at any given time. It’s common for a
database’s workload to fluctuate drastically
from day to day, depending on what jobs
are being processed and how many users are
interacting with the database. The good news is that you can often predict these
fluctuations. For instance, there might be a higher workload at the end
of the month when reports are being processed
or the workload might be really light
right before a holiday. Next, we have throughput. Throughput is the
overall capability of the database’s hardware and
software to process requests. Throughput is made up of
the input and output speed, the central processor
unit speed, how well the machine can
run parallel processes, the database management system, and the operating system
and system software. Basically, throughput describes a workload size that
the system can handle. Let’s get into resources. In BI, resources are the hardware and software tools available for use in
a database system. This includes the disk
space and memory. Resources are a big part of a database system’s ability to process requests
and handle data. They can also fluctuate, especially if the hardware or other dedicated resources are shared with
additional databases, software applications,
or services. Also, cloud-based systems are particularly prone
to fluctuation. It’s useful to remember that external factors can
affect performance. Now we come to optimization. Optimization involves maximizing the speed
and efficiency with which data is
retrieved in order to ensure high levels of
database performance. This is one of the most
important factors that BI Professionals return
to again and again. Coming up soon, we’re going to talk about it in more detail. Finally, the last factor of database performance
is contention. Contention occurs when
two or more components attempt to use a single
resource in a conflicting way. This can really
slow things down. For instance, if there are multiple processes trying to update the same piece of data, those processes
are in contention. As contention increases, the throughput of the
database decreases. Limiting contention as
much as possible will help ensure the database is
performing at its best. There you have five factors
of database performance, workload, throughput, resources, optimization,
and contention. Coming up, we’re
going to check out an example of these
factors in action so you can understand
more about how each contributes to
database performance.

Reading: A guide to the five factors of database performance

Reading

Video: Optimize database performance

Database optimization is the process of maximizing the speed and efficiency with which data is retrieved in order to ensure high levels of database performance.

BI professionals optimize databases by:

  • Examining resource use to identify inefficient queries, indexes, partitions, data fragmentation, and memory/CPU constraints.
  • Rewriting inefficient queries, creating new indexes, partitioning data appropriately, and defragmenting data.
  • Ensuring that the database has the capacity to handle the organization’s demands.

By addressing these issues, BI professionals can improve database performance and make it easier for users to access the data they need.

Additional notes:

  • Query plans can be used to identify steps in a query that are causing performance problems.
  • Data partitioning is a common practice in cloud-based systems working with big data.
  • Fragmented data can occur when data is broken up into many pieces that are not stored together.
  • It is important to monitor database performance to ensure that the database is able to meet the needs of the organization.

Database performance is critical for business intelligence (BI) systems. When a database is performing well, users can get the information they need quickly and easily. This can lead to better decision-making and improved business outcomes.

Here are some tips on how to optimize database performance in BI:

  • Understand the workload: The first step to optimizing database performance is to understand the workload. This includes identifying the most common types of queries and transactions that are being processed. Once you understand the workload, you can start to optimize the database system for those specific tasks.
  • Tune the database: Tuning the database is another important way to improve performance. This involves adjusting the configuration of the database system to improve performance. For example, you may need to create indexes or partition tables.
  • Monitor the database: It is important to monitor the database system on a regular basis to identify any potential performance problems. This can be done using a variety of tools and techniques.
  • Upgrade the hardware: If the database system is overloaded, you may need to upgrade the hardware. This can include adding more CPU, memory, or disk space.

Here are some additional tips for optimizing database performance in BI:

  • Use efficient queries: When writing queries, try to use the most efficient methods possible. This includes using indexes, avoiding unnecessary subqueries, and using the appropriate data types.
  • Partition data: Partitioning data can improve performance by dividing the data into smaller, more manageable chunks. This can be especially helpful for large datasets.
  • Use caching: Caching can improve performance by storing frequently accessed data in memory. This can reduce the number of times the database has to be accessed.
  • Use a database management system (DBMS) that is designed for BI: Some DBMSs are specifically designed for BI workloads. These DBMSs typically have features that can improve performance, such as columnar storage and in-memory analytics.

By following these tips, you can improve database performance and make it easier for BI users to access the information they need.

Here is an example of how to optimize database performance in BI:

Suppose you have a BI system that is used to generate sales reports. The reports are based on a large dataset of sales transactions. The queries that are used to generate the reports are slow, and users are complaining that it takes too long to get the reports they need.

To optimize database performance, you can start by analyzing the workload. Identify the most common types of queries that are being used to generate the reports. Once you have identified the most common queries, you can start to optimize them.

For example, you may need to create indexes on the tables that are used in the most common queries. You may also need to partition the data so that the queries can run more efficiently.

In addition to optimizing the queries, you can also improve performance by tuning the database. For example, you may need to adjust the memory allocation for the database or increase the number of worker threads.

Finally, you should monitor the database performance on a regular basis to identify any potential problems. You can use a variety of tools to monitor the database, such as the database’s own performance monitoring tools or third-party monitoring tools.

By following these steps, you can optimize database performance and improve the performance of your BI system.

What is the process of dividing a database into distinct, logical parts in order to improve query processing and increase manageability?

Data partitioning

Data partitioning is the process of dividing a database into distinct, logical parts in order to improve query processing and increase manageability. Ensuring data is partitioned appropriately is a key part of database performance optimization.

Recently, we’ve been learning a lot about database
performance. As a refresher, this is a measure of the
workload that can be processed by the database as
well as associated costs. We also explored optimization, which is one of the
most important factors of database performance. You recall that optimization involves maximizing
the speed and efficiency with which
data is retrieved in order to ensure high levels
of database performance. In this video, we’re going to focus on optimization and how BI professionals optimized
databases by examining resource use and identifying better data sources
and structures. Again, the goal is to enable
the system to process the largest possible workload at the most reasonable cost. This requires a
speedy response time, which is how long it takes
for a database to respond to a user request.
Here’s an example. Imagine you’re a BI professional receiving emails from
people on your team who say that it’s taking
longer than usual for them to pull the data
they need from the database. At first, this seems like a
pretty minor inconvenience, but a slow database can be disruptive and cost your
team a lot of time. If they have to stop and
wait whenever they need to pull data or
perform a calculation, it really affects their work. There are a few reasons that users might be
encountering this issue. Maybe the queries aren’t
fully optimized or the database isn’t properly
indexed or partitioned. Perhaps the data is fragmented, where there isn’t
enough memory or CPU. Let’s examine each of these. First, if the queries users are writing to interact with the
database are inefficient, it can actually slow down
your database resources. To avoid this, the
first step is to simply revisit the queries to ensure they’re as
efficient as possible. The next step is to
consider the query plan. In a relational database
system that uses SQL, a query plan is a
description of the steps the database system takes in
order to execute a query. As you’ve learned, a query
tells a system what to do, but not necessarily
how to do it. The query plan is the how. If queries are running slowly, checking the query plan to
find out if there are steps causing more draw than
necessary can be helpful. This is another
iterative process. After checking the query plan, you might rewrite
the query or create new tables and then check
the query plan again. Now let’s consider indexing. An index is an
organizational tag used to quickly locate data
within a database system. If the tables within a database haven’t been fully indexed, it can take the database
longer to locate resources. In Cloud-based systems
working with big data, you might have data partitions
instead of indexes. Data partitioning is the process of dividing a database into distinct logical
parts in order to improve query processing
and increase manageability. The distribution of data within the system is
extremely important. Ensuring that data has been partitioned appropriately
and consistently, is part of optimization too. The next issue is
fragmented data. Fragmented data
occurs when data is broken up into many pieces
that are not stored together. Often as a result of using the data frequently or creating, deleting, or modifying files. For example, if you are
accessing the same data often and versions of it are
being saved in your cache, those versions are actually causing fragmentation
in your system. Finally, if your
database is having trouble keeping up with your
organization’s demands, it might mean there
isn’t enough memory available to process
everyone’s requests. Making sure your database
has the capacity to handle everything you
ask of it it’s critical. Consider our example again. You received some emails from the team stating that it was taking longer than usual to
access data from database. After learning about the
slowdown from your team, you were able to assess the situation and
make some fixes. Addressing the issues
allowed you to ensure the database was working as efficiently as possible
for your team. Problem-solved. But
database optimization is an ongoing process
and you’ll need to continue to monitor performance to keep everything
running smoothly.

Reading: Indexes, partitions, and other ways to optimize

Reading

Practice Quiz: Activity: Partition data and create indexes in BigQuery

Code

Reading: Activity Exemplar: Partition data and create indexes in BigQuery

Reading: Case study: Deloitte – Optimizing outdated database systems

Reading

Video: The five factors in action

  • Workload: This is the combination of transactions, queries, data warehousing analysis, and system commands being processed by the database system at any given time. In this case, most of the workload is processing user requests such as generating scheduled reports or fulfilling queries. If the database can’t handle the workload, it might cause the system to crash, disrupting user’s ability to access and use the data.
  • Throughput: This is the overall capability of the database’s hardware and software to process requests. Because the movie theater system is mostly focused on analysis of data from OLTP databases, they are working with an OLAP database that primarily uses cloud storage. The database storage processes and the computers within the system, which are accessing the cloud data, need to be capable of handling the theaters workload, especially when the database system is being used a lot.
  • Resources: The hardware and software that compose the system’s throughput are the resources. For example, the movie theaters might use a cache controller disc to help the database manage the storage and retrieval of data from the memory systems.
  • Optimization: Ideally users should be able to access transaction data that has been ingested from multiple other database systems. If retrieval slows down, it can take longer to get the data and provide insights to stakeholders. This is why keeping the database optimized even after it has been set up is important.
  • Contention: The movie theater company has a team with many different analysts accessing and using this data. That’s in addition to the automated transformations being applied to the data and the reports being generated. All these requests can end up competing with each other and cause contention. And this can potentially be problematic if the system processes multiple requests at the same time, essentially making the same updates over and over. To limit this, the database processes queries and the order the requests are made.

It is important to consider all five of these factors when designing and managing a database system, as they can all have a significant impact on performance.

The Five Factors in Action in Business Intelligence

The five factors of business intelligence (BI) performance are workload, throughput, resources, optimization, and contention. These factors are essential considerations for any BI professional, as they can all have a significant impact on performance.

Workload

Workload is the combination of transactions, queries, data warehousing analysis, and system commands being processed by the BI system at any given time. In general, the higher the workload, the more demanding it will be on the system’s resources.

Throughput

Throughput is the overall capability of the BI system’s hardware and software to process requests. It is measured in terms of the number of queries or transactions that can be processed per second.

Resources

The resources available to the BI system include the hardware and software components that make up the system, such as the CPU, memory, storage, and network. The availability of resources can have a significant impact on throughput and performance.

Optimization

Optimization refers to the process of improving the performance of the BI system by tuning the hardware and software components, as well as the database and application code. Optimization can help to improve throughput, reduce response times, and improve overall performance.

Contention

Contention occurs when multiple users or processes are competing for the same resources. In the context of BI, contention can occur when multiple users are running complex queries, or when the system is performing resource-intensive tasks such as data loading or indexing.

How the Five Factors Work Together

The five factors of BI performance are all interrelated. For example, if the workload is high, it may be necessary to increase the resources available to the system in order to maintain throughput. Similarly, if the system is experiencing contention, it may be necessary to optimize the system or workload to improve performance.

Example

Consider a BI system that is used to generate reports on sales data. The workload for this system is likely to be highest during the peak sales season. During this time, the system may need to process a large number of queries from different users. If the system does not have enough resources to handle the workload, it may experience performance problems, such as slow response times or timeouts.

To improve performance, the BI administrator could increase the resources available to the system, such as by adding more CPU or memory. The administrator could also optimize the system by tuning the database and application code. Additionally, the administrator could work with users to reduce the workload during peak times.

Conclusion

The five factors of BI performance are essential considerations for any BI professional. By understanding these factors and how they work together, BI professionals can improve the performance of their systems and deliver better insights to their users.

Here are some additional tips for improving BI performance:

  • Use a data warehouse or data lake to store and manage your data. This will help to improve performance by separating the data from the operational systems.
  • Optimize your database queries. This can be done by using the appropriate indexes and by writing efficient SQL code.
  • Use a caching layer to store frequently accessed data in memory. This can help to improve performance by reducing the number of database queries that need to be executed.
  • Use a load balancer to distribute the workload across multiple BI servers. This can help to improve performance and scalability.
  • Monitor your BI system performance and make adjustments as needed. This can be done using a variety of tools and techniques, such as system monitoring tools and performance testing.

Earlier, we learned about the five
factors of database performance: Workload, throughput, resources,
optimization and contention. But how do they actually operate
within a working system. Let’s explore a database and witness how
the five factors affect its performance. Before we get into how the five
factors influence this database, let’s understand how it’s been designed. In this example we’ll be checking
out a movie theater chain system. There are a few things that we’ll
need to consider during optimization. First let’s think about what
this database is being used for. In this case the movie theater chain uses
data related to ticket purchases, revenue and audience preferences in order to make
decisions about what movies to play and potential promotions. Second we’ll consider where
the data is coming from. In this example it’s being
pushed from multiple sources into an OLAP system
where analysis takes place. Also the database uses data from
individual theaters OLTP systems in order to explore trends and ticket sales for
different movie times and genres. The OLTP systems that manage transaction
data, uses snowflake database model. At the center there is a fact
table capturing the most important information about the tickets
such as whether or not a specific seat has been reserved,
the reservation type, the screening ID. The employee ID of whoever entered
the reservation and the seat number. In order to capture details about
these facts, the model also includes several dimension tables connected to the
fact table with information on employee, movie, screening, auditorium,
seat and reservation. This database is fairly
straightforward and enables each movie theater to record
data in these different tables and prevents them from accidentally
booking the same seat twice. However, these individual OLTP
systems aren’t designed for analysis, which is why the data needs to be pulled
into the destination OLAP system. There, it can be accessed and explored
by users in order to gain insights and make business decisions. Okay, now that we know a little
more about our database, let’s find out how the five factors
of database performance influence it. First as you know, workload is
a combination of transactions, queries, data warehousing analysis and system commands being processed by
the database system at any given time. In this case, most of the workload
is processing user requests such as generating scheduled reports or
fulfilling queries. If the database can’t handle the workload,
it might cause the system to crash disrupting user’s ability to access and
use the data. Maybe report requires a lot
of resources to generate or there might be a growing number
of analysts accessing this data. But we know that it’s often possible
to predict peak workload times. So we can make adjustments to ensure
the system can handle these requests. Now, let’s explore throughput. Again this is the overall capability
of the database’s hardware and software to process requests. Because our movie theater system is
mostly focused on analysis of data from OLTP databases, were working with an OLAP database
that primarily uses cloud storage. The database storage processes and
the computers within the system, they’re accessing the cloud data. Need to be capable of handling
the theaters workload, especially when the database
system is being used a lot. The hardware and software that compose
the system’s throughput are the resources. For example, the movie theaters might
use a cache controller disc to help the database manage the storage and
retrieval of data from the memory systems. Next we have optimization,
which you’ve already learned a lot about. Ideally users should be able to access
transaction data that has been ingested from multiple other database systems. If retrieval slows down,
it can take longer to get the data and provide insights to stakeholders. This is why keeping the database optimized
even after it has been set up is important. The last factor,
database performance is contention. The movie theater company has a team with
many different analysts accessing and using this data. That’s in addition to the automated
transformations being applied to the data and the reports being generated. All these requests can end up competing
with each other and cause contention. And this can potentially be problematic
if the system processes multiple requests at the same time, essentially
making the same updates over and over. To limit this,
the database processes queries and the order the requests are made. And now, you’ve gotten a chance to
explore how the five factors of database performance might affect
a real database system. No matter how simple or complex,
these are essential considerations for any BI profession.

Reading: Determine the most efficient query

Upgraded Plugin: Design: Optimize for database speed

Practice Quiz: Test your knowledge: Database performance

Fill in the blank: A data mart is a _ database that can be a subset of a larger data warehouse. This means it is a convenient way to access the data pertaining to specific areas or departments of a business.

A business intelligence team manager wants to support their team’s ability to perform at a high level. They investigate the overall capability of their company’s database hardware and software tools to enable the team to process stakeholder requests. In this situation, which of the factors of database performance do they consider?

What term is used to describe data that is broken up into many pieces that are not stored together?

Review: Dynamic database design


Video: Wrap-up

You have been learning about database design and the role of BI professionals in creating and maintaining useful database systems. You have also learned about the five factors of database performance, database optimization strategies, and the importance of monitoring database performance.

As a BI professional, developing processes that enable your team to pull insights themselves is a key part of the job. However, systems and processes change over time, so it is important to continue to monitor database performance.

In the next lesson, you will learn more about optimizing systems and the tools you will create as a BI professional. You will also learn about optimizing ETL processes.

You’ve been learning about
database design and the role BI professionals
play in creating and maintaining useful
database systems. So far, you focus on the five factors of
database performance; workload, throughput, resources, optimization,
and contention. You also learn some strategies specifically for
database optimization, and what issues to check for if your team members start
noticing a slowdown. You even explored how the five factors can
affect actual databases, database optimization, and the importance of keeping
databases up to speed. As a BI professional, developing processes
that enable your team to pull insights themselves
is a key part of the job. But systems and processes
change over time, they stop working or
need to be updated. That’s one of the reasons
why continuing to monitor database performance
is so important. The database system should have lasting high performance levels. Coming up, you’re going
to discover more about optimizing systems and the tools you’ll create as a
BI professional. But first, you have
another weekly challenge. As always, feel free to check back over any
of the material, and review the glossary to
prepare yourself for success. Once you’ve completed
your assessment, I’ll meet you back
here for more about optimizing ETL
processes. Great job.

Reading: Glossary terms from module 2

Reading

Quiz: Module 2 challenge

Which of the following statements accurately describe data marts and data lakes? Select all that apply.

What are some key benefits of ELT data pipelines in business intelligence?

What is a measure of the workload that can be processed by a database, as well as the associated costs?

Which of the following statements accurately describes workload with regards to database performance?

When evaluating a database system’s resources, what does a business intelligence professional consider? Select all that apply.

Optimization involves decreasing _, which is how long it takes for a database to respond to a user request.

A business intelligence professional is investigating the steps their database system takes in order to execute a query. They discover that creating a new table will enhance performance. What does this scenario describe?

Which of the following statements accurately describe indexes versus data partitions? Select all that apply.

Fill in the blank: Fragmented data occurs when data is broken up into many pieces that are not_____, often as a result of using the data frequently.

When two or more data analysts attempt to use a single data resource in a conflicting way, what is the result?