You’ll learn more about database systems, including data marts, data lakes, data warehouses, and ETL processes. You’ll also investigate the five factors of database performance: workload, throughput, resources, optimization, and contention. Finally, you’ll consider how to design efficient queries that get the most from a system.

  • Discover strategies to create an ETL process that works to meet organizational and stakeholder needs and maintain an ETL process efficiently.
  • Understand what the different data storage and extraction processes and tools may include (Extract/L: Stitch/Segment/Fivetran, Transform: DBT/Airflow/Looker).
  • Explain how to optimize when building new tables.
  • Identify and describe where new tables can fit in the pipeline.
  • Recognize the different aspects of databases, including OLAP and OLTP, columnar and relational, distributed and single-homed databases.
  • Understand the importance of database performance and optimization.
  • Describe the different five factors of database performance: workload, throughput, resources, optimization, and contention.
  • Perform pipeline debugging using queries.

This video series will explore how to improve the performance of data pipelines by understanding and optimizing database queries. This will allow you to deliver the most up-to-date information to your stakeholders more efficiently.

The series will cover the following topics:

  • How to increase throughput and minimize resource contention
  • Database systems, including data marts, data lakes, data warehouses, and ELT processes
  • The five factors of database performance: workload, throughput, resources, optimization, and contention
  • Tips for improving database intake and storage
  • How to design efficient queries

By the end of the series, you will be able to:

  • Optimize database queries to improve the performance of data pipelines
  • Choose the right database system for your needs
  • Understand the factors that affect database performance
  • Improve database intake and storage
  • Design efficient queries that get the most out of your systems

In addition to data warehouses, there are other data storage and processing patterns that BI professionals may encounter, such as data marts, data lakes, and ELT processes.

Data marts: Data marts are subject-oriented databases that can be a subset of a larger data warehouse. They are useful for accessing the relevant data that needs to be pulled for a particular project.

Data lakes: Data lakes are database systems that store large amounts of raw data in its original format until it’s needed. This makes the data easily accessible, because it doesn’t require a lot of processing.

ELT processes: ELT stands for Extract, Load, and Transform. It is a type of data pipeline that enables data to be gathered from different sources, usually data lakes, then loaded into a unified destination system and transformed into a useful format.

These new technologies and processes offer a number of advantages over traditional data warehouses, such as:

  • Increased flexibility and scalability
  • Reduced storage costs
  • Faster data processing
  • Ability to handle a wider variety of data types

BI professionals who are curious and lifelong learners will be well-positioned to take advantage of these new technologies and processes to deliver better insights to their stakeholders.

A data mart is a subset of a data warehouse that is focused on a specific business area or department. For example, a company might have a data mart for its sales team, its marketing team, and its finance team.

Data marts are typically smaller and more focused than data warehouses, which makes them faster and easier to query. They are also often more affordable to build and maintain.

A data lake is a central repository that stores all of a company’s data, regardless of its format or structure. Data lakes can store structured data, such as relational database tables, as well as unstructured data, such as images, videos, and text files.

Data lakes are often used to store raw data that has not yet been processed or analyzed. This data can then be analyzed using a variety of tools and techniques, such as machine learning and artificial intelligence.

The ETL process is a three-step process for extracting data from source systems, transforming it into a consistent format, and loading it into a target system.

  1. Extract: The first step is to extract the data from the source systems. This may involve connecting to the source systems and querying the data, or it may involve exporting the data from the source systems into files.
  2. Transform: The second step is to transform the data into a consistent format. This may involve cleaning the data, removing errors, and converting the data to a common data model.
  3. Load: The third step is to load the data into the target system. This may involve connecting to the target system and inserting the data into tables, or it may involve importing the data into the target system from files.

The ETL process is important for ensuring that data is accurate, consistent, and accessible for analysis.


Data marts, data lakes, and the ETL process are all important components of modern data warehouses. Data marts provide fast and easy access to data for specific business areas or departments. Data lakes provide a central repository for all of a company’s data, regardless of its format or structure. And the ETL process ensures that data is accurate, consistent, and accessible for analysis.

When to use data marts

Data marts are a good choice for organizations that:

  • Need to provide fast and easy access to data for specific business areas or departments.
  • Have limited resources to build and maintain a data warehouse.
  • Need to comply with data privacy or security regulations.

When to use data lakes

Data lakes are a good choice for organizations that:

  • Need to store a large volume of data, including unstructured data.
  • Need to perform complex analytics on their data.
  • Need to be able to scale their data storage and processing capabilities quickly and easily.

When to use the ETL process

The ETL process should be used by any organization that needs to extract data from multiple source systems, transform it into a consistent format, and load it into a target system. This includes organizations that are using data marts, data lakes, or traditional data warehouses.

Choosing the right technology

The best technology for your organization will depend on your specific needs and requirements. If you are not sure which technology is right for you, it is a good idea to consult with a data expert.

Fill in the blank: A data lake is a database system that stores large amounts of _____ in its original format until it’s needed.

raw data

A data lake is a database system that stores large amounts of raw data in its original format until it’s needed. While the raw data has been tagged to be identifiable, it is not organized.

What is the term for a pipeline that extracts, loads, then transforms the data?


ELT is a pipeline that extracts, loads, then transforms the data. It enables data to be gathered from data lakes, loaded into a unified destination system, and transformed into a useful format.

Database performance is a measure of the workload that can be processed by a database, as well as the associated costs. The factors that influence database performance are:

  • Workload: The combination of transactions, queries, analysis, and system commands being processed by the database system at any given time.
  • Throughput: The overall capability of the database’s hardware and software to process requests.
  • Resources: The hardware and software tools available for use in a database system, such as disk space and memory.
  • Optimization: Maximizing the speed and efficiency with which data is retrieved in order to ensure high levels of database performance.
  • Contention: When two or more components attempt to use a single resource in a conflicting way.

Understanding these factors can help you to improve the performance of your database system.

Tutorial on “The five factors of database performance” in Business Intelligence

Database performance is a critical factor in any business intelligence (BI) system. When a database is performing well, users can get the information they need quickly and easily. This can lead to better decision-making and improved business outcomes.

There are five factors that influence database performance:

  1. Workload: The workload of a database is the combination of transactions, queries, analysis, and system commands that are being processed at any given time. The workload can fluctuate depending on the time of day, week, or month. For example, the workload may be higher at the end of the month when reports are being run.
  2. Throughput: Throughput is the rate at which a database can process requests. It is measured in transactions per second (TPS). Throughput is affected by the hardware and software of the database system, as well as the workload.
  3. Resources: Resources are the hardware and software tools that are available to the database system. This includes the CPU, memory, disk space, and network bandwidth. Resources can affect throughput and performance.
  4. Optimization: Optimization is the process of tuning the database system to improve performance. This can include things like creating indexes, partitioning tables, and using caching.
  5. Contention: Contention occurs when two or more components of the database system are trying to access the same resource at the same time. This can lead to performance degradation.

How to improve database performance

There are a number of things that you can do to improve database performance. Here are a few tips:

  • Understand the workload: The first step to improving database performance is to understand the workload. This includes identifying the most common types of queries and transactions that are being processed. Once you understand the workload, you can start to optimize the database system for those specific tasks.
  • Tune the database: Tuning the database is another important way to improve performance. This involves adjusting the configuration of the database system to improve performance. For example, you may need to create indexes or partition tables.
  • Monitor the database: It is important to monitor the database system on a regular basis to identify any potential performance problems. This can be done using a variety of tools and techniques.
  • Upgrade the hardware: If the database system is overloaded, you may need to upgrade the hardware. This can include adding more CPU, memory, or disk space.


By understanding the five factors that influence database performance and taking steps to improve performance, you can ensure that your BI system is able to meet the needs of your users.

A database is performing slowly because multiple components are attempting to use the same piece of data at the same time. Which of the factors of database performance should be addressed?


The factor of contention should be addressed. Contention occurs when two or more components attempt to use a single resource in a conflicting way.

Database optimization is the process of maximizing the speed and efficiency with which data is retrieved in order to ensure high levels of database performance.

BI professionals optimize databases by:

  • Examining resource use to identify inefficient queries, indexes, partitions, data fragmentation, and memory/CPU constraints.
  • Rewriting inefficient queries, creating new indexes, partitioning data appropriately, and defragmenting data.
  • Ensuring that the database has the capacity to handle the organization’s demands.

By addressing these issues, BI professionals can improve database performance and make it easier for users to access the data they need.

Additional notes:

  • Query plans can be used to identify steps in a query that are causing performance problems.
  • Data partitioning is a common practice in cloud-based systems working with big data.
  • Fragmented data can occur when data is broken up into many pieces that are not stored together.
  • It is important to monitor database performance to ensure that the database is able to meet the needs of the organization.

Database performance is critical for business intelligence (BI) systems. When a database is performing well, users can get the information they need quickly and easily. This can lead to better decision-making and improved business outcomes.

Here are some tips on how to optimize database performance in BI:

  • Understand the workload: The first step to optimizing database performance is to understand the workload. This includes identifying the most common types of queries and transactions that are being processed. Once you understand the workload, you can start to optimize the database system for those specific tasks.
  • Tune the database: Tuning the database is another important way to improve performance. This involves adjusting the configuration of the database system to improve performance. For example, you may need to create indexes or partition tables.
  • Monitor the database: It is important to monitor the database system on a regular basis to identify any potential performance problems. This can be done using a variety of tools and techniques.
  • Upgrade the hardware: If the database system is overloaded, you may need to upgrade the hardware. This can include adding more CPU, memory, or disk space.

Here are some additional tips for optimizing database performance in BI:

  • Use efficient queries: When writing queries, try to use the most efficient methods possible. This includes using indexes, avoiding unnecessary subqueries, and using the appropriate data types.
  • Partition data: Partitioning data can improve performance by dividing the data into smaller, more manageable chunks. This can be especially helpful for large datasets.
  • Use caching: Caching can improve performance by storing frequently accessed data in memory. This can reduce the number of times the database has to be accessed.
  • Use a database management system (DBMS) that is designed for BI: Some DBMSs are specifically designed for BI workloads. These DBMSs typically have features that can improve performance, such as columnar storage and in-memory analytics.

By following these tips, you can improve database performance and make it easier for BI users to access the information they need.

Here is an example of how to optimize database performance in BI:

Suppose you have a BI system that is used to generate sales reports. The reports are based on a large dataset of sales transactions. The queries that are used to generate the reports are slow, and users are complaining that it takes too long to get the reports they need.

To optimize database performance, you can start by analyzing the workload. Identify the most common types of queries that are being used to generate the reports. Once you have identified the most common queries, you can start to optimize them.

For example, you may need to create indexes on the tables that are used in the most common queries. You may also need to partition the data so that the queries can run more efficiently.

In addition to optimizing the queries, you can also improve performance by tuning the database. For example, you may need to adjust the memory allocation for the database or increase the number of worker threads.

Finally, you should monitor the database performance on a regular basis to identify any potential problems. You can use a variety of tools to monitor the database, such as the database’s own performance monitoring tools or third-party monitoring tools.

By following these steps, you can optimize database performance and improve the performance of your BI system.

What is the process of dividing a database into distinct, logical parts in order to improve query processing and increase manageability?

Data partitioning

Data partitioning is the process of dividing a database into distinct, logical parts in order to improve query processing and increase manageability. Ensuring data is partitioned appropriately is a key part of database performance optimization.

  • Workload: This is the combination of transactions, queries, data warehousing analysis, and system commands being processed by the database system at any given time. In this case, most of the workload is processing user requests such as generating scheduled reports or fulfilling queries. If the database can’t handle the workload, it might cause the system to crash, disrupting user’s ability to access and use the data.
  • Throughput: This is the overall capability of the database’s hardware and software to process requests. Because the movie theater system is mostly focused on analysis of data from OLTP databases, they are working with an OLAP database that primarily uses cloud storage. The database storage processes and the computers within the system, which are accessing the cloud data, need to be capable of handling the theaters workload, especially when the database system is being used a lot.
  • Resources: The hardware and software that compose the system’s throughput are the resources. For example, the movie theaters might use a cache controller disc to help the database manage the storage and retrieval of data from the memory systems.
  • Optimization: Ideally users should be able to access transaction data that has been ingested from multiple other database systems. If retrieval slows down, it can take longer to get the data and provide insights to stakeholders. This is why keeping the database optimized even after it has been set up is important.
  • Contention: The movie theater company has a team with many different analysts accessing and using this data. That’s in addition to the automated transformations being applied to the data and the reports being generated. All these requests can end up competing with each other and cause contention. And this can potentially be problematic if the system processes multiple requests at the same time, essentially making the same updates over and over. To limit this, the database processes queries and the order the requests are made.

It is important to consider all five of these factors when designing and managing a database system, as they can all have a significant impact on performance.

The Five Factors in Action in Business Intelligence

The five factors of business intelligence (BI) performance are workload, throughput, resources, optimization, and contention. These factors are essential considerations for any BI professional, as they can all have a significant impact on performance.


Workload is the combination of transactions, queries, data warehousing analysis, and system commands being processed by the BI system at any given time. In general, the higher the workload, the more demanding it will be on the system’s resources.


Throughput is the overall capability of the BI system’s hardware and software to process requests. It is measured in terms of the number of queries or transactions that can be processed per second.


The resources available to the BI system include the hardware and software components that make up the system, such as the CPU, memory, storage, and network. The availability of resources can have a significant impact on throughput and performance.


Optimization refers to the process of improving the performance of the BI system by tuning the hardware and software components, as well as the database and application code. Optimization can help to improve throughput, reduce response times, and improve overall performance.


Contention occurs when multiple users or processes are competing for the same resources. In the context of BI, contention can occur when multiple users are running complex queries, or when the system is performing resource-intensive tasks such as data loading or indexing.

How the Five Factors Work Together

The five factors of BI performance are all interrelated. For example, if the workload is high, it may be necessary to increase the resources available to the system in order to maintain throughput. Similarly, if the system is experiencing contention, it may be necessary to optimize the system or workload to improve performance.


Consider a BI system that is used to generate reports on sales data. The workload for this system is likely to be highest during the peak sales season. During this time, the system may need to process a large number of queries from different users. If the system does not have enough resources to handle the workload, it may experience performance problems, such as slow response times or timeouts.

To improve performance, the BI administrator could increase the resources available to the system, such as by adding more CPU or memory. The administrator could also optimize the system by tuning the database and application code. Additionally, the administrator could work with users to reduce the workload during peak times.


The five factors of BI performance are essential considerations for any BI professional. By understanding these factors and how they work together, BI professionals can improve the performance of their systems and deliver better insights to their users.

Here are some additional tips for improving BI performance:

  • Use a data warehouse or data lake to store and manage your data. This will help to improve performance by separating the data from the operational systems.
  • Optimize your database queries. This can be done by using the appropriate indexes and by writing efficient SQL code.
  • Use a caching layer to store frequently accessed data in memory. This can help to improve performance by reducing the number of database queries that need to be executed.
  • Use a load balancer to distribute the workload across multiple BI servers. This can help to improve performance and scalability.
  • Monitor your BI system performance and make adjustments as needed. This can be done using a variety of tools and techniques, such as system monitoring tools and performance testing.

Fill in the blank: A data mart is a _ database that can be a subset of a larger data warehouse. This means it is a convenient way to access the data pertaining to specific areas or departments of a business.

A business intelligence team manager wants to support their team’s ability to perform at a high level. They investigate the overall capability of their company’s database hardware and software tools to enable the team to process stakeholder requests. In this situation, which of the factors of database performance do they consider?

What term is used to describe data that is broken up into many pieces that are not stored together?

You have been learning about database design and the role of BI professionals in creating and maintaining useful database systems. You have also learned about the five factors of database performance, database optimization strategies, and the importance of monitoring database performance.

As a BI professional, developing processes that enable your team to pull insights themselves is a key part of the job. However, systems and processes change over time, so it is important to continue to monitor database performance.

In the next lesson, you will learn more about optimizing systems and the tools you will create as a BI professional. You will also learn about optimizing ETL processes.

