Skip to content

You’ll learn about optimization techniques including ETL quality testing, data schema validation, business rule verification, and general performance testing. You’ll also explore data integrity and learn how built-in quality checks defend against potential problems. Finally, you’ll focus on verifying business rules and general performance testing to make sure pipelines meet the intended business need.

Learning Objectives

  • Discover strategies to create an ETL process that works to meet organizational and stakeholder needs and how to maintain an ETL process efficiently.
  • Introduce tools used in ETL
  • Understand the primary goals of ETL quality testing.
  • Understand the primary goals of data schema validation.
  • Develop ETL quality testing and data schema validation best practices.
  • Identify and implement appropriate test scenarios and checkpoints for QA on data pipelines.
  • Explain different methods for QA data in the pipeline.
  • Create performance testing scenarios and measure performance throughout the pipeline.
  • Verify business rules.
  • Perform general performance testing.

Optimizing pipelines and ETL processes


Video: Welcome to module 3

In this section of the course, BI professionals will learn about ETL quality testing, data schema validation, verifying business rules, and general performance testing.

  • ETL quality testing ensures that data is extracted, transformed, and loaded to its destination without any errors or issues.
  • Data schema validation keeps source data aligned with the target database schema. A schema mismatch can cause system failures.
  • Verifying business rules ensures that the pipeline is fulfilling the business need it was intended to.
  • General performance testing ensures that the pipeline is performing efficiently.

These optimization processes are important for keeping the ETL running smoothly and ensuring that data is accurate and reliable.

You’ve learned a lot about
how BI professionals ensure that their organizations’ database systems and tools continue to be as
useful as possible. This includes evaluating whether fixes or updates are needed, and performing optimization
when necessary. Previously, we focused specifically on optimizing
database systems. Now it’s time to explore optimizing pipelines
and ETL processes. In this section of the course, you’ll learn about
ETL quality testing, data schema
validation, verifying business rules and general
performance testing. Through ETL quality testing, BI Professionals aim to confirm
that data is extracted, transformed, and loaded to its destination without
any errors or issues. This is especially
important because sometimes your pipeline might
start producing bad or misleading results. This can happen when
the original sources are changed without
your knowledge. Also, we’ll soon cover
data schema validation, which is used to
keep source data aligned with the target
database schema. A schema mismatch can
cause system failures. This is critical to keeping
the ETL running smoothly. We’ll also investigate
data integrity and how built-in quality checks defend against
potential problems. Finally, we’ll focus on
verifying business rules and general performance
testing to make sure the pipeline is fulfilling the business need
it was intended to. There’s a lot to come. So
let’s begin exploring the optimization processes
for pipelines and ETL.

Video: The importance of quality testing

ETL quality testing is the process of checking data for defects in order to prevent system failures. It involves seven validation elements:

  • Completeness: Confirming that the data contains all the desired components or measures.
  • Consistency: Confirming that data is compatible and in agreement across all systems.
  • Conformity: Confirming that the data fits the required destination format.
  • Accuracy: Confirming that the data conforms to the actual entity that’s being measured or described.
  • Redundancy: Ensuring that there isn’t any redundancy in the data.
  • Integrity: Checking for any missing relationships in the data values.
  • Timeliness: Confirming that data is current.

These elements are important for ensuring that data is accurate and reliable. BI professionals can use a variety of methods to test data quality, such as data mapping, data profiling, and data validation.

Tutorial on the Importance of Quality Testing in Business Intelligence

Introduction

Business intelligence (BI) is the process of collecting, analyzing, and interpreting data to help businesses make better decisions. Quality testing is the process of checking data for errors or inconsistencies before it is used in BI applications.

Why is quality testing important in BI?

Quality testing is important in BI because it helps to ensure that the data used to make decisions is accurate and reliable. If the data is inaccurate or unreliable, the decisions made using that data could be flawed.

This could lead to a number of problems, such as:

  • Lost revenue
  • Increased costs
  • Decreased customer satisfaction
  • Damage to the company’s reputation

What are the benefits of quality testing in BI?

There are a number of benefits to quality testing in BI, including:

  • Improved data accuracy and reliability
  • Reduced risk of errors in decision-making
  • Increased confidence in BI results
  • Improved compliance with regulations
  • Enhanced ability to identify and mitigate risks
  • Improved business performance

How to perform quality testing in BI

There are a number of different ways to perform quality testing in BI. Some common methods include:

  • Data profiling: Data profiling is the process of analyzing data to identify its characteristics, such as data types, value ranges, and patterns. This information can be used to identify potential errors or inconsistencies in the data.
  • Data validation: Data validation is the process of checking data to ensure that it meets certain criteria, such as being within a specific value range or conforming to a specific format.
  • Data reconciliation: Data reconciliation is the process of comparing data from different sources to identify any discrepancies.
  • Data cleansing: Data cleansing is the process of correcting or removing errors and inconsistencies from data.

Conclusion

Quality testing is an important part of any BI process. By ensuring that the data used in BI applications is accurate and reliable, businesses can make better decisions and improve their performance.

Here are some additional tips for performing quality testing in BI:

  • Identify the risks: Before you start testing, identify the areas where the data is most likely to be inaccurate or incomplete. This will help you to focus your testing efforts on the most important areas.
  • Use a variety of testing methods: There is no one-size-fits-all approach to quality testing in BI. Use a variety of testing methods, such as data profiling, data validation, data reconciliation, and data cleansing, to ensure that the data is accurate and reliable.
  • Automate your testing: If possible, automate your quality testing process. This will save you time and help you to ensure that your data is always tested before it is used in BI applications.
  • Monitor your test results: Once you have tested your data, monitor the test results to identify any trends or patterns. This information can be used to improve your quality testing process and to identify areas where the data needs to be improved.

By following these tips, you can ensure that the data used in your BI applications is accurate and reliable, and that you are making the best possible decisions for your business.

When quality testing, why does a business intelligence professional confirm data conformity?

To ensure the data fits the required destination format

When quality testing, a business intelligence professional confirms data conformity in order to ensure the data fits the required destination format.

You’re already familiar
with ETL pipelines, where data is extracted
from a source, transformed while
it’s being moved, and then loaded into
a destination table where it can be acted on. Part of the
transformation step in the ETL process, is
quality testing. In BI, quality testing
is a process of checking data for defects in order
to prevent system failures. The goal is to ensure the pipeline continues
to work properly. Quality testing can
be time-consuming, but it’s extremely important for an organization’s workflow. Quality testing involves seven validation
elements: completeness, consistency, conformity, accuracy, redundancy, integrity,
and timeliness. That’s a lot of elements
to keep in mind, but we’re going to
break down each in this video, starting
with completeness. Also, you may recall some of these concepts from the Google Data Analytics certificate. If you’d like, take
a few minutes to review that content
before moving ahead. Let’s start with checking
for completeness. This involves confirming
that the data contains all the desired
components or measures. For example, imagine you’re working with sales data and you have an ETL pipeline
that delivers monthly data to target tables. These target tables are used to generate reports
for stakeholders. If the data being moved through the pipeline is missing
a week of data or information about one of
the best-selling products or another key metric, then the calculations
used to create reports won’t have
complete accurate data. Next, we have consistency. You might have learned that
in a data analytics context, consistency deals with the
degree to which data is repeatable from different
points of entry or collection. In BI, it’s a bit different. Here, consistency involves
confirming that data is compatible and in
agreement across all systems. Imagine two systems, one is an HR database
with employee data, and the other is
a payroll system. If the HR database lists an employee who either isn’t
in the payroll system, or is listed differently there, that inconsistency
could create problems. Next is conformity. This element is
all about whether the data fits the required
destination format. Consider sales data
in an ETL pipeline. If the data’s being extracted
includes dates of sale that don’t match the dates that the destination table
is designed to hold, that’s going to create errors. Now, accuracy has to do
with the data conforming to the actual entity that’s
being measured or described. Another way of
thinking about this is if the data
represents real values. With that in mind, any mistyped
entries or errors from the source are problematic because they will be
reflected in the destination. Source systems
requiring a lot of manual data entry are more likely to have issues
with accuracy. If a purchase of a hamburger was miss entered as selling for a
million dollars, that’s something you
need to take care of before the data is loaded. If you’re using a relational
storage database, ensuring that there isn’t
any redundancy in the data, is another important
element of quality testing. In a BI context,
redundancy is moving, transforming, or storing more
than the necessary data. This occurs when the
same piece of data is stored in two or more places. Moving data through
a pipeline requires processing power,
time, and resources. It’s important not to move
any more data than you need. For instance, if
client company names are listed in multiple places, but are only required
to appear in one place in the
destination table, we wouldn’t want to waste resources on loading
that redundant data. Now we come to integrity. Integrity concerns the
accuracy, completeness, consistency, and
trustworthiness of data throughout its life cycle. In quality testing,
this often means checking for any missing relationships in
the data values. As an example, say a company sales
database is relational, BI professionals would depend
on those relationships to manipulate data within the database and
to query the data. Maybe they have product IDs and descriptions in a database. But if there’s a description and no corresponding
record with the ID, there’s now an issue
with data integrity. It’s essential to
make sure this is addressed before
moving on to analysis. Data mapping is one way to
make sure that the data from the source matches the data
in the target database. You’ll learn more
about this later. But basically, data
mapping is a process of matching fields from one
data source to another. Last but not least, you want to make sure
your data is timely. Timeliness involves confirming
that data is current. This check is done
specifically to make sure data has been updated with the most recent information that can provide
relevant insights. For example, if a data warehouse is supposed to
contain daily data, but doesn’t update properly then the pipeline can’t ingest
the latest information. BI professionals are mostly
interested in exploring current data in order to allow stakeholders to gain
the freshest insights. This definitely
won’t be possible if the data being moved
is already outdated. A lot goes into ETL
quality testing, and it can be a tricky process, but remembering these
seven key elements is a wonderful first step toward creating
high-quality pipelines. Coming up, you’re going to learn even more about these checks and other performance tests for
ETL processes. Bye for now.

Reading: Seven elements of quality testing

Reading

Upgraded Plugin: Validate: Data quality and integrity

Reading: Monitor data quality with SQL

Reading

Video: Mana: Quality data is useful data

Mana is a Senior Technical Data Program Manager at Google. She helps create tools that enable business partners to make better decisions with data. She believes that good data is essential for making good decisions.

Quality testing is the process of making sure that data is accurate, relevant, representative, and timely. It is important to test data quality at every stage of the data pipeline, from extraction to transformation to loading.

Mana wishes she had been more confident in her skills when she was starting out. She believes that even if you don’t have all the traditional skills for a job, you can still be successful if you are curious, open, and humble.

She encourages people to stay curious and to learn from others. She believes that it is important to constantly evolve and become better than you were yesterday.

My name is Mana and I am a Senior Technical Data
Program Manager at Google. What that means is that I work with business
partners and I help create tools that
help enable them to make better
decisions with data. I’m a big data nerd, so I love playing around with
data and getting to build cool stuff that makes
people’s jobs a lot easier. There’s a common saying that in the absence of data
you have dirt. I like to take that
a step further. I like to say in the absence
of good data, you have dirt. Quality testing is all about how do you make sure
that you have good data. Good data can mean a number
of different things. Oftentimes, it means
accurate data. How do you ensure that the
numbers you are producing are correct and they’re
representative of the truth. It can also mean relevant data. It can also mean
representative data. It can also mean quick
data at your fingertips. The process of quality
control is making sure that the tools that
you’re building with respect to data are accurate, helpful, relevant, and timely. There are many times
across the life-cycle of building a BI product in which quality testing
comes into play. If you think of the very
early stages where maybe someone is trying to
extract data from logs, you’re going to want to
make sure that the data that you’re extracting
is accurate. It’s the same data that’s
coming in through the logs, the same data that’s
being spit out from the data mart maybe
that I’m creating. It’s the same idea through your ETL processes where ETL means extract,
transform and load. As you’re grabbing the data,
you’re transforming it, you’re massaging it, and you’re creating relevancy
with that data. You want to make sure
that the data you got in is the same data
you’re spitting out. I remember when I was
a young professional, I was always imagining that one day I would land
at a company that was data nirvana and all of
their data would be so clean and so perfect and
it was just magical, I could just query it with
no worries in the world. The truth is that nirvana
does not exist anywhere. Data always has issues. There are always
bugs that come up. Even the data that you were
looking at today might be different than the data
that you look at tomorrow. Embedding quality testing, not only as a one-off, but as a regular process in your pipelines or
whatever that you’re building is of
incredible importance because bugs are just
bound to happen. There are a couple
of things that I wish I knew when I
was starting out. I wish that I had
fully been able to recognize how many
skills I had in this department that came from
other parts of my life and weren’t the traditional
means of BI skills, and with that, having
more confidence in myself because I came in with really fantastic
storytelling skills that I was really
able to hone in on. Even though I wasn’t
a software engineer, I have a lot of experience
coding and I knew a lot of best practices that I
was able to put in place. I would say that if
there are areas that you feel like you’re maybe
not the strongest in, that if you stay curious and you stay open and you stay humble, and you find folks that are really great at those
and you ask them, hey, how do you do that? Know that you can learn and you can grow in
those capacities, and you can become
continuously well-rounded. It’s very normal
for us as humans to have our strengths
and our growth areas. But being successful is
not having them innately. It’s having the
ability to constantly evolve better yourself and
not become the expert, but just become better
than you were yesterday.

Practice Quiz: Test your knowledge: Optimize pipelines and ETL processes

What is the business intelligence process that involves checking data for defects in order to prevent system failures?

Fill in the blank: Completeness is a quality testing step that involves confirming that the data contains all desired __ or components.

A business intelligence professional is considering the integrity of their data throughout its life cycle. Which of the following goals do they aim to achieve? Select all that apply.

Data schema validation


Video: Conformity from source to destination

Schema validation is a process of ensuring that the source system data schema matches the target database data schema. Schema validation properties should ensure three things: the keys are still valid after transformation, the table relationships have been preserved, and the conventions are consistent across the database.

Data dictionaries and data lineages are documentation tools that support data schema validation. A data dictionary is a collection of information that describes the content, format, and structure of data objects within a database, as well as their relationships. Data lineage describes the origin of data, where it has moved throughout the system, and how it has transformed over time.

Using schema validation, data dictionaries, and data lineages helps BI professionals promote consistency as data is moved from the source to destination.

Conformity from source to destination in Business Intelligence

In Business Intelligence (BI), conformity refers to ensuring that data is consistent as it is moved from source to destination. This is important for a number of reasons, including:

  • Ensuring that data is accurate and reliable
  • Enabling users to easily understand and compare data from different sources
  • Facilitating the creation of accurate and consistent reports and dashboards

There are a number of tools and techniques that can be used to achieve conformity from source to destination in BI. These include:

  • Schema validation: Schema validation is the process of ensuring that the structure of the data in the source system matches the structure of the data in the destination system. This can be done using a variety of tools, such as data profiling tools and data modeling tools.
  • Data mapping: Data mapping is the process of defining how data elements in the source system are related to data elements in the destination system. This can be done manually or using data mapping tools.
  • Data cleansing: Data cleansing is the process of identifying and correcting errors in data. This can be done manually or using data cleansing tools.
  • Data transformation: Data transformation is the process of converting data from one format to another. This can be done using a variety of tools, such as ETL (extract, transform, load) tools and data wrangling tools.
  • Data quality checks: Data quality checks are used to identify and measure the quality of data. This can be done using a variety of tools, such as data profiling tools and data quality monitoring tools.

By using these tools and techniques, BI professionals can ensure that data is consistent as it is moved from source to destination. This helps to ensure that data is accurate, reliable, and easy to use.

Here are some additional tips for achieving conformity from source to destination in BI:

  • Standardize data formats: As much as possible, use standard data formats for both source and destination systems. This will make it easier to map data between systems and reduce the need for data transformation.
  • Use common naming conventions: Use common naming conventions for data elements in both source and destination systems. This will make it easier for users to understand and compare data from different sources.
  • Document data transformations: Document any data transformations that are performed. This will help to ensure that data is transformed consistently and that users understand how data has been transformed.
  • Test data transformations: Test data transformations to ensure that they are working correctly. This will help to prevent errors in transformed data.
  • Monitor data quality: Monitor data quality to identify and correct errors in data. This will help to ensure that data is accurate and reliable.

By following these tips, BI professionals can help to ensure that data is consistent as it is moved from source to destination. This helps to ensure that data is accurate, reliable, and easy to use.

Fill in the blank: A _____ is a collection of information that describes the content, format, and structure of data objects within a database, as well as their relationships.

data dictionary

A data dictionary is a collection of information that describes the content, format, and structure of data objects within a database, as well as the relationships.

You’re learning a lot
about the importance of quality testing and ETL, and you now know that a key part of the
process is checking for conformity or whether the data fits the required
destination format. To ensure conformity, from source to destination, BI professionals have three
very effective tools; schema validation, data
dictionaries and data lineages. In this video, we’ll
examine how they can help you establish
consistent data governance. First, schema validation is
a process to ensure that the source system data schema matches the target
database data schema. As you’re learning, if
the schemas don’t align, this can cause system failures that are very difficult to fix. Building schema validation
into your workflow, is important to
prevent these issues. Database tools offer various schema validation
options that can be used to check incoming data against the destination
schema requirements. For example, you
could dictate that a certain column contains
only numerical data. Then if you try to enter something in that column
that doesn’t conform, the system will flag the error. Or in a relational database, you could specify
that an ID number must be a unique field. That means the same ID can’t be added if it matches
an existing entry. This prevents
redundancies in the data. With these properties
and action, if the data doesn’t conform
and throws an error, you’ll be alerted, or if
it meets the requirements, you’ll know it’s valid
and safe to load. Schema validation properties
should ensure three things. The keys are still valid
after transformation. The table relationships
have been preserved, and the conventions are
consistent across the database. Let’s start with the keys. As you’ve been learning,
relational databases use primary and foreign keys to build relationships
among tables. These keys should
continue to function after you’ve moved data from
one system into another. For example, if
your source system uses customer_id as a key, then that needs to be valid and the target schema as well. This is related to the next property of schema validation, making sure the table
relationships have been preserved. When taking in data
from a source system, it’s important that
these keys remain valid in the target system so
the relationships can still be used to
connect tables or that they are transformed to
match the target schema. For example, if the
customer_id key doesn’t apply to
our target system then all of the tables
that used it as a primary or foreign
key are disconnected. If there are relationships
between tables have been broken and while
data is being moved, then data becomes hard
to access and use, and that’s the whole reason we moved it to our target system. Finally, you want to ensure that the conventions are consistent with the target
database’s schema. Sometimes data from
outside sources uses different conventions for
naming columns and tables. For example, you could have
a source system that uses employee ID as one word
to identify that field, but the target database
might use employee_id. You’ll need to ensure these
are consistent so you don’t get errors when trying to
pull data for analysis. In addition to the
properties themselves, there are some other
documentation tools that support data
schema validation; data dictionaries
and data lineages. A data dictionary
is a collection of information that
describes the content, format and structure of data
objects within a database, as well as their relationships. You might also
hear this referred to as a metadata repository. You may know that metadata
is data about data. This is a very important
concept in BI, so if you’d like to review
some of the lessons about metadata from the Google
Data Analytics certificate, go ahead and do that now. In the case of
data dictionaries, these represent metadata because they’re basically using
one type of data, metadata, to define the use and origin of another
piece of data. There are several reasons
you might want to create a data dictionary
for your team. For one thing, it helps avoid inconsistencies
throughout a project. In addition, it enables you to define any conventions that other team members
need to know in order to create more
alignment across teams. Best of all, it makes the
data easier to work with. Now, let’s explore
data lineages. Data lineage describes a process of identifying the
origin of data, where it has moved
throughout the system, and how it has
transformed over time. This is useful because
if you do get an error, you can track the lineage
of that piece of data, and understand what happened along the way to
cause the problem. Then, you can put standards in place to avoid the same
issue in the future. Using schema validation,
data dictionaries, and data lineages really helps
BI professionals promote consistency as data is moved from the source
to destination. This means all users can be confident in the BI
solutions being created. We’ll keep exploring
these concepts soon.

Reading: Sample data dictionary and data lineage

Reading

Video: Check your schema

In this case study, an educational non-profit is ingesting data from school databases in order to evaluate learning goals, national education statistics, and students surveys.

The non-profit uses a data dictionary and lineage to establish the necessary standards for data consistency. The data dictionary records four specific properties for each column: the name of the column, its definition, the datatype, and possible values. The data lineage includes information about the data’s origin, where it is moved throughout the system, and how it has transformed over time.

The schema validation process flags an error for a piece of data that is not integer type. The data lineage is used to trace the journey of this piece of data and find out where in the process a quality check should be added.

In this case, the data was not type casted correctly when it was input into the schools’ original database. The transformation process in the pipeline does not include type casting.

The non-profit can improve their systems and prevent errors by incorporating type casting into the pipeline before data is read into the destination table.

Checking your schema in Business Intelligence (BI)

A schema is a set of rules that defines the structure and relationships of data in a database. In BI, it is important to check your schema regularly to ensure that it is accurate and up-to-date. This helps to prevent errors and ensure that data is consistent and reliable.

There are a number of ways to check your schema in BI. These include:

  • Using a data profiling tool: A data profiling tool can be used to analyze the structure and contents of your data. This can help you to identify any errors or inconsistencies in your schema.
  • Manually reviewing your schema: You can also manually review your schema to check for errors. This is a good way to ensure that your schema is complete and accurate.
  • Using a schema validation tool: A schema validation tool can be used to check your schema against a set of rules. This can help you to identify any violations of your schema rules.
  • Using a data quality tool: A data quality tool can be used to check the quality of your data. This can help you to identify any errors or inconsistencies in your data that may be caused by problems with your schema.

By using these methods, you can ensure that your schema is accurate and up-to-date. This helps to prevent errors and ensure that data is consistent and reliable.

Here are some additional tips for checking your schema in BI:

  • Document your schema: Documenting your schema helps to ensure that everyone who uses your data understands its structure and relationships. This can help to prevent errors and make it easier to maintain your schema.
  • Use standard naming conventions: Using standard naming conventions for your data elements helps to make your schema more readable and understandable. This can help to prevent errors and make it easier to maintain your schema.
  • Test your schema: Testing your schema helps to ensure that it is working correctly. This can be done by creating test data and running it through your BI system.
  • Monitor your schema: Monitoring your schema helps to ensure that it remains accurate and up-to-date. This can be done by tracking changes to your data and updating your schema accordingly.

By following these tips, you can help to ensure that your schema is accurate, up-to-date, and easy to use.

One of the best ways to learn
is through a case study. When you witness how something happened at an
actual organization, it really brings ideas
and concepts to life. In this video, we’re
going to check out schema governance in action
at an educational non-profit. In this scenario,
decision-makers at the non-profit are interested in measuring educational
outcomes in their community. In order to do this, they
are ingesting data from school databases in order
to evaluate learning goals, national education statistics,
and students surveys. Because they’re
pulling data from multiple sources into
their own database system, it’s important that they
maintain consistency across all the data to prevent errors and avoid losing
important information. Luckily, this
organization already has a data dictionary and lineage in place to establish the
necessary standards. Let’s check out an example of a column from the student
information table. This table has five columns. Student ID, school system, school, age, and
grade point average. Each column in this table
has been recorded in the data dictionary to specify what information
it contains, so we can go to the data
dictionary entry for the school system column to double-check the
standards for this table. As a refresher, a
data dictionary is a collection of information
that describes the content, format, and structure of data objects within a database
and their relationships. This dictionary records
four specific properties, the name of the column,
its definition, the datatype, and
possible values. The dictionary entry
for age lets us know the data objects in this column contain information
about a student’s age. It also tells us that this
is integer type data. We can use these
properties to compare incoming data to the
destination table. If any data objects
aren’t integer type data, then the schema validation
will flag the error before the incorrect data is ingested into the destination. What happens when a data object fails to schema
validation process? We can actually use
the data lineage to trace the journey
of this piece of data and find out where in the process we might want
to add a quality check. Again, a data lineage includes information
about the data’s origin, where it is moved
throughout the system, and how it has
transformed over time. During the schema
validation process, this piece of data through
an error because it isn’t currently cast
as integer type. When we check the data lineage, we can track this object’s
movement through our system. This data started in an
individual schools database before being read into the
school systems database. The individual schools
data was ingested by our pipeline along with data
from other school systems, and then organize
and transformed during the movement process. Apparently, when this data was input in the schools
original database, it wasn’t type casted correctly. We can confirm that by checking its datatype throughout
the lineage. Lineage also includes
all the transformations that this data has
undergone so far. Also, we might at this point notice that type casting
isn’t built into our transformation
process during quality checks.
That’s great news. Now we know that that’s a process we should
incorporate into the pipeline before data is read into the destination table. In this case, age data objects
should be integer type, and that’s how schema
governance and validation can help improve systems
and prevent errors. There are other tests
that should be applied to a pipeline to make sure
it’s functioning correctly, which we’ll learn
more about soon. But now you have a better understanding of how to validate the schema and keep improving
your pipeline processes.

Reading: Schema-validation checklist

Reading

Practice Quiz: Activity: Evaluate a schema using a validation checklist

Reading: Activity Exemplar: Evaluate a schema using a validation checklist

Reading

Practice Quiz: Test your knowledge: Data schema validation

A team of business intelligence professionals builds schema validation into their workflows. In this situation, what goal do they want to achieve?

Why is it important to ensure primary and foreign keys continue to function after data has been moved from one database system to another?

Fill in the blank: A _ describes the process of identifying the origin of data, where it has moved throughout the system, and how it has transformed over time.

Business rules and performance testing


Video: Verify business rules

Business rules are a critical aspect of ensuring that databases meet the needs of a business. By verifying that data complies with business rules, BI professionals can ensure that databases are performing as intended. Business rules are different for every organization and can change over time, so it is important to keep a record of what rules exist and why. Verifying business rules is similar to schema validation, but it involves comparing incoming data to the business rules of the organization. By verifying business rules, BI professionals can help ensure that databases are providing accurate and relevant information to stakeholders.

Verifying Business Rules in Business Intelligence

Business rules are statements that define or constrain some aspect of a business. In the context of business intelligence (BI), business rules are used to ensure that data is accurate, consistent, and compliant with the organization’s policies and procedures.

There are a number of ways to verify business rules in BI. One common approach is to use a business rule repository. A business rule repository is a central storehouse for all of the business rules that are used by an organization. The repository can be used to store the rules themselves, as well as metadata about the rules, such as the rule owner, the date the rule was created, and the rule’s status.

Another approach to verifying business rules is to use data quality tools. Data quality tools can be used to identify and correct data errors. They can also be used to ensure that data is consistent with business rules.

For example, a data quality tool could be used to identify duplicate records in a customer database. The tool could then be used to merge the duplicate records into a single record. This would ensure that the customer database is accurate and consistent.

In addition to using data quality tools, BI professionals can also verify business rules by manually reviewing data. This can be a time-consuming process, but it can be necessary to ensure that data is accurate and compliant with business rules.

Benefits of verifying business rules

There are a number of benefits to verifying business rules in BI. These benefits include:

  • Improved data quality: By verifying business rules, BI professionals can ensure that data is accurate, consistent, and compliant with the organization’s policies and procedures.
  • Reduced risk: By identifying and correcting data errors, BI professionals can reduce the risk of making bad decisions based on inaccurate data.
  • Increased efficiency: By ensuring that data is consistent, BI professionals can make it easier for users to find and use data.
  • Improved compliance: By verifying that data is compliant with business rules, BI professionals can help organizations meet their compliance obligations.

How to verify business rules

The following are some steps that BI professionals can take to verify business rules:

  1. Identify the business rules that need to be verified. This can be done by reviewing the organization’s policies and procedures, as well as by interviewing stakeholders.
  2. Determine how the business rules will be verified. This could involve using a business rule repository, data quality tools, or manual review.
  3. Gather the data that needs to be verified. This could involve extracting data from operational systems, or it could involve collecting data from users.
  4. Verify the data against the business rules. This could involve using automated tools or manual review.
  5. Document the results of the verification process. This could involve creating a report that summarizes the findings of the verification process.

Conclusion

Verifying business rules is an important part of ensuring that data is accurate, consistent, and compliant with the organization’s policies and procedures. By taking the steps outlined above, BI professionals can help ensure that their organizations are getting the most out of their data.

Fill in the blank: A business rule is a statement that creates a _____ on specific parts of a database.

restriction

A business rule is a statement that creates a restriction on specific parts of a database. It helps prevent errors within the system.

So far, we’ve learned a lot about
database performance, quality testing and schema validation and
how these checks ensure the database and pipeline system continue
to work as expected. Now we’re going to explore another
important check, making sure that the systems and processes you have
created actually meet business needs. This is essential for ensuring that those systems continue
to be relevant to your stakeholders. To do this by professionals
verify business rules. In BI, A business rule is a statement that
creates a restriction on specific parts of a database. For example, a shipping database might
impose a business rule that states shipping dates can’t
come before order dates. This prevents order dates and
shipping dates from being mixed up and causing errors within this system. Business rules are created according
to the way a particular organization uses its data in a previous video,
we discovered how important it can be to observe how a business uses data
before building up database system. Understanding the actual needs,
guides, design. And this is true for business rules too. The business rules you create will
affect a lot of the databases, design what data is collected and
stored, how relationships are defined, what kind of information the database
provides and the security of the data. This helps ensure the database
is performing as intended. Business rules are different in
every organization because the way organizations interact with
their data is always different. Plus business rules
are also always changing. Which is why keeping a record of what
rules exist and why is critical. Here’s another example:
consider a library database. The primary need of the users
who are librarians in this case is to check out books and maintain
information about patrons. Because of this, there are a few business rules this
library might impose on the database to regulate the system. One rule could be that library patrons
cannot check out more than five books at a time. The database won’t let a user check out
a sixth book. Or the database could have a rule that the same book cannot be checked
out by two people at the same time. If someone tries, the librarians would
be alerted that there’s a redundancy. Another business rule could be that
specific information must be entered into the system for a new book to
be added to the library inventory. Basically verification involves ensuring
that data imported into the target database complies with
business rules on top of that. These rules are important pieces
of knowledge that help a BI professional understand how a business and its processes function. This helps the BI professional become a subject matter expert and trusted advisor. As you’re probably noticing this process
is very similar to schema validation. In schema validation, you take the target database’s schema and
compare incoming data to it. Data that fails this check is not ingested
into the destination database. Similarly, you will compare incoming data
to the business rules before loading it into the database. In our library example,
if a patron puts in a request for a book but they already have more than
five books from the library, then this incoming data doesn’t comply with the
preset business rule, and it prevents them from checking out the book. And that’s
the basics about verifying business rules. These checks are important because they
ensure that databases do their jobs as intended. And because business rules are
so integral to the way databases function, verifying that they’re working
correctly is very important. Coming up, you’ll get a chance to
explore business rules in more detail.

Reading: Business rules

Reading

Reading: Database performance testing in an ETL context

Reading

Upgraded Plugin: Evaluate: Performance test your data pipeline

Reading

Reading: Defend against known issues

Reading

Video: Burak: Evolving technology

Burak is a BI Engineer in Google. He moved to the US five years ago and had a hard time getting his first job. He realized that he needed to learn new skills to meet the demands of the US market, so he started teaching himself what a BI Engineer does and what technical skills he needs.

He found a lot of free and paid resources online and learned the skills he needed to get his first job. He emphasizes that BI technology has evolved a lot in the past 10 years, and that spreadsheets are no longer the go-to tool for data analysis. Instead, cloud technologies, more sophisticated languages, and more powerful visualization tools are now being used.

Burak believes that it is important to keep up with the latest technology trends in order to be successful in BI engineering. He recommends following online resources, web pages, and newsletters to stay informed. However, he also emphasizes that the essentials of BI engineering remain the same. Once you learn a SQL language, the skills will be transferable to other organizations, even if the specific SQL dialect is different.

Burak’s number one piece of advice is to be dedicated. BI engineering requires a lot of technical skills, so it is important to be willing to put in the time and effort to learn them. He also recommends identifying which area of BI engineering you want to focus on, as there are several different areas that you can work in. The technical skills you need to learn will vary depending on which area you choose.

My name is Burak. I’m a BI Engineer in Google. What a BI Engineer does
is actually collecting, storing, and analyzing data with a lot of
infrastructure involved. I moved to States five years
ago as a fresh immigrant. It was very difficult to
get the first job for me. Even though I have a degree, I still needed to a lot
of training to get ready for the domestic
needs of the market. I started teaching myself
what a BI engineer does? What are the essential technical skills that
I need to learn? I did a lot of online research. I found a lot of free stuff, I found a couple of paid subscriptions and
I started learning all these technical skills that I need to learn to
get my first job. BI technology evolved a lot
over the last 10 years. I can say spreadsheets
very in 10 years ago, but nobody is a specialist anymore in terms
of data analysis. I mean, it’s still being
used for quick analysis, but now everything has
evolved into Cloud, technologies, more
sophisticated languages, more powerful visual
tools to help everyone. Personally, I have to adapt, changing new environments and new languages and
new infrastructures. In order to keep up
with the technology, you actually need to do a
little bit research and understand how the industry
is actually evolving. There are lots of
online resources or web pages or
newsletters that you can follow about the
latest industrial trends. But the essentials,
they never change. The foundation of the BI
engineering is the same. If you learn a SQL language, whatever organization you work for is going to be a
different SQL language, is going to be
different dialects. You still need to catch up with, but the amount of time that
you need to learn is going to be drastically less than the amount of time
you first start. Because there’s going to
be transferable skills. My number one piece advice
will be dedication. If you really love
working with data and if you would like to
work in this field, dedication is one of the key requirements
because there is a lot of technical
skills that you need to learn and it needs time. Therefore, I
recommend to identify which area of BI engineering that you actually
want to focus on, because there are several
parts that you can work. You can build infrastructures, or you can design systems, or you can analyze data, or you can do some
visualization tools. Depending on which area you
would like to focus on, the technical skills you
need to learn will vary.

Reading: Case study: FeatureBase, Part 2: Alternative solutions to pipeline systems

Practice Quiz: Test your knowledge: Business rules and performance testing

A business intelligence professional considers what data is collected and stored in a database, how relationships are defined, the type of information the database provides, and the security of the data. What does this scenario describe?

Considering the impact of business rules

This scenario describes establishing business rules. A business rule is a statement that creates a restriction on specific parts of a database. It helps determine if a database is performing as intended.

At which point in the data-transfer process should incoming data be compared to business rules?

Before loading it into the database

During the data-transfer process, incoming data should be compared to business rules before loading it into the database.

Review: Optimize ETL processes


Video: Wrap-up

As a BI professional, it is important to ensure that the database systems and pipeline tools you build for your organization continue to work as intended and handle potential errors before they become problems. This involves:

  • Quality testing in an ETL system to check incoming data for completeness, consistency, conformity, accuracy, redundancy, integrity, and timeliness.
  • Schema governance and schema validation to prevent incoming data from causing errors in the system by making sure it conforms to the schema of properties of the destination database.
  • Verifying business rules to ensure that the incoming data meets the business needs of the organization using it.
  • Database optimization to ensure that the systems that users interact with are meeting the business’s needs.
  • Optimizing pipelines and ETL systems to ensure that the systems that move data from place to place are as efficient as possible.

The author of the blog post is encouraging the reader to continue learning and to review the material before the next assessment. They also remind the reader that they will have the chance to put everything they have been learning into practice by developing BI tools and processes themselves.

As BI professional, your job doesn’t end
once you’ve built the database systems and pipeline tools for your organization. It’s also important that you ensure
they continue to work as intended and handle potential errors
before they become problems in order to address those ongoing needs. You’ve been learning a lot. First, you explored the importance
of quality testing in an ETL system. This involved checking incoming data for
completeness, consistency, conformity, accuracy, redundancy,
integrity, and timeliness. You also investigated
schema governance and how schema validation can prevent incoming
data from causing errors in the system by making sure it conforms to the scheme of
properties of the destination database. After that, you discovered why verifying
business rules is an important step in optimization because it ensures that
the data coming in meets the business needs of the organization using it. Maintaining the storage systems that
users interact with is an important part of ensuring that your system is
meeting the business’s needs. This is why database optimization is so
important, but it’s just as important to ensure that
the systems that move data from place to place are as efficient as possible. And that’s where optimizing pipelines and
ETL systems comes in. Coming up, you have another assessment. I know you can do this and
just as a reminder, you can review any of the material as you
get ready as well as the latest glossary. So feel free to revisit any videos or readings to get a refresher
before the assessment. After that, you’ll have the chance to
put everything you’ve been learning into practice by developing BI tools and
processes yourself. You’re making excellent
progress toward a career in BI.

Reading: Glossary terms from module 3

Reading

Quiz: Module 3 challenge

Fill in the blank: Quality testing is the process of checking data for _ in order to prevent system failures.

A business intelligence professional is confirming that their data is compatible and in agreement across all systems. Which quality testing validation element does this involve?

A data warehouse is supposed to contain weekly data, but it does not update properly. As a result, the pipeline fails to ingest the latest information. What aspect of the data is being affected in this situation?

Business intelligence professionals use schema validation, data dictionaries, and data lineages while establishing consistent data governance. Which aspect of data validation does this involve?

What process involves confirming the validity of database keys, preserving table relationships, and ensuring that conventions are consistent?

Fill in the blank: A data _ describes the process of identifying the origin of data, where it has moved throughout the system, and how it has transformed over time.

A business intelligence professional establishes what data will be collected, stored, and provided in a database. They also confirm how relationships are defined and the security of the data. What process does this scenario describe?