Skip to content
Home » IBM » IBM Data Science Professional Certificate » Course 1: What is Data Science? » Module 2: Data Science Topics

Module 2: Data Science Topics

In the first lesson in this module, you gain insight into the impact of big data on various aspects of society, from business operations to sports, and develop an understanding of key attributes and challenges associated with big data. You will learn about the big data fundamentals, how data scientists use the cloud to handle big data, and the data mining process. Lesson two delves into machine learning and deep learning and the relationship of artificial intelligence to data science.

Learning Objectives

  • Define Big Data and its distinguishing characteristics, such as velocity, volume, veracity, and value
  • Describe how Hadoop and other big data tools, combined with distributed computing power, are triggering digital transformation.
  • List some of the skills required to be a data scientist and analyze big data.
  • Describe the five essential cloud computing characteristics
  • Explain what data mining is.
  • Summarize the importance of establishing goals, data selection, preprocessing, transformation, and storage of data in preparation for data mining.
  • Explain the difference between deep learning and machine learning.
  • Describe regression and how it might be used to predict market behavior and trend analysis.
  • Describe generative AI

Big Data and Data Mining


Reading: Lesson Overview: Big Data and Data Mining

Reading

Video: How Big Data is Driving Digital Transformation

The video discusses the concept of Digital Transformation and its impact on businesses and organizations. Here’s a summary:

What is Digital Transformation?

Digital Transformation is the integration of digital technology into all areas of an organization, resulting in fundamental changes to how it operates and delivers value to customers. It’s an organizational and cultural change driven by Data Science and Big Data.

Examples of Digital Transformation

  • Netflix transformed from a postal DVD lending system to a video streaming provider
  • The Houston Rockets NBA team used data analysis to improve their game strategy
  • Lufthansa analyzed customer data to improve its service

How Big Data Triggers Digital Transformation

The video uses the example of the Houston Rockets to illustrate how Big Data can trigger a digital transformation. The team installed a video tracking system that analyzed data from games, revealing that two-point dunks and three-point shots were the most effective plays. This discovery changed the team’s approach to the game, leading to a record-breaking season.

Key Points about Digital Transformation

  • It’s not just about duplicating existing processes in digital form, but about using data analysis to improve processes and operations
  • It requires fundamental changes to an organization’s approach to data, employees, and customers
  • It affects every aspect of the organization and requires support from top-level decision makers, including the CEO, CIO, and CDO
  • It’s a whole-organization process that requires support from everyone to succeed
  • It requires a new mindset to deal with the issues that arise during the transformation process

Conclusion

Digital Transformation is the way to succeed in today’s business landscape. It’s a cultural and organizational change that requires support from top-level decision makers and a new mindset to deal with the challenges that arise during the transformation process.

[Music] Digital Transformation affects business operations,
updating existing processes and operations and creating new ones to harness the benefits
of new technologies. This digital change integrates digital technology
into all areas of an organization resulting in fundamental changes to how it operates
and delivers value to customers. It is an organizational and cultural change
driven by Data Science, and especially Big Data. The availability of vast amounts of data,
and the competitive advantage that analyzing it brings, has triggered digital transformations
throughout many industries. Netflix moved from being a postal DVD lending
system to one of the world’s foremost video streaming providers, the Houston Rockets NBA
team used data gathered by overhead cameras to analyze the most productive plays, and
Lufthansa analyzed customer data to improve its service. Organizations all around us are changing to
their very core. Let’s take a look at an example, to see
how Big Data can trigger a digital transformation, not just in one organization, but in an entire
industry. In 2018, the Houston Rockets, a National Basketball
Association, or NBA team, raised their game using Big Data. The Rockets were one of four NBA teams to
install a video tracking system which mined raw data from games. They analyzed video tracking data to investigate
which plays provided the best opportunities for high scores, and discovered something
surprising. Data analysis revealed that the shots that
provide the best opportunities for high scores are two-point dunks from inside the two-point
zone, and three-point shots from outside the three-point line, not long-range two-point
shots from inside it. This discovery entirely changed the way the
team approached each game, increasing the number of three-point shots attempted. In the 2017-18 season, the Rockets made more
three-point shots than any other team in NBA history, and this was a major reason they
won more games than any of their rivals. In basketball, Big Data changed the way teams
try to win, transforming the approach to the game. Digital transformation is not simply duplicating
existing processes in digital form; the in-depth analysis of how the business operates helps
organizations discover how to improve their processes and operations, and harness the
benefits of integrating data science into their workflows. Most organizations realize that digital transformation
will require fundamental changes to their approach towards data, employees, and customers,
and it will affect their organizational culture. Digital transformation impacts every aspect
of the organization, so it is handled by decision makers at the very top levels to ensure success. The support of the Chief Executive Officer
is crucial to the digital transformation process, as is the support of the Chief Information
Officer, and the emerging role of Chief Data Officer. But they also require support from the executives
who control budgets, personnel decisions, and day-to-day priorities. This is a whole organization process. Everyone must support it for it to succeed. There is no doubt dealing with all the issues
that arise in this effort requires a new mindset, but Digital Transformation is the way to succeed
now and in the future. [Music]

Video: Introduction to Cloud

The video introduces the concept of cloud computing, its benefits, and its essential characteristics. Here’s a summary:

What is Cloud Computing?

Cloud computing is the delivery of on-demand computing resources such as networks, servers, storage, applications, services, and data centers over the Internet on a pay-for-use basis. It allows users to access applications and data over the Internet rather than on their local computer.

Benefits of Cloud Computing

  1. Cost-effective: Users can access online versions of applications and pay a monthly subscription, eliminating the need to purchase and install software locally.
  2. Latest version: Users can access the latest version of applications without having to purchase a full retail copy.
  3. Collaborative work: Cloud-based applications enable users to work collaboratively with colleagues in real-time.
  4. Storage space: Cloud-based applications save local storage space as the application is hosted online.

Essential Characteristics of Cloud Computing

  1. On-demand self-service: Users can access cloud resources without human interaction with the service provider.
  2. Broad network access: Cloud computing resources can be accessed via the network through standard mechanisms and platforms.
  3. Resource pooling: Cloud providers pool computing resources to serve multiple consumers, making cloud cost-efficient.
  4. Rapid elasticity: Users can access more resources when needed and scale back when not needed.
  5. Measured service: Users only pay for what they use or reserve as they go.

Cloud Deployment Models

  1. Public cloud: Cloud services are delivered over the open Internet on hardware owned by the cloud provider, shared by other companies.
  2. Private cloud: Cloud infrastructure is provisioned for exclusive use by a single organization, either on-premises or owned and managed by a service provider.
  3. Hybrid cloud: A mix of public and private clouds working together seamlessly.

Cloud Service Models

  1. Infrastructure as a Service (IaaS): Users can access infrastructure and physical computing resources without managing or operating them.
  2. Platform as a Service (PaaS): Users can access the platform that comprises hardware and software tools needed to develop and deploy applications.
  3. Software as a Service (SaaS): Software and applications are centrally hosted and licensed on a subscription basis, also referred to as “on-demand software.”

In summary, cloud computing is a cost-effective and flexible way to access computing resources over the Internet, with various deployment and service models to suit different needs.

Welcome to Introduction to Cloud Computing
and Cloud Deployment and Service Models. After watching this video, you will be able
to: Describe cloud computing concepts, define cloud deployment models and cloud service
models, and identify the characteristics of cloud
computing. Cloud computing, also referred to as the cloud,
is the delivery of on-demand computing resources such as networks, servers, storage, applications,
services, and data centers over the Internet on a pay-for-use basis. The term “cloud computing” can be used
to describe applications and data that users access over the Internet rather than on their
local computer. Examples of cloud computing include users
using online web apps, employees using secure online business applications to conduct their
work, and users storing personal files on cloud-based storage platforms such as Google
Drive, OneDrive, and Dropbox. One of the main user benefits of cloud computing
is that instead of users needing to purchase their own applications and install them locally
on their computer, they can use online versions of those applications and pay a monthly subscription. Not only is this typically more cost-effective
initially, but users can also access the latest version of the application without having
to purchase a full retail copy of the newer version. A side advantage of this is that the user
also saves lots of local storage space as the application is hosted online. And, the beauty of most cloud-based applications
is that they also enable users to work collaboratively with their colleagues, working on the same
files in real time and being able to see each other’s edits and updates. Cloud computing is composed of five essential
characteristics, three deployment models, and three service models. Let’s start with understanding the five
essential characteristics of the cloud. On-demand self-service means that you get
access to cloud resources such as the processing power, storage, and network you need, using
a simple interface, without requiring human interaction with each service provider. Broad network access means that cloud computing
resources can be accessed via the network through standard mechanisms and platforms
such as mobile phones, tablets, laptops, and workstations. Resource pooling is what gives cloud providers
economies of scale, which they pass on to their customers, making cloud cost-efficient. Using a multitenant model, computing resources
are pooled to serve multiple consumers, and cloud resources are dynamically assigned and
reassigned according to demand, without customers needing to know the physical location of these
resources. Rapid elasticity implies that you can access
more resources when you need them, and scale back when you don’t, because resources are
elastically provisioned and released. And measured service means that you only pay
for what you use or reserve as you go. If you’re not using resources, you’re
not paying. Resource usage is monitored, measured, and
reported transparently based on consumer utilization. As you have seen, cloud computing is really
about using technology “as a service,” leveraging remote systems on-demand over the
open Internet, scaling up and scaling back, and paying for what you use. And it has changed the way the world consumes
compute services, by making them more cost-efficient while also making organizations more agile
in response to changes in their markets. Cloud deployment models indicate where the
infrastructure resides, who owns and manages it, and how cloud resources and services are
made available to users. There are three types of cloud deployment
models: public, private, and hybrid. Public cloud is when you leverage cloud services
over the open internet on hardware owned by the cloud provider, but its usage is shared
by other companies. Private cloud means that the cloud infrastructure
is provisioned for exclusive use by a single organization. It could run on-premises or it could be owned,
managed, and operated by a service provider. And when you use a mix of both public and
private clouds, working together seamlessly, that is classified as the hybrid cloud model. Now, let’s look at the three cloud service
models that are based on the three layers in a computing stack: infrastructure, platform,
and application. These cloud computing models are aptly referred
to as Infrastructure as a Service (or IaaS), Platform as a Service (or PaaS), and Software
as a Service (or SaaS). In an IaaS model, you can access the infrastructure
and physical computing resources such as servers, networking, storage, and data center space
without the need to manage or operate them. In a PaaS model, you can access the platform
that comprises the hardware and software tools that are usually needed to develop and deploy
applications to users over the Internet. And an SaaS is a software licensing and delivery
model in which software and applications are centrally hosted and licensed on a subscription
basis. It is sometimes referred to as “on-demand
software.” In this video, you learned that: Cloud computing is the delivery of on-demand
computing resources over the Internet on a pay-for-use basis. Cloud computing is composed of five essential
characteristics, three deployment models, and three service models. The five essential characteristics of cloud
computing are on-demand self-service, broad network access, resource pooling, rapid elasticity,
and measured service. There are three types of cloud deployment
models: public, private, and hybrid. And the three cloud service models are based
on the three layers in a computing stack (infrastructure, platform, and application), and they are referred
to as Infrastructure as a Service (or IaaS), Platform as a Service (or PaaS), and Software
as a Service (or SaaS).

Deep Learning and Machine Learning