Week 1: Big Data and Artificial Intelligence

In this module, you will be introduced to Big Data and examine how machine learning is used throughout various business segments. You will also learn how data is analyzed and extracted, and how digital technologies have been used to expand and transform businesses. You will also get a detailed look at data management tools and how they are best implemented and the value of data warehouses. By the end of this module, you will have gained insight into how machine learning can be used as a general-purpose technology, and some best techniques and practices for data mining.

Learning Objectives

Examine Big Data and how it is best used in identifying issues within a business
Assess how different skillsets are needed to manage, understand, and act on Big Data
Review the three types of Machine Learning and identify their applications

Video: AI for Business Introduction

Notes

Transcript

Course Summary: AI for Business

This course, titled “AI for Business,” is designed to provide a comprehensive understanding of artificial intelligence (AI) from a managerial perspective. The course is led by Kartik Hosanagar, a Professor of Technology and Digital Business at the Wharton School, who specializes in the digital economy, including topics such as Internet commerce, digital media, digital marketing, and data-driven decision-making. Hosanagar is also the author of “Human’s Guide to Machine Intelligence,” which explores the implications of AI in business decision-making.

Key Course Objectives:

Understanding AI and its Business Applications: The course begins by defining AI as the capability of computers to perform tasks that typically require human intelligence, such as understanding language, reasoning, and learning. It introduces machine learning as a subset of AI, focused on enabling computers to learn without explicit programming.
AI as the Next Phase of Digital Transformation: The curriculum positions AI as the next frontier in digital transformation, following the Internet, cloud computing, and mobile computing. It highlights how companies that embraced these technologies early on were better positioned for long-term success.
AI as a General Purpose Technology: The course discusses the concept of general purpose technologies, which have broad applications across various sectors and can drive economic growth. It presents research indicating that AI, particularly machine learning, exhibits characteristics of a general purpose technology, with a significant number of research jobs spread across multiple industries.
Implications for Business Managers: The course emphasizes that AI’s widespread use and ongoing research imply that managers must be patient and adaptable. It suggests that AI will affect a variety of industries, and businesses must prepare for its transformative impact by understanding the technology, adapting business models, and fostering a culture of innovation.

Course Goals:

To equip managers with a strategic framework for leveraging AI to achieve a return on investment.
To explore various business use cases for AI and how it can be applied across different sectors.
To discuss the organizational changes required to integrate AI effectively, including technological infrastructure, business processes, and cultural shifts.

Conclusion:

“AI for Business” aims to prepare managers and business leaders for the AI-driven future, providing them with the insights and tools necessary to navigate the challenges and opportunities that AI presents. The course underscores the importance of understanding AI’s potential, being patient with its development, and being proactive in adapting business strategies to capitalize on the transformative power of AI.

Welcome. This is a course
on AI for Business. In this course, we will talk about AI from a
managerial standpoint. We’ll look at several
business use cases for artificial intelligence
and we will look at a strategic framework
for managers to use in order to get the
return from AI investments. I’m Kartik Hosanagar, I’m a Professor of Technology and Digital Business
at the Wharton School. My research focuses broadly
on the digital economy. I look at Internet
commerce, digital media, digital marketing, and
data-driven decision-making. I was previously a
co-founder of Yodel, which was a marketing
platform for small businesses and I’ve worked with a number of startups and large companies
over the years, looking specifically
at the applications of AI and Data
Science and Business. I’m also an author of the book Human’s Guide to
Machine Intelligence, which looks at the
implications of using artificial intelligence to make business decisions, both within the
enterprise and outside. Artificial intelligence
or AI is all around us. AI is about getting
computers to do the kinds of things that
require human intelligence. For example,
understanding language, reasoning, navigating
the physical world, learning, and so on. Machine learning is a subfield
of AI that is focused on getting computers to learn without explicitly
programming them. AI is increasingly being seen as the next phase of
digital transformation. Over the years, a number of different digital
technologies have helped enable business
transformation, which is the idea
of organizing or transforming a company’s
activities and processes in order to make use of the new opportunities created
by digital technologies. Now in the late ’90s, the technology that
helped usher in that transformation
was the Internet. A number of companies opened online divisions to help make
use of that opportunity. But unfortunately, a number
of these firms also shut down these online divisions
after the dot-com bust. The few companies that persisted actually benefited
significantly in the long run and the
companies that did not ended up paying
a significant price. In the mid 2000s, Cloud computing helped
usher in a similar change. Here again, a number
of companies started investing in Cloud
computing, but again, a few companies
backed off when they realized that their early
forays into Cloud computing faced a number of challenges
related to security with moving their data to the Cloud or with regulatory compliance. But these companies
that backed off, again, paid the price and
the companies that persisted were the
ones that were well-positioned in the long run and helped create
a certain amount of business agility that has
helped them in the long run. In the late 2000s, mobile computing helped
create a similar change, and the companies
that invested in mobile computing
early helped really create mobile-first and
mobile-only products and helped transform the
businesses into a mobile world. Today, it’s
increasingly appearing like AI will be equally
transformative. In fact, there is early
evidence that AI can be seen as what is often referred to as a general
purpose technology. Now a general purpose technology is a technology that has the potential for widespread use
across a range of sectors, and these general
purpose technologies can stimulate innovation and
drive economic growth. At an organizational level, they can also inform
product strategy and overall design of
the organization itself. Now three factors
are seen as being indicative of
whether a technology is a general purpose technology. The first is that
the technology has widespread use across
multiple industries. The second is that there are large number of research jobs
related to that technology and these research
jobs themselves are spread across a number of
other industries as well. In a recent study
by Goldfarb et al, the researchers looked at whether artificial
intelligence shows promise as a general purpose technology. They looked at a number of recent technologies that have
got attention in the press. For example, machine learning, geographical information
systems, CRISPR, quantum computing,
fracking, robotics, nanotechnology, Internet of
Things, and Cloud computing. They also looked at millions
of job postings and classified these
job postings based on which technology
they were related to. They evaluated whether
machine learning, which is a subfield of artificial intelligence,
look different. As you can see on the slide, the researchers found
that a number of job postings were related
to machine learning, as also a few other technologies like robotics and
Cloud computing. In fact, as many
as 14.6 percent or almost 15 percent
of the jobs for machine learning, were
research-related jobs. Now research-related jobs are particularly important
indicator of a general purpose technology
because they help demonstrate that
the technology is capable of ongoing improvement. This ongoing improvement creates significant future potential, some of which is not currently recognized and certainly
machine learning, which is the most
important sub-field of AI, seems to demonstrate
that capability. The researchers also looked at whether machine learning jobs in particular were spread across a number of different
industries. They did find that
machine learning jobs can be seen in a number
of different industries, primarily in education services, in professional services, and manufacturing, and
financial services, and so on. In contrast, some of the
other technologies like quantum computing
were seen primarily in one or two industries
like professional services. In short, it does seem like machine learning jobs
are available in multiple industries and
multiple industries see value from
those skills today. Next, the researchers looked
at whether research jobs, in particular, were also
widespread across industries. Here again, the researchers found that a number
of industries, including manufacturing,
professional services, information technology-related
jobs, finance, education, all of these had need for research jobs related
to machine learning. Not every other technology
showed this widespread. In short, a number of these
statistics do suggest that machine learning in
particular and AI in general, is likely to be a general
purpose technology. There are many
implications of this. The first is that companies and managers
need to realize that machine learning
and AI will have significant impact on a
wide variety of industries. Just because you’re not
a technology industry, does not mean that
you’re shielded from the transformative impacts
of machine learning. Secondly, the fact that a lot of these jobs
are research jobs, which implies the
technology is evolving, also implies that managers need to be patient
with the technology. The transformative impact of the technology might
come with a luck. Therefore, to effectively make use of these opportunities, managers will need to understand the technology
and its applications, they will need to
make many changes to their business models, to their tech infrastructure, to their organizational
processes, and to their culture as well. All of that requires
significant changes. The purpose of this course
is to help you get there.

Video: Course Introduction

Notes

Transcript

Course Overview

This course covers AI Fundamentals from a business perspective, focusing on Big Data, Artificial Intelligence, and Machine Learning.

Module 1: Introduction to Big Data

Defining Big Data and its characteristics
Working with Big Data
Business questions that Big Data can help answer

Module 2: Introduction to Artificial Intelligence

Defining Artificial Intelligence and Machine Learning
Relationship between AI and ML
Types of Machine Learning methods

Module 3: Machine Learning in Practice

Machine Learning visualizations
Recent developments in ML, such as Auto ML
Simple interfaces for non-engineers and non-data scientists to leverage AI

Module 4: The Role of Data in Building AI Systems

Importance of large training data sets for AI systems
Challenges for small companies and enterprises without data
Strategies for building AI systems without data

The course aims to provide a comprehensive understanding of AI Fundamentals, from Big Data to Machine Learning, and how they can be applied in a business context.

This course will discuss AI Fundamentals
from a business perspective. We’ll begin with
an introduction to Big Data. Specifically, what exactly is Big Data? How does one work with it? And what types of questions, business
questions can Big Data help you answer? Will then move to an introduction
to artificial intelligence. We’ll talk about what is
artificial intelligence? What is machine learning? How are they related and what are the different types
of machine learning methods? Next my colleague Professor Sunny Tampa, we’ll talk about machine
learning in practice. He’ll discuss machine learning
visualizations as well as recent developments such as Auto ML, that allow
non engineers, non data scientists to leverage AI to answer business
questions in very simple interfaces. And finally, I will talk about the role
of data in building AI systems. Specifically, modern AI is built
on large training data sets. Which implies that for companies
to have flourishing AI practices, they really need to have
access to a lot of data. But how do small companies start
with an AI practice without data? Or how do companies in general rollout
AI in their enterprise without data? And we’ll talk about building AI systems
without data in the final model in this course.

Video: Big Data Overview

Notes

Transcript

What is Big Data?

Big data refers to large volumes of data that exceed the capacity of conventional methods and computer systems. It’s not just about volume, but also about variety (structured and unstructured data), velocity (data streaming in at high speed), and veracity (truthfulness of data).

The 3 V’s of Big Data

Volume: Large amounts of data that can’t be stored or analyzed on personal computers.
Variety: Structured and unstructured data, including text, audio, and video data.
Velocity: Data streaming in at high speed, requiring real-time analysis and decision-making.

The 4th V: Veracity

Veracity refers to the truthfulness of data, which is critical in big data due to the variety of sources and potential inconsistencies.

Why is Big Data Important?

Big data is important because it allows managers to ask new questions and answer old questions better. It enables companies to make data-driven decisions, improve operations, and create new business models.

Applications of Big Data

Big data has applications in various industries, including:

Marketing: Analyzing social media data to craft targeted marketing campaigns.
Finance: Detecting credit card fraud in real-time using big data tools.
Healthcare: Analyzing wearable device data to improve consumer well-being.
Transportation: Using sensor data to optimize traffic patterns and route planning.

Conclusion

Big data is a transformative force that is changing the way businesses operate. It enables companies to make better decisions, improve operations, and create new business models. In the next module, we’ll explore machine learning and its applications in various industries.

In this module, we’ll
talk about big data. In particular, we
will start with an overview of what
exactly is big data, we’ll talk a little bit
about what kinds of skills are needed to
excel with big data, we’ll talk about big data
tools and infrastructure, and we will conclude by
talking about data mining and setting up the stage
for machine learning, which we will cover in module 2. To begin with, let’s explore
what exactly is big data. Now, data is certainly a concept that’s been around for
a really long time and there has been
an emphasis on data for several decades now. We hear phrases like, “Data is the new oil. Data is just like crude. It’s valuable, but if unrefined it cannot
really be used.” Futurist John Naisbitt says that we have for the first time an economy which is based on a key resource that is information that is
not only renewable, but it’s also self-generating. Running out of this
resource is not the problem but drowning
in it is the real problem. Now, we’ve heard phrases
like this for awhile. Data has been very important to businesses for multiple decades, but the focus or emphasis on
big data is relatively new. Now, big data, as
the term suggests, is about large volume of data. In fact, the National Institute of Standards and Technology says that big data is data that exceeds the capacity
or capability of conventional methods
and computer systems. Now volume is certainly a
key aspect of big data. But it’s not just about volume. When we talk about big data, we’re talking about data
with different structure, we’re talking about
data that is being created at a different speed, we’re talking about
different kinds of tools to analyze the data
and most importantly, from a managerial standpoint, we’re talking about
different kinds of business questions that
we can answer with that. Now, one way to
think about big data is through the three
V’s of big data; volume, variety, and velocity. Volume of data simply
implies that we’re not talking about terabytes
or petabytes of data. In short, the kinds of
data that won’t fit in our laptops and
personal computers. The kind of data
that we cannot open in Excel and just
start analyzing. That’s what the volume
of data is all about. In terms of variety, we refer to the fact that
we’re no longer talking about structured numerical
data that you can analyze in Excel spreadsheets. Well, we were talking about unstructured data,
meaning text data, audio data, video data, where there’s
intelligence hidden in that data that
we want to extract. In terms of velocity, we’re referring to the idea that data are
constantly coming in. It’s streaming every second
and milliseconds and we need to be able to
perhaps even analyze the data and make
decisions on the fly. That’s what data
velocity is all about. Sometimes when we
talk about big data, there’s a fourth V, which is veracity or truthfulness
of data that comes in. Data veracity refers to the point that data
are coming in from multiple sources and are
not curated as in the past, and so you might
have data coming in from social media platforms, meaning user-generated
content and this content might not
exactly be high-quality data, so we need to account for that. We might also have
inconsistency of data or incomplete data and
so data veracity is also becoming a
fourth issue which is very critical and an
integral part of big data. Now of course, a natural
question to ask is why is this emphasis on
big data so new? Really, it comes
down to two things. The first is computing capacity. Computing capacity has been
growing exponentially. Our ability to store data and process data has been
growing exponentially and that has made big
data tools available today that simply weren’t
available 10 years back. The second is data generation itself is being transformed. In the past, data
was generated in a centralized way
and it was limited. In contrast, today data is being generated in a
decentralized way. There’s lot of user-generated
content that our customers, for example, are generating. There’s data generated
from mobile devices, again, from each
individual user. This data being generated from
thousands of sensors that a company might be using in it’s manufacturing facility
or retail stores. All of these factors are
resulting in an explosion of data and really is all about the
transformation in data. But most importantly,
big data also changes the things
a manager can do. In particular, big data allows managers to ask new
questions that they simply couldn’t ask before and they also help answer the same
old questions better. In terms of the ability
to ask new questions, consider the problem of a
marketing manager that’s trying to design marketing
campaign for a new product. The manager has to decide what product features
to emphasize. If it’s a phone, the manager has to decide whether we
should be talking about the battery life of
the phone or should we instead be talking about the
sleek design of the phone, or should we instead talk about the user interface and
how user-friendly it is, or should we talk about the
brand as such and be talking about our social and
philanthropic initiatives in our marketing campaigns? These are questions that
are hard to answer. In the past, they were
answered partly by gut, partly by small-scale
user service. But now a marketing manager can look at data on social media
platforms and they can look at data on Twitter and Facebook and other platforms and look at what aspects of our products our customers really
appreciating and enjoying. What is it that data on
social media platforms that suggests that
differentiates our brand from other brands? They can use these data to precisely craft
marketing messages. This might not have
been feasible in the past but it’s
feasible through big data that is available on social media platforms that
we can analyze at scale. I also mentioned that
big data allows us to answer the same
old questions better. For example, consider credit
card fraud detection. Credit card fraud is rampant in the financial
services industry and costs these companies
billions of dollars. In the past it was hard to
detect and most commonly, it was detected well after
the fact, for example, a customer might see their credit card statement and conclude that a
certain transaction is fraudulent and might call the customer service center
and flag that transaction, and then it gets corrected, but it’s done after the
fact and often it’s hard to really recover
the lost money. In contrast today
with big data tools, companies can
analyze transactions on the flight right after a customer swipes a credit
card on a terminal. Big data tools can analyze that transaction and determine whether it’s fraudulent or not. This helps not only
detect fraud faster, it also helps do it at scale which simply
was not feasible before and this
is creating a lot of value to financial
services companies. The value of big
data is not limited to just financial
services company. We see applications in a number of industries
like healthcare, education, transportation,
and many more. For example, if you
look at healthcare, there’s a big trend in
wearable devices these days, a lot of consumers are wearing
devices like Fitbit and others and these devices are able to capture
data about heart rate, sleep patterns, exercise, and many more aspects of
our daily lifestyle. This kind of data
ultimately helps consumers take better actions to improve their well-being. Similarly, consider
transportation. There are sensors on
roads that can capture data on traffic
patterns, road closures, accidents and now that
data is being made available to us in real
time on our mobile devices. This helps us plan a route
better, helps in scheduling, and ultimately is the basis of applications like
Google Maps and many other mapping systems many of us use on a daily basis. These are but a few examples
of applications of big data. In fact, later in module 3, we will look at a number of other applications of big data in a variety of industries. We will also look at how machine learning
is being used in these industries to extract intelligence out of the
data in these settings.

Video: Big Data Analysis

Notes

Transcript

Traditional Data Analysis vs. Big Data Analysis:

Traditional data analysis is hypothesis-driven, starting with a specific question and testing a hypothesis.
Big data analysis is more exploratory, starting with broad business questions and using data to find patterns, relationships, and correlations.

Skills for Big Data Analysis:

Managing the data: organizing data for analysis, which may involve buying tools or hiring data experts like data architects or chief data officers.
Understanding the data: using data science tools to extract intelligence from data, including statistics, machine learning, and data mining.
Acting on the data: applying insights to make managerial decisions, requiring data skills and domain expertise.

Data Skills for Managers:

Interpreting and understanding data analysis
Challenging data analysis when necessary
Understanding the limitations of data analysis

Domain Expertise for Managers:

Asking the right questions to turn data insights into action
Having relevant domain expertise to marry data insights with managerial action

Tools for Big Data Analysis:

Data management tools: collecting and organizing data within the company
Data analysis tools: extracting meaningful, managerially actionable intelligence from data

The lecture emphasizes the need for new skills and tools in big data analysis, including data management, data science, and domain expertise. It also highlights the importance of managers having both data skills and domain expertise to effectively act on data insights.

In this lecture, we’ll talk about what
makes big data analysis different from traditional data analysis. And in turn what does that mean in
terms of the kinds of skills and tools companies need
within their organization. Now, traditional data analysis
is often very structure. It might start with the managerial
question which might result in a hypothesis that is posed by
a statistician or a data scientist. The goal then is to analyze the data
in order to test that hypothesis. The data analysis confirms or
suggest that our hypothesis is incorrect. In short,
it is very much hypothesis driven. In contrast big data and
analytics is far more exploratory. It starts by looking at data not
necessarily with the specific hypothesis. But with a broad set
of business questions. We might conduct more
exploratory analysis and find certain patterns or relationships or
correlations in our data. That might suggest certain insights,
business insights. And sometimes that might in fact
even lead to certain hypothesis. And we might then conduct more
former hypothesis testing or traditional analysis on that. In short, big data analysis is about being
more iterative, being more exploratory. And essentially it’s a process where data
often leads the way as opposed to our hypothesis. Big data analysis also needs
a new set of skills or capabilities within the organization. I tend to think of these skills in
terms of three main types of skills. Managing the data, understanding the data,
and acting on the data. Managing the data is all
about organizing data so that it can be analyzed subsequently. Sometimes this involves
buying 3rd party tools and often it’s these tool developers that
focus on how best to manage data. And they provide us nice solutions
that we can buy off the shelf. But sometimes within a company, we also need data experts who can
help manage the data internally. These might be in the form
of data architects or chief data officers who might
set data governance policies. Who might also figure out the architecture
of how our data is going to be organized either on premise or in the cloud. Understanding the data is all about using
tools to extract intelligence from data. This is broadly the domain
of data science. It includes statisticians who often
conduct traditional data analysis. It also includes machine learning and
data mining experts. Who might apply more modern
techniques from computer science in order to analyze the data. It also includes the ability
to visualize data. Because often one of the key
abilities in data science is not just being able to analyze the data. But it’s also being able
to construct stories and visualize it in meaningful ways. So that the insights can be easily
consumed by all stakeholders. The third set of skills
is acting on the data. This is where managers come in. It requires managers to apply
insights from the data analysis. And apply them to make
managerial decisions. It requires two kinds of skills,
the first is data skills. Managers need to be able to interpret and understand what data
scientists are telling them. They need to be able to challenge the data
analysis when appropriate because data insights can also be misleading. Sometimes data analysis can
find spurious correlations. And acting on them without really
challenging the data analysis can be problematic. So it really does require managers to have
a basic understanding of data science. To understand what
are the limitations of data analysis. To appreciate when data
analysis is correct and when it needs to be modified or
challenged appropriately. But the second related skill that
managers need is domain expertise. Data is often telling us what
patterns we see in the past. But it requires managers who
have relevant domain expertise. To ask the right questions to figure
out how do we go from the data insights to manage the real action. So the most successful
managers in a data world. Our managers who
are simultaneously data savvy and also have strong vertical
domain expertise. They can marry the two together
to drive managerial action. Lastly, big data analysis not only
requires new sets of skills within the organization. It also requires new sets of
tools within the organization. In terms of tools we can think
about two kinds of tools. The first is data management tools,
which is essentially about tools that help us collect and
organize all the data within the company. And the second is data analysis tools. These are tools that help
us analyze the data and extract meaningful managerially
actionable intelligence from that data. In the next lecture, we will dive
into the data management tools.

Video: Data Management Tools

Notes

Transcript

Data Warehousing

A Data Warehouse is a specialized database management system that stores historic data from multiple sources in an enterprise.
Its purpose is to provide a single point of access for all data in the company, serving analytics needs.
Examples of Data Warehouses include Microsoft’s Azure SQL Warehouse, Google BigQuery, Snowflake, and Amazon Redshift.

How Data Warehouses Work

Operational data is pulled from various sources (e.g., CRM, ERP, billing systems) using ETL (Extract, Transform, Load) tools.
Data is transformed and loaded into the Data Warehouse, providing a unified view of all data in the company.
Reporting and data visualization tools, such as Tableau, can be built on top of the Data Warehouse.

Value of a Data Warehouse

Provides a single point of access for all data in the company.
Serves as a repository for historical data, separating operations from analytics.
Ensures data quality and provides a comprehensive view of data for analytics queries.

Big Data Tools

Hadoop and Spark are big data tools that serve two main purposes: storage and processing.
They store massive amounts of data in a distributed fashion across multiple computers or nodes.
They process data in a distributed and parallelized manner, increasing speed.

Hadoop and Spark

Hadoop is an open-source tool offered by the Apache Foundation, with Cloudera being a popular distribution.
Spark is a more recent and dominant replacement for Hadoop, solving some of its limitations.
Databricks is a company built around Spark.

Before your company can use your data and begin
with AI initiatives, it is important to have the
data infrastructure in place. In this lecture, I’m
going to talk about data management tools
that are necessary for companies to have in
place before they can embark on large-scale
AI initiatives. First, we’ll talk a little
bit about Data Warehousing. To begin, many of you are likely familiar with the
concept of a database. Database is quite simply a
structured collection of data. Quite simply an
Excel spreadsheet can be thought of as
a type of database. Now in practice, we need usually better tools
to manage data. Database management
systems, or DBMSs, are systems that allow users to better access and
manage the database. Excel again provides some
simple functionality, but more advanced databases
from Microsoft and Oracle and many other companies really help companies better
manage their data. Sometimes we refer to database management systems
quite simply as a database. A Data Warehouse is a particular database
management systems. It’s specialized in two ways. First, it’s specialized
in terms of the type of a Data
Warehouse stores. Usually that is historic data from many sources
in the enterprise. Data warehouse is also specialized in terms of
the purpose it serves, and that is Analytics. A usual database might
serve operations. For example, when a customer of a bank logs into
the website and wants to look up their
current account information, then you actually
interact or that customer is interacting with
an operational database that is able to pull
data very fast and respond to customer queries
like their current balance. In contrast, analytics
needs access to all of the data that a company might
have or most of it. The purpose there is
usually not speed, but it’s the ability
to have a more comprehensive and more of a bird’s eye view of all
the data in the company. A Data Warehouse
serves that purpose. It’s not necessarily
the fastest database, but it’s specialized for the
function of Analytics and thus provides a more
complete picture of the data in an organization. Examples of Data
Warehouses include Microsoft’s Azure SQL Warehouse, google BigQuery, Snowflake,
and Amazon Redshift. Now let’s talk a little bit about how Data Warehouses work. Usually in most companies, operational data is sitting
in many different places. For example, customer data might be sitting
in a CRM system. Some other enterprise
information, including information
about partners and supply chain may be
sitting in an ERP system. Customer billing information
might be sitting in another separate database. Now if we want a unified view of all the data in the company, we first need to pull all of that data into
a Data Warehouse. ETL tools are useful for that. ETL stands for extract,
transform, and load. These tools pull the data out of the different
individual databases. For example, they’ll pull the customer data out
of the CRM system, the customer’s billing data out of the billing
system and so on. All of that data is pulled out, it’s transformed as needed and then loaded into
the Data Warehouse. Popular ETL tools
include tools built by companies like
Informatica and stitch, which is now part
of a company called talent and many others. The Data Warehouse
now has all of the data from all these
different sources. Once we have this
data in one place, you can now build
Reporting and data visualization tools
on top of that. For example, business
intelligence tools like Tableau sit on top of
the Data Warehouse. When an analyst enters a query, these systems can then go into the Data Warehouse and pull
the necessary information. Next, let’s talk about the
value of a Data Warehouse. The main purpose or value of
a Data Warehouse is that it serves as a single point of access for all data
in the company. It’s where a history
of all the data is stored and as I
mentioned earlier, a Data Warehouse helps separate operations
from Analytics. Usually the operations data is made to be fast so that
when a customer logs in, then you can pull the
data fast and respond to information such as the
customer’s balance. On the other hand,
certain Analytics queries might require a more comprehensive access to historical data and an
assurance of data quality. For example, if an
analyst wants to know how much revenue has each
product line brought in over the last 10 years and
we want that data broken out by month
and by city and state. That query requires
access to a lot of historical data over
the last ten years and the Data Warehouse
provides that assurance of Data Quality and
that single point of access to all of that data. Now, that’s a little bit
about Data Warehouses. As part of Data Infrastructure. We should also talk about big data tools such
as Hadoop and Spark. Now Big Data tools
like Hadoop serve two main purposes,
storage and processing. Now Storage of big data usually has some
unique challenges. If we want to store a
little bit of data, a few files, we can typically store that in our computers. But what if there’s
massive amounts of data, data for millions or
hundreds of millions of customers over the
last 10, 20 years. That kind of data cannot be
stored in a single computer. One of the things that Big Data tool like
Hadoop does is that it stores it in a
distributed fashion across multiple computers
or multiple nodes. Next, these systems also take care of
processing that data. Usually that processing again involves distributed processing of that data across
multiple nodes or across multiple machines. Also parallelizing
the computations or data processing
as much as possible, which helps increase speed. Hadoop is an open source tool that is offered by the
Apache Foundation, which is a non-profit foundation that provides open
source software. The most popular distribution of Hadoop is by a company
called Cloudera, although there are
several others. Spark is a more recent version, or in fact, I would say, a more dominant
replacement for Hadoop, which serves similar
purposes but solves some of the problems that Hadoop
had faced in the past. Databricks is the
most dominant company that is built around Spark. We’ll next talk a
little more about Data Warehouses and
also Big Data tools like Hadoop and Spark in our discussion with an
executive from Snowflake.

Video: Data Management Infrastructure

Notes

Transcript

Main Topic: Data Infrastructure for AI-Driven Business Transformation

Guest: Chris Child, Director of Product Management at Snowflake

Key Points:

Companies need two types of databases: transactional databases for day-to-day operations and analytic databases for processing large sets of data over a long period of time.
The evolution of data infrastructure has moved from custom-built systems to Hadoop and now to cloud data warehousing.
A data infrastructure includes a set of ingest tools, transformations, and query and visualization engines.
Companies should think ahead of time about what problems they want to solve with data and what data they need to answer those questions.
Having a data infrastructure in place is crucial before using machine learning or predictive technologies.

Key Takeaways:

Companies need to have a data infrastructure in place before embarking on large-scale AI-driven business transformation.
There are two types of databases: transactional databases and analytic databases.
The evolution of data infrastructure has moved from custom-built systems to Hadoop and now to cloud data warehousing.
A data infrastructure includes a set of ingest tools, transformations, and query and visualization engines.
Companies should think ahead of time about what problems they want to solve with data and what data they need to answer those questions.

Actionable Advice:

Think carefully about the types of questions you wish you could ask, but can’t because you don’t have all of the data.
Identify the types of questions you are answering today, but it’s taking a long time.
Think about what data you need in order to answer those questions.
Focus on collecting the most important pieces of data that are critical to your business.

Welcome back. In
this session we’re going to talk about the
data infrastructure that companies need to have in
place before they can embark on large-scale AI driven
business transformation. To help us understand
what kinds of data infrastructure companies
need to have in place, we have with us Chris Child, who is a Director of Product
Management at Snowflake. Chris, welcome and please tell us a little bit about
your background in the space. Thanks Kartik,
excited to be here. As you mentioned, I work
at Snowflake Computing, which is a Cloud data
warehouse company. I’ve spent my career
working in data, both as an investor and
now as an operator. Helping build systems
that help companies make better decisions and really
run their businesses better. Chris, I think a good
place for us to begin is to talk about what
exactly is a database? How exactly are companies thinking about all the
different kinds of databases? What does it mean that evolution over the last several years? Sure. There’s
really two types of databases that most
companies end up needing. The first is what we call
transactional database. This is a system
that keeps track of the important information that’s running your business
on a day-to-day basis. If we take a bank, for example, a bank would have a transactional
database that keeps track of the balances
for all the customers, and you’d use that every time someone starts a transaction
to figure out if there’s enough money
in their account to debit their account or credit their account and keep the running
balance going on. These are very useful, and
they need to be very fast, and they tend to
be very expensive. On the other hand, you
end up with what’s called an analytic database
or an analytic system to process
much larger sets of data over a long
period of time. Continuing with
the bank example, I might want to keep a
history of every transaction and every balance that each
of my customers has ever had. It’ll be very
expensive to keep this in my transactional database. I move that data to
a separate database, to an analytic
database where I can keep these massive
long histories. Then I can ask questions like, “I’d like a list of all of my customers whose balanced grew by at least 10 percent in
four of the last five years.” My transactional database
won’t be able to answer that, but my analytical database will. To transition to
analytics databases clearly would involve
investments in infrastructure. What kind of infrastructure
are we talking about? When you originally set up a data warehouse or an
analytics database, you needed to buy a
special hardware. You needed to buy very
expensive special software from a variety of
different vendors. Again, we’re talking
about 20,30 years ago, is where this methodology of storing your data
came into play. As the amount of data
that people were collecting from lots
of different sources, whether that be from mobile
apps, from websites, from marketing campaigns,
or even from data you’re collecting physically in your store about
what’s happening, or from your entire
supply chain. There’s a lot of
different sources of data that started coming in. Those types of specialized
analytics databases that ran on special hardware, become very expensive to operate and started
not really being able to keep up with
the performance needs of that massive
amount of data. This time we went
through the first big evolution of this. From these custom-built
specialized Analytics data warehouses, the massive amount of
data started getting stored in a new
system called Hadoop. Hadoop was developed
by Google to process the massive amounts of web data that they collect and track. Was also designed to run on a giant network of very
inexpensive hardware. Instead of these specialized, very expensive
servers, you could run on hundreds of
very cheap servers. The result was this was a much more
cost-effective way to manage and process these
massive amounts of data. Now, isn’t that really what creating a data lake all about? Also aren’t we in a process
of seeing many companies transition to newer
Big Data tools or technologies like
Spark and others? Absolutely. The data lake
is what people use to refer to basically massive sets of hard drives that they’re
storing all of this data in. It’s a place you can pour huge amounts of
data like a lake, and then you can use tools
like Hadoop or now Spark, which is a much
more modern version of the Hadoop
computation engine, to pull data out of that lake, run some calculations
and transformations on it and then put it back so
that you can find it later. Traditionally, what we
saw is once people would sort of finish all
those calculations, they wanted to be able to
query that data very quickly. They would end up
putting that into those data warehouses that
they were using originally. Now they started to refer
those as data marts, which was where a small set
of your customers or of your internal users could go
get a subset of the data. But in order to get a
new data-set loaded, you had to go back and write
Hadoop or Spark jobs and get that data transformed
and loaded into those data marts or
data warehouses. For those of you who are not
familiar, Hadoop and Spark, these are techniques for storing and processing
large amounts of data. Essentially they involve
distributed storage, distributed processing
of the data, and creating lot of
parallelization which helps the data processing
to happen faster. Now, coming back, Chris, we’ve also now seen a transition towards
Cloud data warehousing. Can you set up what exactly
is a data warehouse? How does it fit in within this whole conversation
of companies moving the data to data
lakes and data marts? Absolutely. A lot of people found that with these
data lakes and data marts, it was still hard to keep
track of all of your data. It was in different places that were massive amounts of it, it was in inconsistent formats. Accessing it often involves having your engineering
team actually write code that could run on these large parallelize systems. About 10 years ago, a lot of research
started happening into what are now called
cloud data warehouses. These are systems from Amazon or Google
or from Snowflake, which are a re-imagining of the traditional
data warehouse. They’re designed to run on massively parallel sets of inexpensive hardware,
like Hadoop. Generally, they’re run on
hardware that you rent from cloud providers like Amazon
or Google or Microsoft, instead of having to manage
those servers yourself. But from the outside, they look and operate and have the performance of a
traditional data warehouse. What that means is they use a language to speak
to them called SQL, which is what the data
warehouses and databases use. This means you can natively use Tableau or Looker
or other analytics and BI tools right
on top of them. Because they use that
standard language, they also integrate well
with large sets of tools. As we were talking a
little bit about before, what you really want
from your data platform overall is somewhere to
store all this data. You need a set of ingest tools. How do you get the data
into your data platform? Being SQL based,
you can use any of a wide variety of tools that are built specifically for that. You then need a set
of transformations. It take the raw
data that’s coming in and turn it into
something useful. As I’m sure you’ve talked
about in this class, one of those techniques
is machine learning that you can use to
take this raw data and score it and make predictions and figure out
what’s going to happen. But there’s also simple things like I might be getting data about the set of actions that users taking on a daily basis. Really I want to look at
that on a monthly basis. One transformation would be rolling that up to
a monthly basis. The final piece you need is a query and
Visualization Engine, as we mentioned,
Tableau or Looker, other tools like that. A way to actually run queries and for your analyst
team to build dashboards and basically ask questions of the data once it’s
been transformed. One of the big challenges
that people had with Hadoop or even with the
Spark based ecosystem, is that those tools often need to be custom-built
for that ecosystem. Whereas, if you use a
Cloud data warehouse, you get high-performance, you get the
scalability of Hadoop, but you also get access to the standard ecosystem of tools. Chris, when we started
our conversation, I mentioned that before
companies can start using machine learning or other
predictive technologies, they need to have a data
infrastructure in place. Now, putting this data
infrastructure in place obviously costs some money and
cannot be taken lightly. What questions should a manager ask before they embark
on such an exercise? Absolutely. One of the mistakes that I’ve seen people
make repeatedly is to think that having this Data Infrastructure
in and of itself is an important
thing to do. They’ll set this up and
the load a bunch of data and they’ll buy
a bunch of tools, and then they won’t actually
get any value out of it. Because what they didn’t
do was think ahead of time about what they
were trying to solve, what problems they had that they wanted to solve with data. What I would suggest
as anyone who’s going to undertake this journey, think first carefully about the types of questions that
you wish you could ask, but you can’t because you
don’t have all of the data. The types of questions that
you are answering today, but it’s taking a long time. An example of that
is anything where you ask someone on
your team to go spend two weeks collecting data and running
analysis in Excel. Those are decent candidates for the types of
problems that you could solve in minutes if you have the correct data
infrastructure in place. Finally, think about
what data you need in order to answer
those questions. It’s generally not
that useful to go collect every single piece of data you can possibly think of. Instead, what are the pieces of data that are important
to your business and are going to help you answer those critical business
questions so that you can run your business better
and more efficiently? Is really at the end of the
day, that’s the whole goal. Chris, that has been very
helpful set of depths and overview of the data infrastructure companies
need to think through. Thank you so much
for joining us. Thank you, Kartik. I appreciate it.
Thanks for having me.

Video: Data Analysis: Extracting Intelligence from Big Data

Notes

Transcript

Data Mining vs. Traditional Statistical Regression

Data mining is a broad term that refers to tools for discovering patterns in large data sets
Traditional statistical regression starts with a hypothesis, whereas data mining is a data-driven exploration that may not start with a hypothesis
Data mining techniques include clustering and association rule mining

Clustering

Clustering is a data mining technique that groups similar data points together
Clustering can be used to determine customer segments in a data-driven manner
Example: clustering can be used to identify purchasing patterns of suburban soccer moms and compare them to other customer segments

Association Rule Mining

Association rule mining is a data mining technique that finds common co-occurrences in data
Example: analyzing shopping cart data to find patterns, such as people who buy bread and butter also tend to buy milk
Another example: Don Swanson’s analysis of Raynaud’s disease, which found that EPA (eicosapentaenoic acid) is associated with reducing blood viscosity and strengthening the musculoskeletal system, leading to a hypothesis that EPA can help treat Raynaud’s disease

Predictive Analytics

Predictive analytics involves using data to make predictions about the future and take action based on those predictions
Examples of predictive analytics include:
- Predicting demand for a product and making production decisions based on that
- Predicting whether a transaction is fraudulent or not
- Recommending products to customers based on their browsing and purchasing history
- Detecting fraudulent transactions in real-time

In this lecture, we’re going
to talk about data analysis. In particular, we’ll begin
by discussing data mining. Now, data mining is a
broad term that refers to tools for discovering
patterns in large data sets. To understand what
exactly is data mining, it’s useful to contrast data mining from something
many of you understand, which is simple
statistical regressions. Now, when we are
conducting regressions, we might start off
with a hypothesis. For example, we’re trying to understand what are the
factors that predict whether a customer might default and not pay
their credit card dues. We might come up with the
hypothesis that the risk of default depends on a
number of factors, such as whether they’ve
defaulted in the past. That is, we have a hypothesis
that people who have defaulted previously are
likely to default again. We might also have a hypothesis that people who have
a large number of credit cards are
likely to default because they are
perhaps struggling to manage their finances. Lastly, we might have
a hypothesis that people who are employed might
be less likely to default. Now the goal of the
regression might be to test these hypotheses. We might run a regression
based on past data, where we test what is
the risk of a person defaulting and whether it
depends on these factors, meaning prior default, number of credit cards, and whether they’re
employed or not. The regression tells us whether these factors
matter or not, and it also tells us
how much it matters. On the slide, you
see that whether a person defaulted
in the past or not has an impact on whether they default
again in the future, and the regression coefficient 0.93 tells us how
important it is. Now notice that all these
important variables, like number of credit cards, whether they’re employed or not, whether they’ve defaulted
in the past or not, these came from a
hypothesis the analyst had. That is at the heart of
traditional data analysis, that is at the heart of
regression testing as well. Data mining, in contrast, is more about
data-driven exploration. It may not start
with a hypothesis, as I previously mentioned. There are a number of
different techniques that are part of data mining. In fact, data mining is a catch-all term for a
number of these techniques. I will not go over all of the techniques that are
part of data mining because really there are a very large set of
such techniques. But I’ll go over a
couple useful examples. The first one is clustering. Clustering is a data
mining technique that is used to group our data. Essentially, clustering
will break up our data into a bunch of
smaller groups or clusters, such that data points within a cluster are similar
to each other, and data points in
different clusters are different from each other. A classic application of
clustering might be in determining customer
segments in our data. The old way of doing
customer segmentation might be from your gut. A marketing manager might say
based on their experience, that we have three kinds
of customer segments, and they might describe
these customer segments in terms of some customer
demographics like they might say one of those
customer segments might be soccer moms in families of four or five people who live in suburban places, and that might be how they might articulate what one of their
customer segments is like. In contrast, when
you use clustering, we’re trying to figure
out the customers and the customer segments from a data-driven manner
without this hypothesis, and clustering might
either validate the gut of the manager
and might indeed indicate and show that the
purchasing patterns of suburban soccer moms are different from the purchase
patterns of other customers. Or it might suggest that the
differences are not that important and maybe there’s a different way we should be thinking about
customer segments. Another data mining tool is
association rule mining. Association rule mining is a data mining technique that finds common co-occurrences
in the data. For example, we might analyze shopping cart data or purchase patterns of
customers at a grocery store, and we might look at
common patterns in there. Association rule
mining software might find a pattern such as people who tend to buy
bread and butter in a transaction also tend to buy milk in that
same transaction. If we find this transaction, we might take
action based on it. For example, a traditional
grocery store meaning a brick and mortar grocery store might decide to stock bread, butter, and milk close by. Or an online grocery store might decide that if a customer has already added bread and butter to their shopping cart, then it’s going to
make a recommendation to the customer
to also add milk. There are many applications of association rule
mining techniques in business data in order to
find patterns in those data. Another example might be
applications in health care. One example that comes to
my mind is an analysis of Raynaud’s disease that was
done by a computer scientist or information scientist by
the name of Don Swanson. Don Swanson was interested in
studying Raynaud’s disease, which is a syndrome that affects the
musculoskeletal system. He was in particular
interested in identifying novel treatments
for Raynaud’s disease. Because at that
time there were not very many known treatments for Raynaud’s disease
or Raynaud’s syndrome. In order to answer that, Don Swanson looked
at a number of research papers on
Raynaud’s disease and found what kinds of concepts are associated with
Raynaud’s disease. In other words, what are the common co-occurrences with
the term Raynaud’s disease? He found that blood
viscosity is a term that often co-occurs with a
discussion of Raynaud’s disease. He also found that
musculoskeletal issues are often discussed in articles that talk about
Raynaud’s disease. For example, he found that articles that talk about
Raynaud’s phenomenon or Raynaud’s syndrome talk about an increase of blood viscosity
during Raynaud’s syndrome. Next, he asked what kinds
of other concepts are commonly co-occurring
with ideas such as blood viscosity and
musculoskeletal weakness. He found one concept which is EPA or eicosapentaenoic acid, which was commonly discussed along with blood
viscosity along with musculoskeletal weakness
and along with a number of ideas that are associated
with Raynaud’s disease. For example, he found
phrases such as EPA or eicosapentaenoic acid
helps reduce blood viscosity. In contrast, Raynaud’s disease
increases blood viscosity. EPA is also associated with strengthening the
musculoskeletal system. In contrast,
Raynaud’s disease is associated with weakening of
the musculoskeletal system. Based on this, Don
Swanson came up with a hypothesis that EPA, which is found
commonly in fish oil, can help treat
Raynaud’s disease. Indeed, later clinical
trials showed that fish oil is an effective
treatment for Raynaud’s disease. Now I should clarify here that Don Swanson did not use
association rule mining software. Instead, he used the same
idea and did it manually. But in his later research, he talked about
how his scientific process could perhaps be
automated using tools that are finding common
co-occurrences in data. This is what is at the heart of association rule
mining software. Now data mining techniques, such as clustering and
association rule mining, ultimately are about
finding patterns in data. The next step beyond just
finding patterns is to perhaps make predictions about the future and take
action from it. For example, can we
predict demand for our product in the
future and figure out production decisions
based on that. Can we predict whether a transaction that just
happened is fraudulent or not? That is where the domain of predictive analytics comes in. Let’s look at a
couple examples of what we can do with
predictive analytics. Let’s look at a large retail
company, such as Amazon. A customer might
visit the website. They might actually look and
or browse at a few products. They might eventually pay for these products and the items
are shipped by Amazon. Now the goal of
the retailer is to convince the customer
to buy the product. Often, retailers like Amazon will show recommendations
to the consumer. For example, recommendations such as people who
bought this also bought this or people who viewed this product also
viewed that product. At the heart of these
recommendations is an attempt to try and figure out what kinds of products might this customer
be interested in? That ultimately is
hoping to convince the customer to buy a product. That’s an example of a predictive analytics
application that is trying to predict what kinds of product a customer
might be interested in. Another example might be that when a customer is
ready to buy the product, they might enter their
credit card information and hit Purchase now or Buy now. At this point, algorithms at the retailer’s website
have to figure out whether this is a
legitimate transaction or not. In particular, whether
the credit card is a legitimate credit card that is owned by this
customer who’s placing the order or is it likely
that this is stolen? Here, predictive
analytics techniques look at past data and try to predict whether this transaction is
fraudulent or not. Ultimately, this is
just one example of predictive
analytics in retail. Indeed, there are many
applications of these approaches. In the next module, we will look at these predictive
analytics techniques. In particular, we’ll look
at machine learning as a tool to make predictions that are
managerially actionable.

Video: Introduction to Artificial Intelligence

Notes

Transcript

Artificial Intelligence (AI)

AI refers to the development of computer systems that can perform tasks that normally require human intelligence
There are different types of AI, including weak AI (narrow intelligence) and strong AI (artificial general intelligence)
The goal of the field is to build strong AI that can perform any intellectual task that a human can

History of AI

The field of AI was founded by Alan Turing, who proposed the Turing test to determine whether a machine can think
The first workshop on AI was organized by John McCarthy in 1956
The term “Artificial Intelligence” was coined by John McCarthy and has since become a widely recognized term in the field

Machine Learning

Machine learning is an alternative approach to building AI that involves giving computers the ability to learn from data
This approach is in contrast to traditional expert systems, which rely on explicit programming of knowledge from experts

Limitations of Expert Systems

Expert systems have limitations, such as Polanyi’s Paradox, which suggests that there is a lot of tacit knowledge that is not explicitly known
This limitation has led to the emergence of machine learning as an alternative approach to building AI

Examples of AI

IBM’s Deep Blue, which beat the world chess champion in 1997
IBM’s Watson, which beat human champions on Jeopardy! in 2011
Google’s AlphaGo, which beat the world Go champion in 2016

Machine Learning

Machine learning is a way to build AI that involves giving computers the ability to learn from data
This approach has been successful in building AI systems that can perform complex tasks, such as playing Go and diagnosing diseases.

In this module,
we’ll talk about artificial intelligence, we’ll begin with a brief overview of AI. We’ll then dive into a sub-field
of AI known as machine learning. We will start with the high level view
of what exactly is machine learning. And then, we’ll dive into some
specific machine learning methods. With that, let’s start by talking about
what exactly is artificial intelligence. Artificial intelligence or AI is a term
that refers to the development of computer systems that are able to perform tasks
that normally require human intelligence. Such as understanding language, reasoning,
speech recognition, decision making or navigating the visual world,
manipulating physical objects and such. When we talk about artificial
intelligence, there are many kinds of AI, for example,
one can think about weak AI and strong AI. Weak AI, also known as artificial
narrow intelligence is the kind of AI that is very
good at a very specific task. For example, you might have a
chess-playing AI I that can probably beat the world’s best chess grandmaster,
but it is only good at that one task. The same AI probably
cannot converse with us, it probably cannot recognize images and
so on. Similarly, you might have AI that is
good at product recommendations, but is not good at chess or
recognizing images. In short, these are AI that
are good at one narrow task, most of the AI around
us tend to be weak AI. But the goal of the field is eventually
to build what is known as strong AI or artificial general intelligence. This is a computer program that could
do all intelligent things that a human can do. And so this kind of AI would
be truly intelligent and would be close to a human being
at a wide range of tasks. And finally, you have the notion
of artificial super intelligence, this is an AI system that is a strong AI. It’s as good as humans at a lot of tasks
but it has the ability to leverage its computational resources to store more
data, to analyze the data faster and make decisions faster and therefore
can perhaps beat humans at many tasks. And that is the idea of
super intelligence or AI that is better at humans at most tasks. The history of AI is very recent, the
field owes its origins to a paper written by mathematician Alan Turing,
who asked the question can machines think? Turing had the contention
that machines can be constructed which can
simulate human mind very closely. In fact, he proposed a test which
is known as an imitation game, or also popularly known as the Turing
test for machine intelligence. In the test, a human judge interacts
with two computer terminals, one of the computer terminals
is controlled by a computer and the other terminal is
controlled by a human being. The judge interacts and has a conversation with each of
these through the computer terminal. If the judge cannot distinguish between
the human being and the computer system, then that computer system is said
to have passed the Turing test. Now, when Alan Turing proposed
the Turing test and posed the question, can machines think,
it created a lot of interest in the field. And it led to one of the first workshops
in the field which was a summer workshop on artificial intelligence that
was organized by mathematician, John McCarthy and was attended by
several other luminaries of the field. At this workshop, the scientists laid
the foundations for a field that became known as AI and in fact also coined
the term AI or artificial intelligence. Computer scientists, Pedro Domingos
believes that calling this field AI made it very ambitious, but it also helped
inspire many people to enter the field and that has been responsible for
a lot of progress that the field has made. Now, a lot of the early attention
in AI often was focused on whether AI could beat human beings at games. For example, in 1997, IBM created
a chess-playing computer called Deep Blue, which ended up beating the world
number one chess player at the time, Gary Kasparov three and
a half points to two and a half points. This system had no machine learning in it, meaning it was not capable of learning
on its own without being programmed. Its edge relative to human players
came from its brute computing power, its ability to analyze more than
200,000 moves per second and figure out the best possible move. In 2011, IBM created IBM Watson
which beat Ken Jennings and Brad Rutter who were two of the best
all time players of Jeopardy. IBM’s Watson had machine learning in it,
which was capable of understanding language, meaning understand
the question that’s being asked, that was able to retrieve information
from a large database of information and then answer the question
that was posed to it. More recently, Google created software
known as AlphaGo to play the game of Go. Go is a strategy game like chess but
is much more complex than chess, which implies that brute computing power alone
is not sufficient to beat a human being. You require something more
than brute computing power and you require the ability to learn and
is a better yardstick for intelligence. Google used some of the latest machine
learning techniques in creating AlphaGo. And AlphaGo had great success
in playing human beings and in fact beat the World Go champion,
Lee Sedol. There are many ways to build
artificial intelligence, now, the old way of building AI is an approach
known as knowledge engineering or also now referred to as expert systems. This is the idea of programming
knowledge or capturing and transferring knowledge
to the computer system. For example, if we wanted to build
software to diagnose diseases, we might interview doctors and codify
the rules they use to diagnose diseases. For example,
a doctor might tell us that if a person or a patient has had fever for
over a week and they have body aches and chills, then they might start to
consider antibiotic treatment. Now, that’s a rule that
they might give us and we might program many such
rules to diagnose diseases. Similarly, if we wanted to drive cars, we
might interview thousands of drivers and ask them, what are the rules
they used to drive cars? And they might give us rules such as
when the car in front of us slows down, we might apply the brake and
slow down ourselves. If the car in front of us is going very
slowly, might change lanes and so on. Now, ultimately, we can create reasonably intelligent
systems using these kinds of techniques. And in fact, we have found over time
that expert systems do reasonably well. But over time, we have also observed
that expert systems are often not able to beat human beings at complex
tasks that require intelligence. For example, a system used to diagnose
diseases can do reasonably well, but it cannot often beat doctors in
terms of diagnosing diseases as well. This is because of a limitation that’s
referred to as Polanyi’s Paradox. Polanyi was a mathematician who came
up with the idea of tacit knowledge, which is the idea that we have a lot
of knowledge that we are not aware of. For example, when you ask a person, what are the rules
they used to drive a vehicle, they might be able to give us a number
of the rules that they can think of. And those rules are useful,
but at the same time, they’re not sufficient because there’s
a lot of knowledge we all have that we implicitly apply when we’re driving. But we’re simply not aware of
some of these principles that we apply with driving. And so as a result, asking people to
give us all the knowledge they have gets us a good amount of information,
but because of tacit knowledge, it doesn’t give us all the information. This is why an expert system to
diagnose diseases often cannot beat real world experts. This is why a driverless car created
using knowledge engineering or through an expert system approach ultimately
cannot drive as well as human beings. This has led to the emergence of
an alternative approach which is known as machine learning. Which is the idea that instead of
explicitly programming computers with knowledge from experts, we can instead
give them the ability to learn from data. And hopefully, they can observe
the action taken by experts and mimic that action over time. And that is what we will
turn to in the next lecture.

Video: Machine Learning Overview

Notes

Transcript

The lecture discusses machine learning, a subfield of artificial intelligence that focuses on how computers can learn from data without being explicitly programmed. Machine learning is used for prediction tasks, such as predicting whether a transaction is fraudulent or not, determining whether an email is spam or not, and recognizing speech.

There are three types of machine learning: supervised learning, unsupervised learning, and reinforcement learning.

Supervised Learning: In supervised learning, the algorithm is trained on labeled data, where the input and output are clearly defined. The goal is to learn from the labeled data and make predictions on new, unseen data. Examples of supervised learning include image classification, speech recognition, and sentiment analysis.

Unsupervised Learning: In unsupervised learning, the algorithm is trained on unlabeled data, and the goal is to find patterns or structure in the data. Examples of unsupervised learning include clustering, anomaly detection, and topic modeling.

Reinforcement Learning: In reinforcement learning, the algorithm learns by taking actions and observing the consequences of those actions. The goal is to learn a policy that maximizes a reward signal. Examples of reinforcement learning include game playing, robotics, and autonomous vehicles.

The lecture also discusses the importance of high-quality data in machine learning, and how reinforcement learning can be used to balance exploration and exploitation in decision-making.

Some key points from the lecture include:

Machine learning is a subfield of artificial intelligence that focuses on how computers can learn from data.
There are three types of machine learning: supervised learning, unsupervised learning, and reinforcement learning.
Supervised learning involves training on labeled data to make predictions on new data.
Unsupervised learning involves finding patterns or structure in unlabeled data.
Reinforcement learning involves learning from actions and consequences to maximize a reward signal.
High-quality data is essential for machine learning.
Reinforcement learning can be used to balance exploration and exploitation in decision-making.

Hi again. In this
lecture we’ll talk about machine learning and the different types
of machine learning. The machine learning,
as I mentioned, is a sub-field of
artificial intelligence. It’s mostly focused on how
do we get computers to learn from data without
explicitly programming them? They’re often used
for prediction tasks. For example, we
might have data on past credit card transactions, and we might be interested
in predicting whether a new transaction is
fraudulent or not, so we might look at
past data in order to make this decision. Or we might be interested
in determining whether an email is spam or not
based on past data. We might be looking at a task of analyzing images for
a driverless car and figuring out whether
the object in front of a car is another
vehicle or a person, or a tree, or something else. We might be interested
in recognizing speech and understanding speech
like with Alexa or Siri. In short, there
are many kinds of prediction tasks that use machine learning and these have applications in a
variety of industries, ranging from
healthcare, to finance, to manufacturing, to human
resources, and so on. Now, it’s important
to understand machine learning is not
one single technique, they’re really a large set of techniques all of which come under the umbrella
of machine learning. In fact, there are many
types of machine learning. For example, one way to think of machine learning is in terms
of supervised techniques, unsupervised techniques, and reinforcement
learning techniques. Supervised learning
is the idea of building a predictive
model based on past data, and these data have clearly labeled input
and output data. For example, we
might have data on emails in the past and nice and clear labels on which
of those past emails are spam emails and
which ones are not. We might then want
to learn from it. This is a classification task
which is using past data, which has nice
labels of inputs and outputs to learn how
to label future data. Unsupervised techniques,
in contrast, have a lot of input data but you don’t have clear
labels on output, and so these techniques are finding patterns
in the input data. For example, you might
have anomaly detection, which is the idea of finding certain data points that look like anomalies or
in other words, they look different from
all other data in there. Similarly, we talked about
clustering, previously, which is the idea of grouping a set of data points
into different groups, such that data points within a group are as similar
to each other, and data points in
different groups are as different
from each other. This is based on data, but we don’t have
clearly labeled output that is guiding us on how best to actually break up the data into
different clusters. Lastly, we have
reinforcement learning, which is the idea of
having a machine learning system acquire new data by taking actions and looking at the data to learn and
improve its future action. We will look at each of these techniques in greater detail. Let’s start with
supervised learning. As I mentioned,
supervised learning is the idea of
learning from data, where you have cleanly labeled
output and labeled input. The inputs can be referred to as features or as covariates, and the outputs are often
called targets of the model. This is what we’re
trying to predict. For example, as I mentioned, we have email data and
the output that we’re trying to predict is whether
an email is spam or not. The inputs or the features of the covariates are the
actual text in the email. With supervised learning,
the idea is that we have cleanly labeled past data
which have a correct answer, meaning that certain data
have been labeled as spam and certain
other data have been labeled as not being
spam and now we need to learn how to classify
future emails. Similarly, you might
have a desire to predict sales next week based
on historical data. We might use data on the season, the month of the
year, the weather, and other such patterns
to predict future sales. Our training data is actually past data which has
all these patterns, month, season, weather, and also the actual sales that were
realized in the past. Now we’re trying to make predictions in the
future based on that. Let’s look at another example
of supervised learning. In a recent research study, my colleagues and I were
interested in analyzing social media posts posted by a number of
companies on Facebook. We gathered data on
over 100,000 posts submitted by large
brands on Facebook. We wanted to identify what posts are associated
with the highest engagement. That is our emotional posts associated with
greater engagement, or humorous posts
or posts that show deals and promotions to
consumers or other posts. Now, it is very
expensive to tag 100,000 posts and label each post
as being humorous or not, emotional or not, as offering a price
discount or not, and so on. We wanted to automate
this process. We use a supervised machine learning
technique to do that. To do that, we first need data, a training dataset that has clearly labeled
inputs and outputs. The inputs are available to us. These are the words that
companies use in their posts. The output is essentially
a label that says whether the post is emotional
or humorous or not. To do that, we took
a sample of 5,000 posts and had human beings
label each of these posts. Every one of these
5,000 posts was labeled by a human being
as being humorous or emotional or as offering
a price discount or being a post that shares a remarkable
fact, and so on. These labels were then used as a training dataset for supervised machine
learning algorithm that learned what words are predictive of whether a post is emotional or humorous or not. Then that algorithm was used
to make predictions for the remaining nearly 100,000 posts that hadn’t been labeled
by a human being. This is essentially the idea of supervised machine learning, which is you need a training
dataset and you learn from that and you apply
that to future data. What we found in our study was that our machine
learning algorithm did well and often had accuracy
of over 90, 95 percent, and sometimes even
greater than 99 percent, in essentially being
able to predict whether a post is humorous or whether a
post is emotional or not. In any business application, if you have good high-quality
training dataset, one can apply these
techniques in order to make predictions
about the future. The key is collecting
high-quality data, and that is the most
important activity in supervised machine learning. There are a number of very good high-quality off
the shelf algorithms that can be applied to make
predictions if you’ve got high-quality training dataset
for machine learning. The next set of machine
learning techniques are unsupervised
learning techniques. Unsupervised learning
techniques also take in data, but they don’t have
clearly labeled output. For example,
clustering algorithms that we discussed previously. They tend to cluster our
data into different groups, but they are not
told in advanced what the ideal
clustering looks like, meaning there’s no
labeled output for them. Similarly, another example
is anomaly detection. Anomaly detection algorithms
look at a bunch of data and identify data points that look dissimilar to most
of the other data. Here again, there’s
a lot of input data, but there’s no clearly
labeled output. Another example is Latent
Dirichlet Allocation or LDA, which is a commonly used
technique for topic modeling, meaning identifying what topics a certain document might cover. Typically, with LDA, you have an input dataset which consists of a large
set of documents. The idea behind LDA is that each document likely covers
a small set of topics, and each topic itself tends to use the same set
of words quite frequently. For example, we might
take a large dataset of new stories published in all of the major newspapers and
online news media outlets, and feed that as an input
to an LDA algorithm. An LDA is trying to identify the topics that these
documents cover, but it’s not given
clearly labeled outputs, meaning that the algorithm
is not told that here’s a document on
politics and here’s a document on sports and so on. LDA as a set assumes that each document
covers very few topics, and each topic has a few words
that it uses frequently. When it takes a training dataset or an input dataset rather, LDA might identify that a certain topic tends to use certain words
quite frequently. For example, it might
say that here’s a topic that tends to
use the word Obama, the word Trump, the word speech, and a few other such
words quite frequently. But it does not tend
to use words like pizza or baseball as frequently. This clearly we can infer
is the topic of politics, and that’s something
that the algorithm identifies on its own. Now, given any document, LDA then looks at the
kinds of words that are used in this document and identifies which
topics it covers. Given a document, LDA might
say that a topic covers sports or a topic covers
politics and so on. Once LDA has been trained
using a large dataset, it can now be applied
to any new document, and it can
automatically classify these documents and identify
the topics in there. In this example,
you see a passage that LDA might analyze and it looks at certain words that
are used in this document. With each of these
words it identifies certain topics that these
words are related to. For example, arts, or education, or children. Then it identifies a set of topics that this
document covers. Now, in addition to
unsupervised learning, we also have this idea of
reinforcement learning. Reinforcement
learning usually does not take in large
training datasets. Rather, the algorithm learns by testing or trying various
actions or strategies, and observing what happens and using those observations
to learn something. This is a very
powerful method and has been used in a number of
robotics-based applications. It is also at the heart of a
software created by Google, which was called AlphaZero, which was an advanced version of Google’s Go playing
software, AlphaGo. AlphaGo had used
training dataset, which is based on past Go games. AlphaZero had no
training dataset. Instead, it learned the
game of Go by playing Go against itself and once it played millions of
games against itself, that was in fact the
training dataset that it used to develop the best
strategies for this game. Of course, in many settings, experimentation
isn’t always free, and so you have to balance
the cost of experimentation against exploiting the
knowledge that we already have. Let’s explore that through a reinforcement
learning algorithm known as multi-armed bandit. To illustrate how
bandit algorithms work, let’s consider setting
where you have two different ad
copies that we have designed and that we would like to try with our customers. We do not know which ad
copy is more effective in engaging customers and attracting them to
click on the ad. We would like to ideally figure out which ad is the
better ad to use. Now, one way to figure this out is to do what is
known as A/B testing. That is, we might show ad A to half the users and
ad B to half the users. We might do this for some period of time,
let’s say a day. Then we observe which ad has
the higher click-through rate and we might use
that ad there on. Now in this graph that
you see on this slide, we have two ads, ad A and ad B. Ad A, has a
click-through rate of five percent and ad B has a click-through
rate of 10 percent. But we do not know
this in advance. What we might end up
doing is show ad A to some users and show
ad B to some users. If we’ve shown these ads in a randomized version to a large number of
users, over time, we learn that ad A has five
percent click-through rate, ad B has 10 percent
click-through rate. Then we can use ad B
from that point onwards. But there is a cost
of this learning, because some people were shown ad A and some people
were shown ad B. During this learning step, the average
click-through rate that our ads experienced was
seven and a half percent, which is lower
than we would have obtained if we had chosen
the better-performing ad. Now, bandit algorithm
can do better, and it can improve performance. The way it does this, is that it starts off initially like any A/B testing algorithm, meaning it shows ad A and
ad B equal number of times. But it starts to observe what is happening
and is learning. For example, it
starts to observe that ad B is doing
better than ad A. As it learns this, it starts to show ad B
more frequently than ad A. It still will show
ad A a few times so, it still allows itself
to learn and correct itself in case ad A actually
will perform better. But over time it starts to
weigh ad B more and more, and as a result, if you observe at the
end of a day or in this example at the
end of 1,000 sessions, this ad which used a bandit algorithm based
allocation strategy, ended up having a click-through
rate that was much higher than seven and a
half percent that we obtained through A/B testing. It was not quite equal to the
10 percent that ad B has, but it’s close enough
because what it’s able to do is it’s able to
experiment and learn, and exploit that knowledge to
also improve the outcomes. In short, a reinforcement
learning algorithm is essentially an algorithm
that takes actions, observe what happens, and then improves its
performance over time.

Video: Reinforcement Learning

Notes

Transcript

The text discusses reinforcement learning, specifically multi-armed bandit algorithms, as a powerful tool for making decisions in situations where continuous data is available and can be learned from to improve decisions. Examples of such situations include:

Personalizing a news media website to users
Determining which news articles to feature on the homepage
Personalizing a product page on an e-commerce website (e.g., Nanophone) to a consumer, including deciding which images to show, which product features to emphasize, and which discounts to offer

The key challenge in these situations is balancing exploration (gathering more information about the decision environment) and exploitation (making the best decision based on current information). Multi-armed bandit algorithms can help balance this trade-off.

The text also explains two algorithms for multi-armed bandit problems:

Epsilon-first: experiment early and then exploit what has been learned
Thompson sampling: initially allocate traffic equally to all choices, then adjust based on the results

Reinforcement learning has applications in gaming and online personalization, but is not as widely used as supervised machine learning.

Let’s next look at
reinforcement learning and in particular multi armed
bandit algorithms more closely. These algorithms are a powerful tool when
you have continuous data coming in and we can learn from the data
to improve decisions. For example, consider a media website, like a news media website that would like
to personalize the website to its users. Or determine for example which
of thousands of different news articles to profile at
the top of its homepage. Or consider an e-commerce retailer. Let’s call it nanophone, which is
a retailer of let’s say, mobile phones. When a consumer logs into the website,
nanophone needs to decide how to personalize
the product page to the consumer. They might have to decide which of 10
different images of the phone to show to the consumer. They might have to decide which product
features to emphasize to the consumer. For example, should they focus on
the battery life for the phone or should they focus on the sleek design or
some other product attribute. They might want to decide on which of
several different discounts to offer to these consumer, 0% discount,
5% discount, 10% discount. They might have multiple
calls to actions and they might have to choose
which call to action to use. So the action space or
the set of decisions of the set of choices available to the marketer in this
context are really very large. And the goal is to decide which actions
to choose in order to maximize revenues. At the heart of this problem is
the question of how much do we explore and how much do we exploit? And what we mean by that is exploration
is all about gathering more information about the decision environment. For example, asking the question, what might happen if I choose not to
emphasize the battery life as much and instead choose to focus on
the sleek design of the phone? In contrast, exploitation is about making the best
decision given the current information. Maybe based on the current information,
we believe the marketing message that most attracts the consumer is a
message that emphasizes the battery power. So should we go with that or
should we try something new? Now, we routinely use
the ideas of exploration and exploitation in our everyday lives. For example,
suppose you’re going to a restaurant, do you go to a completely new restaurant
which is the equivalent of exploration? Or do you go to your favorite restaurant
which you’ve been too many times and you know, it’s tried and tested? That is exploitation. And there are times at which
you might choose to go to your favorite restaurant
that is choose to exploit. And there are times at which you might
say, let’s try something new and learn about the restaurant,
even if it risks the possibility that we might not enjoy the food or
the experience. And that’s exploration. The question at the heart of these kinds
of decision problems is how do we balance exploration versus exploitation? How do we decide when to try
a completely new marketing message or a completely new say, web page to
the consumer versus when do we use something that has worked
reasonably well in the past. And this trade off is what is
really handled by algorithms like multi-armed bandit algorithms,
which as I mentioned earlier is sort of the classical
reinforcement learning approach. Now, multi armed bandit problem
is a problem in which a fixed or a finite set of resources must be
allocated among multiple choices. So for example, imagine a gambler and
a casino faced with a row of slot machines and the gambler must
decide which slot machines to pull. And the gambler only has a finite
amount of time in the casino and can therefore get only 100 or
200 pulls in different slot machines. So they must decide at any given time
whether to try a completely new slot machine and see if it’s associated with a much higher
probability of getting high rewards. Or should the gambler stick with
a slot machine that is already producing reasonable returns. Now, there are many algorithms that can
be used to balance this exploration and exploitation and
indeed there are many algorithms for multi armed bandit problems. For example, a strategy epsilon
first is essentially a heuristic in which we tend to experiment early,
that is explore a lot early. And then once we have learned a little
bit, then we start exploiting. So in the context of personalizing
the website for nanophone. What we might do is during the first few
weeks we might choose to explore and try many different marketing messages,
many different images and so on. And once we’ve learned what we wanted to
learn then we choose to exploit that is, allocate 100% of the traffic to
the best performing variant that we discovered in the first
few weeks of exploration. Another algorithm available
is Thompson sampling. What Thompson sampling might do for
the problem that nanophone faces is that it might initially allocate the traffic,
the web traffic coming into the website equally to all the different choices
that the company is considering. Meaning the choices like,
should we emphasize messages and visuals that show the battery? Should we use visuals and
messages that talk about the sleek design. Should we instead use visuals and messages that talk about the app
store of the phone and so on. And Thompson sampling will initially
allocate traffic to each of these choices with equal probability. But as more and more data comes in,
Thompson sampling will choose the alternatives that
are producing higher or better results. It will choose them with
higher probability. So, if for example the message and
visuals that emphasize the app store is the one that slowly but
steadily producing better results, then the probability with which that
choice has chosen will keep going up. Qualitatively speaking,
that’s what the algorithm does. Obviously the details of how the algorithm
works is probably not the most interest given our focus is to talk
about the business applications of AI. But hopefully you get a sense of
the intuition behind these approaches. In summary, while we assume that
machine learning is based on having access to large data sets,
reinforcement learning offers an alternative that
relies less on training data. But more on dynamic experimentation
to learn which strategies are doing better and
to use those more and more. Reinforcement learning has found
many applications in gaming and in online personalization. That said, today it is not as widely used
as other machine learning approaches, such as supervised machine learning. Given how pervasive supervised
machine learning is, especially in business settings, we will now deep dive into the world
of supervised machine learning methods.

Video: A Detailed View of Machine Learning

Notes

Transcript

What is Machine Learning?

Machine learning is a type of AI that involves taking input variables and predicting an outcome variable.
Supervised machine learning is the most common type of machine learning, where the goal is to learn from labeled data to make predictions.

Example of Machine Learning

Predicting whether a user will purchase a product based on their website behavior, demographics, and device information.

Key Concepts

Input variables (X): The data used to make predictions, also known as features, predictors, or covariates.
Outcome variable (Y): The variable being predicted, also known as the output or target variable.
Function f: The relationship between the input variables and the outcome variable, which machine learning aims to approximate.

Factors that Drive Prediction Accuracy

Quantity of data: Having more observations (rows) helps increase accuracy.
Quality of data: Having more features (columns) and relevant information helps increase accuracy.
Relevance of information: Having relevant data is more important than having a large quantity of data.
Complexity of the model: More complex models can capture more complex relationships, but may also lead to overfitting.
Feature engineering: The ability to create new features or transform existing data to improve predictions.

Next Steps

The next lecture will cover specific machine learning algorithms to provide a better understanding of how they work.

Hello. In this lecture, we’re going to go into the
details of machine learning. In particular, I’ll try and provide a high-level
view of what is machine learning and what drives the accuracy of machine
learning models. For the purposes of
this discussion, I’ll tend to focus only on supervised
learning techniques. There’s a good reason for this. If we look at the
practice of AI, I would say that
almost 90 percent or maybe even higher
than 90 percent of practical business uses of AI tends to be machine learning. If you look at within
machine learning, almost 90 percent of
machine learning in practice tends to be
supervised machine learning. For the discussion of machine learning in
a very high level, I’ll tend to focus
our attention on supervised machine
learning algorithms. As I’d mentioned
earlier, at its core, supervised machine learning
is all about taking a set of input variables and predicting some
outcome variable. Now, we do this all the
time in our real life. For example, if you observe
dark clouds and strong winds, we might predict that
maybe it’s going to rain. Or we might look at what clothes somebody’s wearing or
better yet how they’re interacting with us
and we might make some inferences or
predictions about whether we’re likely to be
good friends with them. Or at the workplace, we might look at a person’s
educational background. We might look at
their job experience, at their skills, and predict whether they’ll be
successful at the job. These are all typical
prediction problems that we are solving on
a day-to-day basis. Now there are many
business applications of these kinds of predictions. Now, for example, in
a business setting, we are trying to predict whether somebody will buy a product. We’re trying to predict whether somebody will click our add. All of these predictions
can be made using supervised learning if we
have good training data. Let’s consider an example. Suppose we have data about
users coming to our website, we might know how many pages have they viewed on our
website in the past, we might know their zip code
based on their IP address. We might know what device they’re accessing our page from, and we might know the operating
system of that device, and ultimately we’re
trying to predict whether this person will purchase
a product or not. Now that’s a typical prediction
problem one might have. Now, for the data we
have about our users, meaning the input data that
we use to make predictions, one refers to these
input data as the predictors of our model
or the features in our model, or sometimes just the data and the variables or the
covariates of the model. There are many different names, but ultimately we can think of all these variables as
the inputs to our model. Now, given these inputs, we’re also trying to
predict something, that’s the output of our model
or the outcome variable. We might describe the input
variables using the letter X, and we might describe the outcome variable
using the letter Y. Now the prediction
problem that we’re facing is we’re
trying to figure out, given the input X, we’re trying to predict Y. That is, we are trying to
figure out some function f that takes X as an
input and predicts Y. The entire task of supervised machine learning
comes down to coming up with a highly accurate approximation
of this function f so that we can predict
Y as accurately as possible given the inputs X. Now, the notion of accuracy is built-in to what
I just described. For any prediction problem, we think of accuracy as essentially a measure
of how often are the predictions true or how close are the
predictions to reality. For example, if we’re trying to predict whether a person
purchases or not, if we make 100 prediction and 93 of those 100
predictions are correct, we might say that the prediction accuracy of
our model is 93 percent. We want this prediction
accuracy to be as high as possible and so
natural question is to ask, what drives the prediction
accuracy of a model? One factor that drives
the accuracy of a machine learning model
is the quantity of data, meaning the number of
distinct observations we have as an input
into the model. For example, if we have data on only 100 customers who’ve been to our website in the past, it would be very hard for us to make accurate predictions about future customers
visiting our website based on the observations
of just these 100 users. On the other hand, if we had
data for a million users, then we have a lot more
data and so clearly having more observations helps increase the accuracy of our model. Another driver of
prediction accuracy is how much do we know
about each observation? Now in the example
I just described, for each consumer, we only knew the number of pages
they viewed in the past. We knew the operating
system they use, and we also happen to
know their location. But this amount of
data might not be sufficient for us to actually
make the predictions. On the other hand, if we had much more data about each user. For example, we knew
that user’s interests, we also knew that
person’s income level, we also happen to know whether
they have made purchases from our website in the past and many more such observations, then the prediction accuracy of our model might go
up significantly. In other words, the two
factors that seem to drive the accuracy of the
model are the number of rows, which we can think about
the number of data points, and the number of columns, which we can think of as
the number of features or the number of X variables
available within each row. Together, they tend to drive the accuracy of machine
learning models in very significant ways. But they’re by no means
the only two factors. There’s a number
of other factors. For example, how relevant is the information
you have available? If we were trying to predict whether it’s going
to rain today, knowing how many people are
carrying umbrellas today is more useful than just knowing the color of the
clothes worn by people. So clearly having more
relevant data matters. Similarly, the complexity
of the model matters. If we restrict ourselves
to use very simple models, they might not be
able to capture very complex relationships that are out there in
the environment. Some of the more modern
machine learning methods like deep learning, which I will describe
in a later lecture, allow us to have more flexible and more
complex relationships between the input variables and
the outcome variables, and this helps increase the prediction
accuracy of models. Another factor is what is referred to as
feature engineering. This is essentially
the ability of an analyst to use their
domain knowledge to create new features or new input variables that are
predictive of the outcome. This comes down to having
deep domain knowledge and identifying what new data might we add to our
dataset or how might we transform our dataset so that we can better
make predictions. In short, there are many
different factors that drive the success of
machine learning models. Certainly one has to think
about many of these factors. It always starts with having high-quality and
high-volume data with lots of rows
and lots of columns. In the next lecture, we will talk about some specific machine
learning algorithms to give you a better
intuition of what exactly these machine
learning algorithms do.

Module 1 Quiz

What evidence is there that AI might be the next general-purpose technology?
Experts predict artificial general intelligence is only 50 years away.
It will be a highly impactful technology in the future, though it is not affecting us currently.
It is very difficult to store big data without a solid understanding of artificial intelligence.
There is widespread demand for AI and AI research skills across industries.

There is widespread demand for AI and AI research skills across industries.

Llama3

The correct answer is:

There is widespread demand for AI and AI research skills across industries.

This statement suggests that AI is already having a significant impact on various industries, and there is a growing need for AI research skills across different sectors. This widespread demand is an indication that AI has the potential to become a general-purpose technology, with applications in multiple domains.

The other options do not provide direct evidence that AI might be the next general-purpose technology:

The prediction that artificial general intelligence is 50 years away is a forecast, but it does not provide current evidence of AI’s impact or potential as a general-purpose technology.
The statement that AI is not currently affecting us is incorrect, as AI is already being used in various applications, such as virtual assistants, image recognition, and natural language processing.
The difficulty of storing big data without AI is a challenge, but it does not necessarily imply that AI will become a general-purpose technology.

Which statement about big data is least accurate?
Useful to decision-makers in a wide number of industries
Can be used to answer new types of questions
Thought of as mostly a bigger version of previous forms of data analysis
Difficult to implement, requiring new organizational skills

Thought of as mostly a bigger version of previous forms of data analysis

gemini-advanced-0514

The least accurate statement is:

Thought of as mostly a bigger version of previous forms of data analysis

Big data is not merely a larger version of previous forms of data analysis. It involves:

Volume: The massive quantity of data collected.
Velocity: The speed at which data is generated and processed.
Variety: The diverse types of data (structured, unstructured, semi-structured).
Veracity: The uncertainty and inconsistencies in data.
Value: The potential to extract meaningful insights from the data.

Big data requires new technologies and techniques for storage, processing, and analysis to derive valuable insights that traditional data analysis methods cannot uncover.

Let me know if you’d like to explore other aspects of big data.

claude-3-sonnet-20240229

The statement that is least accurate about big data is:

“Thought of as mostly a bigger version of previous forms of data analysis”

While big data does involve larger volumes of data compared to traditional data analysis methods, it is not simply a scaled-up version of previous forms of data analysis. Big data introduces new challenges and requires different approaches due to the following characteristics:

Volume: The sheer scale of data being generated and collected is massive, often in the range of petabytes or exabytes, which traditional data processing systems cannot handle efficiently.
Variety: Big data comes in various formats, both structured (databases, spreadsheets) and unstructured (text, audio, video, social media data), which requires different techniques for processing and analysis.
Velocity: Data is being generated at an unprecedented speed, often in real-time or near real-time, requiring systems that can handle high-velocity data streams.
Veracity: Big data can be of varying quality, with issues such as noise, inconsistencies, and incompleteness, which necessitates techniques for data cleansing and quality assurance.

To effectively manage and extract insights from big data, new technologies, architectures, and analytical methods are required, such as distributed computing frameworks (e.g., Hadoop, Spark), NoSQL databases, machine learning algorithms, and data visualization techniques. These approaches go beyond traditional data analysis methods and represent a paradigm shift in how data is processed and analyzed.

llama3

The statement that is least accurate is:

Thought of as mostly a bigger version of previous forms of data analysis

This statement is not entirely accurate because big data is not just a larger version of traditional data analysis. Big data involves working with large and complex datasets that require new tools, techniques, and methodologies to process and analyze. It often involves dealing with unstructured or semi-structured data, which requires different approaches than traditional data analysis.

The other options are more accurate:

Useful to decision-makers in a wide number of industries: Big data can provide valuable insights and patterns that can inform decision-making in various industries, such as healthcare, finance, marketing, and more.
Can be used to answer new types of questions: Big data can be used to answer complex questions that were previously impossible to answer with traditional data analysis, such as predicting customer behavior, identifying trends, and optimizing business processes.
Difficult to implement, requiring new organizational skills: Big data implementation can be challenging and requires new skills, such as data engineering, data science, and data visualization, as well as changes to organizational culture and processes.

Gemini

The least accurate statement about big data is:

Thought of as mostly a bigger version of previous forms of data analysis

Big data isn’t just a larger volume of data. It’s different in nature (variety of data types) and requires distinct analytical techniques to handle its complexity.

Here’s why the other statements are more accurate:

Useful to decision-makers in a wide number of industries: Big data analysis helps uncover patterns and trends that can inform strategic decisions across various sectors like finance, healthcare, and marketing.
Can be used to answer new types of questions: The vast amount and variety of data allows for exploring entirely new questions that traditional data analysis couldn’t address.
Difficult to implement, requiring new organizational skills: Big data requires specialized skills, infrastructure, and processes that traditional data analysis doesn’t.

One of the challenges of working with big data is that:
You must have a hypothesis in mind before analyzing the data.
Data generation has increased in recent years but computing capacity has remained largely stagnant.
Working with it involves a broad skillset and a wide range of tools.
Machine learning techniques are not applicable to big data.

Gemini

You don’t necessarily need a hypothesis before analyzing big data. In fact, one of the strengths of big data is its ability to uncover hidden patterns and answer unforeseen questions.

Here’s why the other statements are not the biggest challenges of big data:

Data generation has increased in recent years but computing capacity has remained largely stagnant: While keeping up with ever-growing data can be a challenge, advancements in computing power and distributed processing have mitigated this to a large extent.
Working with it involves a broad skillset and a wide range of tools: This is true, but it’s a challenge, not an impossibility. There are various tools and platforms available to manage and analyze big data.
Machine learning techniques are not applicable to big data: Machine learning is a powerful tool for analyzing big data due to its ability to handle complex and large datasets.

gemini-advanced-0514

The correct answer is: Working with it involves a broad skillset and a wide range of tools.

Here’s why:

Big data requires diverse skills: Working with big data involves not just data analysis, but also data collection, storage, processing, cleaning, visualization, and interpretation. This requires skills in statistics, programming, database management, and often domain-specific knowledge.
Big data tools are varied: The tools used for big data are diverse and include distributed file systems (like Hadoop), processing frameworks (like Spark), NoSQL databases, and various visualization tools. Mastering these tools can be challenging.

Let’s look at the other options:

Hypothesis not always necessary: While a hypothesis can be useful, exploratory data analysis is often done without one, especially when dealing with large datasets where patterns may not be immediately obvious.
Computing capacity is increasing: While data generation is indeed increasing rapidly, computing capacity is also evolving with the development of more powerful hardware and cloud-based solutions.
Machine learning is crucial for big data: Machine learning techniques are essential for analyzing big data and finding patterns that would be impossible to detect manually.

Let me know if you’d like more details on any of these points!

llama3

The correct answer is:

Data generation has increased in recent years but computing capacity has remained largely stagnant.

This statement is a challenge of working with big data because the rapid growth of data generation has outpaced the growth of computing capacity, making it difficult to process and analyze large datasets in a timely manner.

The other options are not accurate:

You must have a hypothesis in mind before analyzing the data: While having a hypothesis can be helpful, it’s not always necessary when working with big data. In fact, big data often involves exploratory data analysis, where the goal is to discover patterns and insights without a preconceived hypothesis.
Working with it involves a broad skillset and a wide range of tools: This is actually a characteristic of big data, rather than a challenge. Big data often requires a team with diverse skills, including data engineering, data science, and data visualization, as well as a range of tools and technologies.
Machine learning techniques are not applicable to big data: This is not accurate. Machine learning is a key technique used in big data analytics, as it enables the development of models that can handle large datasets and complex patterns.

What is the primary value of using data warehouses?
They provide a single point of access for data/analytics functions without affecting operations.
They allow for unstructured data storage.
They allow you to integrate with Amazon/Google.
They have bigger servers which allow for bigger data.

They provide a single point of access for data/analytics functions without affecting operations.

gemini-1.5-pro-api-0514

The answer is: They provide a single point of access for data/analytics functions without affecting operations.

Here’s why:

Single Point of Access and Analysis: Data warehouses are designed to consolidate data from multiple sources (operational databases, logs, CRM systems, etc.) into a centralized location. This makes it far easier for analysts and business intelligence tools to access and analyze data without needing to query multiple, often siloed, systems.
Non-Disruptive to Operations: Data warehouses are separate from operational databases. This means that running complex queries and analyses on the warehouse won’t impact the performance of the systems your business relies on for daily transactions.

Let’s address the other options:

Unstructured Data Storage: While some modern data warehouses are incorporating capabilities to handle semi-structured and unstructured data, their primary strength lies in handling structured data for analysis. Data lakes are often a better fit for storing vast amounts of unstructured data.
Integration with Amazon/Google: Cloud providers like Amazon and Google offer data warehousing solutions (like Amazon Redshift and Google BigQuery), but integration with them is a feature, not the core value of data warehousing itself.
Bigger Servers for Bigger Data: While data warehouses can handle large volumes of data, their value lies in making that data usable and insightful, not just storing it.

In essence, the primary value of a data warehouse is to transform raw data from various sources into meaningful information that supports better decision-making without disrupting your core business operations.

claude-3-opus-20240229

The primary value of using data warehouses is that they provide a single point of access for data and analytics functions without affecting the operational systems.

Data warehouses are designed to store large amounts of historical data from various sources in a structured and optimized manner. They separate the analytical workload from the transactional workload, allowing organizations to perform complex queries and analysis without impacting the performance of their operational databases.

By centralizing data from different sources into a single repository, data warehouses enable users to access and analyze data consistently and efficiently. This single point of access simplifies data management, reporting, and decision-making processes.

The other options mentioned are not the primary values of data warehouses:

Unstructured data storage: Data warehouses are primarily designed for structured data, not unstructured data.
Integration with Amazon/Google: While data warehouses can be hosted on cloud platforms like Amazon Web Services (AWS) or Google Cloud Platform (GCP), this is not their primary value.
Bigger servers for bigger data: While data warehouses can handle large volumes of data, the size of the servers is not the main value proposition. The primary focus is on providing a centralized, optimized, and scalable environment for data analysis.

Gemini

The primary value of using data warehouses is that they provide a single point of access for data/analytics functions without affecting operations.

Here’s why this is the most important benefit:

Centralized access: Data warehouses consolidate data from various sources into a single location, making it easier for analysts and decision-makers to find and analyze relevant information. This eliminates the need to search through multiple databases or systems.
Improved Efficiency: By having a central location for data, analysts don’t need to wait for IT teams to extract data or worry about impacting ongoing operations on source systems. This saves time and improves overall efficiency.

While data warehouses have other advantages, like optimized storage for analysis and some level of data transformation, these are secondary to the core function of providing a unified view for data analysis.

Data warehouses typically don’t allow unstructured data storage (although some can handle semi-structured data), and integration with cloud platforms (like Amazon or Google) is more of an implementation detail, not the core value. They might have larger servers to handle the volume of data, but that’s a consequence of their core function, not the primary benefit.

llama3

The correct answer is:

They provide a single point of access for data/analytics functions without affecting operations.

The primary value of using a data warehouse is to provide a centralized repository for storing and managing large amounts of structured and semi-structured data, making it easily accessible for analytics, reporting, and business intelligence purposes. This allows organizations to:

Integrate data from multiple sources and systems
Provide a single source of truth for data
Support business intelligence and analytics initiatives
Enable data-driven decision-making
Improve data governance and compliance

The other options are not accurate descriptions of the primary value of using a data warehouse:

Option 2 is incorrect because data warehouses are designed to store structured and semi-structured data, not unstructured data.
Option 3 is incorrect because while data warehouses can integrate with various cloud providers, this is not their primary value.
Option 4 is incorrect because while data warehouses can handle large amounts of data, this is not their primary value.

The main value of using MapReduce is:
Its ability to answer questions about future customers.
It integrates well with Microsoft Excel.
It allows for parallel computing which speeds up your query times.
It reduces any computation to a grid search algorithm.

It allows for parallel computing which speeds up your query times.

gpt-4-1106-preview

The main value of using MapReduce is:

It allows for parallel computing which speeds up your query times.

MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. The model is a simple yet powerful one that lends itself well to scalability and fault-tolerance. It is not specifically designed to predict information about future customers, integrate with Microsoft Excel, or reduce computations to a grid search algorithm. Its primary advantage lies in its ability to handle large volumes of data by splitting the workload across many nodes in a distributed computing environment, thus significantly speeding up processing times.

Gemini

The main value of using MapReduce is:

It allows for parallel computing which speeds up your query times.

Here’s why this is the most significant advantage:

Parallel Processing: MapReduce breaks down large datasets into smaller chunks and processes them simultaneously on multiple machines (nodes) within a cluster. This significantly reduces processing time compared to running the same task on a single machine.

While MapReduce can be used for various purposes, its core strength lies in handling massive datasets efficiently.

Here’s why the other options are not the main value:

Ability to answer questions about future customers: While MapReduce can be used to analyze customer data and identify trends, it doesn’t inherently predict future behavior.
Integration with Microsoft Excel: MapReduce is a framework for distributed computing, not designed for integration with specific software like Excel.
Reduces any computation to a grid search algorithm: Grid search is a specific search technique, not the primary function of MapReduce. MapReduce offers a more general-purpose framework for parallel processing of large datasets.

llama3

The correct answer is:

It allows for parallel computing which speeds up your query times.

MapReduce is a programming model and software framework used for processing large data sets in parallel across a cluster of computers. Its main value is its ability to scale horizontally, allowing it to process massive amounts of data quickly and efficiently. This is achieved through parallel processing, where the data is broken down into smaller chunks and processed simultaneously across multiple nodes in the cluster.

The other options are not accurate descriptions of the main value of using MapReduce:

Option 1 is incorrect because MapReduce is not designed to answer questions about future customers. It is a general-purpose data processing framework.
Option 2 is incorrect because MapReduce is not specifically designed to integrate with Microsoft Excel. While it can be used with various data sources and tools, its primary focus is on processing large data sets.
Option 4 is incorrect because MapReduce does not reduce computation to a grid search algorithm. It is a more general-purpose framework that can be used for a wide range of data processing tasks, including filtering, sorting, and aggregating data.

Which of the following statements is/are true (select all that apply):
Predictive analytics is only useful for customer retention.
Fraud detection and recommendation systems are examples of predictive analytics.
Association rule mining looks for common co-occurrences in your data.
Clustering is the process of distributing your server load to multiple computers

Fraud detection and recommendation systems are examples of predictive analytics.
Association rule mining looks for common co-occurrences in your data.

gemini-advanced-0514

The following statements are true:

Fraud detection and recommendation systems are examples of predictive analytics. Both of these use patterns in data to predict future behavior (e.g., fraudulent transactions or items a customer might like).
Association rule mining looks for common co-occurrences in your data. This technique is used to find relationships between items in large datasets, such as items frequently purchased together.

The following statements are false:

Predictive analytics is only useful for customer retention. While it is helpful for customer retention, predictive analytics has a much broader range of applications, such as fraud detection, risk assessment, inventory management, and more.
Clustering is the process of distributing your server load to multiple computers. Clustering is a machine learning technique used to group similar data points together. Load balancing is the process of distributing server load.

llama-3-70b-instruct

The correct answers are:

Fraud detection and recommendation systems are examples of predictive analytics.
Association rule mining looks for common co-occurrences in your data.

Explanation:

Predictive analytics is not only useful for customer retention. It has a wide range of applications, including fraud detection, recommendation systems, credit risk assessment, and many others.
Fraud detection and recommendation systems are indeed examples of predictive analytics, which involves using data and statistical models to make predictions or forecasts about future events or behaviors.
Association rule mining is a data mining technique that looks for common patterns or co-occurrences in data, such as “customers who buy product A are also likely to buy product B”.
Clustering is not the process of distributing server load to multiple computers. Clustering is a type of unsupervised machine learning algorithm that groups similar data points or objects into clusters based on their characteristics or features. The process of distributing server load to multiple computers is called load balancing or distributed computing.

Gemini

The following statements are true:

Fraud detection and recommendation systems are examples of predictive analytics. Predictive analytics is a broad field that uses data to predict future outcomes. Fraud detection uses historical data to identify patterns that might indicate fraudulent activity, while recommendation systems use data about user behavior to suggest products or services they might be interested in.
Association rule mining looks for common co-occurrences in your data. This is a data mining technique used to uncover relationships between different variables in a dataset.

Here’s why the other statements are not true:

Predictive analytics is only useful for customer retention. While customer retention is a common application, predictive analytics can be used for various purposes like risk assessment, market forecasting, and optimizing operations.
Clustering is the process of distributing your server load to multiple computers. Clustering is a data mining technique for grouping data points with similar characteristics. Distributing server load across multiple machines is a concept in cloud computing or distributed systems.

Data mining is:
Another name for regression
A type of data warehouse
The process of discovering new data to include in your systems
A term encompassing tools for discovering patterns in large datasets

A term encompassing tools for discovering patterns in large datasets

claude-3-opus-20240229

The correct answer is:

A term encompassing tools for discovering patterns in large datasets

Data mining refers to the process of analyzing large datasets to identify patterns, relationships, and insights that can inform decision-making. It involves using various computational techniques and algorithms to uncover hidden information within the data.

To clarify the other options:

“Another name for regression” is incorrect. Regression is a specific statistical method used for modeling and analyzing the relationship between variables, while data mining is a broader term that includes various techniques, including regression.
“A type of data warehouse” is also incorrect. A data warehouse is a centralized repository that stores structured data from various sources for reporting and analysis. While data mining can be performed on data stored in a data warehouse, they are not the same thing.
“The process of discovering new data to include in your systems” is not accurate. Data mining focuses on analyzing existing data to uncover insights, not on discovering new data to include in systems.

Therefore, the most appropriate definition of data mining is “a term encompassing tools for discovering patterns in large datasets.”

llama3

The correct answer is:

A term encompassing tools for discovering patterns in large datasets

Data mining is the process of automatically discovering patterns, relationships, and insights from large datasets, often using machine learning and statistical techniques. It involves using various algorithms and techniques to extract valuable information and knowledge from data, which can be used to make informed business decisions, improve operations, or identify new opportunities.

The other options are incorrect:

Another name for regression: Regression is a specific statistical technique used to model the relationship between a dependent variable and one or more independent variables. While regression is a type of data analysis, it is not the same as data mining.
A type of data warehouse: A data warehouse is a centralized repository that stores data from various sources, making it easier to access and analyze. While data warehouses can be used to support data mining, they are not the same thing.
The process of discovering new data to include in your systems: This is not a accurate definition of data mining. Data mining is about discovering patterns and insights from existing data, not about discovering new data to include in systems.

Gemini

Out of the choices you provided, data mining is:

A term encompassing tools for discovering patterns in large datasets

Here’s why this definition is the most accurate:

Data mining focuses on extracting hidden insights and relationships from existing data, not creating new data.
Regression is a specific statistical technique used for modeling relationships between variables, not a general term for data analysis.
Data warehouses are centralized repositories for storing large amounts of data, not tools for analyzing it.

Data mining utilizes various techniques and algorithms to uncover patterns, trends, and anomalies within large datasets. These patterns can then be used for various purposes like fraud detection, customer segmentation, and market research.

Which of the following statements is false?
Big data is a valuable complement to predictive analytics.
Clustering can be used for data-driven customer segmentation.
Data mining and big data are technical tools mostly useful for software developers
Amazon Redshift, Snowflake, and Google BigQuery are common data warehouse tools.

Data mining and big data are technical tools mostly useful for software developers

gemma-2-27b-it

The false statement is: Data mining and big data are technical tools mostly useful for software developers.

Here’s why:

Data mining and big data have broad applications: While software developers certainly utilize these tools, they are valuable across many fields like business intelligence, marketing, finance, healthcare, and scientific research.

Let’s look at why the other statements are true:

Big data is a valuable complement to predictive analytics: Big data provides the massive datasets needed to train accurate predictive models.
Clustering can be used for data-driven customer segmentation: Clustering algorithms group customers with similar characteristics, enabling targeted marketing and personalized experiences.
Amazon Redshift, Snowflake, and Google BigQuery are common data warehouse tools: These are popular cloud-based platforms designed for storing and analyzing large datasets.

llama3

The correct answer is:

Data mining and big data are technical tools mostly useful for software developers

Data mining and big data are not technical tools that are only useful for software developers. They are actually business tools that can be used by anyone who wants to extract insights and value from large datasets. Data mining and big data can be used by business analysts, data scientists, and other professionals to gain insights and make informed decisions.

The other options are true:

Big data is a valuable complement to predictive analytics: Big data can provide the large amounts of data needed to train machine learning models and improve the accuracy of predictive analytics.
Clustering can be used for data-driven customer segmentation: Clustering is a type of unsupervised machine learning algorithm that can be used to group similar customers together based on their characteristics, allowing for more targeted marketing and customer service.
Amazon Redshift, Snowflake, and Google BigQuery are common data warehouse tools: These are all cloud-based data warehousing solutions that allow users to store, process, and analyze large amounts of data.

Gemini

The false statement is:

Data mining and big data are technical tools mostly useful for software developers.

While data mining and big data do involve technical aspects, their applications extend far beyond software development.

Here’s why the other statements are true:

Big data is a valuable complement to predictive analytics. Big data provides the large and varied datasets that fuel powerful predictive analytics models.
Clustering can be used for data-driven customer segmentation. By grouping customers with similar characteristics, clustering helps businesses tailor marketing campaigns and promotions.
Amazon Redshift, Snowflake, and Google BigQuery are common data warehouse tools. These are popular cloud-based data warehouse platforms used by businesses to store and analyze large datasets.

Recommender systems, such as those used by Amazon, are best characterized as being examples of:
MapReduce
Data warehouses
Database management systems
Predictive analytics

Predictive analytics

phi-3-small-8k-instruct

Recommender systems, like the one employed by Amazon, are best characterized as being examples of Predictive analytics. These systems analyze user behavior, preferences, and other data to predict and recommend items that a user might be interested in. The algorithms used for this purpose often include collaborative filtering, content-based filtering, and sometimes a hybrid of both, along with other machine learning techniques.

llama3

The correct answer is:

Predictive analytics

Recommender systems, such as those used by Amazon, are a type of predictive analytics that use machine learning algorithms to predict user preferences and recommend products or services based on their past behavior and preferences. These systems analyze large amounts of data, including user interactions, product characteristics, and other relevant factors, to make personalized recommendations.

The other options are not accurate characterizations of recommender systems:

MapReduce: MapReduce is a programming model and software framework used for processing large data sets in parallel across a cluster of computers. While recommender systems may use MapReduce or other distributed computing techniques to process large amounts of data, this is not the primary characteristic of recommender systems.
Data warehouses: Data warehouses are centralized repositories that store data from various sources, making it easier to access and analyze. While recommender systems may use data warehouses to store and process data, this is not the primary characteristic of recommender systems.
Database management systems: Database management systems are software systems that manage and store data in a structured and organized way. While recommender systems may use database management systems to store and retrieve data, this is not the primary characteristic of recommender systems.

Gemini

Recommender systems, like those used by Amazon, are best characterized as being examples of:

Predictive analytics

Here’s why:

Predictive analytics uses data to forecast future outcomes or behaviors. Recommender systems analyze a user’s past purchases, browsing history, or ratings to predict what products they might be interested in buying or consuming in the future.
MapReduce is a framework for parallel processing large datasets, not specifically for making predictions.
Data warehouses are central repositories for storing data, not for analyzing it to predict user behavior.
Database management systems (DBMS) are general-purpose tools for storing, organizing, and retrieving data, but they don’t inherently focus on making predictions.

Artificial Narrow Intelligence, also known as weak AI:
Involves transferring knowledge from experts to a knowledge base
Can rapidly improve itself and do all things a human can do at a significant increase in speed and competency
Can do all intelligent things a human can do just as quickly and easily, although it cannot rapidly improve itself
Is AI that is very good at one specific task

Is AI that is very good at one specific task

gemma-2-27b-it

The correct answer is: Is AI that is very good at one specific task.

Here’s why:

Artificial Narrow Intelligence (ANI), or weak AI, is designed to perform a specific task exceptionally well. Think of things like:
- Spam filters
- Chess-playing programs
- Voice assistants (like Siri or Alexa)
ANI does not possess general intelligence like humans. It cannot learn and adapt to new tasks outside its programmed domain.

Let’s look at why the other options are incorrect:

Involves transferring knowledge from experts to a knowledge base: While this can be a part of developing ANI, it’s not the defining characteristic. ANI can also learn from data through machine learning techniques.
Can rapidly improve itself and do all things a human can do at a significant increase in speed and competency: This describes Artificial General Intelligence (AGI), which is still a theoretical concept.
Can do all intelligent things a human can do just as quickly and easily, although it cannot rapidly improve itself: This also describes AGI.

claude-3-5-sonnet-20240620

The correct statement about Artificial Narrow Intelligence (ANI), also known as weak AI, is:

Is AI that is very good at one specific task

Explanation:

Artificial Narrow Intelligence (ANI) or weak AI refers to AI systems that are designed and trained for a specific, narrow task. These systems excel at performing the particular task they are built for, but they cannot perform outside of their specialized domain. Here’s why this is the correct answer and why the others are incorrect:

Is AI that is very good at one specific task – Correct
This accurately describes ANI. Examples include chess-playing programs, image recognition software, or voice assistants like Siri or Alexa. They perform well in their specific domains but cannot generalize to other tasks.
Involves transferring knowledge from experts to a knowledge base – Incorrect
While this can be a part of creating some AI systems, it’s not a defining characteristic of ANI. Many ANI systems are trained on data rather than expert knowledge, and this description is too narrow to encompass all of ANI.
Can rapidly improve itself and do all things a human can do at a significant increase in speed and competency – Incorrect
This description is more aligned with the concept of Artificial General Intelligence (AGI) or even Artificial Superintelligence (ASI), not ANI. ANI systems cannot improve themselves or generalize to all human tasks.
Can do all intelligent things a human can do just as quickly and easily, although it cannot rapidly improve itself – Incorrect
This also describes a form of AGI, not ANI. ANI systems are limited to specific tasks and cannot perform all intelligent human actions.

In summary, Artificial Narrow Intelligence is characterized by its ability to perform a specific task very well, but it lacks the generality and adaptability of human intelligence or hypothetical more advanced AI systems.

llama3

The correct answer is:

Is AI that is very good at one specific task

Artificial Narrow Intelligence (ANI), also known as weak AI, refers to a type of AI that is designed to perform a specific task or set of tasks, but is not capable of general intelligence or self-awareness. ANI systems are typically designed to excel in a particular domain or task, such as playing chess, recognizing faces, or translating languages.

The other options are not accurate descriptions of ANI:

Involves transferring knowledge from experts to a knowledge base: While ANI systems may use knowledge bases or expert systems, this is not a defining characteristic of ANI.
Can rapidly improve itself and do all things a human can do at a significant increase in speed and competency: ANI systems are not capable of rapid self-improvement or general intelligence. They are designed to perform a specific task or set of tasks, but are not capable of general problem-solving or learning.
Can do all intelligent things a human can do just as quickly and easily, although it cannot rapidly improve itself: ANI systems are not capable of general intelligence or self-awareness, and are not designed to perform all intelligent tasks that humans can do. They are designed to excel in a specific domain or task, but are not capable of general problem-solving or learning.

Gemini

The most accurate description of Artificial Narrow Intelligence (ANI) is:

Is AI that is very good at one specific task

Here’s why:

ANI, also known as weak AI, is designed to excel at a single task or a narrow range of tasks. Examples include image recognition software, spam filters, or chess-playing computers. These excel in their specific domains but cannot perform other unrelated tasks.
Transferring knowledge from experts to a knowledge base is a technique used in knowledge representation and expert systems, but it’s not a defining characteristic of ANI.
Rapid self-improvement and human-level general intelligence are hallmarks of Artificial General Intelligence (AGI), which is still theoretical.
ANI cannot perform all intelligent tasks a human can do, even within its specific domain. For instance, a chess-playing AI might not understand the concept of sportsmanship or the history of chess.

Home » University of Pennsylvania » AI For Business Specialization » AI Fundamentals for Non-Data Scientists » Week 1: Big Data and Artificial Intelligence

Week 1: Big Data and Artificial Intelligence

Video: AI for Business Introduction

Video: Course Introduction

Video: Big Data Overview

Video: Big Data Analysis

Video: Data Management Tools

Video: Data Management Infrastructure

Video: Data Analysis: Extracting Intelligence from Big Data

Video: Introduction to Artificial Intelligence

Video: Machine Learning Overview

Video: Reinforcement Learning

Video: A Detailed View of Machine Learning

Module 1 Quiz

Share this:

Like this: