Skip to content
Home » IBM » Generative AI Fundamentals Specialization » Generative AI: Impact, Considerations, and Ethical Issues » Week 1: Limitations and Ethical Issues of Generative AI

Week 1: Limitations and Ethical Issues of Generative AI

This module delves into the various limitations, concerns, and ethical issues associated with generative AI. You will gain insights into the limitations related to training data and the lack of accuracy, explainability, and interpretability. You will learn about various ethical issues and concerns around the use of generative AI, including data privacy, copyright infringement, and hallucination. You will explore the potential risks and misuses of generative AI, focusing on deepfake. Finally, you will identify the legal issues and implications around generative AI.

Learning Objectives

  • Explain the limitations of generative models.
  • Describe the ethical issues and concerns associated with generative AI.
  • Describe the potential risks and misuses of generative AI.

Welcome


Video: Course Introduction

Key Focus: Exploring the impact and ethical concerns surrounding generative AI.

Concerns Covered:

  • Data Issues: Potential copyright infringement due to how models are trained, security risks with user data.
  • Accuracy & Bias: Limitations in model accuracy, explainability, and the potential for bias in the output.
  • Misuse: The dangers of deepfakes and other malicious applications.

Responsible AI Practices: Emphasizes transparency, accountability, and safety measures for both organizations and individual users of generative AI tools.

Course Structure:

  • Target Audience: Anyone interested in generative AI, regardless of background.
  • Topics: Limitations, ethics, responsible use, socioeconomic impact.
  • Format: Videos, readings, hands-on labs, quizzes, project, discussion forums, expert insights.

Call to Action: Encourages learners to understand the limitations and complexities of generative AI for ethical use.

Welcome to the impact
considerations and ethical issues associated
with generative AI video. Foundation models are creating an irreversible impact on how we build and
share information, and this is influencing our economic and
social development. Do these models train on copyrighted data
without permission? Who is accountable for any
violation of copyright laws? What kind of security
measures are in place to protect user data? To enjoy the capabilities
that generative AI offers, it’s critical to
put in controls for increased transparency and
security, among others. Organizations must use generative AI models
responsibly and honestly. Users must make
safe choices when working with generative AI
tools and applications. This course invites
all beginners, whether professionals,
enthusiasts, practitioners, or students, if you
have a genuine interest in the rapidly developing
field of generative AI. This course is for you,
a course for everyone, regardless of your
background or experience. By the end of this course, you will be able to describe the limitations of generative AI and the related concerns, identify the ethical
issues, concerns, and misuses associated
with generative AI, explain the considerations for the responsible use
of generative AI, and discuss the economic and social impact
of generative AI. As this is a focused course
comprising three modules, you’re expected to spend one to 2 hours to
complete each module. In Module 1 of the course, you’ll explore the
various limitations and concerns associated
with generative AI, such as bias, training data, and the lack of accuracy, explainability, and
interpretability. You will identify
ethical considerations such as data privacy, copyright infringement,
and hallucination, and discover the risks
associated with deep fakes. In Module 2, you’ll understand
how organizations can use generative AI responsibly by implementing measures
for transparency, accountability, privacy
and safety guard rails. You’ll also take a holistic
look at the impact of generative AI on our economic growth and
social well being. Module 3 requests
your participation in a final project and presents a graded quiz to test your understanding
of course concepts. You can also visit the
course glossary and receive guidance on the next steps in your learning journey. The course is curated
with a mix of concept videos and
supporting readings. Watch all the videos to capture the full potential of
the learning material. You’ll work on hands on labs to understand the limitations of generative AI models and participate in a final
project in Module 3. There are practice
quizzes at the end of each lesson to help you
reinforce your learning. At the end of the course, you’ll also attempt a graded quiz. The course also offers
discussion forums to connect with the course staff and interact with your peers. Most interestingly, through
the expert viewpoint videos, you’ll hear experienced
practitioners talk about the on ground
realities related to ethical considerations and
limitations of generative AI. As Albert Einstein said, once we accept our limits, we go beyond them,
let’s get started.

Limitations, Concerns, and Issues of Generative AI


Video: Limitations of Generative AI

Major Limitations of Generative AI

  • Training Data Dependency:
    • Limited or outdated data leads to limited or inaccurate outputs.
    • Obtaining quality data, especially for specific domains, can be difficult and costly.
  • Contextual Understanding:
    • Struggles to analyze information beyond the scope of its training data, hindering its ability to handle novel situations.
  • Can’t Fully Replace Human Creativity:
    • Lacks genuine originality and ability to think critically. Can’t evaluate abstract concepts (like humor) the way humans can.
  • Lack of Explainability:
    • Often functions like a “black box,” making it hard to understand how the model reaches conclusions. This raises questions about reliability.

Key Takeaway: While generative AI is powerful, these limitations highlight the continued importance of human judgment and oversight when using these tools.

Welcome to Limitations
of Generative AI. After watching this video, you’ll be able to describe generative AI’s
limitations regarding training data and identify other significant limitations
of generative AI. Generative AI’s capability
to generate new well presented and convincing
content has led to its profound use in
commercial and social sectors and diverse industries. However, before it
can be considered a fully functional
business utility, it’s essential to address the prominent limitations
of generative AI. These limitations can be probable hindrances for
businesses developing AI, and constraints
for organizations and individuals
using generative AI. Let’s explore a few major
limitations of generative AI. The fundamental limitation of generative AI is related
to training data. Generative AI models are not
repositories of information. Instead, they strive to
generate and replicate the knowledge based on the data they have
been trained on. The limitations in training data influence the output
produced by the models. For example, if the
training data set for an image generating model
is limited in scope, the generated images will also be limited in
terms of scope. For instance, if
you ask the model to generate a picture of a cat, it might produce a generic
looking cat without the distinctive features of a particular breed
or individual cat. Many generative AI models are trained on data
with cutoff dates, resulting in outdated or
incorrect information. This leads to the
inability of the models to provide answers about current
information and events. For example, GPT 3.5 is trained on data
until January 2022. Accordingly, when prompted to generate information based on a fact or an event that
occurred after January 2022, the model does not
provide the response or generate an inaccurate
or fictitious response. In such cases, the AI
models are expected to clearly specify the cutoff date on which its response is based. In addition, many generative AI models are not connected to the Internet and
cannot update or verify the information
they generate. With training data
as the main source of the generative AI
models knowledge, there is a requirement for
high quality training data. The training data is expected
to be rich in information, updated, accurate,
and free from biases. The extensive data sets
required for training may not always be readily
available or very costly. For example, for
specific domains like rare medical
conditions or diseases, obtaining sufficient data
may be difficult and costly due to various reasons
such as scarcity of cases, considerations for
patient privacy, and requirements for
specialized medical expertise. Generative AI models require substantial cost computational resources and training time. This can be a concern
for businesses developing and
training AI models. To mitigate the difficulties
regarding data reliance, researchers are working on
data augmentation techniques aimed at generating additional training data from
limited samples. Generative AI excels at analyzing data to identify
patterns and trends, using them to
generate new content. However, it has limitations in understanding the context
when presented with new data, information, or solutions beyond the scope of its
training parameters. For example, let’s consider a generative AI model trained to assist in
legal document review. It’s been trained on
extensive legal documents, and excels at identifying common legal clauses and
providing relevant suggestions. However, if you present a
novel, complex legal case, the model may struggle to
offer insightful analysis as it lacks the context and
training for such unique, and unprecedented
legal scenarios. Generative AI cannot completely replace human creativity
or critical thinking. It can be creative only within the boundaries of
its training data. It does not possess
the ability to invent an entirely new idea. It cannot think outside the box. An example to consider
is that it can produce information
on an idea or debate, it cannot evaluate which side has more depth or credibility. It cannot recognize
abstract concepts such as humor or sarcasm. All things that
require a human touch. From the user’s perspective
a significant limitation of generative AI is its lack of explainability and
interpretability. Generative AI models are often considered to be
complex and opaque. It may be challenging
for users to comprehend how models
generate content, make predictions, or arrive
at specific decisions. This lack of transparency and predictability raises
concerns about the accountability
and reliability of AI generated outputs. In this video, you learned
about the limitations of generative AI that can be constraints for
its wide adoption. The fundamental limitations of generative AI is related
to training data. The extensive datasets
required for training may not always be available or may
be outdated or inaccurate. The limitations in training data influence the output
produced by the models. Generative AI’s
understanding and creativity are limited
to its training data. Generative AI cannot completely replace human creativity
or critical thinking. Generative AI lacks explainability
and interpretability. This lack of transparency
and predictability raises concerns about the
accountability and reliability
of generative AI.

Video: Issues and Concerns About Generative AI

Key Issues with Generative AI

  • Inaccuracies and Biases:
    • Training data may contain errors or outdated information, leading to inaccurate output.
    • Poorly sampled training data can perpetuate harmful stereotypes and biases.
  • Data Privacy and Security:
    • Sensitive user or organizational data fed into generative AI models can be leaked.
    • This can include personally identifiable information (PII) or confidential data.
  • Copyright Infringement and Ambiguity:
    • AI-generated content might reproduce copyrighted material without permission.
    • Questions remain about who owns the copyright of AI-generated work – the user, the AI creator, or no one?

Reasons for These Issues

  • Limitations in Training Data: Errors or biases within the training data directly affect the AI’s output.
  • Sensitive Data Usage: Many models use massive amounts of data, some of which could contain private information.
  • Lack of Clear Regulations: The legal landscape surrounding AI-generated content and ownership is currently ambiguous.

What Can Be Done

  • Careful Data Selection: Ensure training data is diverse, representative, and free of bias where possible.
  • Privacy Protection: Implement measures to protect sensitive data used in AI models.
  • Ethical Development: AI creators need to establish ethical guidelines for themselves.
  • User Awareness: Users should be cautious about submitting sensitive data and be critical of the AI’s output.

Welcome to issues and
concerns about generative AI. After watching this video, you’ll be able to explain the common issues
and ethical concerns around generative AI, identify the reasons
for the generic issues and concerns around the
use of generative AI. Research conducted by advisory
organizations points to the growing impact
of generative AI across diverse sectors
and industries. Gartner predicts that
generative AI will account for 10% of all
data produced by 2025. This large scale adoption of generative AI is the result of the diverse and groundbreaking capabilities of generative AI, including natural
language understanding, content creation, image
synthesis, and problem solving. However, the accelerated
capabilities and resulting adoption of generative AI give rise to some critical issues
and ethical concerns. The most prevalent concerns include inaccuracies and biases, data privacy and security, copyright infringement,
and copyright ambiguity. Let’s discuss these concerns. Let’s say we have a
generative model trained on a large data set of
articles until 2021. When prompted to generate
an article about the discovery of a new planet in our solar system in 2023, the model generated a
fictional, scenario-based, and factually inaccurate
article based on its understanding of astronomy and scientific discoveries. The limitation here is that generative AI lacks
the ability to verify the accuracy of truthfulness of the
information it generates. If the training data
contains inaccuracies, the content generated by the model will
reflect those errors. Also, most generative AI models are pretrained on
vast amounts of data, but are not dynamic
in terms of keeping up with the latest facts
or new information. Apart from inaccuracies, training data can also
be subject to biases. Biases may occur when the
training data is poorly sampled and does not accurately
reflect the real world. The common pattern of biases
that can be visible in a generative AI model is negative or outdated
stereotypes and discrimination. For example, an image generative AI tool may generate images of a middle-aged white man when prompted to generate
an image of a CEO. Stereotypes related to gender, race, religion, or
other demographics, political bias, cultural
and language biases, and historical beliefs or
practices have also been noted. To avoid or reduce these biases, it’s important to ensure that the training data is
diverse and representative. Developers and users must
work to address this through careful data selection and fine-tuning and evaluation
and improvement processes. Next, let’s focus
on data privacy and security aspects related
to generative AI. Any data or queries
you prompt into open source AI models can
be used as training data. This data may include
sensitive information, including personally
identifiable information, or PII, or confidential
information of an organization. As a result, the generative
AI model may reveal a user’s personal or an
organization’s confidential data both by mistake and by design as part of its output
and make it public. The data privacy and
security concerns can be extended to copyright
and legal exposure. The generative AI models can
reveal the original data and content of creators or organizations as part
of their output, which can be used
and explored by other users and organizations. Accordingly, a model
can infringe upon the copyrights and
intellectual property rights, or IPRs, of users
or other companies. In addition, generative
AI models may generate content that includes copyright
elements such as logos, trademarks, or
copyrighted images. Using such content for
commercial purposes without permission can
result in legal action. Another related concern
is copyright ambiguity. That is, the ownership
of AI-generated content. This determines who
has the authorship of the content and creative works generated through AI models. There are a few primary
questions to consider. Firstly, who would have ownership
of the created content? For example, consider an
AI-generated image based on or similar to the
famous painting of Mona Lisa by Leonardo Da Vinci. Who owns the ownership and title for such generated
creative work? The artist of the
original image, the organization
that owns and trains the AI model that
generated this image, or the user who prompted the model to
generate such image? Moreover, is it even fair and ethical for generative AI to create art or content that closely resembles others’ work? Or another question
to consider is, should the content created by AI be eligible for
copyright protection? You could say that
it’s not because it’s not the output of
human creativity. However, one could also argue
that it is because it is the output of algorithms and programming together
with human input. Determining the ownership
and licensing of AI-generated content is
currently a questionable topic. Currently, there are only a few legally mandated regulations regarding the development or
use of AI-generated content. Accordingly, AI
developers should take the lead and develop their own ethical
generative AI policies to protect themselves
and their customers. For example, they
must ensure that the training data is
representative and unbiased. Confidentiality of users’
personal information and their data must be
respected during the training, development, and deployment
of generative AI models. As a user, it’s
challenging to ensure that you’re using
generative AI ethically. However, a few
considerations can be considered while using
a generative AI tool. For example, don’t input information that is
confidential, sensitive, or constitutes intellectual
property violation and analyze the output for factual errors and biased or
inappropriate statements. In this video, you
learned about some of the common ethical concerns and issues around generative AI. These concerns include
inaccuracies and biases, data privacy and security, copyright infringement,
and copyright ambiguity. A few reasons for
these issues include the limitations or
inaccuracies in training data, the use of sensitive,
confidential, or personally
identifiable information for training the model, and the lack of legally
mandated regulations regarding the development or
use of AI-generated content.

Video: Hallucinations of Text and Image Generating LLMs

What are “hallucinations” in AI-generated text and images?

  • Hallucinations are when AI models produce content (text or visuals) that is factually incorrect, nonsensical, or simply doesn’t exist in the real world.
  • Examples: A news article with false information, a poem with meaningless word combinations, an image of a winged cat.

Why do AI models hallucinate?

  • Imperfect Training Data: AI models learn from massive datasets, which may contain errors or inconsistencies. The models then reproduce these issues.
  • Overfitting: Models become too focused on the specific training data and struggle to generalize to new information, leading to inaccurate output.
  • Optimization for Plausibility: The focus is on creating statistically likely output, not necessarily factually correct output.

Negative Impacts of Hallucinations

  • Misinformation: Hallucinations can spread false or misleading information, which is harmful.
  • Harm to Individuals: Fake content can be used for bullying, harassment, or propaganda.
  • Erosion of Trust: Hallucinations make it harder to rely on AI output, limiting its usefulness.

Techniques to Reduce Hallucinations

  • Better Datasets: Training on curated, fact-checked datasets helps.
  • Improved Training Methods: Researchers are developing methods to make AI models more reliable.
  • Fact-checking Tools: AI-powered fact-checking can be used to identify inaccuracies in generated content.
  • Prompt Engineering: Carefully designing input prompts for AI models can guide them towards more accurate output.
  • Ensemble Techniques: Combining the output of multiple models helps reduce individual model errors.

Key Takeaway: Hallucinations are a significant challenge in generative AI. Researchers are working on solutions to make these models more reliable and trustworthy.

Welcome to Hallucinations of Text and Image Generating LLMs. After watching this video, you’ll be able to
explain the phenomenon of hallucination in text and image generation by LLMs and its negative implications
for individuals and society. You’ll also learn about
the various techniques employed to reduce the
risk of hallucination. In text and image generation, hallucinations occur when
generative AI models produce content
that lacks a basis. Sometimes, large
language models or LLMs generate text that is factually inaccurate or lacks coherence. For example, a
language model might generate a news
article that contains fabricated information or a poem that uses words in a
meaningless manner. It might even create images containing objects or
scenes that do not exist. For example, an image
generation model might generate a picture of a cat with wings or a landscape with
floating mountains. Now, let’s understand why text or image generating
LLMs hallucinate. LLMs are trained on massive datasets of
texts and images, which are likely to contain
errors and inconsistencies. Since LLMs are trained to
maximize the likelihood, there’s a greater chance
that they generate text or images similar to the
data they were trained on, even if that data is
inaccurate or unrealistic. Finally, LLMs are complex systems with
various parameters, and it’s difficult to
ensure that they will always generate accurate
and realistic content. Even small changes in these parameters can
lead to hallucinations because of overfitting or uncertainty in the
correctness of the content. Let’s understand this in detail. Overfitting occurs
when a model becomes excessively proficient at learning from the training data, but struggles to apply this
knowledge to new data, potentially causing
the model to generate novel content that was not a part of the training dataset. Generative AI models are
frequently trained with the objective of optimizing
the content they generate. This can result in the model
producing content that may be statistically plausible
but factually incorrect. Hallucinations in text and image generation can also
have negative implications. For example, hallucinations
can spread misinformation, which can be used to create harmful and offensive content. This can make it
difficult to trust the output of
generative AI models, limiting their usefulness
in real-world applications. Hallucinations can
harm the image of an individual or cause
chaos in society as it can lead to the creation of fake news articles or images that can be used to bully or harass
someone or spread propaganda. There’s no single solution
to the problem of hallucination in text
and image generation. However, by combining
different approaches, it is possible to significantly
reduce the risk of hallucination and
improve the reliability of generative AI models. Let’s discuss these
techniques individually. Training LLMs on
curated datasets. The Allen Institute
for AI has developed a carefully curated
dataset known as common sense open-ended
plausible answers, or COPA, which serves the purpose of training
LLMs to engage in common reasoning and preventing the generation of factually
inaccurate statements. Developing new training methods. Researchers at Google AI have pioneered a novel
training approach named contrastive learning that
can be employed to instruct LLMs in generating output that is more accurate
and coherent. Using post-processing
techniques, the Facebook AI Research
or FAIR team has introduced a post
processing method known as factcheck.net, which serves the
purpose of detecting inaccuracies and
generated texts. Crafting prompts through
prompt engineering. Prompt engineering is
the process of crafting effective prompts that are used as inputs for
generative AI models. Precision in constructing
these prompts can minimize the likelihood of the model generating hallucinating output, regulating the
diversity of output. Temperature sampling is a
method employed to regulate the diversity of output produced
by generative AI models. When the temperature is raised, the model is more likely to produce a wider range of output, but this also increases the risk of generating
hallucinatory information. Conversely, lowering
the temperature reduces the likelihood
of hallucination, but limits the
diversity of output. Combining the output of
generative AI model. Ensemble generation is a
technique where output from multiple generative AI models is combined to produce
the final result. This can help reduce the
risks of hallucination, as the final output
is less likely to contain hallucinations
from all the models. In this video, you
explored the phenomenon of hallucinations in text and
image generation by LLMs. Hallucinations occur when
generative AI models produce content
that lacks a basis, which can lead to
the generation of factually incorrect or
incoherent text and images. These hallucinations have
significant implications, including the spread
of misinformation and potential harm to
individuals and society. To mitigate these
negative implications, various techniques are employed, including training
on curated datasets, novel training methods, post-processing techniques,
prompt engineering, and regulating the
diversity of output. The goal is to improve
the reliability of generative AI models and reduce the risk
of hallucination, enhancing their usefulness
in real-world applications.

Video: Hallucinations of Code-Generating LLMs

What are “Hallucinations” in Code Generation?

  • Hallucinations are instances where a code-generating LLM (Large Language Model) produces code that is wrong, nonsensical, or doesn’t match the instructions.

Why Hallucinations Happen:

  1. Ambiguous Instructions: Natural language can be vague, confusing the LLM.
  2. Long-term Context: LLMs struggle to track all the information needed for complex, lengthy code generation tasks.
  3. Complex Language Rules: LLMs can misinterpret the細かい details of programming language syntax.

Impact on Developers

  • Incorrect and Unsafe Code: Hallucinations can lead to bugs and security holes in software.
  • Legal and Ethical Concerns: Who’s responsible if hallucinated code causes harm (e.g., in a medical device)?
  • Bias Amplification: Hallucinations can reflect biases in the LLM’s training data, leading to unfair software.

How to Minimize Hallucinations

  • Clear Documentation: Be transparent about model limitations.
  • Careful Prompting: Guide the LLM with precise instructions.
  • Reduce Training Bias: Ensure the data the LLM learns from is fair and representative.
  • Robust Error Handling: LLMs should be able to detect and flag potential errors.
  • Collaboration: LLM developers and software engineers need to work together to address the problem.

[MUSIC] Welcome to Hallucinations
of Code Generating LLMs. After watching this video, you’ll be
able to describe hallucinations of code generating LLMs and
explain why they occur. You’ll also be able to discuss their
implications for developers and users and what measures can be taken
to prevent their occurrence. Code generating large language models,
where LLMs have become increasingly popular with developers because of several
benefits, such as speed, automation, debugging, and so on. However, one must also consider the
potential challenges and risks associated. In this video, you’ll specifically
explore one such critical risk, the phenomenon of hallucinations
of code generating LLMs. Hallucinations in code generating LLMs
refer to the generation of code that is incorrect, nonsensical, or
irrelevant to the given prompt. Why do they occur? Let’s try to understand. First, because of ambiguity in
natural language, instructions for creating code are sometimes vague. LLMs become confused
by ambiguous language, resulting in incorrect code production. Here’s an example of generating
simple Python code through Chat GPT, a tool based on GPT. When you enter a text prompt,
generate a Python function to check if a number is even,
Chat GPT generates the Python code for it. When you test this code
in the Python editor, you can see here that the code
is executed correctly. Let’s see what happens if you
enter the number 12.2. Here, the LLM should ideally
generate an exception. However, it classifies 12.2 as odd. This function is incorrect
because the LLM has hallucinated that only
an integer would be an input. To generate the correct code, you must provide
a clear prompt, specify the programming language, and provide other relevant
requirements and constraints. Second, LLMs can deal with
short-term context, but could have trouble handling
long-term dependency. As a result, individuals risk losing track
of important information during prolonged code generation activities
leading to hallucinations. For instance, if the task is write a Python program to
calculate the sum of numbers in a list, the input is write a program to
calculate the sum of numbers in a list. The generated output will look like this. This code is correct for the given task. However, if the function is called
with a very long list of numbers, the LLM might be unable
to track the total sum. This could result in the function
returning an incorrect result, leading to code hallucinations. Third, programming languages have complex
syntax and semantics, and LLMs may misread certain language constructs,
resulting in hallucinatory code. For example, suppose the task is
generate a function that is able to sort a list of numbers in ascending order. The generated output will look like this. This function looks correct at first,
but does not work correctly. This is because the LLM does not
understand the subtle nuances of the sorting algorithm. The function does not compare each
number to every other in the list. Instead, it only compares each
number to those that came after it. This means that the function will not
work correctly if the list is sorted in descending order. What is the impact on developers when code
generating LLMS start to hallucinate? Let’s look at the implications. To begin with, hallucinations can result
in the generation of incorrect and unsafe code. This poses a risk as it may introduce
bugs or vulnerabilities in the software. For example, suppose the task is
write a Python program to open and write to a file safely. The hallucinatory, incorrect and unsafe output is in this example the
generated code suggests that opening and writing to a file is a secure operation,
which is a hallucination. This code may lead to incorrect and unsafe file operations as it
doesn’t follow best practices for handling errors or issues resulting in
data loss or security vulnerabilities. Further, if hallucinated code leads
to negative outcomes, legal and ethical questions could arise about
the responsibility of developers and AI model creators. Imagine a scenario where an AI model
hallucinates code for a medical device. The outcome is that the device
misinterprets patient data, leading to incorrect treatments. The possible legal and
ethical questions are, who is liable for the medical errors caused
by the hallucinated code? Is it the developers who
deployed the AI model or the creators of the AI model itself? Another possible consequence of
hallucinations influenced by biased training data is that the code
generated could perpetuate existing biases leading to discriminatory
practices in software development. Imagine this, an AI model hallucinates
code for a resume screening tool. The outcome, the tool disproportionately
rejects resumes from female candidates. The consequence, the generated code
perpetuates gender bias in the training data, leading to discriminatory practices
in software development by unfairly disadvantaging female job applicants. How can we address these risks? Let’s see. Developers can provide users with clear
documentation about the limitations of their models,
including hallucination risks, to enable them to make informed decisions. Efforts should be made to use prompting
techniques that help the LLMs generate code relevant to the given task. Further, minimizing biases
in training data and ensuring the generated code adheres
to ethical principles is essential. LLMs should have robust error
handling mechanisms to detect and flag potential hallucinations
in the output. Finally, a close collaboration between
LLM developers and software developers can help identify and rectify
hallucinations in real world applications. In this video, you learned that code generating LLMs
can sometimes produce hallucinations, which are factually incorrect responses
divergent from the intended output. You learned that the reasons for
these hallucinations could be natural language ambiguity, long-term dependency,
and complex semantics. Finally, you learned how to minimize the
occurrence of these hallucinations through documentation, unbiased training data,
strong error handling systems, and a close partnership between the LLM and
software developers. [MUSIC]

Video: AI Portraits and Deepfakes

AI Portraits:

  • Creation: AI models analyze vast datasets of images to learn artistic techniques and facial features, enabling them to generate portraits from scratch, enhance existing images, or apply artistic styles.
  • Techniques: Common methods include GANs (which pit a generator against a discriminator to refine results) and Style Transfer (applying the style of one image to another).
  • Benefits: AI portraits can preserve cultural heritage, reimagine historical figures, and offer new artistic avenues.

Deepfakes:

  • Definition: AI-manipulated videos, images, or audio designed to appear real.
  • Dangers: Deepfakes can be used for malicious purposes like spreading misinformation, harassment, fraud, and identity theft.

Ethical Concerns:

  • Misuse and Deception: The realistic nature of AI-generated content raises concerns about potential misuse for malicious intent.
  • Creative Control: Artists may face limitations in fully controlling the creative output of AI models.
  • Bias: AI models trained on biased datasets can perpetuate societal biases in generated portraits.

Welcome to AI portraits and deepfakes. After watching this video, you’ll be
able to describe AI portraits and explain the techniques of
creating AI portraits. You’ll also learn about deepfakes and
their potential misuses. Generative AI has led to remarkable
progress in various fields, including art. One such exciting progress is how
generative of AI models leverage deep learning algorithms to generate
realistic portraits of individuals. The model can create, enhance or
manipulate portraits in various ways. Like creating entirely new portraits
of individuals who do not exist, artificially aging or
de-aging a person’s face and photographs. And applying the artistic style of
famous painters to create unique and artistic effects. AI portraits are generated by machine
learning algorithms using a diverse data set of artistically curated images. The algorithm analyzes these images
to learn and replicate human artistic techniques by identifying key elements
like facial features, colors, textures and brushstrokes. With this knowledge, the generative
AI model can produce unique and realistic portraits. Allowing users to customize the mood,
style and composition of the image generated
to match their preferences. There are various techniques that can be
employed to create realistic AI portraits. Let’s learn about these
techniques individually. GANs, or Generative Adversarial Networks
consist of two components. A generator that creates fake images and
a discriminator that attempts to differentiate between real and
artificially generated images. Throughout its training, the discriminator enhances its
ability to detect counterfeit photos. While the generator refines its skill
in crafting authentic portraits. The generator and the discriminator
thus produce AI portraits that appear indistinguishable from traditional
hand drawn or photographed portraits. Style transfer is another frequently used
technique for crafting AI portraits. The method involves transferring the
artistic style of one image to another, resulting in a unique and
visually captivating portrait. Artists can employ this technique
to experiment with various styles from impressionism to cubism. And apply them to produce portraits
with a fusion of classic art and modern technology. Datadriven strategies are also
helpful in creating AI portraits. Through the examination of extensive data
sets of human faces, generative models acquire knowledge about a wide range of
facial features, expressions, and styles. These data sets play a vital role in
instructing the model to create portraits that capture the diverse spectrum
of human appearances and emotions. Surpassing the constraints of
traditional artistic methods. AI portraits offer exciting possibilities
like conserving our cultural heritage by commemorating individuals and
their narratives. And reconnecting with our history to pay
tribute to the numerous aspects of human heritage. But it also raises ethical concerns,
including misuse and deception. Generative AI can be misused
to create deepfakes, which can have serious consequences. Deepfakes are videos, images or audio
recordings that have been manipulated using Generative AI models to make
it appear as if they are real. Deepfakes can be very convincing,
and they can be used for a variety of bad purposes like
creating fake news and propaganda. Spreading, disinformation, harassing or
blackmailing people and committing fraud. Deepfakes can be used to create forged
identity documents or photos for fraudulent purposes, such as opening
bank accounts, obtaining loans, or gaining access to secure locations. This can lead to financial fraud and
other security breaches. Cybercriminals can use deepfake videos to
impersonate trusted individuals such as CEOs of organizations. And create convincing messages to
trick employees into revealing sensitive information, transferring funds,
or downloading malicious files. Another challenge with AI assisted
portraits is the lack of creative control. As the results produced by
the model are based on a data set, it is difficult to create something that
significantly deviates from the pattern established by machine
learning algorithms. Therefore, in some cases, the output does not align with
the creative vision of the artists. Moreover, while generative AI models
can produce visually stunning images, capturing the sole essence of the subject
is still an ongoing endeavor. As the models are trained via datasets, ensuring the AI-generated portraits
represent diverse individuals. And avoid perpetuating biases in gender,
race and other factors is a crucial
area of improvement. In this video, you learn that AI portraits
are created using generative AI models, which are trained on vast
datasets of curated images. These models can generate realistic
portraits of individuals from scratch or by enhancing or
manipulating existing photographs. AI portraits have the potential
to preserve cultural heritage and create new artistic possibilities. However, deepfakes will raise ethical
concerns, such as the potential misuse and deception. It’s essentially to develop
ethical guidelines and new technologies to detect deepfakes. [MUSIC]

Video: Enhancing LLM Accuracy with RAG

Summary:

Marina Danilevsky, a Senior Research Scientist at IBM Research, introduces Retrieval-Augmented Generation (RAG), a framework designed to enhance the accuracy and relevance of large language models (LLMs).

Key Points:

  1. Limitations of LLMs:
  • LLMs generate text based on training data, which can be outdated or inaccurate.
  • They often provide confident but incorrect answers, lacking up-to-date sources.
  1. Personal Anecdote:
  • Danilevsky shares how she inaccurately answered a question about which planet has the most moons, using outdated information.
  • This highlights common issues with LLMs: lack of sourcing and outdated knowledge.
  1. RAG Framework:
  • Combines retrieval and generation to improve responses.
  • Steps:
    1. User Prompt: The user asks a question to the LLM.
    2. Retrieval: The LLM queries a content store (internet, documents, policies) for relevant information.
    3. Generation: Combines the retrieved content with the user’s question to generate a response.
  1. Benefits of RAG:
  • Up-to-Date Information: Keeps responses current by updating the data store.
  • Source-Based Answers: Reduces hallucinations by sourcing evidence from primary data.
  • Increased Accuracy: Helps LLMs recognize when they don’t have sufficient information, reducing misinformation.
  1. Challenges:
  • Ensuring high-quality retrieval remains critical for accurate answers.
  • A weak retriever might prevent answerable questions from being correctly addressed.
  1. Future Work:
  • Improving both the retriever and generative components to provide accurate and comprehensive responses.

Danilevsky concludes by emphasizing the importance of RAG in making LLMs more reliable and accurate.

[MUSIC] Large language models,
they are everywhere. They get some things amazingly right and
other things very interestingly wrong. My name is Marina Danilevsky, I am a Senior Research Scientist here at
IBM Research, and I want to tell you about a framework to help large language models
be more accurate and more up to date. Retrieval-augmented generation, or RAG. Let’s just talk about
the generation part for a minute. So forget the retrieval augmented. So the generation, this refers to
large language models, or LLMs, that generate text in response to
a user query referred to as a prompt. These models can have some
undesirable behavior. I want to tell you an anecdote
to illustrate this. So, my kids,
they recently asked me this question, in our solar system,
what planet has the most moons? And my response was, that’s really great
that you’re asking me this question. I loved space when I was your age. Of course, that was like 30 years ago. But I know this, I read an article, and the article said that it was Jupiter and
88 moons. So that’s the answer. Now, actually, there’s a couple
of things wrong with my answer. First of all, I have no source
to support what I’m saying. So even though I confidently said I
read an article, I know the answer. I’m not sourcing it. I’m giving the answer
off the top of my head. And also, I actually haven’t
kept up with this for a while. And my answer is out of date. So we have two problems here. One is no source, and the second
problem is that I am out of date. And these, in fact, are two behaviors
that are often observed as problematic when interacting with large language
models, they are LLM challenges. Now, what would have happened if
I’d taken a beat and first gone and looked up the answer on
a reputable source like NASA? Well, then I would have been able to say,
okay, so the answer is Saturn with 146 moons. And in fact, this keeps changing because
scientists keep on discovering more and more moons. So I have now grounded my answer
into something more believable. I have not hallucinated or
made up an answer. And by the way, I didn’t leak personal
information about how long ago it’s been since I was obsessed with space. All right, so what does this have
to do with large language models? Well, how would a large language
model have answered this question? So, let’s say that I have a user
asking this question about moons. A large language model would
confidently say, okay, I have been trained, and
from what I know in my parameters during my training, the answer is Jupiter. The answer is wrong, but we don’t know. The large language model was very
confident in what it answered. Now, what happens when you add this
retrieval-augmented part here? What does that mean? That means that now, instead of
just relying on what the LLM knows, we are adding a content store. This could be open like the Internet. This could be closed like
some collection of documents, collection of policies, whatever. The point,
though now is that the LLM first goes and talks to the content store and says, hey, can you retrieve from me information that
is relevant to what the user’s query was? And now with this
retriever augmented answer, it’s not Jupiter anymore
we know that it is Saturn. What does this look like? Well, first, user prompts the LLM with their question. They say, this is what my question was. And originally, if we’re just
talking to a generative model, the generative model says,
okay, I know the response. Here it is.
Here’s my response. But now, in the RAG framework,
the generative model actually has an instruction that says, no,
first go and retrieve relevant content. Combine that with the user’s question, and only then generate the answer, so
the prompt now has three parts. The instruction to pay attention to the
retrieved content together with the user’s question, now give a response. And in fact, now you can give evidence for
why your response was what it was. So now hopefully you can see how does RAG
help the two LLM challenges that I had mentioned before. So, first of all,
I’ll start with the out of date part. Now, instead of having to retrain your
model, if new information comes up, like, hey, we found some more moons. Now it’s a Jupiter again,
maybe it’ll be Saturn again in the future. All you have to do is you augment
your data store with new information, updated information. So now the next time that a user comes and
asks the question, we’re ready, we just go ahead and
retrieve the most up to date information. The second problem, source. Well, the local language model
is now being instructed to pay attention to primary source data
before giving its response, and in fact, now being able to give evidence. This makes it less likely to hallucinate
or to leak data because it is less likely to rely only on information that
it learned during training. It also allows us to get the model to have
a behavior that can be very positive, which is knowing when to say,
I don’t know. If the user’s question cannot be reliably
answered based on your data store, the model should say, I don’t know. Instead of making up something that is
believable and may mislead the user. This can have a negative effect as well
though, because if the retriever is not sufficiently good to give
the large language model the best, most highest quality
grounding information, then maybe the user’s query that is
answerable doesn’t get an answer. So this is actually why lots of folks,
including many of us here IBM, are working the problem on both sides. We are both working to improve
the retriever to give the large language model, the best quality data on which
to ground its response, and also the generative part so that the LLM
can give the richest, best response finally to the user when
it generates the answer. Thank you for learning more about RAG Thank you. [MUSIC]

Video: Legal Issues and Implications of Generative AI

The video discusses the legal challenges and implications of generative AI, particularly focusing on issues like identity fraud, misinformation, copyright infringement, data privacy violations, cyber warfare, and discriminatory practices. Key points include:

  1. Legal Concerns:
  • Generative AI can infringe on data privacy and copyright.
  • Algorithmic bias may violate affirmative action laws.
  • Ownership of AI-generated content is ambiguous.
  • Deep fakes can manipulate identities and spread false information, contributing significantly to identity fraud.
  • AI can be used to create cyber weapons.
  1. Regulatory Responses:
  • European Union: The AI Act (pending) and GDPR offer some protections.
  • Canada: The Artificial Intelligence and Data Act and a voluntary code of conduct aim to regulate AI use.
  • United States: Copyright law can address AI-generated content if it closely resembles existing works, but human authorship is required for copyright claims. Deep fakes are not illegal but can be challenged under cyberstalking laws.
  • United Kingdom: The Online Harms Bill seeks to criminalize sharing deep fake pornography. Existing laws on defamation, data protection, privacy, and harassment are used against AI-related crimes.
  • India: The Ministry of Electronics and Information Technology oversees AI regulation. Existing laws under the IT Act and the Digital Personal Data Protection Act can be used against deep fakes and unauthorized data distribution.
  1. Challenges in Regulation:
  • Rapid advancements in AI make it hard to regulate effectively.
  • Balancing innovation and regulation is crucial to avoid stifling technological benefits like professional upskilling, service efficiency, and creativity.
  1. Potential Solutions:
  • AI can potentially solve its problems, such as using AI to detect deep fakes or blockchain to verify content authenticity.
  • Collaboration between AI firms, social media companies, tech giants, government bodies, and citizen groups is essential for creating effective AI legislation.

The video concludes by emphasizing the need for balanced and progressive AI regulations that protect against misuse while fostering innovation.

Welcome to the legal issues and implications of
generative AI video. After watching this video, you’ll be able to identify the
legalities associated with generative AI list laws that can help fight against
AI powered fraud. And discuss the direction
AI regulation can take. Generative AI capabilities,
even on their best day, are raising some serious
legal questions. For instance, our foundation
models training on content that violates data
privacy and copyright laws. Is algorithmic bias going against affirmative
action legislation? Who owns the copyright
to AI generated text, videos, music, and images? More disturbingly,
the emergence of deep fakes has shown us
that AI can be used to manipulate a person’s
voice and appearance to generate false information
and commit identity fraud. Did you know that
research has shown that AI powered fraud is the top reason for all
identity fraud committed? There’s always the
added danger that generative AI can be used
to develop cyber weapons. According to Forbes.
If ransomware has been detected by
a cybersecurity tool, ChatGPT can generate a different algorithm
to avoid detection. Therefore, a lot of what
foundation models can accomplish must be legally regulated to guard
against identity fraud, misinformation,
copyright infringement, data privacy violations, cyber warfare, and
discriminatory practices. This brings us to a
critical question. What are governments
worldwide doing to regulate the use of foundation
models and generative AI? The first AI legislation
in the world is the European Union’s AI Act, which has yet to be
enacted into law. However, the EU’s general
data protection regulation gives individuals more
control over their data, which can be used to challenge any unauthorized legal
use of private data. Canada’s Artificial
Intelligence and Data Act helps regulate
companies using AI. Canada’s voluntary
code of conduct on the responsible
development and management of advanced
generative AI systems identifies measures
companies can apply. What does the law
have to say when an AI generated output resembles an existing
piece of work? US case law states that
copyright owners may be able to prove that such outputs infringe their copyrights. If the AI program, both one had access to their
works and two, created substantially
similar outputs. However, when a US citizen
tried to copyright a visual artwork
that was authored autonomously by an AI program, the Copyright office
denied his application, saying that human authorship is an essential part of a
valid copyright claim. Deep fakes are unethical,
but not illegal. One can use the provisions
in the cyberstalking law, but it cannot be easy to trace the origins of the deep fake. Plus the victim
has to prove that the creator of the deep
fake intended to harm. That’s why the Pentagon
is working with research institutions under its Defense Advanced
Research Projects Agency, or Darpa, to develop
tech that can detect deep fake videos. In
England and Wales, there’s no legislation
that governs AI directly. However, the Online
Harms bill seeks to make sharing deep fake pornography
a criminal offense. In the UK, citizens are encouraged to use existing
laws for defamation, data protection, privacy and harassment to counter
AI induced crime. In India, the Ministry
of Electronics and Information
Technology regulates AI. While there is no legislation
against deep fakes, a victim of a
malicious deep fake can file a complaint
using the provisions in existing laws under
section 66 of the IT act. Capturing, publishing,
or transmitting a person’s image in mass media without their consent
is punishable. The Digital Personal Data
Protection Act also protects the unauthorized distribution of sensitive personal data that
can identify an individual. If you’re wondering
why governments are not doing more to regulate AI, then there are two
possible answers. First, the generative
AI space is so dynamic that it’s
difficult to qualify and quantify with
certainty and speed how the technology is being
used to commit crime. And then establish who is
accountable for the crime. Second, societies must
find a balance between generative AI innovation and
generative AI regulation. Too much regulation can increase business costs and stop citizens
from enjoying benefits, such as professional upskilling, service efficiency,
customized products, and ease of creation. Experts foresee a
situation where generative AI might
solve the problems. It creates two
examples to consider. First, the Pentagon’s
Defense Advanced research projects agency trains machines on deep fakes
so they can detect and differentiate between
real and fake ones. Block chain technology, popularly used in the
finance, banking, and healthcare industries,
can be integrated with AI to identify genuine
versus fake content. Organizations like AI firms,
social media companies, and technology giants should dialogue closely with
government bodies and citizen groups to
create thoughtful and progressive AI legislation
for their country. In this video, you explored the legal issues
and implications of generative AI foundation models can be used to commit
identity fraud, spread misinformation,
infringe copyright, violate data privacy,
launch cyber warfare, and worsen discriminatory
practices. Very few laws directly govern
the use of generative AI. As this is a dynamic field, governments and
industry must balance innovation with regulation
of generative AI.

Reading: Lesson Summary: Limitations, Concerns, and Issues of Generative AI

Reading

Practice Quiz – Limitations, Concerns, and Issues of Generative AI

What is the fundamental limitation of generative AI models?
1 point
Constraints in training data
Limited understanding of context
Lack of computational resources
Inability to recognize humor or sarcasm

What is a common pattern of biases that may be visible in generative AI models?

What is a potential consequence of hallucinations in code generating LLMs influenced by biased training data?

Graded Quiz – Limitations and Ethical Issues of Generative AI

How does overfitting contribute to hallucination in text and image-generating LLMs?
1 point
Overfitting leads to a limited understanding of training data.
Overfitting prevents the generation of novel content.
Overfitting increases the complexity of LLMs.
Overfitting ensures inaccurate generation by LLMs.

What is the potential legal and ethical question that might arise if a gen AI model hallucinates code for a medical device, leading to incorrect treatments?
1 point
Questions about the syntax and semantics of programming language used in medical devices
Questions about the speed and automation benefits of LLMs
Questions about the effectiveness of error-handling mechanisms in LLMs
Questions about the responsibility of developers and creators of the AI model

Why does the lack of creative control become a challenge in AI-assisted portraits?
1 point
Difficulty in creating diverse datasets for gen AI models
Insufficient power of computing displayed by generative AI models
Results are based on the pattern established by machine learning algorithms
Lack of skilled artists using the gen AI technology

Which among the following raises concerns about accountability and reliability and is identified as a significant limitation of generative AI models from the users’ perspective?
1 point
High cost, computational resources, and training time
Difficulty in obtaining sufficient data for training
Lack of transparency and predictability
Inability to generate content based on current information and events

Ethical Issues about Generative AI

As we delve into the world of generative AI, we’ve explored the various ethical issues and concerns associated with this cutting-edge technology. 
Considering the ethical landscape, what are your thoughts on issues such as data privacy and the potential for unauthorized use or access to generated content? Are you concerned about your or your organization’s data privacy when using generative AI tools? 
We encourage you to engage in a constructive discussion with your peers on these topics.


Home » IBM » Generative AI Fundamentals Specialization » Generative AI: Impact, Considerations, and Ethical Issues » Week 1: Limitations and Ethical Issues of Generative AI