This module explores what large language models (LLM) are, the use cases where they can be utilized, and how you can use prompt tuning to enhance LLM performance. It also covers Google tools to help you develop your own Gen AI apps.
Learning Objectives
- Define Large Language Models (LLMs)
- Describe LLM Use Cases
- Explain Prompt Tuning
- Describe Google’s Gen AI Development tools
Vide: Introduction to Large Language Models
The article discusses the concept of Large Language Models (LLMs), which are a subset of Deep Learning. LLMs are pre-trained models that can be fine-tuned for specific purposes. The article explains that LLMs are trained on large datasets and can be used for various tasks such as text classification, question answering, and text generation.
The article highlights the benefits of using LLMs, including:
- Single model can be used for different tasks
- Minimal field training data required
- Performance continuously growing with more data and parameters
The article also explains the concept of pre-training and fine-tuning, which is similar to training a dog. The model is first trained for general purposes and then fine-tuned for specific tasks.
The article also discusses the concept of prompt design, which is the process of creating a prompt that is clear, concise, and informative. The article highlights the importance of prompt design in LLMs and explains the difference between prompt design and prompt engineering.
The article also discusses the different types of LLMs, including generic language models, instruction-tuned models, and dialog-tuned models. It also explains the concept of chain-of-thought reasoning, which is the observation that models are better at getting the right answer when they first output text that explains the reason for the answer.
The article also discusses the concept of tuning, which is the process of adapting a model to a new domain or a set of custom use cases. The article highlights the importance of tuning in LLMs and explains the different methods of tuning, including fine-tuning and parameter-efficient tuning methods.
The article also discusses the Google Cloud services that can help developers get more out of their LLMs, including Generative AI Studio, Vertex AI, and PaLM API.
Reading: Introduction to Large Language Models: Reading
Quiz: Introduction to Large Language Models: Quiz
What are large language models (LLMs)?
An LLM is a type of artificial intelligence (AI) that can generate human-quality text. LLMs are trained on massive datasets of text and code, and they can be used for many tasks, such as writing, translating, and coding.
An LLM is an advanced natural language processing framework that uses linguistic algorithms to generate sophisticated conversational agents.
An LLM is an artificial neural network architecture optimized for training large-scale reinforcement learning agents capable of mastering complex tasks in robotics.
An LLM is a state-of-the-art computer vision system that excels in recognizing and analyzing intricate patterns and features in images and videos.
An LLM is a type of artificial intelligence (AI) that can generate human-quality text. LLMs are trained on massive datasets of text and code, and they can be used for many tasks, such as writing, translating, and coding.
Correct! An LLM is a type of artificial intelligence (AI) that can generate human-quality text. LLMs are trained on massive datasets of text and code, and they can be used for a variety of tasks, such as writing, translating, and coding.
phi-3-mini-4k-instruct
Large Language Models (LLMs) are a class of sophisticated artificial intelligence systems designed to understand, generate, and manipulate human language at a level comparable to or exceeding that of humans in specific contexts. These models are built upon the foundations of deep learning, particularly utilizing a type of neural network architecture known as the Transformer, which was introduced in the paper “Attention Is All You Need” by Vaswani et al. in 2017.
LLMs are trained on extensive, diverse datasets comprising vast amounts of text from books, articles, websites, and other sources. This training enables them to learn the intricacies of language, including grammar, syntax, semantics, and even style and tone. The most well-known examples of LLMs include GPT (Generative Pre-trained Transformer) series by Microsoft and BERT (Bidirectional Encoder Representations from Transformers) by Google.
The capabilities of LLMs span a wide range of applications, such as:
- Text Generation: LLMs can generate coherent and contextually relevant text, making them useful for applications like content creation, storytelling, and drafting emails.
- Translation: They can translate text between languages with a high degree of accuracy, aiding in breaking down language barriers.
- Question Answering: LLMs can understand and respond to questions posed in natural language, making them valuable for chatbots and virtual assistants.
- Summarization: They can condense long documents into concise summaries, which is helpful for quick information retrieval.
- Code Generation and Completion: LLMs can assist programmers by generating code snippets or completing partially written code, based on the provided context.
It’s important to note that while LLMs are powerful, they are not without limitations. They can sometimes generate incorrect or nonsensical information, especially when the input lacks context or contains ambiguous language. Additionally, ethical considerations arise concerning the potential misuse of these models, such as generating misleading information or deepfakes.
LLMs are not related to reinforcement learning agents, computer vision systems, or other AI frameworks not focused on natural language processing. Instead, they represent a significant advancement in the field of natural language understanding and generation, with the potential to revolutionize how we interact with technology and process information.
gemma-1.1-7b-it
An LLM is a type of artificial intelligence (AI) that can generate human-quality text.
LLMs are trained on massive datasets of text and code, and they can be used for many tasks, such as:
- Writing
- Translating
- Coding
Gemini
The closest description of a large language model (LLM) is:
- An LLM is a type of artificial intelligence (AI) that can generate human-quality text. LLMs are trained on massive datasets of text and code, and they can be used for many tasks, such as writing, translating, and coding.
Here’s why this is the most accurate explanation:
- Large Amounts of Text Data: LLMs are indeed distinguished by their training on massive datasets of text and code. This exposure to vast amounts of language allows them to learn complex patterns and relationships within human language.
- Human-Quality Text Generation: A key capability of LLMs is their ability to generate human-quality text. This can range from creative writing to different writing styles to code production.
- Versatility in Applications: LLMs are finding use in a variety of tasks beyond just text generation. They can be used for tasks like translation, writing different kinds of creative content, and even writing different kinds of computer code.
While LLMs are powerful tools, the other descriptions are not their primary focus:
- Conversational Agents: While LLMs can be used to create chatbots or conversational agents, this is not their exclusive function. They have a broader range of capabilities.
- Reinforcement Learning: This type of AI focuses on training agents through trial and error, and LLMs are not specifically designed for reinforcement learning.
- Computer Vision: This field deals with image and video analysis, and LLMs are primarily focused on processing and generating text.
What is a benefit of using large language models (LLMs)?
They can generate inaccurate or misleading content if their training data is incomplete or biased.
They can generate human-quality text for tasks such as content creation, writing assistance, and automatic summarization.
They can be trained using only a tiny dataset of text and code.
They can only provide output in the English language.
They can generate human-quality text for tasks such as content creation, writing assistance, and automatic summarization.
Correct! LLMs can generate human-quality text, as well as being used for a wide variety of other tasks.
gpt-4-0613
The benefit of using large language models (LLMs) is that they can generate human-quality text for tasks such as content creation, writing assistance, and automatic summarization.
dbrx-next
They can generate human-quality text for tasks such as content creation, writing assistance, and automatic summarization.
Gemini
The benefit of large language models (LLMs) you listed is:
- They can generate human-quality text for tasks such as content creation, writing assistance, and automatic summarization.
Here’s why this is a key advantage of LLMs:
- Text Generation: LLMs are exceptional at producing human-quality text content in various styles and formats. This can be helpful for tasks like writing different kinds of creative content, blog posts, marketing copy, or even different kinds of computer code.
- Efficiency and Scalability: LLMs can automate content creation processes, saving time and resources. They can also scale to generate large amounts of text content.
While LLMs have limitations, the other options are not necessarily benefits:
- Inaccurate or Misleading Content: LLMs can generate inaccurate content if their training data is biased or incomplete. It’s crucial to be mindful of this and use LLMs responsibly.
- Large Datasets: LLMs are trained on massive amounts of text data. While some advancements are being made in training with smaller datasets, large datasets are still the norm.
- Multilingual Capabilities: While some LLMs are focused on English, many are multilingual and can handle different languages.
What are some of the applications of LLMs?
LLMs can be used for many tasks, including writing, translating, and coding.
LLMs can be used for many tasks, including personalized advice and therapy.
LLMs can be used for many tasks, including original creative expression and ethical decision making.
LLMs can be used for many tasks such as making real-time decisions in emergency situations and generating content based on physical perceptions.
LLMs can be used for many tasks, including writing, translating, and coding.
Correct! LLMs can be used for many tasks, including writing, translating, and coding.
Gemini
The most promising applications of LLMs are:
- Writing, translating, and coding: As you mentioned, LLMs excel at these tasks due to their ability to process and generate human-quality text. They can be used for content creation, writing different kinds of creative content, blog posts, marketing copy, or even different kinds of computer code. Additionally, they can translate languages and assist with coding tasks.
Here’s why the other options are not primary applications (yet):
- Personalized Advice and Therapy: While LLMs might be able to provide some informational support, therapy and advice require complex social and emotional understanding that is still under development for AI.
- Creative Expression and Ethical Decision Making: LLMs can be used for creative text generation, but true creative expression with deep understanding and originality is still a challenge. Similarly, ethical decision making necessitates real-world context and nuance that LLMs are not currently equipped for.
- Real-time Decision Making and Physical Perception: LLMs primarily deal with text data and lack the capabilities for real-time decision making or physical perception, which involve sensors and interaction with the real world.
What are some of the challenges of using LLMs? Select three options.
They can be expensive to train.
Correct! Some of the challenges of using LLMs are that they can be expensive to train, they can be biased, and they can be used to generate harmful content.
They can be biased.
Correct! Some of the challenges of using LLMs are that they can be expensive to train, they can be biased, and they can be used to generate harmful content.
They can be used to generate harmful content.
Correct! Some of the challenges of using LLMs are that they can be expensive to train, they can be biased, and they can be used to generate harmful content.
- How’s it going? I’m Megha. Today, I’m going to be talking
about large language models. Don’t know what those are? Me either. Just kidding, I actually
know what I’m talking about. I’m a customer engineer
here at Google Cloud and today I’m going to teach you everything you need to know about LLMs. That’s short for Large Language Models. In this course, you’re going to learn to define large language models, describe LLM use cases, explain prompt tuning, and describe Google’s Generative
AI developmental tools. Let’s get into it. Large Language Models, or LLMs, are a subset of Deep Learning. To find out more about Deep Learning, check out our introduction to
generative AI course video. LLMs and generative AI intersect and they’re both a part of Deep Learning. Another area of AI you
may be hearing a lot about is generative AI. This is a type of artificial intelligence that can produce new
content including text, images, audio, and synthetic data. All right, back to LLMs. So what are large language models? Large language models refer to large general
purpose language models that can be pre-trained and then fine-tuned for specific purposes. What do pre-trained and fine-tuned mean? Great questions, let’s dive in. Imagine training a dog. Often, you train your dog
basic commands such as sit, come, down, and stay. These commands are normally
sufficient for everyday life and help your dog become
a good candidate citizen. Good boy. But if you need special service dogs, such as a police dog, a guide dog, or a hunting dog, you add
special trainings, right? The similar idea applies
to large language models. These models are trained
for general purposes to solve common language problems such as text classification,
question answering, document summarization, and text generation across industries. The models can then be tailored to solve specific problems
in different fields, such as retail, finance,
and entertainment, using a relatively small
size of field data sets. So now that you’ve got that down, let’s further break down the concept into three major features
of large language models. We’ll start with the word “large.” Large indicates two meanings. First is the enormous size
of the training dataset, sometimes at the petabyte scale. Second, it refers to the parameter count. In machine learning, parameters are often
called hyperparameters. Parameters are basically the memories and the knowledge the machine learned from the model training. Parameters define the skill of a model in solving a problem
such as predicting text. So that’s why we use the word large. What about general purpose? General purpose is when
the models are sufficient to solve common problems. Two reasons led to this idea. First is the commonality of human language regardless of the specific tasks. And second is the resource restriction. Only certain organizations
have the capability to train such large language
models with huge data sets and a tremendous number of parameters. How about letting them create
fundamental language models for others to use? So this leaves us with our last terms, pre-trained and fine-tuned, which mean to pre-train
a large language model for a general purpose
with a large dataset, and then fine-tune it for specific aims with a much smaller dataset. So now that we’ve nailed
down the definition of what large language models, LLMs, are, we can move on to
describing LLM use cases. The benefits of using
large language models are straightforward. First, a single model can
be used for different tasks. This is a dream come true. These large language models that are trained with petabytes of data and generate billions of
parameters are smart enough to solve different tasks,
including language translation, sentence completion, text classification, question answering, and more. Second, large language models require minimal field training data when you tailor them to
solve your specific problem. Large language models
obtain decent performance even with little domain training data. In other words, they
can be used for few-shot or even zero-shot scenarios. In machine learning, few-shot refers to training
a model with minimal data and zero-shot implies that
a model can recognize things that have not explicitly been
taught in the training before. Third, the performance
of large language models is continuously growing when you add more data and parameters. Let’s take PaLM as an example. In April, 2022, Google released PaLM, short for Pathways Language Model, a 540 billion parameter model that achieves a state
of the art performance across multiple language tasks. PaLM is a dense decoder-only
transformer model. It leverages a new pathway
system, which enabled Google to efficiently train a single model across multiple TPU v4 Pods. Pathway is a new AI architecture that will handle many tasks at once, learn new tasks quickly, and reflect a better
understanding of the world. The system enables PaLM to orchestrate distributed
computation for accelerators, but I’m getting ahead of myself. I previously mentioned that
PaLM is a transformer model. Let me explain what that means. A transformer model consists
of an encoder and a decoder. The encoder encodes the input sequence and passes it to the decoder, which learns how to
decode the representations for a relevant task. We’ve come a long way from
traditional programming to neural networks to generative models. In traditional programming, we used to have to hard code the rules for distinguishing a cat. Type, animal, legs, four, ears, two, fur, yes, likes yarn and catnip. In the wave of neural networks, we could give the network
pictures of cats and dogs and ask, “Is this a cat?” And it would predict a cat. What’s really cool is that
in the generative wave, we as users can generate our own content, whether it be text, images,
audio, video, or more. For example, models like PaLM,
or Pathways Language Model, or LaMDA, Language Model
for Dialog Applications, ingest very, very large data from multiple sources across the internet and build foundation
language models we can use simply by asking a question,
whether typing it into a prompt or verbally talking
into the prompt itself. So when you ask it, “What’s a cat?”, it can give you everything
it has learned about a cat. Let’s compare LLM development
using pre-trained models for traditional ML development. First, with LLM development,
you don’t need to be an expert. You don’t need training examples, and there is no need to train a model. All you need to do is
think about prompt design, which is a process of creating
a prompt that is clear, concise, and informative. It is an important part of
natural language processing, or NLP for short. In traditional machine
learning, you need expertise, training examples, compute
time, and hardware. That’s a lot more requirements
than LLM development. Let’s take a look at an example of a text generation use case to really drive the point home. Question answering, or QA, is a subfield of natural
language processing that deals with the task
of automatically answering questions posed in natural language. QA systems are typically trained on a large amount of text and code and they’re able to answer
a wide range of questions, including factual, definitional, and opinion-based questions. The key here is that you
needed domain knowledge to develop these question
answering models. Let’s make this clear
with a real world example. Domain knowledge is required to develop a question answering model for customer IT support or
healthcare or supply chain, but using generative QA, the model generates free text
directly based on the context. There’s no need for domain knowledge. Let me show you a few more
examples of how cool this is. Let’s look at three
questions given to Gemini, a large language model chat
bot developed by Google AI. Question one, “This
year’s sales are $100,000. Expenses are $60,000. How much is net profit?” Gemini first shares how
net profit is calculated, then performs the calculation. Then Gemini provides the
definition of net profit. Here’s another question. “Inventory on hand is 6,000 units. A new order requires 8,000 units. How many units do I need to
fill to complete the order?” Again, Gemini answers the question by performing the calculation. And our last example, “We have 1,000 sensors
in 10 geographic regions. How many sensors do we have
on average in each region?” Gemini answers the
question with an example on how to solve the problem
and some additional context. So how is that in each of our questions, a desired response was obtained? This is due to prompt design, fancy. Prompt design and prompt engineering are two closely related concepts of natural language processing. Both involve the process of
creating a prompt that is clear, concise, and informative. But there are some key
differences between the two. Prompt design is the
process of creating a prompt that is tailored to the specific task the system is being asked to perform. For example, if the system is being asked to translate a text
from English to French, the prompt should be written in English and should specify that the
translation should be in French. Prompt engineering is a
process of creating a prompt that is designed to improve performance. This may involve using
domain specific knowledge, providing examples of desired output, or using keywords that
are known to be effective for the specific system. In general, prompt design
is a more general concept, while prompt engineering is
a more specialized concept. Prompt design is essential, while prompt engineering is
only necessary for systems that require a high degree
of accuracy or performance. There are three kinds of
large language models, generic language models,
instruction-tuned, and dialog tuned. Each needs prompting in a different way. Let’s start with generic language models. Generic language models
predict the next word based on the language
in the training data. Here is a generic language model. In this example, “The cat sat on,” the next word should be, “the,” and you can see that the is
most likely the next word. Think of this model type as
an auto complete in search. Next, we have instruction-tuned models. This type of model is
trained to predict a response to the instructions given in the input. For example, “Summarize a text of X.” “Generate a poem in the style of X.” “Give me a list of keywords based on semantic similarity for X.” In this example, classify
text into neutral, negative, or positive. And finally, we have dialog tuned models. This model is trained to have a dialog by the next response. Dialog tuned models are a special case of instruction-tuned where requests are typically
framed as questions to a chat bot. Dialog tuning is expected
to be in the context of a longer back and forth conversation and typically works better with natural question like phrasings. Chain-of-thought reasoning
is the observation that models are better at
getting the right answer when they first output text that explains the reason for the answer. Let’s look at the question. “Roger has five tennis balls. He buys two more cans of tennis balls. Each can has three tennis balls. How many tennis balls does he have now?” This question is posed
initially with no response. The model is less likely to get
the correct answer directly. However, by the time the
second question is asked, the output is more likely to
end with the correct answer. But there is a catch,
there’s always a catch. A model that can do everything
has practical limitations, but task-specific tuning
can make LLMs more reliable. Vertex AI provides
task-specific foundation models. Let’s get into how you can tune with some real world examples. Let’s say you have a use
case where you need to gather how your customers are feeling about your product or service. You can use a sentiment
analysis task model, say provision tasks. If you need to perform
occupancy analytics, there is a task specific
model for your use case. Tuning a model enables you to
customize the model response based on examples of the task that you want the model to perform. It is essentially the
process of adapting a model to a new domain or a
set of custom use cases by training the model on new data. For example, we may collect training data and tune the model specifically for the
legal or medical domain. You can also further tune
the model by fine-tuning where you bring your own
dataset and retrain the model by tuning every weight in the LLM. This requires a big training job in hosting your own fine-tuned model. Here’s an example of a
medical foundation model trained on healthcare data. The tasks include question
answering, image analysis, finding similar patients, et cetera. Fine-tuning is expensive and
not realistic in many cases. So are there more efficient
methods of tuning? Yes. Parameter-efficient tuning methods, PETM, are methods for tuning
a large language model on your own custom data
without duplicating the model. The base model itself is not altered. Instead, a small number of
add-on layers are tuned, which can be swapped in
and out at inference time. I’m going to tell you
about three other ways Google Cloud can help you
get more out of your LLMs. The first is generative AI studio. Generative AI studio
lets you quickly explore and customize generative AI models that you can leverage in your
applications on Google Cloud. Generative AI studio
helps developers create and deploy generative AI models by providing a variety
of tools and resources that make it easy to get started. For example, there is a
library of pre-trained models, a tool for fine-tuning models, a tool for deploying models to production, and a community forum for developers to share ideas and collaborate. Next, we have Vertex AI,
which is particularly helpful for those of you who don’t
have much coding experience. You can build generative AI search and conversations for customers and employees with Vertex
AI search and conversation. Formerly, GenAI app builder. Build with little or no coding and no prior machine learning experience. Vertex AI can help you
create your own chat bots, digital assistants, custom search engines, knowledge bases, training
applications, and more. And lastly, we have PaLM API. PaLM API lets you test and experiment with Google’s
large language models and GenAI tools. To make prototyping quick
and more accessible, developers can integrate
PaLM API with MakerSuite and use it to access the API using a graphical user interface. The suite includes a
number of different tools such as a model training
tool, a model deployment tool, and a model monitoring tool. And what do these tools do? I’m so glad you asked. The model training tool helps developers train machine
learning models on their data using different algorithms. The model deployment tool helps developers deploy machine learning
models to production, but the number of different
deployment options. The model monitoring tool helps developers monitor the performance of
their machine learning models in production using a dashboard and a number of different metrics. Gemini is a multimodal AI model. Unlike traditional language models, it’s not limited to
understanding text alone. It can analyze images,
understand the nuances of audio, and even interpret programming code. This allows Gemini to
perform complex tasks that were previously impossible for AI. Due to its advanced architecture, Gemini is incredibly
adaptable and scalable, making it suitable for
diverse applications. Model Garden is continuously
updated to include new models. See, I told you way back in
the beginning of this video that I knew what I was talking about when it came to large language
models, and now you do too. Thank you for watching our course and make sure to check
out our other videos if you want to learn more
about how you can use AI.