Welcome to Week 3! This week, you will learn important concepts for AWS storage services—such as buckets and objects for Amazon Simple Storage Service (Amazon S3), and how Amazon Elastic Block Store (Amazon EBS) is used on AWS. You will also explore databases on AWS, and the use cases for each AWS storage service.
Learning Objectives
- Create a DynamoDB table
- Describe the function of Amazon DynamoDB on AWS
- Explore databases on AWS
- Create an Amazon S3 bucket
- Explain when to use each AWS storage service
- Explain important S3 concepts, such as S3 buckets and objects
- Describe the function of Amazon EBS on AWS
- Differentiate between file, block, and object storage
- Storage on AWS
- Video: Introduction to Week 3
- Video: Storage Types on AWS
- Reading: Reading 3.1: Storage Types on AWS
- Video: Amazon EC2 Instance Storage and Amazon Elastic Block Store
- Reading: Reading 3.2: Amazon EC2 Instance Storage and Amazon Elastic Block Store
- Video: Object Storage with Amazon S3
- Reading 3.3: Object Storage with Amazon S3
- Video: Choose the Right Storage Service
- Reading 3.4: Choose the Right Storage Service
- Video: Demo Creating an Amazon S3 Bucket
- Databases on AWS
- Video: Explore Databases on AWS
- Reading 3.5: Explore Databases on AWS
- Video: Amazon Relational Database Service
- Reading 3.6: Amazon Relational Database Service
- Video: Purpose Built Databases on AWS
- Discussion Prompt: Discussion – Consider this Scenario
- Video: Introduction to Amazon DynamoDB
- Reading: Reading 3.8: Introduction to Amazon DynamoDB
- Reading: Reading 3.9: Choose the Right AWS Database Service
- Week 3 Exercise & Assessment
Storage on AWS
Video: Introduction to Week 3
What’s next in your AWS learning journey?
You’ve already covered compute (EC2) and networking (VPC). Now, it’s time to focus on:
- Storage: Where to put your application’s data.
- Databases: How to manage that data in an organized and efficient way.
Specific Topics
- Storage on AWS: Explore the different options AWS offers and when to use each.
- Amazon S3: Create a bucket to store your employee directory’s images.
- Databases on AWS: Get introduced to various services, but focus on two:
- Amazon RDS (Relational Database Service): For traditional database needs
- Amazon DynamoDB: A NoSQL database for flexibility and scalability
Important Reminders
- Take the Quizzes! They’ll test your understanding of the concepts.
- Don’t Skip the Readings: You’ll find valuable information to support the lessons.
Let me know if you want a deeper dive into storage or database concepts!
- Hello again. I’m so happy to see that you’ve made it to this next batch of
lessons in our course. So far, you’ve learned about
compute and networking on AWS. Time to move on to two
other major categories, storage and databases. The employee directory application we are building out
currently has a solid network in place using Amazon VPC and is being hosted on Amazon EC2. The only issue is the app
doesn’t really work yet because we haven’t set up
anywhere for it to store the employee information
or the employee photos. We are going to go ahead and fix that. We will start off by learning about the different
storage offerings AWS has and compare and contrast them. Then we will create an Amazon S3 bucket in our account that the
Employee Directory App will use to store employee images. Then we will explore the different
database services AWS has and there are a lot. Not to worry though we are focusing on two, Amazon
Relational Database Service or RDS and Amazon DynamoDB. As always, take the quizzes
and please take a good look at the readings that we have placed between the video lessons. There’s a lot of extra information
there that could be useful to you as you continue
your journey with AWS. Great work so far. Keep it up and we’ll see you in a bit.
Video: Storage Types on AWS
Types of Storage for Your App
- Operating System/App Files: Need fast access and frequent updates.
- Static Assets (Employee Photos): Accessed often, but rarely changed.
- Structured Employee Data: Will go in a database (covered later).
Block Storage
- Data split into fixed-sized chunks.
- Good for:
- Frequently updated files – allows modification of small portions.
- Databases and operating systems require fast, granular access.
Object Storage
- Treats each file as a single unit.
- Good for:
- Infrequent changes – you modify the whole object at once.
- Storing static content like images or videos.
Why This Matters
Choosing the right storage type (block vs. object) is crucial for how your application will perform and interact with its data.
Next Steps
The course will dive into specific AWS storage services:
- Look at the provided notes to refresh your understanding of storage types.
- This will help you match the AWS services to the appropriate use cases.
- The next thing we need to configure for our employee directory
app is the storage. Our application requires several types of storage for its data. For one, we need to store the
operating system, software, and system files of our app. We also need to store static assets, like photos for the employee headshots, and then we have more structured data, such as the name, title, and location of each employee, as well. All of that needs a home. The structured data usually
requires a database, which we’ll talk about later this week, so for now we’ll focus on storing the application files as well as the static content. There are two main types of storage that we can use to store this data, block and object. Here’s the difference. Let’s say that I have a one
gigabyte file with text in it. If I’m storing this in
block storage, what happens is that this file is split into fixed size chunks of data and then stored. Object storage, on the other hand, treats each file like
a single unit of data. This might seem like a small difference, but it can change how you
access and work with your data. Let’s say I want to change one character out of that one gigabyte file. If my file is stored in block storage, changing that one character is simple, mainly because we can change the block, or the piece of the file
that the character is in, and leave the rest of the file alone. In object storage, if I want
to change that one character, I instead have to update the entire file. Let’s take these two types of
storage and access patterns and try to apply them to
the data we want to store. For example, our static data,
like the employee photos, will most likely be accessed
often, but modified rarely. Therefore, storing in
object storage is fine. For more frequently updated data or data that has high transaction rates, like our application or system files, block storage will perform better. In this section of the course,
we’ll discuss both block and object AWS storage services
and how they’ll interact with our employee directory application. Before we do that, take
a look at the notes to get a refresher of the
different types of storage. That way you can easily
match the storage type to the AWS storage service
that we talk about.
Reading: Reading 3.1: Storage Types on AWS
Video: Amazon EC2 Instance Storage and Amazon Elastic Block Store
Types of Block Storage for EC2
- Instance Store
- Built-in directly to the physical server the instance runs on.
- Pros: Very fast access speeds.
- Cons: Temporary – data is lost if the instance stops or terminates. Not ideal for data you need to keep long-term.
- Amazon Elastic Block Store (EBS)
- Network-attached volumes, configured separately from EC2 instances.
- Pros: Persistent storage – data remains even if your instance goes down. You can also detach and move EBS volumes between instances.
- Cons: Slightly slower than Instance Store due to network connection.
EBS Volume Types
- SSD-backed: Generally faster, better for frequent access workloads.
- HDD-backed: Slower, but more cost-effective for less frequently used data.
Important Note: Backups are Essential
- Even EBS, being persistent, needs backups to protect against data loss.
- Use EBS Snapshots: Incremental backups for easily restoring EBS volumes in case of issues.
- When you launch an EC2 instance you’re going to need some kind of block storage to go with it. This block storage can
be used as a boot volume for your operating system
or a separate data volume. For example, think about your laptop. With a laptop you store
your data in drives, and those drives are either built-in internally to your laptop
or connected externally. EC2 instances have the same options as far as block storage goes. The internal storage is
called Instance Store and the external connected
storage is called Amazon Elastic Block Store or Amazon EBS. Let’s talk about Instance Store first. Instance Store is a form of
directly attached storage which means the underlying
physical server has at least one storage unit
directly attached to it. This direct attachment is
also the main advantage of using this form of storage. Because it’s so close
to the physical server it can be very fast and
respond very quickly, but while it can be very fast, there is also one big downside. With Instance Store
being directly attached to an EC2 instance, its lifecycle is tied
to that of the instance. That means if you stop
or terminate an instance all data in the Instance Store is gone. It can no longer be used or accessed. Naturally there are many use cases where you want the ability to keep data, even if you shut an EC2 instance down. This is where EBS volumes come in. These volumes, as the name implies, are drives of a user configured size that are separate from an EC2 instance. The drives are simply
network attached storage for your instances. You can think of it as similar to how you might attach an
external drive to your laptop. You can attach multiple EBS
volumes to one EC2 instance, and then you can configure
how to use that storage on the OS of the EC2 instance. When I connect that EBS
volume to my instance, my instance now has a
direct communication line to the data in that volume. Nobody else can directly
talk to that volume so that it maintains secure communication. You need an EC2 instance to
access data on an EBS volume. If I decided I want to use that EBS volume with a different instance,
that’s no problem. We can stop the instance,
detach the volume, and then attach it to another
instance in the same AZ. Much like you can unplug
your drive from a laptop, and plug it into another one. Or depending on the instance
type and EBS volume we’re using we may be able to attach it to multiple instances at the same time, which is called EBS Multi-Attach. And perhaps the most
important similarity is that an EBS volume is
separate from your instance. Just like an external drive
is separate from your laptop. That means if an accident happens, and the instance goes down you still have your
data on your EBS volume. This is what we refer to
as persistent storage. You can stop or terminate your instance, and your EBS volume can still
exist with your data on it. EBS is often the right storage type for workloads that require
persistence of data. However, the question
typically comes down to which EBS volume type do I use? That’s right. There are many different types of volumes, but they’ve divided into
two main volume types. SSD backed volumes and HDD backed volumes. In the readings, you’ll learn
more about these two options. The last thing we’ll
need to talk about here is backing up data. Things fail, errors happen, so you need to backup
your data, even in AWS. The way you backup EBS volumes is by taking what we call snapshots. EBS snapshots are incremental backups that are stored redundantly. The idea here is that
if something goes wrong you can create new volumes
from your snapshots and restore your data to a safe state.
Reading: Reading 3.2: Amazon EC2 Instance Storage and Amazon Elastic Block Store
Reading
Amazon EC2 Instance Store
Amazon EC2 Instance Store provides temporary block-level storage for your instance. This storage is located on disks that are physically attached to the host computer. This ties the lifecycle of your data to the lifecycle of your EC2 instance. If you delete your instance, the instance store is deleted as well. Due to this, instance store is considered ephemeral storage. Read more about it in the AWS documentation .
Instance store is ideal if you are hosting applications that replicate data to other EC2 instances, such as Hadoop clusters. For these cluster-based workloads, having the speed of locally attached volumes and the resiliency of replicated data helps you achieve data distribution at high performance. It’s also ideal for temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content.
Amazon Elastic Block Storage (Amazon EBS) As the name implies, Amazon EBS is a block-level storage device that you can attach to an Amazon EC2 instance. These storage devices are called Amazon EBS volumes. EBS volumes are essentially drives of a user-configured size attached to an EC2 instance, similar to how you might attach an external drive to your laptop.
EBS volumes act similarly to external drives in more than one way.
- Most Amazon EBS volumes can only be connected with one computer at a time. Most EBS volumes have a one-to-one relationship with EC2 instances, so they cannot be shared by or attached to multiple instances at one time. Note: Recently, AWS announced the Amazon EBS multi-attach feature that enables volumes to be attached to multiple EC2 instances at one time. This feature is not available for all instance types and all instances must be in the same Availability Zone. Read more about this scenario in the EBS documentation .
- You can detach an EBS volume from one EC2 instance and attach it to another EC2 instance in the same Availability Zone, to access the data on it.
- The external drive is separate from the computer. That means, if an accident happens and the computer goes down, you still have your data on your external drive. The same is true for EBS volumes.
- You’re limited to the size of the external drive, since it has a fixed limit to how scalable it can be. For example, you may have a 2 TB external drive and that means you can only have 2 TB of content on there. This relates to EBS as well, since volumes also have a max limitation of how much content you can store on the volume.
Scale Amazon EBS Volumes
You can scale Amazon EBS volumes in two ways.
- Increase the volume size, as long as it doesn’t increase above the maximum size limit. For EBS volumes, the maximum amount of storage you can have is 16 TB. That means if you provision a 5 TB EBS volume, you can choose to increase the size of your volume until you get to 16 TB.
- Attach multiple volumes to a single Amazon EC2 instance. EC2 has a one-to-many relationship with EBS volumes. You can add these additional volumes during or after EC2 instance creation to provide more storage capacity for your hosts.
Amazon EBS Use Cases
Amazon EBS is useful when you need to retrieve data quickly and have data persist long-term. Volumes are commonly used in the following scenarios.
- Operating systems: Boot/root volumes to store an operating system. The root device for an instance launched from an Amazon Machine Image (AMI) is typically an Amazon EBS volume. These are commonly referred to as EBS-backed AMIs.
- Databases: A storage layer for databases running on Amazon EC2 that rely on transactional reads and writes.
- Enterprise applications: Amazon EBS provides reliable block storage to run business-critical applications.
- Throughput-intensive applications: Applications that perform long, continuous reads and writes.
Amazon EBS Volume Types
EBS Provisioned IOPS SSD | EBS General Purpose SSD | Throughput Optimized HDD | Cold HDD | |
---|---|---|---|---|
Description | Highest performance SSD designed for latency-sensitive transactional workloads | General purpose SSD that balances price and performance for a wide variety of transactional workloads | Low-cost HDD designed for frequently accessed, throughput intensive workloads | Lowest cost HDD designed for less frequently accessed workloads |
Use Cases | I/O-intensive NoSQL and relational databases | Boot volumes, low-latency interactive apps, development, and test | Big data, data warehouses, log processing | Colder data requiring fewer scans per day |
Volume Size | 4 GB-16 TB | 1 GB-16 TB | 500 GB-16 TB | 500 GB-16 TB |
Max IOPS/Volume | 64,000 | 16,000 | 500 | 250 |
Max Throughput/Volume | 1,000 MB/s | 250 MB/s | 500 MB/s | 250 MB/s |
There are two main categories of Amazon EBS volumes: solid-state drives (SSDs) and hard-disk drives (HDDs). SSDs provide strong performance for random input/output (I/O), while HDDs provide strong performance for sequential I/O. AWS offers two types of each. The following chart can help you decide which EBS volume is the right option for your workload.
Benefits of Using Amazon EBS
Here are the following benefits of using Amazon EBS (in case you need a quick cheat sheet).
- High availability: When you create an EBS volume, it is automatically replicated within its Availability Zone to prevent data loss from single points of failure.
- Data persistence: The storage persists even when your instance doesn’t.
- Data encryption: All EBS volumes support encryption.
- Flexibility: EBS volumes support on-the-fly changes. You can modify volume type, volume size, and input/output operations per second (IOPS) capacity without stopping your instance.
- Backups: Amazon EBS provides you the ability to create backups of any EBS volume.
EBS Snapshots
Errors happen. One of those errors is not backing up data, and then, inevitably losing that data. To prevent this from happening to you, you should back up your data—even in AWS. Since your EBS volumes consist of the data from your Amazon EC2 instance, you’ll want to take backups of these volumes, called snapshots.
EBS snapshots are incremental backups that only save the blocks on the volume that have changed after your most recent snapshot. For example, if you have 10 GB of data on a volume, and only 2 GB of data have been modified since your last snapshot, only the 2 GB that have been changed are written to Amazon Simple Storage Service (Amazon S3).
When you take a snapshot of any of your EBS volumes, these backups are stored redundantly in multiple Availability Zones using Amazon S3. This aspect of storing the backup in Amazon S3 will be handled by AWS, so you won’t need to interact with Amazon S3 to work with your EBS snapshots. You simply manage them in the EBS console (which is part of the EC2 console).
EBS snapshots can be used to create multiple new volumes, whether they’re in the same Availability Zone or a different one. When you create a new volume from a snapshot, it’s an exact copy of the original volume at the time the snapshot was taken.
Resources
Video: Object Storage with Amazon S3
Why not use EBS for employee photos?
- Accessibility: EBS volumes are typically attached to a single EC2 instance, limiting access as your application scales.
- Capacity: EBS volumes have size limits, whereas you may need to store many large employee photos.
Amazon S3: A Better Solution
- Standalone Storage: S3 isn’t tied to specific compute instances (EC2). You access it via URLs, making it broadly accessible (“storage for the internet”).
- Scalability: S3 allows you to store virtually unlimited objects with individual sizes up to 5 terabytes.
Key S3 Concepts
- Buckets: The fundamental containers in S3 where you place your objects (e.g., photos).
- Folders (optional): Help organize objects within a bucket.
- Object Storage: S3 uses a flat structure with unique identifiers to retrieve objects.
- Distributed Design: S3 stores your data redundantly across multiple facilities for high availability and durability.
S3 Access Control
- Private by Default: Data in S3 is initially only accessible to the creating AWS account.
- Making Objects Public: While possible, it involves several explicit steps to prevent accidental data exposure.
- Granular Control: Use the following for more fine-grained access:
- IAM Policies: Attached to users, groups, and roles to control their S3 actions.
- S3 Bucket Policies: JSON format policies attached to buckets. These specify permitted or denied actions on the bucket and its objects.
Let me know if you’d like a more concise summary or a focus on specific aspects of S3!
- So we’ve figured out block
storage for our application. Now, we need to figure out where to store our employee photos. A natural question is,
why can’t we just store these photos in Amazon EBS? Well, there’s a few reasons. Number one, most EBS
volumes are only connected to one EC2 instance at a time. Multi-attach is not supported by all volume and instance types. Eventually, as my app scales, I’ll need to figure out
how to access those photos from all of my instances, that’s an issue. The second consideration
is that an EBS volume has size limitations. That means that eventually,
there will be a limit to how many HD 4K photos I store of my employees in one drive. Ideally, I’d store these photos
in a more scalable solution. So EBS probably isn’t the right choice. Fortunately, AWS has a service called Amazon Simple
Storage Service or Amazon S3 that was designed to be a
standalone storage solution that isn’t tied to compute, meaning you don’t mount
this type of storage onto your EC2 instances. Instead, you can access
your data through URLs from anywhere on the web, which gives this service its nickname, storage for the internet. S3 also allows you to store
as many objects as you’d like with an individual object
size limit of five terabytes. This makes it ideal for
our employee photos. Now, let’s talk about how
we store things in S3. The underlying storage type
for S3 is object storage. That means that all of
the same characteristics of object storage are also
characteristics of S3. So S3 uses a flat structure. It uses unique identifiers
to look up objects when requested, you get the idea. S3 is also considered distributed storage, meaning that we store your data across multiple different
facilities within one AWS region. This is what makes S3 designed
for 99.99% availability and gives it 11 nines of durability. Alright, let’s learn
about some S3 concepts. The first concept is a bucket. In S3, you store your objects in a bucket. You can’t upload any object,
not even a single photo to S3 without creating a bucket first. You then place your objects
inside of these buckets. And if you want to organize
and arrange those objects, you can also have folders
inside of the buckets. Let’s create a bucket in the console. When you log in, you’ll type
S3 in the Service search bar. Once you click on it,
you’ll see the S3 dashboard showing you all the available
buckets for every region. I’ll then select Create bucket. What I want to point out here is that buckets are region specific, so we can choose where we
want to place our bucket. In this case, we want to
place our bucket close to our infrastructure for our application, which is in the Oregon region,
so we’ll choose Oregon. Next, we have the name of our bucket. Even though our bucket is
specific to one region, our bucket name has to be globally unique across all AWS accounts
and must be DNS compliant. Once you create your bucket, AWS will construct a URL using this name, so it has to be something
that is reachable over HTTP or HTTPS, meaning there can be no special characters,
no spaces, et cetera. So for this bucket’s name, let’s choose employee-photo-bucket-sr-001, which is DNS compliant. Now we can leave the rest as defaults. Scroll down and click create. To work with this bucket,
I’ll need to find it in the list and click on its name. Here we can start uploading our objects. To do this, I’ll click
Upload and then Add files. Now I can choose any file I
want and then I’ll click Upload. So as you can see, the
object upload was successful. If I click on the name of my object, I’ll be able to see quite a bit of detail. I can see the owner, region and size, but most importantly, we can
see the URL of my object. The first part of this URL
is simply my bucket URL that AWS created using the bucket name. Then AWS appended the name of my object, also referred to as the
object key to the bucket URL. Now, what happens if I click on this URL? Hmm, access denied. That’s weird, right? Well, not really. That access denied message
leads us to a bigger question that most people have
when they start out on AWS and that’s who can access my data. Earlier I mentioned that
you can retrieve your data from anywhere on the web
and people often think that means that anyone
can retrieve that data. By default, it’s actually the opposite. Everything in S3 is private by default. This means that all S3 resources
such as buckets, folders and objects can only be viewed by the user or AWS account that created that resource. That’s why I got an access denied message, because I was acting as an
anonymous user on the internet trying to access an S3
object that’s private. Now, that’s not to say no object or bucket can be open to the world. They absolutely can be if you
explicitly choose that option and it’s actually kind of a process to make something public. The reason it’s difficult
to make your objects public is to prevent accidental
exposure of your data. Let’s try it. Okay, so if we want to make
the object we created public, we need to do a few things. Normally, from the object detail page, we would be able to click on
the object actions dropdown and then select make public using ACL, but it is currently not
available for us to select. Going to the bucket details page, I can select the permissions tab and see that there is a default setting that blocks all public access. From there I can verify that Block public access bucket setting is set to block all public access. I’ll click Edit, then uncheck the top box that blocks all public access
and then save the changes. I’ll type Confirm to make the change and then click Confirm to finalize it. From there, I’ll go back to
the bucket permissions tab and scroll down to the
Object Ownership section and click Edit for this pane. Instead of the default setting to disable access control lists or ACLs, I’ll select ACLs enabled, then acknowledge and then save the changes. Now I can go back to
the object details page and select Make public using ACL from the object actions drop down. This will allow me to
make the object public. To view the object, all
I need to do is go back to the object details, find the URL, and click on it to view the photo. That’s how you make an object public. That being said, most of
the time you don’t want your permissions to be all or nothing, to where either nobody can see
it or everybody can see it. Typically, you want to be more granular about the way you provide
access to resources. As far as access control, you
can use IAM policies attached to users, groups, and roles
to access your S3 content and you can also use a feature
called S3 bucket policies. S3 bucket policies are
similar to IAM policies in that they’re both defined
using the same policy language in a JSON format. The difference is IAM
policies are attached to users, groups and roles, whereas S3 bucket policies
are only attached to buckets. S3 bucket policies specify
what actions you’re allowed or denied on the bucket. For example, you might want
to attach an S3 bucket policy to it that allows another AWS account to put objects in that bucket. Or you might want to
create a bucket policy that allows read-only
permissions to anonymous viewers. S3 bucket policies can
be placed on buckets and cannot be used for
folders, or objects. However, the policy that
is placed on the bucket can apply to every object in that bucket. Alright, to recap, S3 uses
containers called buckets to store your objects and
you have several options to control access to those objects through the use of IAM
policies and bucket policies.
Reading 3.3: Object Storage with Amazon S3
Reading
WHAT IS AMAZON S3?
Unlike Amazon EBS, Amazon S3 is a standalone storage solution that isn’t tied to compute. It enables you to retrieve your data from anywhere on the web. If you’ve ever used an online storage service to back up the data from your local machine, then you most likely have used a service similar to Amazon S3. The big difference between those online storage services and Amazon S3 is the storage type.
Amazon S3 is an object storage service. Object storage stores data in a flat structure, using unique identifiers to look up objects when requested. An object is simply a file combined with metadata and that you can store as many of these objects as you’d like. All of these characteristics of object storage are also characteristics of Amazon S3.
UNDERSTAND AMAZON S3 CONCEPTS
In Amazon S3, you have to store your objects in containers called buckets. You can’t upload any object, not even a single photo, to S3 without creating a bucket first. When you create a bucket, you choose, at the very minimum, two things: the bucket name and the AWS Region you want the bucket to reside in.
The first part is choosing the Region you want the bucket to reside in. Typically, this will be a Region that you’ve used for other resources, such as your compute. When you choose a Region for your bucket, all objects you put inside that bucket are redundantly stored across multiple devices, across multiple Availability Zones. This level of redundancy is designed to provide Amazon S3 customers with 99.999999999% durability and 99.99% availability for objects over a given year.
The second part is choosing a bucket name which must be unique across all AWS accounts. AWS stops you from choosing a bucket name that has already been chosen by someone else in another AWS account. Once you choose a name, that name is yours and cannot be claimed by anyone else unless you delete that bucket, which then releases the name for others to use.
AWS uses this name as part of the object identifier. In S3, each object is identified using a URL, which looks like this:
After the http://, you see the bucket name. In this example, the bucket is named doc. Then, the identifier uses the service name, s3 and specifies the service provider amazonaws. After that, you have an implied folder inside the bucket called 2006-03-01 and the object inside the folder that is named AmazonS3.html. The object name is often referred to as the key name.
Note, you can have folders inside of buckets to help you organize objects. However, remember that there’s no actual file hierarchy that supports this on the back end. It is instead a flat structure where all files and folders live at the same level. Using buckets and folders implies a hierarchy, which makes it easy to understand for the human eye.
S3 USE CASES
Amazon S3 is one of the most widely used storage services, with far more use cases than could fit on one screen. The following list summarizes some of the most common ways you can use Amazon S3.
- Backup and storage: S3 is a natural place to back up files because it is highly redundant. As mentioned in the last unit, AWS stores your EBS snapshots in S3 to take advantage of its high availability.
- Media hosting: Because you can store unlimited objects, and each individual object can be up to 5 TBs, S3 is an ideal location to host video, photo, or music uploads.
- Software delivery: You can use S3 to host your software applications that customers can download.
- Data lakes: S3 is an optimal foundation for a data lake because of its virtually unlimited scalability. You can increase storage from gigabytes to petabytes of content, paying only for what you use.
- Static websites: You can configure your bucket to host a static website of HTML, CSS, and client-side scripts.
- Static content: Because of the limitless scaling, the support for large files, and the fact that you access any object over the web at any time, S3 is the perfect place to store static content.
CHOOSE THE RIGHT CONNECTIVITY OPTION FOR YOUR RESOURCES
Everything in Amazon S3 is private by default. This means that all S3 resources, such as buckets, folders, and objects can only be viewed by the user or AWS account that created that resource. Amazon S3 resources are all private and protected to begin with.
If you decide that you want everyone on the internet to see your photos, you can choose to make your buckets, folders, and objects public. Keep in mind that a public resource means that everyone on the internet can see it. Most of the time, you don’t want your permissions to be all or nothing. Typically, you want to be more granular about the way you provide access to your resources.
To be more specific about who can do what with your S3 resources, Amazon S3 provides two main access management features: IAM policies and S3 bucket policies.
UNDERSTAND IAM POLICIES
Previously, you learned about creating and using IAM policies, and now you get to apply this to Amazon S3. When IAM policies are attached to IAM users, groups, and roles, the policies define which actions they can perform. IAM policies are not tied to any one AWS service and can be used to define access to nearly any AWS action. You should use IAM policies for private buckets when:
- You have many buckets with different permission requirements. Instead of defining many different S3 bucket policies, you can use IAM policies instead.
- You want all policies to be in a centralized location. Using IAM policies allows you to manage all policy information in one location.
UNDERSTAND S3 BUCKET POLICIES
S3 bucket policies are similar to IAM policies, in that they are both defined using the same policy language in a JSON format. The difference is IAM policies are attached to users, groups, and roles, whereas S3 bucket policies are only attached to buckets. S3 bucket policies specify what actions are allowed or denied on the bucket.
For example, if you have a bucket called employeebucket, you can attach an S3 bucket policy to it that allows another AWS account to put objects in that bucket.
Or if you wanted to allow anonymous viewers to read the objects in employeebucket, then you can apply a policy to that bucket that allows anyone to read objects in the bucket using “Effect”:Allow on the “Action:[“s3:GetObject”]”.
Here’s an example of what that S3 bucket policy might look like.
{
“Version”:”2012-10-17″,
“Statement”:[
{
“Sid”:”PublicRead”,
“Effect”:”Allow”,
“Principal”: “*”,
“Action”:[“s3:GetObject”],
“Resource”:[“arn:aws:s3:::employeebucket/*”]
}
]
}
S3 Bucket policies can only be placed on buckets, and cannot be used for folders or objects. However, the policy that is placed on the bucket applies to every object in that bucket. You should use S3 bucket policies when:
- You need a simple way to do cross-account access to S3, without using IAM roles.
- Your IAM policies bump up against the defined size limit. S3 bucket policies have a larger size limit.
ENCRYPT S3
Amazon S3 reinforces encryption in transit (as it travels to and from Amazon S3) and at rest. To protect data at rest, you can use:
- Server-side encryption: This allows Amazon S3 to encrypt your object before saving it on disks in its data centers and then decrypt it when you download the objects.
- Client-side encryption: Encrypt your data client-side and upload the encrypted data to Amazon S3. In this case, you manage the encryption process, the encryption keys, and all related tools.
To encrypt in transit, you can use client-side encryption or Secure Sockets Layer (SSL).
USE VERSIONING TO PRESERVE OBJECTS
As you know, Amazon S3 identifies objects in part by using the object name. For example, when you upload an employee photo to S3, you may name the object employee.jpg and store it in a folder called employees. If you don’t use Amazon S3 versioning, anytime you upload an object called employee.jpg to the employees folder, it overwrites the original file. This can be an issue for several reasons.
- employee.jpg is a common name for an employee photo object. You or someone else who has access to that bucket might not have intended to overwrite it, and now that you have, you no longer have access to the original file.
- You may want to preserve different versions of employee.jpg. Without versioning, if you wanted to create a new version of employee.jpg, you would need to upload the object and choose a different name for it. Having several objects all with slight differences in naming variations may cause confusion and clutter in your bucket.
So, what do you do? You use S3 versioning! Versioning enables you to keep multiple versions of a single object in the same bucket. This allows you to preserve old versions of an object without having to use different naming constructs, in case you need to recover from accidental deletions, accidental overwrites, or even application failures. Let’s see how this works.
If you enable versioning for a bucket, Amazon S3 automatically generates a unique version ID for the object being stored. In one bucket, for example, you can have two objects with the same key, but different version IDs, such as employeephoto.gif (version 111111) and employeephoto.gif (version 121212). Versioning-enabled buckets let you recover objects from accidental deletion or overwrite.
- Deleting an object does not remove the object permanently. Instead, Amazon S3 puts a marker on the object that shows you tried to delete it. If you want to restore the object, you can remove this marker and it reinstates the object.
- If you overwrite an object, it results in a new object version in the bucket. You still have access to previous versions of the object.
UNDERSTAND VERSIONING STATES
Buckets can be in one of three states.
- Unversioned (the default): No new or existing objects in the bucket have a version.
- Versioning-enabled: This enables versioning for all objects in the bucket.
- Versioning-suspended: This suspends versioning for new objects. All new objects in the bucket will not have a version. However, all existing objects keep their object versions.
The versioning state applies to all of the objects in that bucket. Keep in mind that storage costs are incurred for all objects in your bucket and all versions of those objects. To reduce your S3 bill, you may want to delete previous versions of your objects that are no longer in use.
WHAT ARE AMAZON S3 STORAGE CLASSES?
When you upload an object to Amazon S3 and you don’t specify the storage class, you’re uploading it to the default storage class—often referred to as standard storage. When you learned about Amazon S3 in previous units, you were learning about the standard storage class without even knowing it! S3 storage classes let you change your storage tier as your data characteristics change. For example, if you are now accessing your old photos infrequently, you may want to change the storage class those photos are stored in to save on costs. There are six S3 storage classes.
- Amazon S3 Standard: This is considered general purpose storage for cloud applications, dynamic websites, content distribution, mobile and gaming applications, and big data analytics.
- Amazon S3 Intelligent-Tiering: This tier is useful if your data has unknown or changing access patterns. S3 Intelligent-Tiering stores objects in two tiers, a frequent access tier and an infrequent access tier. Amazon S3 monitors access patterns of your data, and automatically moves your data to the most cost-effective storage tier based on frequency of access.
- Amazon S3 Standard-Infrequent Access (S3 Standard-IA): S3 Standard-IA is for data that is accessed less frequently, but requires rapid access when needed. S3 Standard-IA offers the high durability, high throughput, and low latency of S3 Standard, with a low per-GB storage price and per-GB retrieval fee. This storage tier is ideal if you want to store long-term backups, disaster recovery files, and so on.
- Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA): Unlike other S3 storage classes which store data in a minimum of three Availability Zones (AZs), S3 One Zone-IA stores data in a single AZ and costs 20% less than S3 Standard-IA. S3 One Zone-IA is ideal for customers who want a lower-cost option for infrequently accessed data but do not require the availability and resilience of S3 Standard or S3 Standard-IA. It’s a good choice for storing secondary backup copies of on-premises data or easily re-creatable data.
- Amazon S3 Glacier Instant Retrieval: Amazon S3 Glacier Instant Retrieval is an archive storage class that delivers the lowest-cost storage for long-lived data that is rarely accessed and requires retrieval in milliseconds.
- Amazon S3 Glacier Flexible Retrieval:S3 Glacier Flexible Retrieval delivers low-cost storage, up to 10% lower cost (than S3 Glacier Instant Retrieval), for archive data that is accessed 1—2 times per year and is retrieved asynchronously.
- Amazon S3 Glacier Deep Archive: S3 Glacier Deep Archive is Amazon S3’s lowest-cost storage class and supports long-term retention and digital preservation for data that may be accessed once or twice in a year. It is designed for customers—particularly those in highly regulated industries, such as the Financial Services, Healthcare, and Public Sectors—that retain data sets for 7 to 10 years or longer to meet regulatory compliance requirements.
- Amazon S3 Outposts:Amazon S3 on Outposts delivers object storage to your on-premises AWS Outposts environment.
AUTOMATE TIER TRANSITIONS WITH OBJECT LIFECYCLE MANAGEMENT
If you keep manually changing your objects, such as your employee photos, from storage tier to storage tier, you may want to look into automating this process using a lifecycle policy. When you define a lifecycle policy configuration for an object or group of objects, you can choose to automate two actions: transition and expiration actions.
- Transition actions are used to define when you should transition your objects to another storage class.
- Expiration actions define when objects expire and should be permanently deleted.
For example, you might choose to transition objects to S3 Standard-IA storage class 30 days after you created them, or archive objects to the S3 Glacier storage class one year after creating them.
The following use cases are good candidates for lifecycle management.
- Periodic logs: If you upload periodic logs to a bucket, your application might need them for a week or a month. After that, you might want to delete them.
- Data that changes in access frequency: Some documents are frequently accessed for a limited period of time. After that, they are infrequently accessed. At some point, you might not need real-time access to them, but your organization or regulations might require you to archive them for a specific period. After that, you can delete them.
Video: Choose the Right Storage Service
Scenario Types and Best Solutions
- Video Transcoding (Lambda Function):
- Need: Store large files long-term.
- Solution: Amazon S3 (objects, not tied to single compute instance)
- E-commerce Database (EC2 Instance):
- Need: Fast, durable storage for frequently accessed data.
- Solution: Amazon EBS (attached, reliable volumes)
- Temporary Calculations (Web App):
- Need: Speed and cost are top priorities, data loss is manageable.
- Solution: EC2 Instance Store (included with instance, but ephemeral)
- Shared WordPress Uploads (Multiple Instances):
- Need: Shared file system accessible by all instances.
- Solution: Amazon EFS (network file system, not object storage)
Reasons to Rule Out Other Options
- EBS: Can be costly for large files, and less ideal for Lambda use cases, since it’s compute-attached.
- Instance Store: Ephemeral (non-permanent), making it unsuitable for critical or long-term data.
- S3: While great for objects, it’s not a traditional file system that can be easily mounted by multiple instances.
(bright upbeat music) – Thank you, thank you, and welcome back to everyone’s favorite game show, which AWS storage service
do I use for my use case? Today we have one contestant during the final round of our show, and that’s you. – In order to win the grand prize, you must answer the next
three questions correctly. There will also be a bonus question so there’s an opportunity to get extra points. All right, once we read the question, you have five seconds to answer to get points. Let’s get started. – All right, this is the first question. Let’s say you’re a developer and you plan to build out an application to transcode large
media files like videos. You’ll be using an AWS Lambda function to perform the transcoding, but you need a place to store both the original media files and the transcoded media files. Due to regulations, you need to store these files for at least a year. Which of the storage services that we’ve talked about in
this course should you use? You have five seconds to answer. (timer ticking) And the answer is Amazon S3. Why is S3 the best solution here, Morgan? – Well, first of all, the question says that they’re using a Lambda function. Because of that, I’m already ruling EBS out as EBS volumes can only be attached to EC2 instances. Even if they were using EC2, video files are typically large in size, so you may have to use
multiple EBS volumes to store that data which might not be cost
effective in the long run. So EBS is out. Instance storage is out
for the same reason. We’re not using EC2 here but also because we want this data to persist for a year and instance storage is
considered ephemeral. – All right, S3 it is. Let’s put some points on the board for those who got it right. Morgan, tell us the next question. – The next question is, you’re an architect for
an e-commerce company that wants to run their MySQL database on an EC2 instance. This database needs a storage layer to store their order and customer information. The database will frequently
be accessed and updated so the storage layer
needs to respond quickly. It’s important that the
storage is fast and durable. Which AWS storage service should you use? You have five seconds. (timer ticking) And the answer is Amazon EBS. Add 30 points to your score if you got it. – It seems like we’re looking for storage attached to the compute, so why not EC2 instance store? – Right, that’s also an option but it’s not ideal. Since it’s an e-commerce company, their order and customer data
is what drives the business which means the persistence and durability of that
data is really important. Using EC2 instance store would definitely give us
the speed we’re looking for but it wouldn’t give us
the durability needed to store this data long term. So EBS is the right option. – That makes sense. All right, moving on. Two more questions. The next one is you have a web application that needs to write to disk in order to perform certain calculations. The application will store temporary data during the calculation. The most important aspects of this architecture are speed and cost. With five seconds on the clock, which storage solution would you choose? (timer ticking) And the answer is EC2 instance store. – Seph, would you mind telling us how we chose instance store and not EBS? – Sure. Once again, we’re looking for storage attached to
compute in this case. The first thing I want to point out is that this is temporary
data we’re talking about. We’re not looking at a huge amount of data and we also don’t necessarily care about the durability of that data. If the instance fails mid calculation and you want to plan for failure, you can just restart the
calculation from scratch. So durability doesn’t matter, but cost does. By not using EBS and instead using instance store, you may save yourself some costs. That is because instance store is included in the overall EC2 instance price. So instance store is the best option for this use case. – Okay, 30 more points on the board for those of you who got it. Now the final bonus question for an extra 10 points is next. This is a tricky one, and you might have to think outside of the storage options that we’ve talked about so far. The question is, let’s say you’re creating a WordPress site on multiple instances. By default, WordPress stores user uploads on the local file system. Since you want to use multiple instances, you’ll need to move the
WordPress installation and all of the user customizations into a shared storage platform. Which storage option would you use? Five seconds to go. (timer ticking) And the answer is Amazon
Elastic File System or Amazon EFS. This service was covered
in an earlier reading so if you got points for this, great job. For those of you who didn’t, no worries but I would recommend that you go back and review the reading related to file storage on AWS. – Let’s go ahead and
talk about the options. Typically, when we talk
about shared storage systems that multiple instances can access, we think Amazon S3. why wouldn’t we use that in this case? – Well, S3 isn’t a file system. It’s actually a flat structure for storing objects
instead of a hierarchy. And you can’t mount it onto multiple instances. Because S3 has a different
underlying type of storage, it’s not right for this use case. So by moving the entire
WordPress installation directory onto an EFS file system and mounting it onto each of your EC2
instances when they boot, your WordPress site and all of its data is
automatically stored on a distributed file system that isn’t dependent on
any one EC2 instance. – Nice. Well, you answered all four questions and you win the grand prize of, the satisfaction of
getting them all right. Congratulations and that’s it for today’s show. (bright upbeat music)
Reading 3.4: Choose the Right Storage Service
Reading
Here’s a recap of all the storage services mentioned so far. By the end of this reading, you should be able to better answer the question “Which storage service should I use?” for some of the more common scenarios.
Amazon EC2 Instance Store
Instance store is ephemeral block storage. This is preconfigured storage that exists on the same physical server that hosts the EC2 instance and cannot be detached from Amazon EC2. You can think of it as a built-in drive for your EC2 instance. Instance store is generally well-suited for temporary storage of information that is constantly changing, such as buffers, caches, and scratch data. It is not meant for data that is persistent or long-lasting. If you need persistent long-term block storage that can be detached from Amazon EC2 and provide you more management flexibility, such as increasing volume size or creating snapshots, then you should use Amazon EBS.
Amazon EBS
Amazon EBS is meant for data that changes frequently and needs to persist through instance stops, terminations, or hardware failures. Amazon EBS has two different types of volumes: SSD-backed volumes and HDD-backed volumes.SSD-backed volumes have the following characteristics.
- Performance depends on IOPS (input/output operations per second).
- Ideal for transactional workloads such as databases and boot volumes.
HDD-backed volumes have the following characteristics:
- Performance depends on MB/s.
- Ideal for throughput-intensive workloads, such as big data, data warehouses, log processing, and sequential data I/O.
Here are a few important features of Amazon EBS that you need to know when comparing it to other services.
- It is block storage.
- You pay for what you provision (you have to provision storage in advance).
- EBS volumes are replicated across multiple servers in a single Availability Zone.
- Most EBS volumes can only be attached to a single EC2 instance at a time.
Amazon S3
If your data doesn’t change that often, Amazon S3 might be a more cost-effective and scalable storage solution. S3 is ideal for storing static web content and media, backups and archiving, data for analytics, and can even be used to host entire static websites with custom domain names.Here are a few important features of Amazon S3 to know about when comparing it to other services.
- It is object storage.
- You pay for what you use (you don’t have to provision storage in advance).
- Amazon S3 replicates your objects across multiple Availability Zones in a Region.
- Amazon S3 is not storage attached to compute.
Amazon Elastic File System (Amazon EFS) and Amazon FSx
In this module, you’ve already learned about Amazon S3 and Amazon EBS. You learned that S3 uses a flat namespace and isn’t meant to serve as a standalone file system. You also learned most EBS volumes can only be attached to one EC2 instance at a time. So, if you need file storage on AWS, which service should you use?For file storage that can mount on to multiple EC2 instances, you can use Amazon Elastic File System (Amazon EFS) or Amazon FSx. Use the following table for more information about each of these services.
Service | Characteristic | More Information |
Amazon Elastic File System (EFS) | Fully managed NFS file system. | EFS FAQs |
Amazon FSx for Windows File Server | Fully managed file server built on Windows Server that supports the SMB protocol. | FSx for Windows File Server FAQs |
Amazon FSx for Lustre | Fully managed Lustre file system that integrates with S3. | FSx for Lustre FAQs |
Here are a few important features of Amazon EFS and FSx to know about when comparing them to other services.
- It is file storage.
- You pay for what you use (you don’t have to provision storage in advance).
- Amazon EFS and Amazon FSx can be mounted onto multiple EC2 instances.
Resources:
Video: Demo Creating an Amazon S3 Bucket
Creating an S3 Bucket
- Go to the S3 Console: Search for “S3” in the AWS Management Console and click on the service.
- Create Bucket: Give the bucket a unique name, keeping it in the same region as your other infrastructure. Leave the defaults and click “Create Bucket”.
- Test Upload: Upload an object (like an image) to ensure the bucket is working properly.
Modifying Bucket Permissions
- Bucket Policy: Navigate to the bucket’s “Permissions” tab and click “Edit” next to bucket policy.
- Paste Provided Policy: Replace the placeholders (
insert account number
,insert bucket name
) with your specific details. Save the changes. - IAM Role Access: This policy grants access to a specific IAM role, allowing the application to interact with the bucket.
Updating the EC2 Instance
- Clone Existing Instance: Go to the EC2 Instances view, select the stopped instance from previous exercises. Under “Actions” -> “Image and Templates”, choose “Launch More Like This”.
- Update Instance Name: Append something like “-s3” to the name to distinguish the new instance.
- Enable Public IP: Ensure the instance will be accessible by setting “Auto-assign Public IP” to “Enable”.
- User Data: In the “Advanced Details” section, insert your S3 bucket name into the user data field. This tells the application which bucket to use.
- Launch Updated Instance: Launch the instance and wait for status checks to pass.
Testing and Cleanup
- Verify Application: Copy the instance’s public IP, paste it into a browser, and confirm your application loads (note: database setup is still needed for full functionality).
- Stop Instance and Delete S3 Object: Prevent accidental charges after the exercise.
- [Instructor] Hey, everyone. Welcome to our exercise walkthrough
on creating an S3 bucket and then modifying the EC2
instance holding the application to utilize this S3 bucket. As you can see, I’m already
in the AWS Management Console and I’m logged in as the admin
user that was created before. So the first thing that I need to do is to create the S3 bucket that will be utilized by the application. To do that, I’m going to go up to this
search bar here and type in S3 and then click on the S3 service to be taken to the S3 console. Now that I’m in the S3 console, I’m going to go ahead
and click Create Bucket. And for my bucket name, I am going to use
employee-photo-bucket-sr, as my initials, and then dash and just
three random digits. So I’m going to go with 963. I do want to make sure that my
bucket is in the same region as the rest of my infrastructure, so I’m going to keep this as
the Oregon region or US West 2, and then I’m going to keep
all of the other defaults as they are. From there, I’ll go ahead
and click Create Bucket. And as you can see, my bucket has been successfully created. Now that the bucket has
been successfully created, I want to test uploading an object just to make sure that it all works. So I’ll click on the name of the bucket, which will take me to
the bucket details page. And to upload a file, I can click this upload that’s in the center of the page here, but more often I’m going to
use the one in the upper right. So I’ll click that upload button and then I’ll click Add Files, and I’ll go ahead and upload
this employee two photo. After I add that file, I can click Upload. And as you can see, the upload of that file was successful. So I’ll go ahead and click Close there. And in a previous demo, you might have seen a way to make this object publicly accessible, but for this bucket and for
the exercises moving on, we don’t want this bucket to be just completely open to the world, we want this bucket and these objects to specifically be accessed
by the application. And so in order to do that, we need to adjust the
permissions for this bucket, specifically the bucket policy. So since I’m already in
the bucket details page, I’m going to go ahead and
click on this Permissions tab. And I want to adjust the
bucket policy for this bucket. So I’m going to scroll
down to bucket policy and click the edit button, which will take me to a spot where I can create a bucket policy. So to create this bucket policy, I’m going to take the policy that is in the exercise instructions and paste it here. But before I move forward, before I save this policy, I need to edit a few things. The first thing that I need
to edit is the account number so that I am utilizing
the correct account. And that will be done here where it says, insert account number, and I will paste my account number there. And then from there, I will also scroll down and change this area where
it says, insert bucket name, and put my bucket name there. And I need to make sure to
do that in both locations and make sure that I’m also
removing the caret brackets when I do so. So now that I’ve done that, I can go ahead and save these changes and my bucket policy will be created. And now my account with this
specific role will have access to this bucket and the
objects within this bucket. So now that I have tested
uploading an object and created the bucket, as well as providing access
to the bucket by that role, I need to modify the application
to utilize this bucket. So to do that, I’m going to go over to EC2 and click on Instances. And as you can see, my stopped instance from
previous exercise is there, and there’s a cool little
shortcut that can be used in order to clone this instance so that I’m launching
basically the exact same thing and making sure that I
maintain those settings. So to do that, what I’ll do is I’ll select
this stopped instance and then I’ll go over to Actions, and down to Image and Templates. And then I can click
launch more like this. What that’s going to do is open up my instance launching page, but already have certain things filled out so that my instance is going to be a clone of the stopped instance
that I’ve already launched. So what I want to do is
make sure that I know that this is my updated
application instance. So to do that, I’m going to append -s3 to the end of the instance name. So it’ll be employee-directory-app-s3. And my image and instance type
are going to remain the same. So what I can do is just make sure that I’m using the same key pair that I’m using with my other instances. And then I want to scroll down and make sure that this
is going to be accessible. So I want to change my
auto-assign public IP to enable, and that’s just going to make sure that I have a public IP address
to access for this instance. From there, I’m going to
continue to scroll down, as all of the other
settings are still correct, and then I’m going to
expand advanced details. With the expanded advanced details, I can see that the role
is already associated to this instance. And I’m going to scroll all
the way down to the user data, and what I need to do is
put my bucket name in here so that now my application
knows what bucket to utilize. And so with my bucket name there, I can now launch my instance. And that will just take
a little bit of time, so I’ll go over to my instances and I’ll wait for that to be launched, just occasionally
refreshing it to make sure that everything launches correctly. And I want to wait until the status check is
showing two of two checks passed. So now that I’ve given it some time, I’m going to go ahead
and click refresh again. And as we can see, there are two of two checks passed. So I just wanna make sure that this application is up and running. I will select this instance
and copy its public IP address, and then in a new tab, I will go ahead and paste that IP address. And as we can see, the application is up and running. We still can’t interact
with this application yet because the database
hasn’t been associated. So that is just to make sure that the application is up and running and we’ll be able to interact
with it in just a bit. So now that that’s been done, just want to do a couple of
the closeout tasks for this. And so just make sure that you, if you’re following along or
if you’ve already done this, that you go ahead and stop this instance, as well as delete the object that was uploaded to the S3 bucket. And that’ll just make sure that you don’t accidentally
accrue any charges outside of running this exercise. All right, that’s it for this one. And I will see you in the next exercise.
Databases on AWS
Video: Explore Databases on AWS
Relational Databases: The Backbone
- Relational databases are a common choice for storing structured data, like employee information.
- They’re widely used across many industries.
Database Management Options on AWS
- Databases on EC2:
- You install and manage the database software on an EC2 instance (like migrating an existing database).
- Benefits: More control, good for legacy systems.
- Drawbacks: You handle installation, patching, upgrades, etc.
- Amazon RDS (Managed Service):
- AWS handles the heavy lifting: setup, patching, upgrades, backups.
- Benefits: Much less operational overhead for you.
- Focus: You optimize the database itself (structure, queries, security).
Why RDS for the Employee Directory App
- Lets the team focus on building the app’s features, not managing complex database infrastructure.
Upcoming Lessons
- If you’re new to databases, the next readings will provide background on relational databases and their history.
- The employee directory application that we’ve been building out lets you keep track of employee data, like their name, location,
job title, and badges. The app supports adding new employees, viewing existing employees, as well as editing and deleting employees. All of this data will
be stored in a database, which we haven’t created yet. According to the architecture diagram, we have chosen Amazon Relational Database, or Amazon RDS, to store this data. So let’s talk about
databases for a moment. Relational databases are widely
used across all industries and it’s likely your
company has many databases supporting a variety of
applications and solutions. Relational database management systems, or RDBMS, let you create, manage, and use a relational database. You can install and operate
database applications on Amazon EC2 instances,
and this is a good option for migrating existing databases to AWS. By running databases on EC2, you are already simplifying things from an operational perspective when it comes to on-premises, and it’s a common use case for EC2. When migrating a database
from on-premises to EC2, you are no longer responsible for the physical infrastructure
or OS installation, but you are still responsible for the installation
of the database engine, setting up across multiple AZs with data replication in place, as well as taking on any
database server management tasks like installing security patches and updating database
software when necessary. So EC2 makes it easier, but there is a way to lift even more of the operational burden of running relational databases on AWS. What if, instead of
managing a database on EC2, you could use one of the managed AWS database offerings like Amazon RDS? The big difference between
these two options is instead of taking care of the instances, the patching, the upgrades, and
the install of the database, AWS takes care of all of that undifferentiated heavy lifting for you. The task that you are then responsible for is the creation, maintenance, and optimization of the database itself. So you are still in charge
of creating the right schema, indexing the data,
creating stored procedures, enabling encryption, managing
access control, and more. But all the rest of the
undifferentiated heavy lifting that goes into operating
a relational database AWS takes care of. To start off this section
of lessons on databases, we will first cover RDS. The upcoming reading after the video will dive into the history of enterprise relational databases and explain what relational databases are and how they were used. If you aren’t familiar with databases, the readings coming up will give you some useful background information.
Reading 3.5: Explore Databases on AWS
Video: Amazon Relational Database Service
What is Amazon RDS?
- A managed service that simplifies setting up, running, and scaling relational databases in the cloud.
- You don’t have to worry about the underlying infrastructure or time-consuming database administration.
Creating a Database with RDS
- Easy Create: Provides a quick setup using standard best practices for backups and high availability.
- Database Engines: Choose from MySQL, PostgreSQL, MariaDB, SQL Server, or the AWS-optimized Amazon Aurora.
- Aurora Benefits: Designed for high performance, scalability, and compatibility with MySQL/PostgreSQL.
- Instance Selection: Similar to selecting an EC2 instance, pick a size and type based on your workload (a free tier option exists).
High Availability with RDS
- Multi-AZ Deployment: Launch a secondary database instance in a different Availability Zone (AZ) for redundancy.
- Automated Failover: RDS manages data replication and failover between the primary and secondary instances. Your application connects to a single endpoint that seamlessly redirects if needed.
Why Use RDS
- Simplified Database Management: Reduce the operational burden compared to managing your own database setup.
- Focus on Your Application: Spend less time on database administration and more on building your product.
- Amazon RDS is a service
that makes it easier for you to set up, operate, and
scale a relational database. Instead of telling you about RDS, I am going to show you. As you can see, I’m already in the Amazon RDS dashboard. We are going to create
the relational database our employee directory application can use to store employee data. First, we will click Create database, and then we are going to
select the Easy create option, which gives us the ability to accept the standard best practices for backups and high availability. You could select Standard create if you wanted more granular control to pick and choose the different features of your database setup. Next, you choose the database engine. You can see what is currently
supported at the time of this filming for
database engines on RDS. The common engines out there are MySQL, PostgreSQL, MariaDB, Microsoft SQL Server, and then there’s this one, Amazon Aurora. Amazon Aurora is an AWS-specific database that was built to take advantage of the scalability and
durability of the AWS Cloud. Aurora is designed to
be drop in incompatible with MySQL or PostgreSql. It can be up to five times faster than the standard MySQL databases and three times faster than
standard PostgreSQL databases. So if you have some use cases
that require large amounts of data to be stored with
high availability, durability, and low latency for data retrieval time, consider using Amazon
Aurora over a standard MySQL or PostgreSQL RDS instance. In our case, we really only need a simple database without any high performance
or large storage requirements. So I’m going to select a
standard MySQL instance. Next up, we will choose the
database instance size and type. This database instance is similar to how we choose an EC2
instance size and type. Since this is just a demo, I’m going to select a free
tier eligible instance. Now we’ll give this database a name and assign the database
and admin user and password that would be used to
connect to the database. Then we will accept the rest
of the Easy create defaults and we are done with
this instance creation. You can see that the instance is in the process of booting up, and that will take a
few minutes to complete. So in the meantime, let’s talk about high
availability and RDS. When you create an RDS DB instance, it gets placed inside of
a subnet inside of a VPC, very similar to an EC2 instance. As you learned already
in a previous lesson, subnets are bound to one AZ, and as a best practice
for production workloads, we recommend that you always
replicate your solutions across at least two AZs
for high availability. With RDS, one DB instance belongs to
one subnet inside of one AZ, so that isn’t meeting the
criteria for best practices. Now, before you get worried about managing this all on your own, just know that you can
easily configure RDS to launch a secondary DB
instance in another subnet and another AZ using
RDS Multi-AZ deployment. RDS will manage all of
the data replication between the two instances
so that they stay in sync. The other cool thing about
RDS Multi-AZ deployments is that RDS also manages the
failover for the instances. One instance is the primary and the other is the secondary database. Your app connects to one endpoint. If the primary instance goes down, the secondary instance gets promoted. The endpoint doesn’t change,
so no code change is needed. All of the failover
happens behind the scenes and is handled by RDS. All you do need to do is to make sure that your app can
reconnect to the database if it experiences a momentary outage by updating any cache DNS lookups and reconnecting to the endpoint which now connects to
the secondary instance. Pretty cool, if you ask me. All right, and we’re back in the console and we can see that our
instance is up and running. At this point, you can connect to the database instance and load your database
schema onto it ready to go and much, much simpler than trying to install and manage
this all on your own. Using services like RDS make operating databases
significantly more accessible and lowers the operational overhead that comes along with
relational database management.
Reading 3.6: Amazon Relational Database Service
Video: Purpose Built Databases on AWS
The Problem with One-Size-Fits-All:
- Relational databases (like RDS) are powerful, but they can be overkill for simple use cases, adding unnecessary complexity and cost.
AWS’s Purpose-Built Approach:
- AWS offers a wide range of databases optimized for specific needs. This allows you to choose the ideal fit for your application, avoiding wasted resources and complexity.
Example: Employee Directory
- A simple key-value lookup is better served by DynamoDB (NoSQL, usage-based pricing) than RDS for this specific use case.
Other Use Cases, Other Solutions:
- Content Management: Amazon DocumentDB
- Social Networks/Graphs: Amazon Neptune
- Immutable Ledgers: Amazon QLDB
Key Takeaway:
AWS’s diverse database offerings let you focus on your application instead of managing complex database infrastructure. The goal is to pick the right tool for the job!
- Before we move on to
learning about Amazon DynamoDB, I want to touch on an
idea that’s important when you’re making architecture decisions for your AWS solutions, choosing the right database to fit your business requirements rather than forcing your data to fit a certain database choice. There is no one size fits all
database for all purposes. You should pick a database that
fits your specific use case, and with AWS, you have
multiple choices for databases. We covered Amazon RDS
and relational databases, and that was the default option for businesses for a long
time, but relational databases aren’t the best choice
for all business needs. AWS creates services to support
purpose-built databases, meaning that there are
many database services that AWS offers, and they each were built with a certain use case in mind, and therefore are optimized
for those use cases. Let’s think about the
Employee Directory app. We had originally decided
that we would use RDS for the database, but now after
thinking about it some more, RDS might not be the
best fit for our needs. All we are really doing
is storing one record in a table for each employee. There are no complex relationships
that need to be managed, and it’s essentially just a lookup table. Relational databases offer all sorts of features that are great for complex schemas and relationships, but those features add overhead that is unnecessarily complex for simple things like a lookup table. On top of that, the RDS option
we chose charges per hour of instance run time, so we will get charged for the
running instances regardless of whether we’re using it or not. Our employee directory application will have much higher
usage during the week and no usage on the weekends. Is there an AWS database offering that better fits our needs? Introducing Amazon DynamoDB. Amazon DynamoDB is a NoSQL
database that is great for storing key value
pairs or document data. This service works
great at a massive scale and provides millisecond latency. It also charges based on
the usage of the table and the amount of data that
you are reading from the table, not by the hour or by the second. This is a better option for our simple employee lookup table. Now, besides the employee directory app, there are other use cases that require databases of varying types. What if you are writing
an application that needs a full content management system? Neither RDS nor DynamoDB
would be the best solution. Luckily, AWS has quite a number
of other database offerings. For this use case, you might
look into Amazon DocumentDB. It’s great for content
management, catalogs, or user profiles. Let’s think of another use case. What if you had a social network
that you wanted to track? Keeping track of those
kind of social webs, figuring out who is connected to who can be difficult to manage in a traditional relational database. So you could use Amazon Neptune,
a graph database engineered for social networking and
recommendation engines, but it’s also good for use
cases like fraud detection, or perhaps you have a supply
chain that you have to track with assurances that nothing is lost, or
you have a banking system or financial records that
require 100% immutability. What you really need
is an immutable ledger, so perhaps Amazon QLDB, or Quantum Ledger
Database, is a better fit for this use case. It’s an immutable system of record where any entry
can never be removed, and therefore is great for industries that need to be audited for regulatory and compliance reasons. It can take a lot of experience and expertise
to operate databases at scale, and that’s why it’s so
beneficial to utilize one of the AWS database offerings. You don’t need to be an expert on running all of these
different types of databases. Instead, you can just use the
database service that is best for your use case and
focus on your application and providing value to your end users. You don’t need to build up a ton of in-house expertise to operate a highly scalable
immutable ledger database. You can just use AWS QLDB instead. The key thing to understand
is AWS wants to make sure that you are using the
best tool for the job. Coming up next, we will
explore Amazon Dynamo DynamoDB and get a look at more of the details.
Discussion Prompt: Discussion – Consider this Scenario
Reading
Consider this scenario: You are a cloud engineer who works at a company that uses Amazon Elastic Compute Cloud (Amazon EC2) instances and Amazon Elastic Block Store (Amazon EBS) volumes. The company is currently using unencrypted EBS volumes. You are tasked with migrating the data on these unencrypted EBS volumes to encrypted EBS volumes. What steps can you take to migrate the data to encrypted EBS volumes? (Hint: See Amazon EBS encryption in the AWS Documentation for more information).
Step-by-Step Migration
- Create Snapshots:
- Take a snapshot of each unencrypted EBS volume. This creates a point-in-time backup.
- Snapshots are stored in Amazon S3 for redundancy.
- Create Encrypted Snapshots:
- When creating copies of your existing snapshots, choose the option to encrypt them using a supported KMS (Key Management Service) key.
- Create Encrypted Volumes from Snapshots:
- From your encrypted snapshots, create new EBS volumes. These new volumes will inherit the encryption from the snapshot.
- Detach and Attach Volumes:
- Important: Downtime may be required unless you have specific hot-swap capabilities.
- Detach the original unencrypted EBS volumes from their EC2 instances.
- Attach the newly created encrypted volumes to the same instances.
- Mount and Validate:
- If your OS doesn’t automatically mount the new volume, do so manually.
- Verify data integrity. Test to ensure applications can access the data on the encrypted volumes.
Security Considerations
- KMS: Choose an appropriate Key Management Service key for your encryption. This determines who can control and access the encryption keys.
- IAM Roles: Grant necessary permissions to your EC2 instance(s) to interact with both encrypted volumes and your chosen KMS key.
- Downtime: Plan for potential downtime during the volume swap if you cannot hot-swap. Consider creating a test environment first to gauge downtime impact.
- Snapshots: Remember to delete the unencrypted snapshots once migration is completely verified.
Additional Tips
- Automation: For large numbers of volumes, tools like AWS Systems Manager or scripting can automate the process.
- OS-Level Encryption: If you need more granular file-level encryption, consider using OS-level encryption tools in addition to EBS encryption.
- Volume Types: Ensure the selected encryption type is compatible with the EBS volume type (GP2, io1, etc.).
Important Note: Data on the original unencrypted EBS volume remains accessible until it’s overwritten. For maximum security, it’s best to securely wipe the unencrypted volumes after successful migration.
Video: Introduction to Amazon DynamoDB
What is DynamoDB?
- Serverless NoSQL Database: Amazon handles scaling and infrastructure, you focus on the data.
- Flexible Schema: Items in a table don’t need identical attributes, good for varying data types.
- High Performance: Built for speed (millisecond response) and massive scale.
Why Choose DynamoDB?
- Scalability and Speed: Handles huge workloads with reliably fast performance, unlike some traditional databases that struggle under pressure.
- Less Rigid Data: Great if you don’t have a strictly defined data structure or it changes frequently.
- Simple for Some Use Cases: While it’s not good for complex queries across multiple tables, it excels at focused lookups within a single table
Example from the Video
The employee lookup table was easily switched from a relational database (RDS) to DynamoDB due to:
- Simple Data Structure: Employee ID as unique key.
- Focused Use Case: Fast lookups, not complex analysis required.
Important Notes:
- DynamoDB is Purpose-Built: It’s not a万能solution for every database need.
- Read the Additional Material: Learn more about how DynamoDB works in depth.
- Let’s talk some more
about Amazon DynamoDB. At the most basic level,
DynamoDB is a database. It’s a serverless database, meaning that you don’t need to manage the underlying instances or
infrastructure powering it. With DynamoDB, you don’t create a database with tables that relate to each other like a relational database. Instead, you create standalone tables. A DynamoDB table is just a place where you can store and query your data. Data is organized into items
and items have attributes. If you have one item in your table or 2 million items in your table, DynamoDB manages the
underlying storage for you and you don’t need to worry about scaling the system
up or down for storage. DynamoDB stores data
redundantly across AZs and mirrors the data across
multiple drives for you under the hood, this lessens the burden of operating a highly available database. DynamoDB, beyond being massively scalable, is also highly performant. DynamoDB has a millisecond response time and when you have applications with potentially millions
of users, having scalability and reliable lightning fast
response times is important. Now, DynamoDB isn’t a
normal database in the sense that DynamoDB doesn’t
require a rigid schema or manage complex
relationships or constraints. Relational databases like
the MySQL database we created in an earlier lesson require that you have a well-defined schema in place that might consist of one or many tables that may or may not relate to each other. Relational databases work
great for a lot of use cases and have been the standard
type of database historically, however, these types
of rigid SQL databases can have performance and scaling
issues when under stress. The rigid schema also makes it so that you cannot have
variation in the types of data that you store in a single table, so it might not be the
best fit for a data set that is a little bit less rigid and is being accessed at a high rate. This is where no SQL databases
like DynamoDB are handy. No SQL databases have flexible schemas. With DynamoDB, you can
add or remove attributes from items in the table at any time. Not every item in the table has
to have the same attributes. This is great for data sets
that do have some variation from item to item. The types of queries you
can run on no SQL databases tend to be simpler and focus
on a collection of items from one table, not queries
that span multiple tables. This along with other factors, including the way the
underlying system is designed, allow DynamoDB to be very
quick in response time and highly scalable. So things to remember, DynamoDB is no SQL. It is purpose-built, meaning
it has specific use cases and isn’t the best fit for
every workload out there. Taking a look at our architecture, we modified it to use DynamoDB. We are going to need to
create an employee table for the app to write and read from. We will create this DynamoDB
table using the console. And you know what? Let’s get Seph out here to help us out. – Hello. – We changed our minds about using RDS and decided to change it over to DynamoDB. It’s only one table and it’s
essentially a lookup table for our employees. How hard do you think this
change is gonna be to make? – Well, our app is actually designed to use either RDS or DynamoDB
as the backend database so this won’t take long at all. Here, I’ll show you how. I’m in the console and will navigate to the DynamoDB service. From here, all you need to
do is create a new table. Tables in DynamoDB require you to designate certain attributes as keys that will make an item unique. We will select employee ID
as the unique identifier for the items in this table. Then we will just accept the
defaults and create the table. Our app was coded to look for a table specifically called employees so this actually should all work now that the table is created. To test it out let’s
head over to the website hosted on EC2 and try
to add a new employee. See, that was nice and easy. Now let’s go back into
DynamoDB and refresh the page and scan all the items in the table and boom, there it is. It’s really that simple for
a lookup table like this one. – All right, well, that was easy. Nice. Now, not every use case is this simple. In the reading after this video there will be more information
about how DynamoDB works so make sure to check that out.
Reading: Reading 3.8: Introduction to Amazon DynamoDB
Reading
What Is Amazon DynamoDB?
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. DynamoDB lets you offload the administrative burdens of operating and scaling a distributed database so that you don’t have to worry about hardware provisioning, setup and configuration, replication, software patching, or cluster scaling.
With DynamoDB, you can create database tables that can store and retrieve any amount of data and serve any level of request traffic. You can scale up or scale down your tables’ throughput capacity without downtime or performance degradation. You can use the AWS Management Console to monitor resource utilization and performance metrics.
DynamoDB automatically spreads the data and traffic for your tables over a sufficient number of servers to handle your throughput and storage requirements, while maintaining consistent and fast performance. All of your data is stored on solid-state disks (SSDs) and is automatically replicated across multiple Availability Zones in an AWS Region, providing built-in high availability and data durability.
Core Components of Amazon DynamoDB
In DynamoDB, tables, items, and attributes are the core components that you work with. A table is a collection of items, and each item is a collection of attributes. DynamoDB uses primary keys to uniquely identify each item in a table and secondary indexes to provide more querying flexibility.
The following are the basic DynamoDB components:
Tables – Similar to other database systems, DynamoDB stores data in tables. A table is a collection of data. For example, see the example table called People that you could use to store personal contact information about friends, family, or anyone else of interest. You could also have a Cars table to store information about vehicles that people drive.
Items – Each table contains zero or more items. An item is a group of attributes that is uniquely identifiable among all of the other items. In a People table, each item represents a person. For a Cars table, each item represents one vehicle. Items in DynamoDB are similar in many ways to rows, records, or tuples in other database systems. In DynamoDB, there is no limit to the number of items you can store in a table.
Attributes – Each item is composed of one or more attributes. An attribute is a fundamental data element, something that does not need to be broken down any further. For example, an item in a People table contains attributes called PersonID, LastName, FirstName, and so on. For a Department table, an item might have attributes such as DepartmentID, Name, Manager, and so on. Attributes in DynamoDB are similar in many ways to fields or columns in other database systems.
Security with Amazon DynamoDB
DynamoDB also offers encryption at rest, which eliminates the operational burden and complexity involved in protecting sensitive data. For more information, see DynamoDB Encryption at Rest.
Below you can find additional resources for learning about Amazon DynamoDB:
Mark as completedLikeDislikeReport an issue
Reading: Reading 3.9: Choose the Right AWS Database Service
Reading
AWS Database Services
AWS has a variety of different database options for different use cases. Use the table below to get a quick look at the AWS database portfolio.
Database Type | Use Cases | AWS Service |
Relational | Traditional applications, ERP, CRM, e-commerce | Amazon RDS, Amazon Aurora, Amazon Redshift |
Key-value | High-traffic web apps, e-commerce systems, gaming applications | Amazon DynamoDB |
In-memory | Caching, session management, gaming leaderboards, geospatial applications | Amazon ElastiCache for Memcached, Amazon ElastiCache for Redis |
Document | Content management, catalogs, user profiles | Amazon DocumentDB (with MongoDB compatibility) |
Wide column | High-scale industrial apps for equipment maintenance, fleet management, and route optimization | Amazon Keyspaces (for Apache Cassandra) |
Graph | Fraud detection, social networking, recommendation engines | Amazon Neptune |
Time series | IoT applications, DevOps, industrial telemetry | Amazon Timestream |
Ledger | Systems of record, supply chain, registrations, banking transactions | Amazon QLDB |
Breaking Up Applications and Databases
As the industry changes, applications and databases change too. Today, with larger applications, you no longer see just one database supporting it. Instead, these applications are being broken into smaller services, each with their own purpose-built database supporting it.
This shift removes the idea of a one-size-fits-all database and replaces it with a complimentary database strategy. You can give each database the appropriate functionality, performance, and scale that the workload requires.
Resources:
Week 3 Exercise & Assessment
Video: Introduction to Exercise 3
Objective: Set up the storage and database parts of an application to make it functional.
Steps:
- Amazon S3 (Storage)
- Create an S3 bucket.
- Set up a bucket policy to give your IAM role permissions to interact with the bucket.
- Practice uploading an object manually to get used to S3.
- Amazon DynamoDB (Database)
- Create a DynamoDB table to hold employee data.
- Application Testing
- Verify that your application can successfully read from and write data to both the S3 bucket and the DynamoDB table.
Troubleshooting Tip: If you encounter problems, carefully review previous steps to ensure everything was done correctly.
- Up until this point, you’ve practiced how to
launch an EC2 instance, but you couldn’t use the application yet because the storage and database
pieces weren’t set up yet. In this next lab you’ll add the storage and database components and test that the application is working. To do that, you will
create an Amazon S3 bucket and create a bucket policy that will allow the IAM role to work with
the objects in the bucket. Then you will upload an object to the bucket manually
to get familiar with S3. After that, you will create
an Amazon DynamoDB table, which will be used to
store employee information. Once the bucket and table
are created, you will test that the application
can read and write data. That is it for this lab. As usual, if you get stuck, try going back a few
steps in the instructions and make sure that you
didn’t miss anything.
Lab 3: Configuring a Web Application
Video: Demo Creating an Amazon DynamoDB Table
Database Setup for the Employee Directory Application
- Launch a New EC2 Instance:
- Clone an existing instance configuration for convenience (’employee-directory-app-dynamodb’)
- Ensure a public IP is assigned to the instance for accessibility.
- Verify the application is running on the new instance before proceeding.
- Create a DynamoDB Table:
- Name the table “Employees” for compatibility with the app.
- Set the partition key as “id” (again, aligns with app’s structure).
- Leave default settings and create the table.
- Test the Application with the Database:
- Add a new employee record to the directory, including a photo.
- Verify that the photo is uploaded correctly to the S3 bucket.
- Check that the new record appears in the DynamoDB table with all the correct details.
- Wrap Up:
- Stop the EC2 instance to avoid unnecessary charges while the database remains active.
Key Points
- The instructor emphasizes the importance of testing each step to ensure everything is connected properly.
- DynamoDB is used as the database solution in this example.
- The application is designed to interact with specific table and key names, highlighting the importance of coordination between the app and database setup.
- [Instructor] Welcome to our exercise on setting up the database
for our application. So we’ve already created
and modified the application and have created and tested
the uploading to the S3 bucket. So to get the application
fully functional, or at a fully functional stage, the next thing to do is
to launch a database. But before I launch the database, I want to launch an
instance that’s ready to use that database so that we can
test it as soon as it’s done. So to launch that instance,
I’m going to go over to EC2. And go over to the instances. And just like with the last time, I’m going to use the shortcut
where I launch a clone of an instance that I
have already launched. So to do that, I’m going to select my
employee-directory-app-s3, because that’s the most updated
version of this application. And then I’m going to go over to actions, image and templates. And then, I’m going to
launch more like this. Now that I’m in that launch
page, what I want to do, just so that I know that I
am on the correct instance and that I’m using the correct instance. Instead of appending -s3 to this, I’m going to append -dynamodb so that I know that this
is the application instance that is testing with the database and not just connecting to the bucket. So now that I’ve adjusted
that, I can scroll down. I see that most of my
settings are still there. I want to make sure that
I’m still using the same key in case I do need to access the instance. And I also want to just verify, even though I know that this works, I just want to verify that
I am still using the role. One other thing that I want to
make sure to do is to ensure that the instance
launches with a public IP. And I scrolled just right over this. And so scrolling back, I
go to the network settings and auto-assigned public IP, and just make sure that
I click Enable on that. And that way the instance
will have a public IP, and I can access it and test it once the instance and the
database are all fully launched. So now that I’ve done that, I can see that my user data is
exactly where I left it after adding the bucket. And now I can click Launch instance. As always, give that some time to launch. And I can just go over to the instances and occasionally refresh
this in order to make sure that my instance has launched. So I’m going to give this a few minutes while waiting for the instance
to launch and making sure that where it currently
says, “Initializing,” it will say, “2/2 checks passed.” Now that it’s been a few minutes, I’ll go ahead and refresh the page again. And I can see that two of
two checks have passed. So I’ll go ahead and select
my dynamodb labeled instance and copy its public IP address, just because I want to make sure that the application is up
and running before I proceed. If this weren’t running,
as we can see it is, if it wasn’t, then I would go back and see where I made a mistake
in launching the instance. And once the application
was successfully running, then I would, I would move forward with creating the database. And so since I can see that the base employee
directory application is up and running, I can go ahead and proceed with creating the DynamoDB table. In order to do that, I’m going
to go up to the search bar and type in dynamo and click
on the DynamoDB service. And once I’m here, I’m
just going to go ahead and click Create table. Since I don’t have any
tables currently running, it’s just the easiest way to get to this table creation screen. For the table name, I’m
going to put Employees because the application is set up to work with a database named Employees. So this will make it very easy
for it to just utilize this. And then for my partition
key, I’m going to put id. And this is because again,
the application is set up to utilize this for
organization within the table. And then, I’m going to
keep this type as a string. After I’ve done that, all of the default settings can remain, and I’ll just go ahead
and click Create table. And that takes just a little bit of time, and the table will be
created in just a bit. So I’ll just give it a
couple of seconds here. So that took almost no time at all, and the table has
successfully been created. So now that the table is created, instead of adding items
directly to this table, what I want to do is test
the application again. So I will make sure I copy my
instance’s public IP address. So I’ll go over to my instances and select the instance that was launched for this and copy its IP address. And then, in a new tab I
will paste that address. And I can see that the
Employee Directory app is still running, and it is
currently an empty directory. But I can go ahead and add
an employee to this directory just to make sure that
everything is connected. So what I’ll do is I’ll click Add. And I’ll go ahead and put my name here, and then my location, and my job title. I’ll also just select a
couple of these options here, just so that we can see what it looks like as everything is added to this table and added to this directory. So since I am a Seattle resident, I am also very much a Seattle fan, and I’m very definitely a coffee snob. I will also not just put this
information into the table, but I will add a file. I’ll add my employee
photo and open that up. And then once that has all been added, I can go ahead and click Save. And as we can see, I now have an entry in the Employee Directory. So it looks, it looks good here. And we can see that I
am now in the directory. But I want to show that
this isn’t just something that was added to this location. So, what I will do is show
that these items were added in the table and in the S3 bucket. So starting with S3, I’ll go over to S3. And then, I will open up the
bucket that I had created. And as we can see, my
employee pic has been added, and it’s the employee
picture that I uploaded. Even though it has its
own designated name here, this is the object that was uploaded through that application. And then I can also go over to DynamoDB and can view my table, and view my employee table, and explore the table items. And I can also see that my employee items that were added through
the directory are here. And so it shows which badges
I associated with myself, my name, my job title, my location. It shows the name of the object that was uploaded through the directory, and it shows the ID
specifically for this table, and the partition key that we established. So now that all of that has been done, what I want to do is just go ahead and keep the table running. But I want to go back to
EC2 and stop the instance so that I’m not accruing
any additional charges as I prepare to move on to the next stage of this application
infrastructure development. So I’ll go ahead and stop that instance, and that’s where I’ll go ahead and close this walkthrough out. And I’ll see you in the next one.
Quiz: Week 3 Quiz
What is a typical use case for Amazon Simple Storage Service (Amazon S3)?
Object storage for media hosting
Amazon S3 is an object storage service that is designed for large objects, such as media files. Because users can store unlimited objects, and the size of each individual object can be up to 5 TB, Amazon S3 is a good location to host video, photo, or music uploads. For more information, see the Object Storage with Amazon S3 video.
Gemini
The best answer is Object storage for media hosting. Here’s why:
- Amazon S3: Key Features
- Object storage: Designed for storing files as individual objects, not within a traditional file system hierarchy.
- High scalability, durability, and availability: Ideal for large amounts of data that need to be reliably accessible.
- Frequently used for: website content, images, videos, backups, data lakes.
Why the other options are less suitable:
- Object storage for a boot drive: You need block storage (like Amazon EBS) for operating systems to boot from.
- Block storage for an Amazon Elastic Compute Cloud (Amazon EC2) instance: Amazon EBS is designed for this. S3 is not directly mountable like a drive.
- File storage for multiple Amazon Elastic Compute Cloud (Amazon EC2) instances: Amazon EFS (Elastic File System) provides a shared file system that multiple EC2 instances can access concurrently.
Let me know if you’d like to explore the differences between object, block, and file storage in more detail!
ChatGPT
A typical use case for Amazon Simple Storage Service (Amazon S3) is:
- Object storage for media hosting
Amazon S3 is commonly used for storing and serving media files such as images, videos, audio files, and other multimedia content. It provides scalable and durable storage with high availability, making it suitable for hosting large volumes of media files accessed by web applications, mobile apps, and other digital platforms.
A company needs a storage layer for a high-transaction relational database on an Amazon Elastic Compute Cloud (Amazon EC2) instance. Which service should the company use?
Amazon Elastic Block Store (Amazon EBS)
Amazon EBS works well for a high-transaction database storage layer. For more information, see the Amazon EC2 Instance Storage and Amazon Elastic Block Store video.
Gemini
The correct answer is Amazon Elastic Block Store (Amazon EBS). Here’s why:
- High-transaction relational database needs:
- Fast, low-latency access to data, which block storage provides.
- Data persistence – data needs to be preserved even if the EC2 instance is rebooted.
- Why EBS is the best fit:
- Block storage: Appears as a directly-attached drive to the EC2 instance, offering the performance databases need.
- Variety of volume types: Can choose from provisioned IOPS SSDs for the most demanding workloads to general-purpose SSDs.
- Persistent: Data is saved independently of the EC2 instance’s lifecycle.
- Why the others aren’t ideal:
- Amazon EC2 Instance Store: Temporary storage lost if the instance stops, not suitable for databases.
- Amazon Simple Storage Service (Amazon S3): Object storage, generally slower access than block storage, better for backups or static data.
- Amazon Elastic File System (Amazon EFS): Network-based file storage, good for shared access across instances, but higher latency isn’t ideal for most databases.
Let me know if you want a deeper explanation of the different AWS storage options!
ChatGPT
The appropriate service for storing the data for a high-transaction relational database on an Amazon EC2 instance would be:
- Amazon Elastic Block Store (Amazon EBS)
Amazon EBS provides block-level storage volumes that can be attached to Amazon EC2 instances. It is specifically designed for use with EC2 instances and is suitable for databases that require low-latency and consistent performance. EBS volumes offer features such as snapshot backups, encryption, and high availability, making them well-suited for hosting databases in the AWS cloud.
True or False: Amazon Elastic Block Store (Amazon EBS) volumes are considered ephemeral storage.
False
Amazon EBS provides persistent storage. If the Amazon Elastic Compute Cloud (Amazon EC2) instance is stopped or terminated, data that is attached to the EC2 instance will remain on an associated EBS volume permanently. For more information, see the Amazon EC2 Instance Storage and Amazon Elastic Block Store reading.
Gemini
False. Amazon Elastic Block Store (Amazon EBS) volumes are considered persistent storage. Here’s the difference:
- Ephemeral Storage (e.g., Instance Store):
- Temporary storage tied to the lifecycle of the EC2 instance.
- Data is lost when the instance stops or terminates.
- Persistent Storage (e.g., Amazon EBS):
- Exists independently of the EC2 instance.
- Data survives even if the instance is stopped, rebooted, or terminated.
Why this matters: EBS is suitable for databases, applications requiring long-term data storage, and any data that needs to persist beyond the life of a single EC2 instance.
ChatGPT
False
A solutions architect is working for a healthcare facility, and they are tasked with storing 7 years of patient information that is rarely accessed. The facility’s IT manager asks the solutions architect to consider one of the Amazon Simple Storage Service (Amazon S3) storage tiers to store the patient information. Which storage tier should the solutions architect suggest?
Amazon S3 Glacier Deep Archive
Amazon S3 Glacier Deep Archive is the lowest-cost storage class in Amazon S3. This storage class supports long-term retention and digital preservation for data that might be accessed once or twice in a year. It is designed for customers—particularly those in highly regulated industries, such as financial services, healthcare, and the public sector—that retain data sets for 7 to 10 years (or longer) to meet regulatory compliance requirements. For more information, see the Object storage with Amazon S3 reading.
Gemini
The best answer is Amazon S3 Glacier Deep Archive. Here’s why:
- Key Considerations:
- Infrequent Access: Patient data is rarely needed after the initial storage period.
- Long-term Retention: Healthcare regulations require retaining records for 7 years or more.
- Cost-Effectiveness: Prioritizing low cost for long-term archival.
- Why S3 Glacier Deep Archive is the best fit:
- Lowest cost among S3 tiers, specifically designed for long-term, rarely accessed data.
- Meets regulatory requirements with secure and durable storage.
- Retrieval Flexibility: Although retrieval takes longer (hours), it’s suitable for infrequent historical record access.
- Why other options are less ideal:
- S3 Standard: More costly for infrequently used data. Intended for frequently accessed data.
- S3 Standard IA: Still more expensive than Glacier Deep Archive, better for frequent access within a short period.
- S3 Intelligent-Tiering: Automatic tiering adds overhead costs, beneficial when access patterns are unknown.
Let me know if you want a more detailed comparison of the S3 storage tiers!
ChatGPT
The solutions architect should suggest using Amazon S3 Glacier Deep Archive for storing the patient information, as it is suitable for data that is rarely accessed and needs to be retained for long periods of time.
True or False: Object storage is the best storage solution for applications that need to frequently update specific small sections of a file.
False
User can update only the entire file in object storage. To update specific sections of a file, we recommend using block storage. For more information, see Storage Types on AWS.
Gemini
False. Here’s why object storage isn’t ideal for frequent, small file updates:
- Object Immutability: In object storage, files are treated as whole objects. You can’t modify a portion of a file; you’d need to overwrite the entire object.
- Implications:
- Inefficient for frequently updating small sections of large files.
- Leads to increased network traffic and storage costs as new versions are written.
Better Alternatives:
- Block Storage: Allows modification of specific blocks within a file, making it more efficient for workloads with frequent small updates (like databases).
- File Storage: Offers traditional file system structure where files can be directly edited in place.
Caveat: Some object storage systems might offer versioning features to mitigate the limitations, but it may not be as performant as block or file storage for this use case.
ChatGPT
False
True or False: A Multi-AZ deployment is beneficial when users want to increase the availability of their database.
True
Placing a workload across multiple Availability Zones increases the availability of resources. For example, say that an environmental hazard in an Availability Zone causes an Amazon Aurora database to stop working. In this case, a read-replica of the Aurora database instance that is in an unaffected Availability Zone will automatically be promoted to a primary database instance. For more information, see Amazon Relational Database Service.
Gemini
True! Here’s why a Multi-AZ (Availability Zone) deployment increases database availability:
- Availability Zones (AZs): Physically separate data centers within an AWS region, with independent power, cooling, and networking infrastructure.
- Multi-AZ Redundancy: By replicating your database across multiple AZs, even if one AZ experiences an outage, your database remains accessible from the other AZs.
- High Availability: This setup minimizes downtime and ensures your application can continue serving users even during disruptions.
- Fault Tolerance: Your database becomes more resilient to localized failures.
Let me know if you’d like to explore other benefits of Multi-AZ deployments or how to set one up!
ChatGPT
True
Which task of running and operating the database are users responsible for when they use Amazon Relational Database Service (Amazon RDS)?
Optimizing the database
With Amazon RDS, users are no longer responsible for the underlying environment that the database runs on. Instead, users can focus on optimizing the database because Amazon RDS has components that AWS manages. For more information, see Explore Databases on AWS.
Gemini
The correct answer is Optimizing the database. Here’s why:
- Amazon RDS as a Managed Service: Amazon RDS significantly reduces the operational burden for users by automating many tasks like:
- Provisioning and managing infrastructure: AWS handles the servers, storage, networking, etc.
- Installing the database software: RDS sets up and configures the chosen database engine.
- Applying OS patches: AWS manages security updates for the underlying operating system.
- User Responsibility: Users still need to focus on database-related optimizations such as:
- Schema design: Efficient table structures, indexes, etc.
- Query optimization: Ensuring queries perform well.
- Data management: Archiving old data, managing backups within RDS’s tools.
Let me know if you’d like a breakdown of other tasks that are handled by AWS vs. the user in RDS!
ChatGPT
Users are responsible for optimizing the database.
Which of the following are common use cases for file storage? (Choose TWO.)
Which of the following are common use cases for file storage? (Choose TWO.)
User home directories
User home directories are an example of file storage that uses a hierarchical system to store and organize data. For more information, see Reading: Storage types on AWS.
Large content repositories
Large content repositories are an example of file storage. They use a hierarchical system to store and organize data. For more information, see Reading: Storage types on AWS.
Gemini
Here are the two correct choices, along with why they are common use cases for file storage:
- User home directories:
- File storage systems provide a centralized location to store user documents, settings, and personal files, offering easy access from multiple devices.
- Large content repositories:
- File storage easily scales to accommodate massive amounts of unstructured data like images, videos, audio files, and documents.
Why the other options aren’t the best fit:
- Backup files that are stored in Amazon Simple Storage Service (Amazon S3): S3 is object storage, not traditional file storage.
- Relational or non-relational databases: Databases typically use block storage for better performance and data consistency.
- Big data analytics: While file storage can be a component of big data systems, the data is often processed in object storage (like S3) or specialized data lake solutions.
Let me know if you want a deeper explanation of the different storage types!
ChatGPT
The common use cases for file storage are:
- User home directories
- Large content repositories
True or False: The IT department in a company can attach Amazon Elastic Block Store (Amazon EBS) volumes to Amazon Simple Storage Service (Amazon S3) to store data in a bucket.
False
The IT department cannot attach EBS volumes to Amazon S3. Instead, Amazon EBS can only be attached to Amazon Elastic Compute Cloud (Amazon EC2) instances. For more information, see Choose the right storage service.
Gemini
False. Here’s the breakdown:
- Amazon EBS (Elastic Block Store): Provides block-level storage that you attach directly to EC2 instances. It’s ideal for applications that need high-performance, low-latency storage like databases.
- Amazon S3 (Simple Storage Service): Provides object-based storage designed for storing files as objects. It’s great for storing backups, static content (images, videos), and large data files.
Key Difference: You cannot directly attach EBS volumes to S3. They are fundamentally different storage types with different use cases.
How to transfer data between them: If you need to move data between EBS and S3, you’d use tools or scripts to copy the data from the EBS volume to an S3 bucket.
ChatGPT
False
Which of the following instance families does Amazon Relational Database Service (Amazon RDS) support? (Choose TWO.)
General purpose
Amazon RDS supports general-purpose instances. For more information, see Reading: Amazon Relational Database Service.
Memory optimized
Amazon RDS is optimized for memory-intensive applications. For more information, see Reading: Amazon Relational Database Service.
Gemini
The correct answers are:
- General purpose
- Memory optimized
Amazon RDS offers a variety of instance families to meet different database workload needs. Let’s break down why the others aren’t the best fit:
- Storage optimized: While important for databases, RDS doesn’t explicitly have an instance class with this name. Storage is managed separately with different EBS volume types.
- Compute optimized: Some database workloads are compute-heavy, but RDS instances are often a balance of compute, memory, and I/O.
- Accelerated computing: Useful in specific use cases (like graphics-intensive workloads), but less common as a primary choice for most RDS databases.
Typical RDS Instance Families:
- General Purpose (
M
classes): A good starting point, providing a balance of compute, memory, and networking resources. - Memory Optimized (
R
classes): Designed for workloads requiring large amounts of in-memory caching for faster query performance.
Let me know if you want a deeper dive into choosing RDS instance families!
ChatGPT
Amazon Relational Database Service (Amazon RDS) supports the following instance families:
- General purpose
- Memory optimized
A solutions architect is working for a small business. The business is looking for a storage service that temporarily stores frequently changing and non-persistent data. This type of data can be deleted during instance stops or terminations. Which service should the solutions architect recommend for this use case?
Amazon Elastic Compute Cloud (Amazon EC2) Instance Store
Amazon EC2 Instance Store provides ephemeral block storage that is well suited for the temporary storage of information, such as buffers, caches, and scratch data. For more information, see Reading: Choose the right storage service.
Gemini
The correct answer is Amazon Elastic Compute Cloud (Amazon EC2) Instance Store. Here’s why:
- Temporary & Non-Persistent: Instance store volumes are directly attached to EC2 instances and their data is lost when the instance stops/terminates. This aligns perfectly with the described need.
- Frequent Changes: Instance store typically offers high-performance, making it suitable for workloads with frequently changing data.
Why other options are less ideal:
- Amazon Elastic Block Store (Amazon EBS): Persistent storage. Data survives instance restarts, not suitable for temporary data.
- Amazon Simple Storage Service (Amazon S3): Object storage, generally slower access than EC2 Instance Store. Better for static files or data that needs to persist.
- Amazon Elastic File System (Amazon EFS): Network file system, ideal for shared data between multiple EC2 instances. It has higher latency, not optimal for temporary data tied to one instance.
Let me know if you’d like more details on the differences between these storage types!
ChatGPT
For frequently changing and non-persistent data that can be deleted during instance stops or terminations, the solutions architect should recommend Amazon Elastic Compute Cloud (Amazon EC2) Instance Store.
Which database is a non-relational database that stores data in key-value pairs, and is a good fit for hosting simple lookup tables?
Amazon DynamoDB
DynamoDB is a database that uses the key-value data model for storing simple data. For more information about the correct question, see Purpose Built Databases on AWS.
Gemini
The correct answer is Amazon DynamoDB. Here’s why:
- Non-Relational, Key-Value: DynamoDB is designed for fast, predictable performance with simple lookup operations based on a unique key.
- Simple Lookup Tables: This use case directly aligns with the strengths of a key-value database. You can quickly retrieve values associated with specific keys without complex queries.
Why the others aren’t as ideal:
- Amazon DocumentDB: A document-oriented database, better suited for storing more complex JSON-like data structures.
- Amazon Neptune: A graph database designed for handling highly connected data and complex relationship queries.
- Amazon Relational Database Service (Amazon RDS): Provides managed services for traditional relational databases (MySQL, PostgreSQL, etc.), which are not as efficient for simple key-value lookups.
Let me know if you want to learn more about different types of non-relational databases!
ChatGPT
The database that stores data in key-value pairs and is a good fit for hosting simple lookup tables is Amazon DynamoDB.
Which core component of Amazon DynamoDB corresponds to a column in a relational database table?
Attribute
In DynamoDB, an attribute is a fundamental data element. It is something that does not need to be broken down any further. For more information, see Reading: Introduction to Amazon DynamoDB.
Gemini
The correct answer is Attribute. Here’s how the concepts map to each other:
DynamoDB
- Table: The top-level container for data items.
- Item: A collection of attributes, similar to a row in a relational database.
- Attribute: A named data element within an item. It holds a single value and is the equivalent of a column in a relational table.
Relational Database
- Database: A collection of tables.
- Table: A structured dataset with rows and columns.
- Column: Defines a data field that all rows in the table share.
Let me know if you’d like a diagram to visualize this relationship, it can be helpful!
ChatGPT
In Amazon DynamoDB, the core component that corresponds to a column in a relational database table is called an Attribute.
Which AWS database service is best suited for use cases such as social networking or recommendation engines?
Amazon Neptune
Amazon Neptune is a fast, reliable, fully managed graph database service that is designed for fraud detection, social networking, recommendation engines, and more. For more information, see Reading: Choose the right AWS database service.
Gemini
The best answer is Amazon Neptune. Here’s why:
- Graph Database: Neptune is specifically designed to store and query highly connected graph-like data.
- Social Networking & Recommendations: These scenarios rely on understanding relationships between people, interests, products, etc. Graph databases excel at traversing these complex connections efficiently.
Why the others are less ideal:
- Amazon DynamoDB: Key-value database, better for simple lookups, not ideal for complex relationship analysis.
- Amazon Aurora: Optimized for relational databases (MySQL/PostgreSQL compatible), works well with structured data but less efficient at handling graph-like relationships.
- Amazon Redshift: Data warehouse focused on analytics across large datasets, not primarily for real-time graph queries.
Let me know if you’d like a deeper dive into why graph databases are great for social networking and recommendation systems!
ChatGPT
For use cases such as social networking or recommendation engines, the best-suited AWS database service is Amazon Neptune.