AWS Storage Series: S3

AWS Storage Series: S3

Amazon Web Services has a considerable number of Storage services that people can use for different use cases. Initially, I was thinking of consolidating all that information in a single blog post, but that wouldn’t do justice to all the services. Hence this series for AWS Storage Services. Throughout this series we will focus on each of the following services that AWS offers to its users:

  1. Amazon Simple Storage Service (S3)
  2. Amazon Glacier
  3. Amazon Elastic File System (EFS)
  4. Amazon Elastic Block Store (EBS)
  5. Amazon Storage Gateway
  6. Amazon Snowball

So, let’s start with Amazon Simple Storage Service (S3). According to Amazon’s website, S3 is storage for the internet. S3 is basically an object store with unlimited capacity. AWS monitors the available capacity in its datacenters and keeps adding additional storage without any downtime. Users can store any number of files in S3, the only limitation is that individual file cannot be more than 5 TB. S3 is a flat file system and it supports all file types. The only thing that you cannot do using S3, is installing an Operating System or run a Database off of it. For these use cases, you can use AWS EBS storage. Files will be stored in S3 buckets. Buckets are nothing but directories in the cloud. You can create the same folder hierarchical structure that you use today on your laptop, in the cloud. Bucket names are universal, so if you want to create a bucket called “testbucket”, then chances are that the name would already be used by some other user in any AWS Region.

S3 offers the following storage classes to the users:

  1. S3 – Standard: This is the standard offering, which gives you 99.99% (4 X 9s) Availability and 99.999999999% (11 X 9s) Durability. Any object that is stored in S3 – Standard is replicated across multiple facilities and multiple devices. And given the 11 X 9s durability, your data can survive a failure of two entire AWS facilities.
  2. S3 – Infrequently Accessed(IA): Users can use S3-IA when they have the same availability and durability requirements, but they don’t want to pay as much as S3 – Standard. S3 – IA gives you a cheaper alternative with just one caveat that you will be charged a retrieval fee. Hence the name, S3 – Infrequently Accessed.
  3. S3 Reduced Redundancy Storage(RRS): S3 RRS is the third alternative when you want to store data in S3. S3 RRS offers you 99.99% Availability and 99.99% Durability. This means that S3 RRS cannot survive two AWS facility failures. AWS advises you to only store easily reproducible data in S3 RRS, for eg. Thumbnails.

Two objects in the same bucket can belong to different S3 Storage classes. S3 also offers the following two consistency models:

  1. Read After Write Consistency for PUTS of new objects: This means that when you upload a new file into S3, you can immediately read that file without any wait time.
  2. Eventual Consistency for overwrite PUTS and DELETES: This means that when you update or delete a file in S3, the changes may take time to replicate and you will have to wait for a small amount of time before you can read the updated content.

S3 also has additional features like Lifecycle Management, Versioning, Static Website Hosting, Cross Region Replication, and Encryption.

Lifecycle Management: Lifecycle Management can be used to make sure that you don’t store all your objects in S3 – Standard and pay more money for objects that might not be accessed on a regular basis. You can define Lifecycle Rules such that data that hasn’t been updated in the last 30 days is moved into S3 – IA and 30 days after that, it can be moved to Amazon Glacier for archival. You can move data from S3 standard to S3 – IA after 30 days and from S3 – IA to Glacier after another 30 days. Or if you want to directly archive your data, then you can move your data directly from S3 – Standard to Glacier after just 1 day. Keep in mind that there are additional expenses involved with Data Retrieval, and also the additional time(3 – 5 hours) required in case of Glacier. So, you don’t want to move your regularly accessed data into these services. You can also have lifecycle rules to delete files permanently from AWS S3.

Versioning: Versioning is another cool feature that S3 has. Using Versioning you can store all the versions of a single file in S3 and switch back and forth between them. One thing to note here is that you pay for all the versions. So, if your original file was 10MB, and you made some changes and uploaded a new version of 15MB in size, then you pay for 25MB of storage. You enable Versioning at the bucket level, and once enabled you cannot disable it, you can only suspend it. If you want to stop versioning, then you will have to create a new bucket with versioning not turned on, copy the objects over to the new bucket and then delete the older bucket. When you try and delete a file with versioning enabled, all you do is place a delete marker on that file, you don’t actually delete the file from your bucket. And you can restore your file, by just deleting the delete marker. Thus Versioning can also be considered a great backup tool.

Static Website Hosting: Another cool feature that S3 offers is the capability to host static websites from your S3 buckets. You cannot have any dynamic content or server-side scripts running, but if all you want is to run a static website with HTML pages, then knock yourself out. AWS will take care of scaling in case your website becomes popular and receives a lot of traffic. Once you enable Static Website hosting, you will be able to access your website using an endpoint that looks something like this: https://bucketname.s3-website.Regionname.amazonaws.com. And if you don’t want to use this, then you can easily use AWS Route53 DNS Service and map this to your custom domain name.

Cross Region Replication: Cross Region Replication is another important feature that S3 provides. If you want to backup all files that are uploaded in your bucket to another bucket on the other side of the world, you can do so by using Cross Region Replication. This can be a good DR strategy, in case of your source region failure, all your data will still be safe in another bucket in another AWS Region. Cross Region Replication requires the regions to be unique, and it also requires Versioning to be enabled on both the source and the destination buckets. Once the relation is established, all the new files that are uploaded in the source bucket will be replicated to the destination bucket. Keep in mind that files that were already existing in the source bucket will have to be manually copied over to the destination bucket. When you delete a file from the source bucket, it gets deleted from the destination bucket. But, since versioning is enabled on both the buckets, only a delete marker will be placed on both the copies of data. Deleting individual versions or delete markers won’t be replicated across buckets.

Encryption: Data Security is a key concern in today’s world. You can secure all the objects in your buckets using bucket policies and Access Control Lists(ACLs). Bucket Policies operate at the bucket level, whereas ACLs operate at the object level. In addition to restricting access to your objects, S3 also offers the capability to encrypt data both At-Rest and In-Transit. You can use SSL/TLS when you upload data in S3, thus encrypting data in transit. For encrypting data At Rest, AWS offers the following options:

  • Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3): Each object is encrypted with a unique key using strong multi-factor encryption. In addition to that, each key is itself encrypted with a master key that is regularly rotated.
  • Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS): Similar to SSE-S3, but it also provides you with an audit trail of when your key was used and by whom. Additionally, you have the option to create and manage encryption keys yourself or use a default key that is unique to you, the service you’re using, and the region you’re working in.
  • Server-Side Encryption with Customer-Provided Keys (SSE-C): You provide and manage the keys and S3 encrypts the data using the key that you provided.
  • Client-Side Encryption: You encrypt the data before you upload it to S3. You are responsible for the key management, encryption, and decryption.

And last but certainly not the least, let’s talk about S3 pricing. You are charged for the following things:

  1. Amount of storage that you use.
  2. The number of requests.
  3. Data Transfer: Amount of data going in or out from your S3 buckets.
  4. Storage Management: S3 Inventory, S3 Analytics, and S3 Object Tagging
  5. Transfer Acceleration: Using AWS CloudFront Edge Locations to speed up the upload and download of data into and out of your S3 buckets.

Even though this was quite a big post, I haven’t been able to talk about all of the S3 features. Please reach out to me via comments or Twitter, if you would like me to talk about any S3 topics that I might have missed. And watch this space for additional AWS Storage Services blog posts.

Here are some interesting links that you can refer to if you want to learn more:

  1. https://aws.amazon.com/s3/ 
  2. https://aws.amazon.com/s3/pricing/
  3. http://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html
  4. https://aws.amazon.com/documentation/s3/
  5. https://www.youtube.com/watch?time_continue=3&v=rKpKHulqYOQ
Advertisements

2 thoughts on “AWS Storage Series: S3

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s