AWS S3 총정리

S3 (Simple Storage Service)

인터넷 스토리지 서비스. 용량에 관계 없이 파일을 저장할 수 있고 웹(HTTP 프로토콜)에서 파일에 접근할 수 있다.

1. 사용 이유

  • S3는 저장 용량이 무한대이고 파일 저장에 최적화되어 있다. 용량을 추가하거나 성능을 높이는 작업이 필요없다.
  • 비용은 EC2와 EBS로 구축하는 것보다 훨씬 저렴
  • S3 자체가 수천 대 이상의 매우 성능이 좋은 웹 서버로 구성되어 있어서 EC2와 EBS로 구축했을 때 처럼 Auto Scaling이나 Load Balancing에 신경쓰지 않아도 된다.
  • 동적 웹페이지와 정적 웹페이지가 섞여있을 때 동적 웹페이지만 EC2에서 서비스하고 정적 웹페이지는 S3를 이용하면 성능도 높이고 비용도 절감.
  • 웹하드 서비스와 비슷하지만, 별도의 클라이언트 설치나 ActiveX를 통하지 않고 HTTP 프로토콜로 파일 업로드/다운로드 처리
  • S3 자체로 정적 웹서비스 가능

2. 버킷(Bucket)

  • 생성하면 default로 private.
  • 한 계정 당 최대 100개의 버킷 사용 가능.
  • 버킷 소유권은 이전할 수 없다.
  • 버킷의 이름은 region에 상관없이 globally unique 해야 한다.
  • 버킷 주소는 https://s3-리전이름.amazonaws.com/버킷이름
  • S3 데이터 모델은 flat structure라서 버킷에 hierarchie나 folder는 없다.
  • 하지만 keyname prefix (Folder1/Object1)를 사용해서 논리적인 hierarchies를 암시할 수 있다.
  • 버킷 안에 다른 버킷을 둘 수 없다.
  • Access Control
    • Bucket Policies
    • Access Control Lists
  • Path-Style URL에서 버킷 이름은 Region specific endpoint를 사용하지 않는 이상 도메인명에 포함되지 않는다.
  • Virtual Hosted Style URL에서 버킷이름은 URL의 도메인명의 일부가 된다.
  • Virtual hosting은 HTTP Host Header를 사용해서 REST API 콜의 버킷을 address하는 데 사용될 수 있다.

3. 객체(Object)

  • Object level storage(not a Block level storage)
  • 객체 하나의 크기는 1Byte ~ 5TB
  • 저장 가능한 객체 갯수 무제한
  • 객체마다 각각의 접근 권한 설정 가능
  • default로 private 이다.
  • 객체 metadata는 객체가 업로드 된 후에는 수정될 수 없고, 복사해서 수정해야 한다.
  • 객체는 Range HTTP header를 이용해서 부분적으로 검색할 수 있다.
  • 객체는 Pre-signed url를 사용해서 다운로드 할 수 있다.
  • 객체의 metadata는 response header에 반환된다.
  • Updating any metadata for an object requires all the metadata fields to be specified again

4. 암호화

  1. In Transit (S3로 데이터 업로드할 때)
    • SSL/TLS
  2. At Rest
    • 서버 사이드 암호화
      • None과 AES-256 중 선택 가능
      • S3 Managed Keys : SSE-S3
      • AWS Key Management Service, Managed Keys : SSE-KMS
      • Customer Provided Keys : SSE-C
    • 클라이언트 사이드 암호화
  3. 복호화는 데이터를 가져올 때 이루어진다.

5. S3 Tiers/Classes

  • 파일을 올리고 나서도 설정할 수 있다.
  • S3 Standard
    • 99.99% availability (아마존 게런티 99.9%)
    • 99.999999999% durablity.
    • 다수의 장치와 다수의 시설에 저장
    • 2개의 시설을 동시에 잃어도 지속되게끔 설계
  • S3 IA (Infrequently Accessed)
    • 자주 접근되지 않지만, 필요할 때 빠르게 접근할 필요가 있는 데이터에 적합
    • S3보다 저렴하다. 하지만 retrieval fee가 과금된다.
  • S3 One Zone IA
    • 이전의 RRS를 대체하는 새로운 클래스
    • RRS(Reduced Redundancy Storage)는 데이터 사본의 수를 줄여 비용을 낮춤. 원본에서 다시 생성할 수 있는 데이터 저장에 적함. (내구성 99.99%)
    • 자주 접근되지 않는 데이터를 위한 저비용 옵션
    • S3 IA와 같지만 다수의 AZ이 아니라 하나의 AZ에 저장
  • Glacier
    • 매우 저렴하지만 Archival only.
    • 종류는 Expedidited / Standard / Bulk
    • Standard의 retrieval time은 3~5시간

6. Data Consistency Model

  • Read after Write consistency for PUTS of new Objects
    • 객체를 새로 추가하면 바로 읽을 수 있다.
    • S3는 PUT 요청에 대하여 다수의 시설에 데이터를 저장한 후에 SUCCESS를 반환한다.
    • A process writes a new object to S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list.
  • Eventual Consistency for overwrite PUTS and DELETES (can take some time to propagate)
    • PUTS를 덮어쓰거나(업데이트) 객체를 삭제하면 propagate하는데에 시간이 걸린다.

7. Storage Gateway (AWS 서비스 중 하나)

  • On-premise IT 환경과 AWS의 Storage 인프라를 연결시켜주는 서비스
  • VM image로 다운로드 하여 데이터센터의 host에 설치할 수 있다.
  • Storage Gateway는 VMware ESXi, Microsoft Hyper-V도 지원한다.
  • 종류
    • File Gateway(NFS) - for flat files, stored directly on S3
    • Volumes Gateway(iSCSI)
    • Stored Volumes - Entire Dataset is stored on site and is asynchronously backed up to S3
    • Cached Volumes - Entire Dataset is stored on S3 and the most frequently accessed data is cached on site
    • Tape Gateway(VTL) - Used for backup and uses popular backup applications like NetBackup, Backup Exec, Veeam etc
  • File GateWay(NFS) 다이어그램
  • Volumes Gateway(iSCSI)
    • Stored Volumes 다이어그램
    • Cached Volumes 다이어그램
  • Tape Gateway(VTL) 다이어그램

8. 기타

  • 운영체제 설치할 수 없다.
  • 사용자 설정 metadata는 반드시 “x-amz-meta”라는 prefix로 시작해야 사용자가 정한 key value pair가 설정된다.
  • S3 does not process user-defined metadata
  • S3 자체적으로 Version Control 기능을 내장하고 있다. 파일을 이전 내용으로 되돌리거나 삭제한 파일을 복원할 수 있다.
    • Versioning이 enabled 되면 disabled 될 수 없다. 오직 suspended 되는 것이다.
  • 다른 리전으로 복사하려면 소스버킷의 versioning을 활성화 해야한다.
  • 버킷에 저장된 객체의 LifeCycle을 관리할 수 있다.
  • Multi Part 업로드
    • 1 ~ 10000 parts를 지원하고, 각 파트는 5MB~5GB, 마지막 파트는 5MB 이하로도 가능하다.
    • 최대 업로드 사이즈는 5TB
  • S3 Transfer Acceleration
    • S3에 바로 업로드하지 않고, 생성되는 URL을 사용해서 CloudFront의 Edge location에 바로 올리고 S3로 옮기는 것
  • S3로 정적인 웹사이트 만들기
    • Endpoint 주소 : http://버킷이름.s3-website-리젼이름.amazonaws.com
    • S3 website endpoints는 https를 지원하지 않는다.
  • Snowball
    • Snowball
    • Snowball Edge
    • Snowmobile
  • Notification은 버킷 레벨에서 사용된다.

9. 비용

  • Charged for
    • Storage - GB/month
    • Requests - per Request. Request Type(GET, PUT)에 따라 다르다.
    • Storage Management Pricing
    • Data Transfer Pricing
    • Transfer in - free
    • Transfer out - GB/month (같은 region이나 CloudFront로 이전 제외)
    • Transfer Acceleration
  • S3의 비용은 Region에 따라 다르다.

10. 더 자세한 학습 (완료할 때 마다 줄 긋기)

*어려운 문제들

  1. What are characteristics of Amazon S3? Choose 2 answers
    a. Objects are directly accessible via a URL
    b. S3 should be used to host a relational database
    c. S3 allows you to store objects or virtually unlimited size
    d. S3 allows you to store virtually unlimited amounts of data e. S3 offers Provisioned IOPS

  2. You are building an automated transcription service in which Amazon EC2 worker instances process an uploaded audio file and generate a text file. You must store both of these files in the same durable storage until the text file is retrieved. You do not know what the storage capacity requirements are. Which storage option is both cost-efficient and scalable?
    a. Multiple Amazon EBS volume with snapshots
    b. A single Amazon Glacier vault
    c. A single Amazon S3 bucket
    d. Multiple instance stores

  3. A media company produces new video files on-premises every day with a total size of around 100GB after compression. All files have a size of 1-2 GB and need to be uploaded to Amazon S3 every night in a fixed time window between 3am and 5am. Current upload takes almost 3 hours, although less than half of the available bandwidth is used. What step(s) would ensure that the file uploads are able to complete in the allotted time window?
    a. Increase your network bandwidth to provide faster throughput to S3
    b. Upload the files in parallel to S3 using mulipart upload
    c. Pack all files into a single archive, upload it to S3, then extract the files in AWS
    d. Use AWS Import/Export to transfer the video files

  4. A company is deploying a two-tier, highly available web application to AWS. Which service provides durable storage for static content while utilizing lower Overall CPU resources for the web tier?
    a. Amazon EBS volume
    b. Amazon S3
    c. Amazon EC2 instance store
    d. Amazon RDS instance

  5. When you put objects in Amazon S3, what is the indication that an object was successfully stored?
    a. Each S3 account has a special bucket named_s3_logs. Success codes are written to this bucket with a timestamp and checksum.
    b. A success code is inserted into the S3 object metadata.
    c. A HTTP 200 result code and MD5 checksum, taken together, indicate that the operation was successful.
    d. Amazon S3 is engineered for 99.999999999% durability. Therefore there is no need to confirm that data was inserted.

  6. You have private video content in S3 that you want to serve to subscribed users on the Internet. User IDs, credentials, and subscriptions are stored in an Amazon RDS database. Which configuration will allow you to securely serve private content to your users?
    a. Generate pre-signed URLs for each user as they request access to protected S3 content
    b. Create an IAM user for each subscribed user and assign the GetObject permission to each IAM user
    c. Create an S3 bucket policy that limits access to your private content to only your subscribed users’ credentials
    d. Create a CloudFront Origin Identity user for your subscribed users and assign the GetObject permission to this user

  7. You run an ad-supported photo sharing website using S3 to serve photos to visitors of your site. At some point you find out that other sites have been linking to the photos on your site, causing loss to your business. What is an effective method to mitigate this?
    a. Remove public read access and use signed URLs with expiry dates.
    b. Use CloudFront distributions for static content.
    c. Block the IPs of the offending websites in Security Groups.
    d. Store photos on an EBS volume of the web server.

  8. You are designing a web application that stores static assets in an Amazon Simple Storage Service (S3) bucket. You expect this bucket to immediately receive over 150 PUT requests per second. What should you do to ensure optimal performance?
    a. Use multi-part upload.
    b. Add a random prefix to the key names.
    c. Amazon S3 will automatically manage performance at this scale.
    d. Use a predictable naming scheme, such as sequential numbers or date time sequences, in the key names

  9. What is the maximum number of S3 buckets available per AWS Account?
    a. 100 Per region
    b. There is no Limit
    c. 100 Per Account (Refer documentation)
    d. 500 Per Account
    e. 100 Per IAM User

  10. Your customer needs to create an application to allow contractors to upload videos to Amazon Simple Storage Service (S3) so they can be transcoded into a different format. She creates AWS Identity and Access Management (IAM) users for her application developers, and in just one week, they have the application hosted on a fleet of Amazon Elastic Compute Cloud (EC2) instances. The attached IAM role is assigned to the instances. As expected, a contractor who authenticates to the application is given a pre-signed URL that points to the location for video upload. However, contractors are reporting that they cannot upload their videos. Which of the following are valid reasons for this behavior? Choose 2 answers { “Version”: “2012-10-17”, “Statement”: [ { “Effect”: “Allow”, “Action”: “s3:*”, “Resource”: “*” } ] }
    a. The IAM role does not explicitly grant permission to upload the object. (오답체크: The role has all permissions for all activities on S3)
    b. The contractorsˈ accounts have not been granted “write” access to the S3 bucket. (오답체크: using pre-signed urls the contractors account don’t need to have access but only the creator of the pre-signed urls)
    c. The application is not using valid security credentials to generate the pre-signed URL.
    d. The developers do not have access to upload objects to the S3 bucket. (오답체크: developers are not uploading the objects but its using pre-signed urls)
    e. The S3 bucket still has the associated default permissions. (오답체크: does not matter as long as the user has permission to upload)
    f. The pre-signed URL has expired.

  11. S3 has what consistency model for PUTS of new objects?
    a. Read After Write Consistency
    b. Write After Read Consistency
    c. Eventual Consistency
    d. Usual Consistency

  12. What is AWS Storage Gateway?
    a. It's an on-premise virtual appliance that can be used to cache S3 locaaly at a customers site
    b. It allows large scale import/exports in to the AWS cloud without the use of an internet connection
    c. It allows a direct MPLS connection in to AWS
    d. None of the above.

  13. S3 has eventual consistency for which HTTP Methods?
    a. PUTS of new Objects and DELETES
    b. overwrite PUTS and DELETES
    c. PUTS of new objects and UPDATES
    d. UPDATES and DELETES

  14. You need to use an Object based storage solution to store your critical, non replaceable data in a cost effective way. This data will be frequently updated and will need some form of version control enabled on it. Which S3 storage solution should you use?
    a. S3
    b. S3-IA
    c. S3-RRS
    d. Glacier

  15. You work for a health insurance company who collects large amounts of documents regarding patients health records. This data will be used usually only once when assessing a customer and will then need to be securely stored for a period of 7 years. In some rare cases you may need to retrieve this data within 24 hours of a claim being lodged. Which storage solution would best suit this scenario? You need to keep your costs as low as possible.
    a. S3
    b. S3-IA
    c. S3-RRS
    d. Glacier

  16. You run a meme creation website that frequently generates meme images. The original images are stored in S3 and the meta data about the memes are stored in DynamoDB. You need to store the memes themselves in a low cost storage solution. If an object is lost, you have created a Lambda function that will automatically recreate this meme using the original file in S3 and the metadata in Dynamodb. Which storage solution should you consider to store this non-critical, easily reproducible data on in the most cost effective solution as possible?
    a. S3
    b. S3-IA
    c. S3-RRS
    d. Glacier

  17. You run a popular photo sharing website that is based off S3. You generate revenue from your website via paid for adverts, however you have discovered that other websites are linking directly to the images on your site, and not to the HTML pages that serve the content. This means that people are not seeing your adverts and every time a request is made to S3 to serve an image it is costing your business money. How could you resolve this issue?
    a. Use CloudFront to serve the static content
    b. Remove the ability for images to be served publicly to the site and then used signed URL's with expiry dates
    c. Use security groups to blacklist the IP addresses of the sites that do this
    d. Use EBS rather than S3 to store the content

  18. A user has an S3 object in the US Standard region with the content “color=red”. The user updates the object with the content as “color=”white”. If the user tries to read the value 1 minute after it was uploaded, what will S3 return?
    a. It will return “color=white”
    b. It will return “color=red”
    c. It will return and error saying that the object was not found
    d. It may return either "color=red" or "color=white" i.e. any of the value

  19. A user is enabling a static website hosting on an S3 bucket. Which of the below mentioned parameters cannot be configured by the user?
    a. Error document
    b. Conditional error on object name
    c. Index document
    d. Conditional redirection on object name

  20. Company ABCD is running their corporate website on Amazon S3 accessed from http//www.companyabcd.com. Their marketing team has published new web fonts to a separate S3 bucket accessed by the S3 endpoint: https://s3-us-west1.amazonaws.com/abcdfonts. While testing the new web fonts, Company ABCD recognized the web fonts are being blocked by the browser. What should Company ABCD do to prevent the web fonts from being blocked by the browser?
    a. Enable versioning on the abcdfonts bucket for each web font
    b. Create a policy on the abcdfonts bucket to enable access to everyone c. Add the Content-MD5 header to the request for webfonts in the abcdfonts bucket from the website
    d. Configure the abcdfonts bucket to allow cross-origin requests by creating a CORS configuration

*Reference