Secure data storage with Amazon S3.
This subject has been discussed many times, and yet still raises some doubts. Many times you could hear about the different situation of “data leakage from the cloud”. Of course, these “volatile” titles were intended to attract readers.
Finally, it was usually caused by the fact that, as a result of lack of adequate knowledge or competence, someone made the data public uncontrollably and not necessarily conscious.
For clarification, it is worth mentioning that by default every S3 bucket and every data stored in it is private.
As you can see, however, this did not prevent a few companies from compromising about making sensitive data public.
What can be done then?
Data access control on S3
Before proceeding to how to secure your data from publication, let's look at what access controls Amazon S3 offers.
In this section of the article we will learn how these mechanisms work and how to configure them.
What is Amazon S3
A word of introduction, Amazon S3 is an object storage solution that allows users to store data of any type in any amount. The data is stored in the so-called buckets.
The service currently offers a mass of functions such as versioning, website static hosting and others. You can learn more about this from my ebook.
In short, if you want to store data in S3, create a bucket and start loading data into it.
Who has access to the data on S3?
As I mentioned earlier, by default both bucket and data in it are private. By private I mean, here that only users of the AWS account on which the buckettes and data were created/placed have access to them. Of course, this is true if the user's permissions do not restrict his access to S3.
So let's look at the two main permissions control mechanisms available on the Amazon S3 side.
ACCESS CONTROL LIST
Access Control List, abbreviated ACL, is the first possibility. ACL can be defined on the horizontal of the bucket itself.
As you can see in the image above, ACL allows, for example, access for another AWS account (Access for another AWS Account), or defines permissions for Everyone, that is, for the whole world (Public Access).
This is one of the places where you can “accidentally” allow access to data. In this case, more to list the entire contents of the bucket (List objects).
This will happen when we allow the “List objects” operation for Everyone.
The danger is at this point that by having a list of what is in the bucket you can try to refer to each of the elements in turn. If one of them has access to public access, then we have a potential data leak.
ACL can also be defined at the level of objects (files) themselves, which we will place in the S3 bucket.
As you can see in the picture above, ACL parameters for the object are very similar to those as for the bucket. Here it is worth noting that this is what ACL looks like, for the uploaded file by default.
By default, the file is private, and an attempt to call its URL will end with a message of lack of access.
Let's take a closer look at the Everyone parameter again. In the case of an object, what is a threat when it comes to publicity, including Read object for Everyone.
As soon as you try to select “Read object”, you will be warned that the object will be publicly available.
Now after saving the settings and calling the URL again (into this object, we will get its contents.
The biggest threat associated with S3 ACL's
I don't know if you've noticed, but in ACL at the bucket level, I didn't change any configuration when it comes to Everyone.
In turn, I just changed these settings when it comes to the object (index.html file), which I gave public permissions.
What does that mean?
That each object can have its own Access Contro List settings. Also, you can't see this from the console in a simple way, which of the objects that has settings in ACL.
Look at the list below.
This is a view from the AWS console on the contents of my S3 bucket. Unfortunately, it is not visible which of these files are or are not available to the public. In my case, only index.html has set “Read object” for Everyone, but from this view it does not follow.
That's why it's something that's not so easy to catch from the console view and you have to watch out for it.
How to deal with this, we will discuss in the second part of the article.
The second mechanism for controlling access to data on S3 is the so-called Bucket Policy.
Unlike ACL, Bucket Policy is defined at the S3 bucket level and all elements that are placed in it inherit these permissions. Therefore, this is a much clearer way of controlling data permissions.
Bucket Policy applies in different situations. The first one may be the need for all the data we put in S3 to be public (consciously 😎).
For example, we need to serve some static content for users, S3 works great for this (website static hosting).
Therefore, in such a situation, in order to ensure that all items that are placed in the bucket were public, you need to create an appropriate bucket policy.
Here is an example of such a policy:
“Resource”: “arn:aws:s3: ::my-public-bucket/*”
It is worth noting that here after applying such a policy, information will immediately appear, that the bucket and its contents are publicly available.
So, in this case, it is much easier to see that certain data is publicly available.
Of course, Bucket Policy is not solely for making data publicly available. With their help, you can granually determine who and under what rules has access to the data in the bucket.
You can give access to the bucket to other services or AWS accounts. One such case may be, for example, the permission to write logs from network traffic, the so-called VPC Flow Logs.
Below is an example of such a policy
“Resource”: “arn:aws:s3: ::my-flow-logs-bucket/flow-logs/AWSLogs/111111111111/*”,
“Resource”: “arn:aws:s3: ::my-flow-logs-bucket”
Politics are now much more extensive.
The precisely defined COMU (Principal) is given privileges - in this case the service responsible for providing the logs.
To what resources (Resource), that is, a specific S3 bucket. And what operations (Action) are allowed under this resource - here s3:PutObject and s3:GetBucketAcl.
SUMMARY PART 1
Amazon S3 offers two basic data permissions:
Access Control List - defined independently for buts and individual objects. Harder manageable and less transparent.
Bucket Policy - defined at the bucket level, where objects inherit these policies. More transparent and more manageable.
Identity and Access Management is also used to manage permissions for the resource service. This time, however, we focus on S3 and options within this service.
In the second part of the article we will look at how to limit the possibility of publicity (Public Access).