26
AWS Data Lake with Terraform - Parts 4-5 of 6
Up to this point on this series we have discuss about data injection, data collection and data analysis. I bet you have been wondering about how can we protect this infrastructure and data?
On these two parts 4-5 post we will learn how we can secure this project using some powerful AWS services such as:
- IAM
- KMS
- CloudWatch
Let’s first start by understanding the benefits of IAM security and how it can help to secure our project.
What is IAM and what are some of IAM benefits?
IAM stands for Identity and Access Management. IAM is an important AWS service that enables you control the access and use of your AWS resources and services in one shop.
IAM is many other identities as well and some of them are:
- Account root user
- User
- Groups
- Roles
On this section we will focus only on roles and policy(s). Let’s start by answering a few questions.
What is an IAM role and what are some of the benefits?
IAM role is an identity that has permissions to make AWS service(s) requests. As simple as that.
Some of the benefits are:
- Roles are not associate with application or services instead they are assumed by resources.
- Roles are highly recommended over IAM user by AWS
- Roles can be assumed to access services regularly but not permanently
- IAM user can have access to different AWS accounts as a role
- Roles can perform actions on your behalf. An example a Kinesis firehose direct putting logs into an s3 bucket.
resource "aws_s3_bucket" "data_logs" {
bucket = var.bucket_name
}
resource "aws_iam_role" "f_role" {
name = "f_role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "firehose.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
What about Role limitation?
- You can create only 500 roles by AWS account
What is a policy?
- Is a JavaScript Object Notation (JSON) document that list permission presented as statements
- Policy can contain one or more statements
- Policy or policies can be attached to users, roles, or groups
- IAM entities can have more than one policy
- Policy is reserved to its own Amazon Resource Name (ARN)
Policy structure components:
Actions: define what action is allows over an AWS service.
Resources: define what resources actions can be performed.
Effect: define if the user or role is allowed or deny completing any actions on the resources. Deny is set by default, you would need to explicitly allow it.
resource "aws_iam_role_policy" "f_delivery_policy" {
role = aws_iam_role.f_role.id
policy = <<EOT
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:ListAllMyBuckets",
"s3:PutObject"
],
"Effect": "Allow",
"Resource": "${aws_s3_bucket.arn}"
},
{
"Action": [
"s3:*"
],
"Effect": "Allow",
"Resource": "${aws_s3_bucket.data_logs.arn}"
}
]
}
EOT
}
As we can see roles and policy are resourceful features of IAM and it can be used in different scenarios.
The challenge here is how to protect data in transit. Do not worry AWS have you back.
Let me introduce you with AWS Key Management Service (KMS)
What is KMS and what are some of KMS benefits?
KMS is a totally managed service that supports encryption of your data at rest or in transit.
How does KMS works?
KMS allows you to create keys to encrypt your data, provides you with a fully managed and highly available storage. You can encrypt your data within your applications and across accounts. One of the important elements of KMS is that it is low cost per use key and can be stored in your account at zero charge.
What are some of KMS benefits?
- Secure and complaint
- Centralized key management
- Totally managed
- Incorporate with AWS
Finally, how to control the flow of your infrastructure using CloudWatch
As previously let’s respond to a few questions to begin.
What is CloudWatch and what are some of CloudWatch benefits?
AWS CloudWatch is a global monitoring service that allow you to collect metrics of your AWS resources and applications.
AWS CloudWatch features does not end at the monitoring level you can also create alarm for constantly monitor performance, health checks, and billing. This allows you to act proactively in case of reaching budgets or going over thresholds set by your department or administrative team.
CloudWatch monitoring model:
- 5 minutes interval - free
- 1 minute interval at additional charge
- Metrics are stored up to 15 months after deletion of your resources
Benefits:
- Monitor AWS resources
- Store logs
- Create alarms
- Powerful dashboards
- Automate resources change based on events
resource "aws_cloudwatch_dashboard" "thresholds_control" {
dashboard_name = "admin-dashboard"
dashboard_body = <<EOF
{
"widgets": [
{
"type": "metric",
"x": 0,
"y": 0,
"width": 12,
"height": 6,
"properties": {
"metrics": [
[
"AWS/EC2",
"CPUUtilization",
"InstanceId",
"i-012345"
]
],
"period": 300,
"stat": "Average",
"region": "us-east-1",
"title": "EC2 Instance CPU"
}
},
{
"type": "text",
"x": 0,
"y": 7,
"width": 3,
"height": 3,
"properties": {
"markdown": "We are monitoring"
}
}
]
}
EOF
}
In the other hand Terraform offers you a state file where you can read the configuration of your resources. Nonetheless if you are a person who prefers visualizations Terraform manages a dependency graph for you in the back end.
If you were not familiar with this feature let me share it with you.
what is a dependency graph?
dependency graph is a directed graph representing dependencies of several objects towards each other. It is possible to derive an evaluation order or the absence of an evaluation order that respects the given dependencies from the dependency graph.
wiki-Dependency_graph
Terraform dependency-graph-sample
26