More and more companies are hosting their IT services in the cloud. One of the main concerns when considering migrating workloads to the cloud is security.
While it is true that AWS is responsible for protecting the infrastructure that runs the cloud-provided services, networks and facilities, the user is responsible for protecting the data, networks, operating systems, applications, identities, etc.
Currently, AWS has several services responsible for managing the security of workloads, including:
- AWS Identity and access Management (IAM)
- Amazon GuardDuty
- Amazon inspector
- AWS Config
- Amazon CloudWatch
- AWS CloudTrail
- AWS Shield
- AWS Macie
But how do you manage the findings produced by all these services?
AWS Security Hub is the service responsible for managing the security posture in the cloud, performing checks of security best practices, adding alerts and allowing the automatic correction of the findings produced by these services.
When you access the Security Hub console, you immediately notice that the number of findings is very large, and can reach thousands in a single day.
It is at this point where we identify the need to filter these findings, prioritize them and notify the teams in charge of resolving the vulnerabilities.
We have observed that these findings, until they are resolved, are notified in duplicate by Security Hub, even registering several times in the same day, generating a problem in the identification, categorization and alerting of the incidents, which can saturate the team in charge of the resolution and thus causing an inefficient management of security risks.
This is why we have designed a solution capable of alerting the findings that interest us, that is, filtering by criticality and type of service of origin (GuardDuty, Inspector and the CIS and AWS Benchmarks), also avoiding that the events generated are duplicated, during a customizable period of time.
For example, I can choose to notify only critical events that have been unresolved for more than 5 days, or critical and high events that have not been resolved in 15 days.
¿How does it work?
- An Event Rule monitors the Security Hub Findings. Findings are filtered by the originating service. Currently this solution supports findings originating from Security Hub (CIS and Foundational benchmarks), GuardDuty and Inspector.
- When the Event Rule detects an event, it triggers an execution of the Step Function State Machine workflow.
- If the finding is new or has been active for more than 15 days, it sends an email to the team in charge of reviewing security issues, informing them of the details of the finding. The original event is in JSON format so it is previously formatted in HTML, to make it easier to identify the important parts of the finding at a glance, when the email is received.
- Additionally, a lambda is run daily, checking whether each database record is active or not in the Security Hub. If it is not active, the record is deleted so that, in case it happens again later, it is processed correctly.
What are the components of this solution?
- EventBridge Event Rule –> Two Events Rules. One to monitor the findings produced in Security Hub and the other to check daily which findings are resolved and remove the alert.
- Step Function –> Serverless workflow for processing the findings recorded in Security Hub.
- Lambda Function –> Four Lambda functions in charge of managing the actions required during the workflow hosted by Step Functions.
- DynamoDB Table –> Table that stores the records with the information of all the active findings.
- Cloudwatch Log Group –> Log Groups containing the logs of Lambda executions.
- IAM Role –> Six IAM Roles responsible for provisioning Lambda, DynamoDB and Step Functions with the necessary permissions for processing, analyzing and reporting findings.
- SES Identity –> Service in charge of sending emails with the findings.
Step Functions workflow
- The first Lambda function checks if the record is in the Dynamo database. If it is not there, it means it is a new finding so it adds it to the table, sends the event to the next Lambda which will format the event in HTML and mail it to the support team, via SES.
- If the item exists in the database, it means that the finding has been duplicated and is still active so another Lambda function is run to check if the finding has been active for more than 15 days. If so, it runs the Lambda that parses the event into HTML to notify the support team. If the finding has been active for less than 15 days, no action is taken.
Below is a graphic showing the workflow used to analyze security events:
1. Clone the repository
2. Initialize the working directory containing the terraform files:
$ terraform init
3. Create an execution plan, which allows you to preview the resources to be deployed:
$ terraform plan
4. Execute the actions proposed in the Terraform plan:
$ terraform apply
Security is one of the main pillars on which any IT service is based. With the help of this solution, we facilitate the early detection and resolution of incidents that are recorded, streamlining and optimizing the process of notifying the support team, thus allowing the teams that manage these incidents to focus on the most important thing, solving the problems.
Image: Unsplash | @fakurian
Cloud Architect at Keepler. "I am a Cloud Architect specialized in DevOps and Security. I love designing solutions, fixing problems, learning every day and facing challenges that make me go out of my comfort zone. In my free time I'm a very family oriented person, a lover of rock and almost any kind of sport."