How Aqua Security exports query data from Amazon Aurora to deliver value to their customers at scale
This a guest post co-written with Asaf Brezler from Aqua Security.
Aqua Security is the pioneer in securing containerized cloud native applications from development to production. Aqua’s full lifecycle solution prevents attacks by enforcing pre-deployment hygiene and mitigates attacks in real time in production, reducing mean time to repair and overall business risk. The Aqua Platform, a Cloud Native Application Protection Platform (CNAPP), integrates security from code to cloud, combining the power of agent and agentless technology into a single solution. With enterprise scale that doesn’t slow development pipelines, Aqua secures the future in the cloud. Founded in 2015, Aqua protects over 500 of the world’s largest enterprises.
As a customer-centric organization, Aqua is committed to delivering innovative solutions that address the evolving security needs of its clients. Like many organizations, Aqua faced the challenge of efficiently exporting and analyzing large volumes of data to meet their business requirements. Specifically, Aqua needed to export and query data at scale to share with their customers for continuous monitoring and security analysis. In this post, we explore how Aqua addressed this challenge by using aws_s3.query_export_to_s3 function with their Amazon Aurora PostgreSQL-Compatible Edition and AWS Step Functions to streamline their query output export process, enabling scalable and cost-effective data analysis.
Aqua business requirements
Aqua’s business revolves around continuous analyzing, monitoring, and threat detection for its customers. When it comes to Aqua’s enterprise customers’ data, a few things are true across the board:
- When deployed at the enterprise level, the amount of data regarding their environments, security findings, and platform can become massive
- To ensure data security and control, Aqua manages and retains sensitive data, such as security findings and vulnerabilities, within its internal centralized system. This approach eliminates the need to depend solely on external platforms for accessing and handling highly sensitive customer data
- The data that Aqua gathers from its customers’ environments and the data it produces in terms of risk and vulnerability assessments is highly sensitive and should be handled with caution
The following are several screenshots of the Aqua console presenting a customer’s vulnerabilities and inventory.
The ability to export data in a way that is compatible with the three preceding points has a significant, direct effect on the customer’s ability to use Aqua at scale and embed it within their enterprise processes. If customers can’t successfully route their volumes of Aqua data to their local data storage or SIEM, it means that Aqua findings live in isolation from the rest of their enterprise data, leading for it to be considered a black box.
Therefore, the export mechanisms within Aqua platform should be able to do the following:
- Perform at scale, allowing for export of big data with minimal to no impact on other Aqua flows
- Export the data to a location that is compatible with other key customer flows (for example, reporting over time and vulnerability management)
- Account for the major data types and fields that Aqua users want and need to consume over time, such as vulnerabilities and inventory resources data of container images, virtual machines (VMs), code repositories, and functions
- Contain enough data so users can slice and dice flexibly, along with allowing filtering on the exported data in advance
- Include various export destinations, formats (such as JSON, CSV), and frequencies for the query result-set output data
To fulfill these critical objectives, Aqua needed a robust and scalable solution to export and share large volumes of security query data efficiently with their customers.
Aqua considered different options, such as AWS Database Migration Service (AWS DMS) as well as the Aurora cluster export feature. However, neither of them support exporting query result-set output data in JSON and CSV formats.
Solution overview
The following diagram illustrates the main building blocks of the solution.
The pipeline consists of the following phases:
- The scheduler AWS Lambda function, and in fact the whole Step Functions workflow, is invoked on a schedule by an Amazon CloudWatch trigger, initiating a new exporter job
- The query exporter Lambda function gets all the information to query the required data and format it from the Aurora database using the query_export_to_s3 function
- The query data is extracted from Aurora and uploaded to an Amazon Simple Storage Service (Amazon S3) bucket
- The poller Lambda function polls the S3 bucket to retrieve the job’s status
- After the job is complete and the data is fully uploaded to the S3 bucket, the transferer Lambda function is invoked to transfer the output data from the S3 bucket to the customer’s predefined sink (a designated storage location or data repository specified by the customer, such as another S3 bucket, a data lake, or a cloud storage service, where the exported data will be delivered for further analysis or processing)
Step Functions and aws_s3.query_export_to_s3
Aqua turned to Step Functions and the aws_s3.query_export_to_s3 function to address its query data export challenges, among other AWS services and key components:
- Aurora PostgreSQL-Compatible – Aqua’s security data is stored in an Aurora PostgreSQL database
- query_export_to_s3 function – This powerful function, part of the Lambda Data Access Library, enabled Aqua to export the query output data split into files of 6 GB each, directly to an S3 bucket, a highly scalable and cost-effective data storage service. By using this function, Aqua eliminated the need for intermediate data storage or manual data transfers. The function runs from an Aurora read replica or alternatively from a clone to avoid additional load on their primary writer instance
- Step Functions state machine – Aqua designed a Step Functions state machine to orchestrate the entire query data export process. This state machine defines the sequence of all various steps and scenarios and integrates with other AWS services
- Lambda functions – The state machine uses Lambda functions to run specific tasks, such as running SQL queries, processing data, and invoking other AWS services.
- S3 Buckets – The exported query data was stored in S3 buckets, offering highly scalable and cost-effective object storage for further data analysis and processing
Aqua used Step Functions to orchestrate a serverless solution for exporting security query data from Aurora to Amazon S3. Step Functions enabled sequencing Lambda functions and multiple AWS services into a business-critical application, providing built-in error handling, automatic retries, fault tolerance, and detailed execution logs for efficient troubleshooting.
To achieve an efficient and scalable data export, Aqua implemented parallelization techniques using the Step Functions map state, splitting the query output into smaller chunks and exporting them concurrently to multiple S3 objects. Error handling and monitoring mechanisms within the Step Functions state machine provided prompt detection and resolution of any issues during the export process, maintaining the solution’s integrity and reliability.
While the aws_s3.query_export_to_s3 function facilitated data transfer from Aurora to Amazon S3, it lacked the ability to track the status of the query export job, which is essential for a robust and reliable export system. To address this limitation, Aqua developed a solution using PL/pgSQL to wrap the query export process with additional functionality. This mechanism enabled monitoring the job status, capturing errors, and gathering metadata related to the export.
With this solution Aqua managed to effectively track whether a query export job was successful, encountered an error, or was still in progress. In case of a successful export, the mechanism would trigger the transfer of data from Amazon S3 to the customer’s designated sink. If an error occurred, it would log the error and, if configured, retry the export after a specified time or follow a defined error handling procedure. For long-running or large exports, the mechanism could periodically poll the job status, providing visibility into the progress and allowing for appropriate action if needed. Additionally, the mechanism collected and stored valuable metadata about the export process, such as start and end times, data volume, and relevant parameters or configurations.
Additionally, Aqua addressed the function’s built-in support for NDJSON (Newline-Delimited JSON), a format designed for storing or transmitting a sequence of JSON objects, each on a separate line. As downstream systems often expect the more common JSON format, which represents the entire dataset as a single hierarchical structure, Aqua added a parser step to convert the NDJSON output from the function into standard JSON format.
By integrating Step Functions, Lambda, Aurora, and the aws_s3.query_export_to_s3 function, Aqua achieved a highly scalable, automated, and cost-effective solution for exporting and storing security query data at scale.
Conclusion
The solution implemented by Aqua using Step Functions and the aws_s3.query_export_to_s3 function offers numerous benefits:
- Scalability – The ability to export and store query data at scale, meeting the ever-growing demands of Aqua’s customers
- Cost-effectiveness – By using the cost-effective storage of Amazon S3 and AWS Step Functions, Aqua can efficiently export and store large volumes of data without incurring excessive costs
- Automation – The Step Functions orchestration eliminates the need for manual intervention and cumbersome code in compute units, providing a streamlined and efficient data export process
- Enhanced security operations – By facilitating the scalable and cost-effective export and storage of security-related data, such as risks, vulnerabilities and extensive inventory resources data, Aqua can enable more comprehensive and efficient threat detection, continuous monitoring, and security analysis for its customers
The integration of Step Functions and the aws_s3.query_export_to_s3 function has empowered Aqua to address their query data export challenges effectively. By using these AWS services, organizations can benefit from streamlined data export processes, enabling scalable and cost-effective data analysis for enhanced business operations. This solution serves as a testament to the transformative impact of thoughtful database optimization strategies, enabling organizations to thrive in a data-intensive and cost-conscious environment.
If you would like to learn more about the aws_s3.query_export_to_s3 function please see our user guide documentation.
Please feel free to share your insights and questions about this topic in the comments section below. We would love to hear your thoughts.
About the Authors
Asaf Brezler is an Experienced Engineering Manager at Aqua, leading several SaaS cloud applications aimed for the largest Fortune 100 companies. He has over a decade of experience in embedded security and cloud security, gained through roles in various cybersecurity companies and the Israeli Military Intelligence.
Pini Dibask is a Senior Database Solutions Architect at AWS with 20 years of experience working with database technologies, focusing on relational databases. In his role, Pini works with Israeli’s largest customers as their trusted advisor for AWS database architectures, best practices, and migrations.
Source: View source