Architecture
What does DMS extraction do?
The following diagram illustrates the logical architecture.
- Connect to the source database
- Extract full data from the source database and then uses change data capture (CDC) to extract on-going changes. This is a very efficient way of moving data between systems and also allows downstream applications to track any changes in data
- Extract the source database metadata to be used in the rest of the pipeline and update the metadata store
- Validate the extracted data against the metadata
- Upload the data to the Analytical Platform data lake
- Expose the data and metadata for downstream processes to use
- Apply Logging, Monitoring and Alerting (LMA) in accordance with good practice
How is DMS Extraction implemented?
Various serverless Data Analytics AWS Services are used. This means AWS takes over the heavy lifting of the following:
- Providing and managing scalable, resilient, secure, and cost-effective infrastructural components
- Ensuring infrastructural components natively integrate with each other
The following AWS Services are used:
- AWS Database Migration Service (DMS) to extract the full data and/or CDC changes from the source databases to parquet files
- AWS Lambda to validate the output data
- Amazon S3 to store the data at various stages of the pipeline
- AWS Glue Data Catalogue to expose the metadata
- Amazon Cloudwatch for logging and monitoring
- Amazon Simple Notification Service (SNS) for alerting errors to Slack
also:
- Uses create-a-derived-table to curate the data via Amazon Athena orchestrated using dbt
- Uses different AWS accounts on the Analytical Platform to facilitate and isolate resource management
- Provisions dev and preprod pipelines for testing deployment changes before deploying to production
- Extracts metadata from the source database to be used in various places along the pipeline. Please refer to metadata for more details
- Uses GitHub Actions to automate software workflows and run CI/CD pipelines. Please refer deployment for more details
- Uses pulumi to define and deploy Infrastructure as Code (IAC). Please refer to using pulumi for more details
The following diagram summarises the physical architecture for a single database and environment:
Please refer to components for a deeper dive into the individual components.
Last update:
July 8, 2024
Created: July 8, 2024
Created: July 8, 2024