Delius and Oasys Extraction
We extract data from the source delius and oasys databases on to the Analytical Platform, MoJ's data analysis environment, providing modern tools and key datasets for MoJ analysts.
This project was previously known as managed-pipelines.
What this repository covers do?
The extraction of the following databases onto the Analytical Platform:
- Delius The National Probation Service’s Case Management System
- OASys Used to assess the risks and needs of eligible offenders in prisons and probation trusts
This data is uploaded to the Analytical Platform, such that the data is compatible with analytical services such as Athena, allowing the curation process within create-a-derived-table to run daily, creating versioned data supporting both up-to-date and reproducible, time-dependent analysis.
Please refer to architecture for more details on how the extraction process is implemented.
What this repository not do?
Microservices
This extraction process is not intended for MoJ microservices such as pathfinder and interventions. Instead, the strategic solution is to push data from microservices to the Analytical Platform using a "self-service" approach. This gives microservice owners more control over their pipeline. This option is less suitable for legacy databases, and hence the need for a “managed” approach to extract data.
Please contact #ask-data-engineering for questions and concerns about any existing data pipelines, whether managed or self-serviced, or to discuss and support new requirements.
Curation
The process of deduplicating and versioning the extracted data which was previously handled by this repository through a number of glue-jobs is now managed within create-a-derived-table.
Denormalisation and Transformations
Data in heritage databases tends to be highly normalized to facilitate operational processes. This makes the data more difficult for use in analytical queries. The Extraction Process does not transform the data and denormalize it. This is the responsiblity of downstream applications managed by the analytics engineers in #dmet-hmpps and #data-modelling team. Please reach out to #dmet-hmpps/#data-modelling who can connect you with existing denormalised data assets, or discuss and support new requirements.
Data Access
The extraction process does not manage access to the data. Access to the data is controlled through the data-engineering-database-access repository. Please reach out to #data-modelling for guidance and support on the most appropriate data sources to get access to for specific needs.
Data Documentation and Domain Knowledge
The extraction process does not automatically publish documentation about the data. Metadata can be manually updated, such as table and column names on the data dicovery tool. Please refer to the user guidance for more details. You can also subscribe to dedicated slack channels where you can get support from other analysts working on the same data source:
Project Documentation
The homepage for documentation can be found here.
Warning
The documentation is using an experimental template which should only be used for prototyping
Failures
The extraction process may suffer from failures on occasion, which means data is not refreshed or unavailable, impacting downstream pipelines and applications. We have automated notifications for some of these failures which will be sent to #data-engineering-alerts-prod. We will communicate any ongoing issues to downstream users via #ask-data-engineering in the first instance. You will need an account on the ASD Slack Workspace.
Contact
Please send a slack message on #dmet-hmpps You will need an account on the ASD Slack Workspace.
Roadmap
Please refer to Roadmap.
Contributing
Please refer to contributing.
Licence
Unless stated otherwise, the codebase is released under the MIT License.