Skip to content

Managed Pipelines

Managed Pipelines pulls data from various MOJ legacy/heritage databases to the Analytical Platform, MoJ's data analysis environment, providing modern tools and key datasets for MoJ analysts.

What does Managed Pipelines do?

Managed Pipelines pulls data from the following databases:

  • Delius The National Probation Service’s Case Management System
  • OASys Used to assess the risks and needs of eligible offenders in prisons and probation trusts

Managed Pipelines uploads this data to the Analytical Platform, such that the data is:

  • refreshed on a regular (daily) basis to support up-to-date analysis
  • versioned to support reproducible and time-dependent analysis
  • compatible with analytical services such as Athena

Please refer to architecture for more details on how Managed Pipelines implements these criteria.

What does Managed Pipelines not do?

Microservices

Managed Pipelines is not intended for MoJ microservices such as pathfinder and interventions. Instead, the strategic solution is to push data from microservices to the Analytical Platform using a "self-service" approach. This gives microservice owners more control over their pipeline. This option is less suitable for legacy databases, and hence the need for a “managed” approach to extract data.

Please contact #ask-data-engineering for questions and concerns about any existing data pipelines, whether managed or self-serviced, or to discuss and support new requirements.

Denormalisation and Transformations

Data in heritage databases tends to be highly normalized to facilitate operational processes. This makes the data more difficult for use in analytical queries. Managed Pipelines does not transform the data and denormalize it. This is the responsiblity of downstream applications managed by the #data-modelling team. Please reach out to #data-modelling who can connect you with existing denormalised data assets, or discuss and support new requirements.

Data Access

Managed Pipelines does not manage access to the data. Access to the data is controlled through the data-engineering-database-access repository. Please reach out to #data-modelling for guidance and support on the most appropriate data sources to get access to for specific needs.

Data Documentation and Domain Knowledge

Managed Pipelines does not automatically publish documentation about the data. The Managed Pipelines team can manually update metadata such as table and column names on the data dicovery tool. Please refer to the user guidance for more details. You can also suscribe to dedicated slack channels where you can get support from other analysts working on the same data source:

Project Documentation

The homepage for the Managed Pipelines documentation can be found here.

Warning

The documentation is using an experimental template which should only be used for prototyping

Failures

Managed Pipelines may suffer from failures on occasion, which means data is not refreshed or unavailable. Notifications will be sent to #ask-data-engineering in the first instance. You will need an account on the ASD Slack Worspace.

Contact

Please send a slack message on #managed-pipelines-and-modernisation. You will need an account on the ASD Slack Workspace.

Roadmap

Please refer to Managed Pipelines and Modernisation Roadmap. You will need an account on the dsdmoj Jira site.

Contributing

Please refer to contributing.

Licence

Unless stated otherwise, the codebase is released under the MIT License.