Skip to main content

Analytical Platform data access

Our data is extracted daily into the Analytical Platform, where it is further processed. (For how this is done, please refer to the Integrations page.)

Access checklist

  1. Ensure you have a GitHub account with digital.justice.gov.uk email and multi-factor authentication.
  2. Request to add your GitHub account to the moj-analytical-services organisation.

Control Panel user

Once you are in the Analytical Platform organisation:

  1. Log in to the control panel, which will be your starting point for all interactions.
  2. Set up multi-factor authentication.
  3. Ensure you fully log in at least once, as it creates your user for the following steps.

If you succeeded, you should see a similar screen: control panel screenshot

Data access

  1. Raise a pull request to add your username in the standard access definition.

    For example, if your GitHub user is sldblog, add alpha_user_sldblog to the end.

  2. Optionally, if you require access to live data, raise a pull request to add your username in the project access definition.

    For example, if your GitHub user is perlarem1, add alpha_user_perlarem1 to the end.

  3. If you need to chase approval, please ask in #ask-data-engineering. You will likely need to self-register to ASD Slack.

Test browse data

  1. Log in to the control panel.
  2. Click on the “You can access AWS services such as S3 and Athena via the AWS Console” link at the bottom (you may need to scroll down).
  3. You should see the AWS console, similar to this: AWS console screenshot
  4. Go to the Search bar and search for athena: AWS search for Athena screenshot
  5. Select “Athena”, then select “Query your data”, and “Launch query editor”.
  6. Select hmpps_interventions_prod as the database (or hmpps_interventions_dev for testing)
  7. Try the following in the “Query” box:

    select reference_number, sent_at
    from referral
    where sent_at >= '2022-12-20'
      and extraction_timestamp >= date_format(now(), '%Y%m%d')
    order by sent_at desc
    limit 5
    
  8. You should see something similar to: Athena sample result

Concepts for querying data

The language for querying data is called SQL.

Filtering for the extraction_timestamp is necessary as the Analytical Platform keeps a version of our data daily.

If we had a record in the live service for five days, we would have five copies for that one record. To only fetch the latest, we need to add the extraction_timestamp >= date_format(now(), '%Y%m%d') filter in our queries.

This page was last reviewed on 6 September 2024. It needs to be reviewed again on 6 March 2025 by the page owner #interventions-dev .
This page was set to be reviewed before 6 March 2025 by the page owner #interventions-dev. This might mean the content is out of date.