Analytical Platform data access
Our data is extracted daily into the Analytical Platform, where it is further processed. (For how this is done, please refer to the Integrations page.)
Access checklist
- Ensure you have a GitHub account with digital.justice.gov.uk email and multi-factor authentication.
- Request to add your GitHub account to the moj-analytical-services organisation.
Control Panel user
Once you are in the Analytical Platform organisation:
- Log in to the control panel, which will be your starting point for all interactions.
- Set up multi-factor authentication.
- Ensure you fully log in at least once, as it creates your user for the following steps.
If you succeeded, you should see a similar screen:
Data access
Raise a pull request to add your username in the standard access definition.
For example, if your GitHub user is
sldblog
, addalpha_user_sldblog
to the end.Optionally, if you require access to live data, raise a pull request to add your username in the project access definition.
For example, if your GitHub user is
perlarem1
, addalpha_user_perlarem1
to the end.If you need to chase approval, please ask in
#ask-data-engineering
. You will likely need to self-register to ASD Slack.
Test browse data
- Log in to the control panel.
- Click on the “You can access AWS services such as S3 and Athena via the AWS Console” link at the bottom (you may need to scroll down).
- You should see the AWS console, similar to this:
- Go to the Search bar and search for
athena
: - Select “Athena”, then select “Query your data”, and “Launch query editor”.
- Select
hmpps_interventions_prod
as the database (orhmpps_interventions_dev
for testing) Try the following in the “Query” box:
select reference_number, sent_at from referral where sent_at >= '2022-12-20' and extraction_timestamp >= date_format(now(), '%Y%m%d') order by sent_at desc limit 5
Concepts for querying data
The language for querying data is called SQL.
Filtering for the extraction_timestamp
is necessary as the Analytical Platform keeps a version of our data daily.
If we had a record in the live service for five days, we would have five copies for that one record.
To only fetch the latest, we need to add the extraction_timestamp >= date_format(now(), '%Y%m%d')
filter in our queries.