ADR-015 - Monitoring and Testing MS List Submission

Date: 2024-08-08

Status

✅ Accepted

Monitoring and Testing MS List Integration

Context

One of the most requested features from form owners is to have an alternative form submission option, culminating in an integration with the MS Graph API.

Because we cannot have long term licences for sharepoint/teams, or long term acceptance test accounts in the dev tenancy, it’s difficult for us to have long lived high confidence acceptance tests that run on each deploy that verify the integration is still behaving as expected.

We do have automated testing around the integration and monitoring, this document aims to document what we are able to monitor.

Constraints

Forms with this feature enabled send submissions to the MS Graph API
The graph API in some cases quietly discards answers within submissions, but will return a failure status code if we try to write to a column that does not exist
We cannot control or predict changes to the Graph API
We cannot have a long term dev environment sharepoint site/acceptance test account
If we could, developer access to manage and maintain that sharepoint site is difficult to maintain
We’d need to clear out submissions from test sharepoint sites either manually after each test run or via a data retention policy

Automated tests

We have automated tests checking we send payloads to the graph API routes as expected
We have an acceptance test pointed at devl.justice.gov.uk that sends to a list when also sending an email etc and verifies the answers in the list are correct
We get the list item by looking for the last entry in all list items, because getting all and filtering has a max of 10 results, so we get just the latest list item and the get all its columns
We verify this last item has the right reference number which means multiple overlapping AT runs can pollute
Data retention on list items deletes them after one day
Data retention on uploaded files is TBC

Monitoring and Alerting

We receive logs in kibana - but not much is actually logged in terms of debug info to avoid logging personal info
Any non success status code from the graph api is forwarded to sentry
Sentry errors alert in slack

Potential issues

We should be prevented from releasing a code change that interrupts ms list submissions, thanks to the AT flow
If the graph api changes, we won’t know until ATs fail
If users are regeneerating their lists we won’t know
If graph api returns error codes in any deployed form we will be alerted
But if the graph api accepts the payload and quietly does not behave as expected we won’t be alerted