Project Merlin: the volunteer matching engine¶

The Head of Programmes named the project herself, which should perhaps have been a warning. In the Arthurian tradition, Merlin’s gift for seeing the right answer to a problem was balanced by his tendency to be imprisoned in a tree at critical moments. The Head of Programmes was not aware of this parallel at the time of naming. She became somewhat more aware of it later.

The idea was straightforward. The Home has 340 active volunteers, each with a profile in Covenant and additional notes in the Burrow. Matching volunteers to care tasks requires a coordinator to manually cross-reference availability, skills, location, DBS status, and the somewhat ineffable quality of whether this volunteer would be well-suited to this particular resident’s needs. The process took several hours per week. An AI matching engine would reduce this to seconds.

Priya, who had built the Burrow out of a genuine desire to solve exactly this kind of coordination problem, offered to help. She had found a no-code ML platform called FlowCraft that offered a drag-and-drop interface for building classification models without writing code. The Head of Programmes approved the project in a fifteen-minute meeting. Nobody invited the IT manager.

The training data¶

The model needed to learn what a good match looked like. The logical training data was the historical record of which volunteers had been assigned to which tasks, drawn from the Burrow’s scheduling sheets.

The Burrow contained five years of scheduling history. It also contained, in the notes fields that various coordinators had populated over those five years, a set of assumptions and preferences that had accumulated like sediment at the bottom of a river: calm and invisible until disturbed.

Coordinator notes for high-priority resident tasks overwhelmingly featured the names of a recognisable subset of volunteers. Those volunteers shared certain characteristics: they had been with the Home for several years, they worked weekday daytime hours, and they had built relationships with the previous management team. Volunteers who had joined more recently, who worked evenings and weekends, and whose names were less familiar to the coordinators who wrote the notes, appeared less frequently in the high-priority assignments regardless of their documented skills and qualifications.

None of the coordinators who wrote these notes were making deliberate discriminatory choices. They were making reasonable judgements about who they trusted, based on their experience, under time pressure. The result was a five-year record in which certain groups of volunteers were systematically assigned to less visible, lower-engagement tasks, and that pattern was entirely consistent across the dataset.

FlowCraft ingested the data, trained the model, and reported an accuracy of 89 per cent against the historical assignments. This was accurate in the literal sense: the model had learned to replicate the historical pattern with high fidelity. The historical pattern was the problem.

The missing question¶

The GDPR question that nobody asked: does automated decision-making about individuals’ suitability for roles require a data protection impact assessment and, in some circumstances, explicit consent from the individuals being assessed?

The answer is yes. Article 22 of the GDPR establishes specific requirements for automated processing that produces decisions with significant effects on individuals. Whether a volunteer is assigned to a high-value care task or a lower-engagement role has effects on their experience, their development, and the record that will be used in future assessments of their suitability. The head of programmes had not heard of Article 22 in this context. Priya had not thought to ask. FlowCraft’s onboarding documentation did not mention it.

The model went live in February.

In production¶

The first three months produced no complaints, which the Head of Programmes noted with satisfaction. The matching was faster. The coordinators were pleased not to be doing the manual cross-referencing. The output looked authoritative, being generated by a system rather than a person, which gave it a quality of apparent objectivity that the previous manual process had lacked.

Nikolaj, the vampire supervisor of the night shift team, noticed something in month four. Several of the night shift volunteers, including three who had specifically requested daytime observation rotations as part of their development plans, had not been assigned to any daytime tasks since the system went live. Nikolaj, who had been a logistics coordinator in a previous career and had instincts about patterns, looked at the assignment history for the past four months and then at the assignment history for the four months before the system. The disparity was not subtle.

He sent a politely worded email to the Head of Programmes. The Head of Programmes looked at the numbers. She asked Priya to check the model. Priya looked at the model’s feature importance output. The top features were tenure, availability window (daytime versus evening), and what the Burrow called “coordinator familiarity score,” which was a derived metric Priya had added to capture how often a volunteer’s name appeared in coordinator notes. This feature was doing exactly what it was designed to do. The problem was what the coordinator notes encoded.

The subject access request¶

In month six, a volunteer named Theodora Brightmantle, a harpy who had joined the Home eighteen months earlier and who held qualifications in mythological creature rehabilitation that several of the longer-serving volunteers did not, submitted a subject access request. She wanted to know what data the Home held about her, how it was being processed, and on what basis automated decisions about her task assignments were being made.

The response to a subject access request involving automated decision-making must, under GDPR, include meaningful information about the logic involved, the significance of the processing, and the envisaged consequences for the data subject.

The IT manager, who had just been handed the SAR, sent a message to the Head of Programmes that contained only a question mark and a link to Article 22. The Head of Programmes forwarded it to Priya. Priya forwarded it to the FlowCraft support portal. FlowCraft’s response was that their platform’s model explainability outputs were available in the Pro tier, which the Home was not on.

The matching engine was suspended while the DPO reviewed the situation. Theodora’s SAR response took three weeks longer than the statutory deadline. The Home issued an apology and a remediation plan. The remediation plan included, as its first item, a data protection impact assessment for Project Merlin that should have been completed before training began.

The coordinator familiarity score¶

The DPIA, once completed, identified the coordinator familiarity score as a proxy variable encoding historical bias, and recommended its removal. Priya rebuilt the model without it. The new model’s accuracy against the historical dataset dropped to 71 per cent. The Head of Programmes asked whether this was a problem.

The IT manager said that what the model was now less accurately reproducing was a five-year record of biased assignments, and that in this particular case, 71 per cent accurate against the historical data was better than 89 per cent.

The Head of Programmes thought about this for a moment and then said: “Right. Yes. Of course.”

The Burrow was shut down as a training data source. A formal data audit of the scheduling history was commissioned. Project Merlin was renamed. The IT manager suggested renaming it something less prophetic. Nobody took the suggestion.