TalentScore: the screening tool with opinions¶
The HR manager discovered TalentScore at a recommendation from a colleague at another wildlife sanctuary. The tool scored volunteer applications against a desired profile, ranked candidates by fit, and generated a shortlist for human review in a fraction of the time the manual process required. The colleague said it had transformed their recruitment. The HR manager set up a trial account and ran the first batch of applications through it that afternoon.
She had not mentioned the trial to the IT manager, the business analyst, or the DPO. She mentioned it to the Head of Programmes, who said it sounded very useful.
What TalentScore knew¶
TalentScore was trained on historical volunteer recruitment outcomes from, according to its marketing materials, over forty organisations in the animal welfare and social care sector. The model had been validated, the marketing materials noted, against the judgements of experienced recruitment coordinators.
The business analyst found TalentScore’s model card on their website. It was not prominently linked. It was in a PDF under a Resources tab titled “Technical Documentation,” in a section called “Fairness and Bias Assessment.” The relevant paragraph read:
“Testing against held-out evaluation sets identified differential scoring patterns correlated with applicant name features. Specifically, names associated with certain ethnic and national-origin categories showed lower mean scores in the care assistant and animal handling role categories at a statistically significant rate (p < 0.01). The root cause is attributed to historical outcome disparities in the training data. Mitigation efforts are ongoing. Users are advised to apply human review to all shortlists generated by the system.”
The mitigation efforts had apparently been ongoing for two years, based on the model card’s version history. The advice to apply human review was in paragraph four of a technical document that HR had not read.
TalentScore had been in use at the Home for five months.
What TalentScore had decided¶
The business analyst ran the five months of application data through a simple demographic analysis using the names as a proxy indicator, in the same way the model was doing, to understand the pattern in the Home’s specific case.
The results were consistent with the model card’s description. Applicants with names associated with South Asian, Eastern European, and North African origins had been scored an average of 23 points lower than applicants with names associated with Anglo-Saxon or Western European origins, across the care assistant and animal handling roles that made up the majority of the Home’s volunteer intake. Several applicants who had been placed below the shortlist threshold would have been above it under a name- blind scoring.
Among the applicants who had not made the shortlist was a recent veterinary professional with particular expertise in the treatment of distressed magical fauna, whose qualifications were exceptional and whose application ranked in the bottom quartile of TalentScore’s output. The IT manager, who had been following the analysis with the expression of someone watching a flood approach a building they are very fond of, pointed this out specifically.
The subject access request¶
Three of the rejected applicants submitted subject access requests within a two-week period. This was, the IT manager said to the business analyst, not a coincidence: it suggested that someone had mentioned the Home’s screening process in a community that had experience of exactly this kind of tool. The three SARs asked, among other things, for the logic of any automated decision-making applied to their applications.
Under Article 22, data subjects have the right to meaningful information about automated processing that significantly affects them. “Your application was scored by TalentScore, which has a documented bias against your name category, and we did not read the model card before deploying it” was accurate but not, the DPO observed, a response that the supervisory authority would find entirely satisfactory.
The TalentScore trial was suspended immediately. The IT manager drafted a response to the three SARs in consultation with the DPO. The response acknowledged the use of automated scoring, described TalentScore’s documented bias, confirmed that the Home should have applied additional human review, and offered each applicant a fresh evaluation of their application by a human coordinator.
One of the three applicants declined the re-evaluation and indicated they intended to escalate to the supervisory authority. The DPO noted this in the incident record.
The HR manager’s model card¶
HR’s position, once the situation was fully explained, was that the model card was not easy to find and that the tool had been recommended by a colleague who had not mentioned bias issues. Both of these things were true and neither of them was a defence under the accountability requirements of GDPR.
The HR manager read the model card. She found it detailed, specific, and the kind of document that would have prompted an immediate conversation with the DPO had she read it before starting the trial. She said this to the IT manager, who appreciated the acknowledgement more than he showed.
The AI policy that emerged from the combined output of Oracle, Merlin, ClaimCraft, and TalentScore included, as one of its first requirements, that any AI system used for decisions about individuals must have its technical documentation reviewed and signed off by the IT function and the DPO before the first application is run. Not the first paid application. The first application, including trials.
The IT manager wrote this requirement himself. It was the only part of the policy document he drafted personally, and he drafted it in the tone of someone who had recently had a very specific and illuminating set of experiences.