AI, Bias and the Courts

AI is reshaping how courts make decisions, but without transparency and accountability, these tools risk amplifying bias, eroding civil rights, and undermining public trust in the justice system. This is a summary of Mallory Knodel’s keynote at the Michigan Judges Conference.

Old fashioned scales that are unbalanced
Photo by Piret Ilver / Unsplash

By Mallory Knodel

I had the honor of delivering the keynote at the Michigan Judges Conference on the evolving relationship between artificial intelligence and the justice system. This is a compressed and edited version of what I said:

Artificial intelligence (AI) is already influencing how decisions are made in courtrooms. From pretrial risk assessments to predictive policing, AI is being used to evaluate people’s lives, sometimes without them even knowing. These systems promise objectivity, but they can reinforce discrimination instead of reducing it. As artificial intelligence becomes more embedded in judicial processes, courts must understand how bias arises in AI systems and recognize their responsibility to ensure that automated decisions do not undermine civil rights, due process, or public trust. 

AI doesn’t work like a brain. It works like a mirror. It reflects back the patterns in its training data, which often include the biases and inequalities of the past: historical discrimination, over-policing, or economic exclusion. Even if AI appears neutral, the questions asked and the assumptions baked into its design may not be. The result is a system that mirrors bias rather than correcting it. For courts, the challenge isn’t just understanding how AI works. It’s knowing how bias gets built into automated systems and recognizing the legal and moral responsibilities that follow. 

What’s at Stake: AI and the Role of the Courts

Artificial intelligence is increasingly used to help make decisions in areas like hiring, housing, public benefits, and criminal justice. A 2020 study found that these systems are already in use across nearly half of U.S. federal agencies, where they help enforce rules and determine eligibility for services. That figure is likely even higher now. This matters because AI can be difficult to understand, and it may produce biased results without clear explanations. For courts, this raises important questions about fairness, transparency, and how to ensure people’s rights are protected. Courts are not expected to become experts regarding these sometimes complex or technical matters, but they do need to be aware of how AI is used and where problems can arise. If courts cannot properly review AI-based decisions, it could weaken public trust in the legal system. Judges should expect to see cases involving AI bias in many areas and must be prepared to respond thoughtfully.

How Bias Gets Built In: From Design to Deployment

AI systems encompass technologies designed to replicate aspects of human intelligence, primarily to automate tasks at greater speed and scale. As my coauthor Michael Karanicolas and I explain in our evaluation for the National Institute of Standards and Technology, bias in artificial intelligence does not appear out of nowhere. It is built in at every stage of the process. From how a problem is defined, to how data is collected, to how an AI system is deployed, each step involves human decisions. Those decisions reflect assumptions, priorities, and often, a lack of accountability. AI systems promise prediction. But the real question isn’t “how good is the prediction?” It’s “is the prediction equally good for everyone?

Design

Bias often begins with design. When we define a problem—say, “prevent crime”—we already make a value judgment. Are we asking who commits crime, or who gets caught? Are we comfortable using past policing data as a stand-in for future criminal behavior? Take predictive policing systems like PredPol, which is used by dozens of cities in the U.S. These tools are not biased simply because of one bad dataset. They are trained on arrest data that reflects racially biased policing practices, making them inherently discriminatory even when race is not an explicit input. These systems were designed and deployed in the context of over-policing, using arrest records from communities that had already been over-policed. As a result, the systems recommend more policing in the same neighborhoods, reinforcing the cycle rather than breaking it.

Finding Data

This kind of feedback loop can also incentivize officers to collect and act on ever more granular information, much of it speculative. This phenomenon, where digital systems expand the scope of surveillance and bring more people under scrutiny for ever more tenuous reasons, is known as “digital net-widening.” Over time, the system itself helps redefine what is considered suspicious, embedding subjective and potentially discriminatory patterns into the infrastructure of policing. 

Using an example from outside of the criminal justice system, hiring algorithms show similar problems. In one case, one AI system identified participation in high school lacrosse and the name “Jared” as top indicators of job performance. These traits obviously had no real connection to job skills, they simply mirrored the background of people who had already been hired. The algorithm learned to reward those patterns, which often correlated with race, class, and gender.

Training AI Models

During the training phase, developers fine-tune AI models to meet performance goals. However, these goals often overlook how the system performs across different demographics. Disparities in representation in data sets, especially among marginalized communities, can lead to uneven outcomes, meaning a model may appear accurate overall but perform poorly for specific groups. For example, the Gender Shades study by Joy Buolamwini and Timnit Gebru found that commercial facial analysis tools had error rates below 1 percent for lighter-skinned men but as high as 34 percent for darker-skinned women. These differences underscore the need to evaluate not just average performance, but how reliably and fairly the model works for everyone.

Validating and Optimizing Outputs

Validation is often the final step before deployment, but it rarely offers meaningful transparency. In many cases, testing occurs behind closed doors, with limited opportunity for independent review or public accountability. Even when bias is identified, responses tend to focus on statistical adjustments rather than addressing broader structural issues. Some researchers have proposed more rigorous approaches. In the Gender Shades study mentioned above, Joy Buolamwini and Timnit Gebru recommend intersectional auditing, that is evaluating performance across combinations of attributes like gender and skin tone, to expose disparities hidden in aggregate metrics. 

They also advocate for the use of balanced benchmark datasets that better reflect real-world diversity. This includes better representation of different genders, skin tones, and geographic backgrounds. They recommend transparent reporting of subgroup performance metrics, such as showing how accurately the model performs for each demographic group rather than just overall averages, and the ability to adjust classification thresholds, which means giving developers control over how sensitive the model is when making decisions, so they can better balance fairness and accuracy across different populations. 

Deploying AI

Once deployed, the AI becomes embedded in the system. Judges, clerks, and administrators often treat its output as a neutral recommendation, but it isn't neutral. It's opaque, and unlike a human witness, an algorithm can't be cross-examined.

Pretrial risk assessment tools like COMPAS have been used to assess over one million offenders since the algorithm was launched in 1998. The algorithm’s predictions have real consequences for bail, sentencing, and parole decisions across the United States. Yet despite its widespread use, COMPAS is a black box. While it is known to use 137 factors to generate a risk score, and race is not one of the variables, it uses proxies like education level, employment status, and family history that introduce indirect racial bias. The specific weightings of those factors are also proprietary and not publicly disclosed. This means neither defendants nor courts can fully understand or challenge the score. 

A 2016 investigation by ProPublica found that COMPAS systematically assigned higher risk scores to Black defendants, even when white defendants with similar or worse records were labeled low risk. In one case cited by ProPublica, 18-year-old Brisha Borden, a Black teenager with a juvenile record for misdemeanors, was rated high risk after briefly taking a child’s bike and scooter worth $80. Meanwhile, Vernon Prater, a 41-year-old white man with multiple armed robbery convictions, was rated low risk after shoplifting $86 worth of tools. Two years later, Borden had no new charges. Prater was in prison for stealing thousands of dollars' worth of electronics.

According to ProPublica, judges are not supposed to give longer sentences based on higher risk scores, However, judges have cited scores during sentencing. In one case, Judge Scott Horne of La Crosse County, Wisconsin, described a defendant as “identified, through the COMPAS assessment, as an individual who is at high risk to the community” and then imposed a sentence of eight years and six months in prison.

Risk scores are appealing in part because they promise to reduce human bias. If algorithms could reliably predict who might reoffend, they could help make bail, parole, incarceration and other decisions fairer and more targeted. Unfortunately, this false sense of neutrality makes it harder to challenge unfair outcomes and easier for institutions to rely on flawed tools.

AI Bias and the Limits of Current Law

Anti-discrimination laws in the United States are shaped by a human-centered view of decision-making. They rely on the assumption that discrimination stems from intent or motive, or from policies with unjustified outcomes. But AI systems do not make decisions based on intent. They detect patterns in data, often reflecting and amplifying existing structural inequalities without any person explicitly choosing to discriminate. The data-intensive systems behind these decisions, along with the cold and impassive way in which the results are delivered, create a sense of neutrality and fairness, especially when compared to the more subjective, instinct-driven choices that shaped traditional decision-making. 

This disconnect creates a serious challenge for the law: traditional legal standards do not map neatly onto algorithmic decision-making. Bias in AI typically arises from design choices, flawed data, or deployment practices, not from a conscious actor, which makes it difficult to apply legal doctrines built to assess human intent. Algorithms do not “intend” to discriminate, they replicate discriminatory patterns. As a result, plaintiffs must rely on showing disparate impact. Yet doing so can be nearly impossible when the systems involved are opaque, rely on proxy variables, or are protected as trade secrets.

Several recent cases highlight the challenges these tools pose for existing legal frameworks. This year in Colorado, the ACLU filed a complaint against Intuit and HireVue, arguing that video interview analysis software used in promotion decisions worked worse for Deaf and nonwhite applicants. In California, Workday faces a novel lawsuit that is the first case to challenge AI hiring tools under federal anti-discrimination law. The suit alleges that its screening software, which performs functions traditionally carried out by employers, discriminated against Black, disabled, and older job candidates. In 2023, New York City began requiring audits and public notice for automated hiring tools used by employers, a first-of-its-kind law aimed at increasing transparency and accountability. 

In law enforcement, tools like Cybercheck claim to identify digital evidence in criminal cases. However, investigations have revealed serious concerns about their accuracy, transparency, and oversight, prompting courts and prosecutors in several states to withdraw its use. Globally, similar concerns persist. In France and the Netherlands, governments used algorithmic systems to detect welfare fraud. These systems were later found to disproportionately target vulnerable households, including those on a low income, unemployed, or living in a disadvantaged neighborhood, leading to calls for bans and investigations. In Italy, facial recognition company Clearview AI was fined over $33 million for violating data protection laws, part of a broader trend of regulatory pushback against systems that operate without consent or transparency.

These examples illustrate a broader truth: biased, data-driven decision-making creates new opportunities for old legal deficiencies to be exploited. The challenge from AI technologies, ultimately, is fundamental to their predictive nature, since any predictive system will project the inequities of the past onto the future

Tools and Frameworks: FAccT and Beyond

As courts grapple with the complex realities of AI bias, they need practical ways to interpret and challenge algorithmic systems. One of the most widely cited approaches is the FAccT framework: Fairness, Accountability, and Transparency. This system offers a set of principles for evaluating AI systems that can serve as a guide for courts, regulators, and practitioners seeking to audit or intervene when automated tools produce discriminatory results.

  • Fairness is assessed based on the system’s consistent outcomes and impacts, not just inputs or intentions. An AI system may never be explicitly given information about race, yet still produce biased results if it disadvantages individuals from certain racial backgrounds based on proxy variables like zip code or education level. Even when models perform similarly on average, recent research shows that predictive variance, or how much the predictions fluctuate, can differ significantly across groups, leading to less reliable outcomes for marginalized populations.
  • Accountability depends on whether responsibility for the system’s results can be clearly assigned, with appropriate mechanisms for oversight, redress, and correction. In other words, it is not enough to say “the system made the decision,” there must be a clear chain of human responsibility for how the system was developed, implemented, and used. 
  • Transparency involves how easily the system’s processes, data, and decisions can be understood, explained, and scrutinized by those affected and by the courts. A tool that cannot provide a clear explanation for its outputs, or that hides behind proprietary algorithms, undermines meaningful review and due process.

For courts, applying FAccT means going beyond traditional legal standards. Because AI systems often lack clear intent or explanation, judges may need to consider new types of evidence, like how a model was trained, what data it used, and what mechanisms exist for contesting its outputs. Courts can push for documentation that clarifies how and why decisions were made. These kinds of materials can help bridge the gap between opaque technical systems and the demands of legal due process

There are also a growing number of tools courts can reference. For example, AI Fairness 360 (AIF360), developed by IBM, is an open-source toolkit designed to detect and reduce bias in machine learning datasets and models. Another tool gaining traction is the use of Datasheets for datasets. These are structured documentation templates that describe a dataset’s motivation, composition, collection methods, intended uses, and limitations. Datasheets are now being adopted by some responsible AI teams in tech companies and research institutions to improve transparency.

Justice in a Digitized Era: Courts, Public Trust, and the AI Future 

To preserve fairness and due process in an era of algorithmic decision-making, courts must demand transparency into how systems function and maintain mechanisms for individuals to contest and remedy unjust outcomes. If an individual is denied bail, housing, or a job due to an automated system, they deserve to know why. They also need a clear and accessible path to challenge that decision.

Automation should not erode meaningful access to justice. When AI tools are used in legal or quasi-legal contexts, courts must protect the right to be heard. This includes recognizing when the opacity of proprietary systems limits a person’s ability to contest outcomes. As digital systems increasingly influence human rights, the judiciary must act as a guardian of those rights. This does not require judges to become technologists, but it does demand that they ask hard questions: What does this tool do? Who does it affect? Can those impacted understand and respond?

Courts have a vital role to play in holding algorithmic systems accountable, not only by identifying harm but also by shaping meaningful remedies. As AI becomes increasingly embedded in legal and administrative processes, the justice system has the opportunity to influence the direction of technological development. By requiring that these systems uphold the principles of fairness, transparency, and accountability expected in court, judges can help ensure that AI evolves in ways that serve all of society more equitably.


Elsewhere from IX...

Building The Stack

We've been quietly building out our bookshop, The Stack with reads on internet governance, digital rights, and the forces shaping our society online. Every time you buy a book through our shop or affiliate links, 10% goes to supporting the work we do at IX. Want to suggest a book or a reading list? Get in touch. editor@exchangepoint.tech 


Support the Internet Exchange

If you find our emails useful, consider becoming a paid subscriber! You'll get access to our members-only Signal community where we share ideas, discuss upcoming topics, and exchange links. Paid subscribers can also leave comments on posts and enjoy a warm, fuzzy feeling.

Not ready for a long-term commitment? You can always leave us a tip.

Become A Paid Subscriber

From the Group Chat 👥 💬

This week in our Signal community, we talked about:

Internet Governance

Open Social Web

Digital Rights

Technology for Society

Privacy and Security

Upcoming Events

  •  Global Governance Programme  of the Robert Schuman Centre for Advanced Studies (RSCAS) at the European University Institute (EUI) is hosting a session “Shaping the Future of an Open and Free Internet: The Impact of Global Restrictions on Anti-Censorship and Encryption Tools” June 20, 9:30pm CET. Residence Palace, Brussels. https://globalgovernanceprogramme.eui.eu/events/?id=579706 
  • The AI, Media and Democracy Lab is hosting a conversation about the future of European digital infrastructure. For decades, European institutions and businesses have outsourced their infrastructural provision to American providers. Policymakers, academics, and professionals are doubting if this is a viable model. June 20, 6:00pm CEST. Online. https://www.aim4dem.nl/events 
  • Digital Rights in Asia-Pacific, jointly hosted by the Architects of Diversity, the Centre for Independent Journalism, the Initiative to Promote Tolerance and Prevent Violence, Sinar Project. August 26-30, Kuala Lumpur, Malaysia. https://drap.ac/25 

Careers and Funding Opportunities

Opportunities to Get Involved

What did we miss? Please send us a reply or write to editor@exchangepoint.tech.

💡
Want to see some of our week's links in advance? Follow us on Mastodon, Bluesky or LinkedIn, and don't forget to forward and share!

Subscribe to Internet Exchange

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe