By Mallory Knodel — Nov 17, 2023

AI has 99 problems

... but PR isn't one. Everyone is talking about AI– even me! Just like no one "goes online" anymore, AI is already everything everywhere all at once.

Tim Wu (gift link) centers actual harms in his analysis of the White House executive order on AI, to get you caught up.

And then there's The EU's AI Act about which Melissa Heikkilä wrote “One of my predictions is that we will see the AI regulatory landscape move from vague, high-level ethical guidelines to concrete, regulatory red lines as regulators in the EU finalize rules for the technology and US government agencies such as the Federal Trade Commission mull rules of their own.”

From the AI Act website artificialintelligenceact.eu: “There are several loopholes and exceptions in the proposed law. These shortcomings limit the Act’s ability to ensure that AI remains a force for good in your life. Currently, for example, facial recognition by the police is banned unless the images are captured with a delay or the technology is being used to find missing children.” (Edited.)

A risk, of course, is that the private sector will flagship hasty regulations.

AI has 99 problems and bias is one.

I worked with my co-author Michael Karanicolas of UCLA on guidance to US judges on AI bias published by AAAS last year.
My CDT colleagues wrote a foundational report on large language models in non-English content analysis and it's a great example of AI's limitations at the design, development and application levels:
Watch this excellent panel on "Centering Disability in AI, One Year After the Blueprint for an AI Bill of Rights" featuring Judy Brewer, Ariana Aboulafia.
A deep assumption of the latest "cryptowars" is a modern encryption backdoor has help from AI to scan user content at scale. This has ultimately proven unpopular in Europe – A+ Big Win for human rights.
My dear colleague Ceyda Yolgörmez thinks you should read more about AI bias here: Data, Power and Bias in Artificial Intelligence; Mitigating Gender Bias in Machine Learning Data Sets; Participatory AI.

AI has 99 problems and only one of them is technical. It’s critical to confront the reality that AI is just modern data science with one major promise (or peril): It’s our only hope of making sense of the vast quantities of unstructured data produced by ubiquitous sensors and surveillance. It is this underlying motivation that needs our attention.

Valuating Automation Past and Present

AI is a large umbrella that encompasses many techniques and approaches, the most famous being computer vision, natural language processing, and machine learning. Yet automation and automated decision making has been with us for decades already. The first two predate machine learning and are fields on their own– they began to use ML techniques like deep learning in the era of big data, but they've been branches of AI since the 1960s. Today when most people talk about AI they are talking about the relatively recent advent of neural networks and deep learning especially machine learning.

Previously machine learning was simply “Big Data”– the management of large data systems– which was professed to “predict” and influence investment, especially in conjunction with risk. Supply chains, financial systems, military operations, social movements, transportation, agriculture, and the environment are all complex systems that profit and power need to make sense of in a timely manner through the management of vast quantities of undifferentiated data. Today’s computational infrastructure makes possible the collection of these datafied systems, and the quest to process them has led to modern ML systems at scale.

Even before the inevitability of datafication, AI came with its own discontents (a la Freud), both from inside and outside the discipline. There is an established critical undercurrent that has been part of a “larger tradition of social criticism responding to the accelerating automation, computerization, and complexification of technological civilization in the twentieth century.” (Garvey, 2021).” This technological acceleration relies on people ultimately. The human aspect of ML implicates the roles of design and supervision, including the labeling of data, which can worsen bias and fairness.

Indeed, a critical understanding of AI is important, especially when we consider that vast national budgets are planned on the premise that greater AI will be "beneficial" for society (see The White House, 2023; the European Commission, 2022).

The contemporary condition is that datafication has become essential to value creation and our societies are evermore shaped by large AI systems (Morozov, 2019). Since ML systems, or models, work on translations of quantified phenomena in the world, the value of their learning largely depends on quantity, the amount of data they have. Theoretically if a model is given enough input about its context of operation– in aggregate we assume the surrounding world– then it should give out accurate conditions/outputs about that context– the world– as constructed by those who are in a position of framing in this mathematical, measurable sense. This position itself is however produced out of the epistemic cultures that treat the world-as-picture, and thus, work with the presumption that the truth of the world can be approximated, and indeed could be reached, if only one has enough knowledge (read: data) about the world.

If the only contribution modern machine learning AI can make is to take vast quantities of unstructured data and identify patterns and correlations, then there is really only one way to improve it and its “sense making” ability: more data.

Beyond Bad Big Data

Pointing out that AI applications have unintended consequences is not devastating to the technology, it is an invitation. It is not if, but when, AI fails– see The Fallacy of AI Functionality. And it is not enough to extensively document AI failures. Data-driven methods cannot themselves solve problems that are socially, historically and culturally complex, so for “bad AI” there must be a proposed way forward. And since AI is essentially sense-making out of unstructured data the options are limited to two: The systems can improve with more data; or the system must be limited in scope of application, at times to the point of simply not being used. The latter accepts that the system might never be better or more credible because of external limitations, implicit paradox or greatness of potential harm.

A Facebook executive once admitted that automated moderation of live active-shooter content on social media would not be possible, saying “Thankfully, we don't have a lot of examples of real people shooting other people."

AI’s capabilities are flaunted in the face of easy problems. Facebook can easily detect nipples and so it does, but it can’t solve hate speech, so it doesn’t. Today’s world is one with both unsolved hate speech and fewer nipples, which is an insult to injury if you think about it.

In fact with just regular, non-ML AI there have been multiple crises of credibility for years. However instead of pointing out “This AI is bad…” and crucially delivering a prescription, “it should no longer be used,” ML has been allowed to rescue the fate of AI. By bringing in more magic data science, somehow everyone’s been convinced that a computer making sense of vast amounts of unstructured data is more fair than a computer just doing what it’s told, by people. Those people are biased but they can also be the human in the loop that holds automated decisions to account (Eubanks, 2018). Hopefully this rescue has been temporary as advocates, public servants and others with political will and attention to detail begin to realize that ML has not made bad AI better.

Yet appeals to restraint appear rarely. That more humans monitor AI’s application is almost never suggested, which is likely because AI is an austerity measure by large firms and governments. We are left with the implicit assumption that we need more tech, and more data (quantitatively and qualitatively), to make better systems and to keep innovating. Machine Learning 101 states outright: “The more sophisticated the underlying algorithm, the more data are needed.” Heudin’s renowned tautology places this technical truth in capitalist reality: “It's not who has the best algorithm that wins, it's who has the most data”.

This has implications for surveillance. As we have explained, there is only one thing that AI/ML has contributed scientifically: making some, any, sense out of vast quantities of unstructured data. This requires a harmful or insufficient AI/ML model to consume more data in order to improve. That data must come from somewhere. That data is our data. The AI/ML model needs that data, making the one technical problem that AI has– not enough data– also a social problem.

AI has 99 problems and it turns out all of them are social

So while some technologists may reach for more data to combat AI bias, others might use bias as an excuse to justify more data for the sake of more data. Still other techno-centric approaches suggest devising additional data analysis in parallel to understand, again through data, AI’s complex ethical problems. Tech-based approaches to solving the perils of AI ends up costing the public funds, and creates diffractions over the socio-political landscape.

A socio-technical approach would more directly reveal its limits within social, historical and cultural analysis. Rather than treating AI’s impacts as causal and separate from its social surroundings, instead we should confront AI’s problems as social ones.

The most obvious relationship between AI technologies and societal ones can be exemplified by the problem of bias. Understanding is now widespread that societal biases are easily replicated and even made worse by ML algorithms (Li et al., 2022). Attempts to create fairness and transparency in the computational processes still might end up reinforcing that algorithmic bias is merely a data problem, and data fixes are costly (Hooker, 2021). Furthermore bias correction can only be negotiated in a tradeoff between privacy and accuracy (Bagdasaryan, 2019).

Shoshana Zuboff’s main contribution to this field is not the one you’re thinking of: the “instrumentarian power” of privatized, data-driven technologies effectively puts companies in the position of enforcing and disciplining human behavior. When we see biased AI, sometimes it’s because that was on purpose. Michelle Alexander, Ruha Benjamin and Simone Browne view bias as much more than computer errors needing correction.

I often reached for Amazon Rekognition as an anecdote to describe that by the time a data-driven application has been launched into the world that so much money had been spent on research and development that the company would be determined to make its money back. This means that human rights respecting institutions would reject it, and the company would necessarily turn to contracts with institutions that aren’t accountable to the human rights framework. This illustrates the larger point that scrutiny and intervention must happen earlier and more often in AI/ML development.

What is missing in current advocacy approaches is a deep analysis of the intertwinement between social processes and technological ones. Even a more honest accounting of the evolution of the data-driven methods employed towards capitalist and state sovereignty ends would bring us closer to understanding how AI offers little that is new, but is merely an extension of profit and power seeking.

In conclusion the clarity that comes from a social approach to AI should guide action:

Technology community: AI’s deep technical problems are orthogonal to its social problems. It’s easy to conflate optimizing a data-driven predictive model with mitigating harms. Furthermore parallels to standardization of internet networking are tenuous at best– the internet requires interoperation. AI is foundationally about proprietary methods of prediction. Without tangible incentives to cooperate, coupled with the social, not technical, nature of its hardest problems, we can at best hope for policy norms, top-down guidance, and compliance-motivated standardization.

Regulators: It seems that the work ahead must be regulatory and focused on restraint. That restraint comes in the form of bans, limits on scope of application, or policies that mitigate data-driven methods’ low credibility in high stakes contexts. Only with heavy restrictions will companies be deterred from building systems that have the potential to exact harm.

Advocates: For AI that demonstrably presents significant harms, it should be a rare occurrence that we push for improvements to the technology. Rather we should be quick to call for limits on its use and application. Harmful AI should lose credibility. The institutions that rely on flawed AI should bear the burden of repairing trust. Impose controls and restrictions and encourage ubiquitous encryption as a way for individuals to opt out of surveillance and the datafication of their lives. Prioritize making collective demands, a way to challenge Big Data at-scale.

Everyone: Explore the limitations of the world-as-picture as they are spectacularly questioned in Adam Curtis’ three-part BBC series “All Watched Over By Machines of Loving Grace”. (And read more Philip K Dick, while we’re at it.)

An overall reduction in data-driven methods and the ubiquity of AI is liberation from datafication. Fewer sensors and less surveillance elevates individual and collective rights, and alleviates individuals and communities from massive bureaucratic institutions, neocolonialist global policy, as well as consumer and labor markets that are exploitative and extractive. In other words, less data and more life.

Subscribe to Internet Exchange