internet governance

Creators Need AI Opt-Out Tools They Can Actually Use

If opt-out tools exist, but no creator can use them, do they really protect anything?

Seventy-five percent of professional artists say they want to block AI crawlers from using their work, but research suggests that even on platforms where those options are available, only a fraction actually do. Why? Because while many opt-out tools exist – including robots.txt, NoAI meta tags, the TDM Reservation Protocol, and platform-specific directives like Google-Extended – they are not easily accessible, and subsequently adoption is negligible.

Opt Out… If You Can

The authors of Somesite I Used To Crawl, surveyed 203 professional artists recruited through Discord channels and professional social media networks, predominantly illustrators and digital 2D artists based in North America, of whom 87% were making money from their work. They found that 59% of artists had never heard of robots.txt, a decades-old web protocol that has been repurposed as the primary mechanism for creators to signal that their content should not be scraped for AI training, and that even among those who had, the most common barrier to using it was simply not knowing how. They also found that use of NoAI and NoImageAI meta tags, which allow creators to signal at the level of individual pages rather than an entire site that content should not be used for AI training, was similarly low, with only 17 and 16 sites respectively using them across the top 10,000 domains surveyed.

They also found that this was not only a problem of creator literacy but that of deliberate platform design choices. Reviewing eight website building and hosting platforms aimed at non-technical users, namely Squarespace, ArtStation, Wix (paid), Adobe Portfolio, Wix (free), Weebly, Shopify, and Carbonmade, they found that four provided no method for users to modify robots.txt at all, with the provider setting a default configuration. Of the remaining four, only Carbonmade disallowed AI crawlers by default. Wix's paid version allowed direct editing but found none of the 1,100 artist websites in their dataset had edited their robots.txt despite the option being available. The researchers speculated that it might be that the interface is confusing, having attempted it themselves and found it so. Squarespace was the only provider offering a dedicated AI crawler opt-out toggle, but only 17% of Squarespace users in the dataset had enabled it, a figure the authors note is low given that 75% of surveyed artists had expressed a desire to block AI crawlers when given the choice.

A further structural problem is that opt-out signals cannot follow content once it leaves a creator's own site, meaning content posted to social media carries no or limited anti-crawler protections that creators can control. AI training datasets are compiled through automated scraping of publicly available videos from platforms including TikTok, Instagram, YouTube, Facebook, Vimeo and Dailymotion, and platform terms of service are typically broad enough to permit this use by default. Although YouTube has provided an opt-out system, many other platforms either allow use of data for AI training by default unless users adjust their privacy settings, or do not explicitly clarify their stance on using user data for AI training at all. Even where opt-out mechanisms exist, they may have come too late, given the volume of content already scraped before any such mechanism existed.

Another structural problem is that text and data mining (TDM), the automated computational analysis of large bodies of digital content used to train AI models, occurs far faster than any rights reservation process could realistically respond to. Automated tools can scrape, analyse and process vast amounts of data within seconds or minutes, whereas reserving rights typically involves manual steps. The result, according to Li et al.,whose response to the UK government consultation on copyright and AI focused on the impact on the dance sector, is that platforms provide vague or broad terms of service that allow them to use user-generated content for AI training, while many users do not understand the terms or, while in the process of understanding them, find their content has already been mined.

Many have identified platform terms of service as a key mechanism through which creator agency is limited. Quintais et al. found that platform terms of service have become increasingly complex over time, spread across multiple documents and versions in ways that make them very difficult to follow. Their research also found that platformisation, the process by which platforms accumulate governance power over content, tends to concentrate power both in platforms and in large rights holders, to the detriment of smaller and independent creators. Tools like Meta's Rights Manager, which allow rights holders to assert claims over content, were found to be effectively inaccessible to small creators, functioning in practice only for large institutional actors.

How About Some Compensation?

Sinha and Li found strong opposition to bulk content commodification, with 44% of respondents saying they would never accept their content being sold in bulk and a further 40% finding it negotiable only if compensated. Yet as Liu et al. and Quintais et al. document, platforms routinely permit exactly this through terms of service that have become increasingly complex and difficult for creators to understand or contest.

Independent creators also have limited ability to contest AI use of their content through formal channels. Litigation is viable only for actors with the resources to absorb legal risk, which in practice means large commercial content creators or firms. Kretschmer, Margoni and Oruc note that legal uncertainty has encouraged AI developers to mine content and destroy training material precisely because individual creators cannot reverse-engineer a trained model to prove infringement. Rodrigo further notes that Article 4 of the CDSM Directive, the EU provision that allows creators to reserve their works from commercial AI training, preferences large incumbents with legal and technical resources that can operationalise rights reservations at scale, while independent creators cannot.

Unfortunately, no workable mechanism currently exists for independent creators specifically. Of the proposals suggested to resolve this, Senftleben's is the most cited. He proposes an output-based levy charged to providers of generative AI systems whenever their system has the potential to substitute human literary or artistic output, with funds distributed through collecting societies and social and cultural funds to individual creators, but acknowledges that collected funds are unlikely to reach individual creators if negotiations are dominated by large rights holders, acknowledging that even the most cited compensation proposal may continue to produce the power asymmetry it is meant to correct.

The evidence points to a consistent and structural problem. The tools that platforms offer independent creators to manage how their content is used for AI training are limited and infrequently used. Where opt-out mechanisms exist, they are difficult to find, inconsistently applied across platforms, and arrived too late for creators whose content has already been used to train AI. On revenue sharing, no workable mechanism currently exists for independent creators. Licensing deals are typically negotiated at the level of large publishers and stock libraries, not individuals.

Policymakers considering reform should note that the problem is not simply one of legal gaps. It is structural. Any intervention that does not account for the power asymmetry between platforms, large rights holders, and independent creators risks reproducing the same imbalance it sets out to correct.

Geneva's Got it Going On

At UN Tech Week Geneva (July 6–10), three major events converge at Palexpo and ITU HQ: the inaugural Global Dialogue on AI Governance, the WSIS Forum 2026, and the AI for Good Global Summit, alongside the WIPO Assemblies.

What To Watch For: A range of side-events will be taking place from Sunday through Thursday, including a number that will be folded into the Global Dialogue’s agenda. These include events organized by MAP-AI, Participatory AI Research & Practice Symposium, Partnership on AI, and UN Human Rights.

Registration & Access

Support the Internet Exchange

If you find our emails useful, consider becoming a paid subscriber! You'll get access to our members-only Signal community where we share ideas, discuss upcoming topics, and exchange links. Paid subscribers can also leave comments on posts and enjoy a warm, fuzzy feeling.

Not ready for a long-term commitment? You can always leave us a tip.

Become A Paid Subscriber

This Week's Links

🚨

Stop press! Do you enjoy our links? Links are now available to paid subscribers only. Become a paid subscriber today.

Creators Need AI Opt-Out Tools They Can Actually Use

Opt Out… If You Can

How About Some Compensation?

Geneva's Got it Going On

This Week's Links

Read next

Age Verification Architecture

Not Just Nudes: Image-Based Abuse in Pakistan

Can Tech Prevent and Combat Cyber Violence Against Girls?

Opt Out… If You Can

Sign up for Internet Exchange

The Problem of Social Media

How About Some Compensation?

Geneva's Got it Going On

This Week's Links

Read next