From Bruce Schneier: "All it takes to poison AI training data is to create a website:

Dave Rahardja

@Sorro @emacsomancer I suspect Google Gemini is using Google’s normal search-engine scraper as a searchable source. In other words, I suspect their Gemini LLM is invoking internal API to “search Google” internally (without the degraded search that the public is subject to), and then putting the search results in its context window to form an answer.

This is one reason I think OpenAI and Anthropic are at a huge disadvantage to Google when it comes to their LLMs dealing with current events and topics. You can block OpenAI and Anthropic scrapers, but you don’t want to block Google search crawlers, which “coincidentally” also feeds Gemini.

faxmodem

@emacsomancer we should probably call them AP (Artificial Parrots)

Wandering Adventure Party

From Bruce Schneier: "All it takes to poison AI training data is to create a website:

Poisoning AI Training Data - Schneier on Security