State-controlled media can impact what AI chatbots learn how they respond analysis finds

pti-preview-theweek

New Delhi, May 14 (PTI) Content from state-controlled media can influence information environments, which AI chatbots learn from and use to respond to users' requests, according to an analysis.
    Researchers, including those from the universities of Oregon and California San Diego, found evidence of how recirculation of phrases -- moving through newspapers, apps, reposts and ordinary web pages until they look like a part of the broader information environment -- in a state media control can leave detectable traces in artificial intelligence (AI) model behaviour.
    The study, published in the journal Nature, combines evidence from evaluating large language models in the local languages of 37 countries with a case study from China.
    "People often talk about AI as if it learns from the internet in some neutral way. It doesn't. It learns from information environments that have already been shaped by institutions and power, and those environments can leave measurable traces in what models say," co-first author Hannah Waight, assistant professor of sociology at the University of Oregon, said.
    Over the course of six studies, the team traced the pathway of content from online media to training data to model behaviour, combining results from analyses including that of open training data and experiments with training small models.
    To trace the "institutional influence" through an AI model's training process, the authors first showed that content from state-coordinated media appears frequently in the training data.
    The researchers also found that commercial models memorised distinctive phrases associated with content from state-coordinated media, suggesting that the content had been seen a number of times during training.
    "State-coordinated content is not just about what appears in official media. It is also about recirculation; the same phrasing moving through newspapers, apps, reposts and ordinary web pages until it looks like part of the broader information environment," author Brandon M. Stewart, associate professor of sociology at Princeton University, said.
    "Once state-coordinated content is in the training data, the model can launder it into what looks and sounds like neutral, objective information," Stewart said.
    The team further reasoned that the influence of states over the pre-training data should appear most clearly in the state's primary language -- for example, a question framed in Chinese about the Chinese government should produce a more pro-government answer, compared to the same question in English.
    In responses to political questions about China, human raters adjudged the Chinese-prompted answer to be more favourable to the Chinese government more than 75 per cent of the time, the researchers said.
    A cross-national study of 37 countries where a national language is largely concentrated within a single country, AI models portrayed governments and institutions from countries with a stronger media control more favourably in the country's language than in English.
    The authors said the result is correlational, but added that it is consistent with the mechanism identified in the China case study.
    "This is not evidence that AI companies set out to curry favour with those governments, or that those governments control media systems with chatbots in mind," co-author Margaret E. Roberts, professor of political science at the University of California San Diego, said.
    "States shape the information environment, the information environment shapes training data, and training data shapes model outputs. But going forward, our findings suggest that LLMs create new incentives for powerful actors to think strategically about the text they disseminate online," Roberts said.

(This story has not been edited by THE WEEK and is auto-generated from PTI)