Politics of Questions in News: A Mixed-Methods Study of Interrogative Stances as Markers of Voice and Power
This paper investigates how interrogative stances function as markers of voice and power in French-language digital news. Analyzing over 1.2 million articles from 24 outlets (2023–2024) through a mixed-methods pipeline combining LLM pseudo-labeling and qualitative annotation, the authors operationalize pragmatic concepts like answerhood and dialogicity at scale. The study reveals that questions are sparse but structurally significant, predominantly serving framing functions rather than information-seeking, and centering elite actors over diffuse publics.
The paper makes a valuable contribution by bridging computational linguistics and media sociology to study questioning practices in news. The mixed-methods design—using Qwen3-30B to generate pseudo-labels for fine-tuning CamemBERT-large classifiers—provides a pragmatic balance between scale and interpretability, particularly given the finding that the binary detector achieves F1 0.78 while the six-way stance classifier reaches only macro-F1 0.51 against human gold standards. However, the reliance on embedding-based answer detection risks conflating semantic similarity with pragmatic resolution, and the teacher-student pipeline may propagate systematic biases from the LLM into downstream analyses.
The conceptual framework successfully integrates pragmatic theories of questions with computational text analysis, demonstrating that interrogative stances can be operationalized meaningfully at corpus scale. The finding that framing-procedural questions dominate (accounting for "just over half of all interrogatives") while explicitly leading and tag questions remain rare ("about 2%") offers robust evidence that journalistic questions primarily structure exposition. The systematic variation across editorial scales—showing higher interrogative density in thematic outlets ($\bar{I} \approx 0.036$) compared to transnational ones ($\bar{I} \approx 0.020$)—holds up across sensitivity checks, with the authors noting that varying the confidence threshold keeps mean article-level density in a narrow band ($ID_a = 0.0236$–$0.0261$).
Several limitations constrain the interpretive validity of the quantitative measures. First, the answer identification heuristic relies on cosine similarity between CamemBERT embeddings ($\geq 0.40$ threshold) to detect "answer-like" spans, which the authors acknowledge captures "semantic relatedness rather than manually verified resolution"; the 95.6% answerability rate should be read as an "upper bound on local textual uptake rather than a literal estimate of fully resolved answerhood." Second, the six-way stance classifier exhibits substantial confusion between pragmatically adjacent categories, with "information-seeking and rhetorical cases often absorbed into framing-procedural," and the teacher-student pipeline risks propagating LLM biases into the final models. Third, the NER-based "addressivity" indices conflate mention with address—detecting that an entity appears near a question does not establish accountability relations, a limitation the authors note but do not fully resolve methodologically.
The evidence supports the broad claim that interrogatives cluster around elite actors rather than diffuse publics, with "58.5% of questions mention at least one person" while "only 2.7% of questions mention a public or audience." The comparison to prior work on broadcast interviews (Clayman and Heritage) appropriately notes the shift from spoken interaction to written news, though the paper could more explicitly address how the absence of sequential turn-taking in written news affects the applicability of conversational accountability concepts. The authors effectively position their work against large-scale news studies that "treat all sentences alike, without distinguishing interrogatives from declaratives," but under-cite recent computational pragmatics work on question-answering in dialogue that might offer alternative methodological approaches.
The study provides substantial methodological transparency: code and derived annotations are available at https://gitlab.idiap.ch/socialcomputing/politics-of-questions, with detailed hyperparameters for CamemBERT training (learning rate $2 \times 10^{-5}$, batch size 16, early stopping patience 3) and answer identification (cosine threshold 0.40, window lengths $L \in \{1,2,3,4,5\}$). However, full reproducibility is blocked by copyright restrictions on the raw news corpus (1.2M articles from CCNews and Swiss outlets), and the paper explicitly states it lacks specification of "total compute resources and hardware" due to time constraints. The use of a local Qwen3-30B-A3B-Instruct instance for pseudo-labeling prevents external verification of the training data generation step, which is critical given that student models inherit "systematic biases from the teacher model."
Interrogatives in news discourse have been examined in linguistics and conversation analysis, but mostly in broadcast interviews and relatively small, often English-language corpora, while large-scale computational studies of news rarely distinguish interrogatives from declaratives or differentiate their functions. This paper brings these strands together through a mixed-methods study of the "Politics of Questions" in contemporary French-language digital news. Using over one million articles published between January 2023 and June 2024, we automatically detect interrogative stances, approximate their functional types, and locate textual answers when present, linking these quantitative measures to a qualitatively annotated subcorpus grounded in semantic and pragmatic theories of questions. Interrogatives are sparse but systematically patterned: they mainly introduce or organize issues, with most remaining cases being information-seeking or echo-like, while explicitly leading or tag questions are rare. Although their density and mix vary across outlets and topics, our heuristic suggests that questions are overwhelmingly taken up within the same article and usually linked to a subsequent answer-like span, most often in the journalist's narrative voice and less often through quoted speech. Interrogative contexts are densely populated with named individuals, organizations, and places, whereas publics and broad social groups are mentioned much less frequently, suggesting that interrogative discourse tends to foreground already prominent actors and places and thus exhibits strong personalization. We show how interrogative stance, textual uptake, and voice can be operationalized at corpus scale, and argue that combining computational methods with pragmatic and sociological perspectives can help account for how questioning practices structure contemporary news discourse.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.