Feed - arxlens

0

Leveraging GCN-based Action Recognition for Teleoperation in Daily Activity Assistance

cs.RO cs.HC cs.RO Thomas M. Kwok, Jiaan Li, Yue Hu · Apr 9, 2025

Remote caregiving robots could help older adults age in place, but conventional motion-mapping teleoperation forces caregivers into slow, unnatural postures that cause muscle fatigue. This paper proposes a lightweight Spatio-Temporal Graph Convolutional Network (S-ST-GCN) that recognizes four single-arm daily-living actions from an RGB camera and triggers preset robot trajectories, decoupling the operator's motion from the robot's execution. A finite-state machine filters misclassifications before execution. While still a proof-of-concept, the framework offers a more intuitive alternative to direct motion-tracking teleoperation for assisting with activities of daily living.

Caregiving of older adults is an urgent global challenge, with many older adults preferring to age in place rather than enter residential care. However, providing adequate home-based assistance remains difficult, particularly in geographically vast regions. Teleoperated robots offer a promising solution, but conventional motion-mapping teleoperation imposes unnatural movement constraints on operators, leading to muscle fatigue and reduced usability. This paper presents a novel teleoperation framework that leverages action recognition to enable intuitive remote robot control. Using our simplified Spatio-Temporal Graph Convolutional Network (S-ST-GCN), the system recognizes human actions and executes corresponding preset robot trajectories, eliminating the need for direct motion synchronization. A finite-state machine (FSM) is integrated to enhance reliability by filtering out misclassified actions. Our experiments demonstrate that the proposed framework enables effortless operator movement while ensuring accurate robot execution. This proof-of-concept study highlights the potential of teleoperation with action recognition for enabling caregivers to remotely assist older adults during activities of daily living (ADLs). Future work will focus on improving the S-ST-GCN's recognition accuracy and generalization, integrating advanced motion planning techniques to further enhance robotic autonomy in older adult care, and conducting a user study to evaluate the system's telepresence and ease of control.

Read abstractHide abstract

0

ShapDBM: Exploring Decision Boundary Maps in Shapley Space

cs.HC cs.LG Luke Watkin, Daniel Archambault, Alex Telea · Mar 23, 2026

ShapDBM addresses the fragmentation problem in Decision Boundary Maps (DBMs) by transforming data into Shapley space before applying dimensionality reduction. This creates more compact decision zones that reflect model behavior rather than raw data distribution, enabling high-quality visualization of complex datasets like SVHN where traditional data-space DBMs fail.

Decision Boundary Maps (DBMs) are an effective tool for visualising machine learning classification boundaries. Yet, DBM quality strongly depends on the dimensionality reduction (DR) technique and high dimensional space used for the data points. For complex ML datasets, DR can create many mixed classes which, in turn, yield DBMs that are hard to use. We propose a new technique to compute DBMs by transforming data space into Shapley space and computing DR on it. Compared to standard DBMs computed directly from data, our maps have similar or higher quality metric values and visibly more compact, easier to explore, decision zones.

Read abstractHide abstract

0

Reading Between the Lines: How Electronic Nonverbal Cues shape Emotion Decoding

cs.CL cs.HC Taara Kumar, Kokil Jaidka · Mar 22, 2026

This paper investigates how users decode emotions in text-based communication through electronic nonverbal cues (eNVCs)—orthographic signals like elongation, punctuation, and emojis that approximate paralinguistic features. The authors propose a taxonomy grounded in nonverbal communication theory (kinesics and paralinguistics) and test it across three complementary studies: a content analysis developing a regex detection toolkit, a within-subjects experiment manipulating eNVC presence and sarcasm ($n=513$), and focus groups exploring interpretive strategies. The work identifies sarcasm as a critical boundary condition where eNVCs fail to aid interpretation and provides an open-source Python/R package for automated cue detection.

As text-based computer-mediated communication (CMC) increasingly structures everyday interaction, a central question re-emerges with new urgency: How do users reconstruct nonverbal expression in environments where embodied cues are absent? This paper provides a systematic, theory-driven account of electronic nonverbal cues (eNVCs) - textual analogues of kinesics, vocalics, and paralinguistics - in public microblog communication. Across three complementary studies, we advance conceptual, empirical, and methodological contributions. Study 1 develops a unified taxonomy of eNVCs grounded in foundational nonverbal communication theory and introduces a scalable Python toolkit for their automated detection. Study 2, a within-subject survey experiment, offers controlled causal evidence that eNVCs substantially improve emotional decoding accuracy and lower perceived ambiguity, while also identifying boundary conditions, such as sarcasm, under which these benefits weaken or disappear. Study 3, through focus group discussions, reveals the interpretive strategies users employ when reasoning about digital prosody, including drawing meaning from the absence of expected cues and defaulting toward negative interpretations in ambiguous contexts. Together, these studies establish eNVCs as a coherent and measurable class of digital behaviors, refine theoretical accounts of cue richness and interpretive effort, and provide practical tools for affective computing, user modeling, and emotion-aware interface design. The eNVC detection toolkit is available as a Python and R package at https://github.com/kokiljaidka/envc.

Read abstractHide abstract

0

Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction

cs.HC cs.AI Kuangzhe Xu, Yu Shen, Longjie Yan et al. · Mar 23, 2026

This paper argues that frictionless AI interfaces pose a systemic risk of "cognitive agency surrender"—the habitual abdication of human reasoning to algorithmic systems. Drawing on cognitive psychology, the authors theorize "Scaffolded Cognitive Friction" as a defense: intentionally injecting epistemic tension via Multi-Agent Systems (MAS) that expose structured disagreements (computational Devil's Advocates) to force System 2 activation. The work positions itself as a bridge between HCI, cognitive science, and AI governance.

The proliferation of Generative Artificial Intelligence has transformed benign cognitive offloading into a systemic risk of cognitive agency surrender. Driven by the commercial dogma of "zero-friction" design, highly fluent AI interfaces actively exploit human cognitive miserliness, prematurely satisfying the need for cognitive closure and inducing severe automation bias. To empirically quantify this epistemic erosion, we deployed a zero-shot semantic classification pipeline ($\tau=0.7$) on 1,223 high-confidence AI-HCI papers from 2023 to early 2026. Our analysis reveals an escalating "agentic takeover": a brief 2025 surge in research defending human epistemic sovereignty (19.1%) was abruptly suppressed in early 2026 (13.1%) by an explosive shift toward optimizing autonomous machine agents (19.6%), while frictionless usability maintained a structural hegemony (67.3%). To dismantle this trap, we theorize "Scaffolded Cognitive Friction," repurposing Multi-Agent Systems (MAS) as explicit cognitive forcing functions (e.g., computational Devil's Advocates) to inject germane epistemic tension and disrupt heuristic execution. Furthermore, we outline a multimodal computational phenotyping agenda -- integrating gaze transition entropy, task-evoked pupillometry, fNIRS, and Hierarchical Drift Diffusion Modeling (HDDM) -- to mathematically decouple decision outcomes from cognitive effort. Ultimately, intentionally designed friction is not merely a psychological intervention, but a foundational technical prerequisite for enforcing global AI governance and preserving societal cognitive resilience.

Read abstractHide abstract

0

BadminSense: Enabling Fine-Grained Badminton Stroke Evaluation on a Single Smartwatch

cs.HC cs.AI Taizhou Chen, Kai Chen, Xingyu Liu et al. · Mar 23, 2026

BadminSense is a smartwatch-based system for fine-grained badminton stroke evaluation that aims to provide amateur players with professional-quality coaching feedback without requiring expensive external equipment. The system uses a single commercial smartwatch on the dominant wrist to segment and classify four stroke types, predict stroke quality on a 5-point Likert scale, and estimate shuttle impact location on the racket string area. The key innovation is enabling fine-grained quality assessment beyond simple activity recognition, targeting the gap between basic fitness tracking and professional coaching.

Evaluating badminton performance often requires expert coaching, which is rarely accessible for amateur players. We present adminSense, a smartwatch-based system for fine-grained badminton performance analysis using wearable sensing. Through interviews with experienced badminton players, we identified four system design requirements with three implementation insights that guide the development of BadminSense. We then collected a badminton strokes dataset on 12 experienced badminton amateurs and annotated it with fine-grained labels, including stroke type, expert-assessed stroke rating, and shuttle impact location. Built on this dataset, BadminSense segments and classifies strokes, predicts stroke quality, and estimates shuttle impact location using vibration signal from an off-the-shelf smartwatch. Our evaluations show that

Read abstractHide abstract

0

More Isn't Always Better: Balancing Decision Accuracy and Conformity Pressures in Multi-AI Advice

cs.HC cs.AI Yuta Tsuchiya, Yukino Baba · Mar 23, 2026

As users increasingly consult multiple large language models for decision support, a critical question arises: does increasing the number of AI advisors improve accuracy or amplify harmful conformity pressures? This paper investigates how panel size, within-panel consensus, and human-likeness of presentation shape human reliance and decision accuracy across three prediction tasks (income, recidivism, and dating). Through two crowdsourced experiments with 348 participants, the authors reveal a surprising non-monotonic relationship: three AI advisors improve accuracy over a single advisor, but five provide no additional benefit, while unanimous consensus fosters overreliance and wide disagreement creates confusion.

Just as people improve decision-making by consulting diverse human advisors, they can now also consult with multiple AI systems. Prior work on group decision-making shows that advice aggregation creates pressure to conform, leading to overreliance. However, the conditions under which multi-AI consultation improves or undermines human decision-making remain unclear. We conducted experiments with three tasks in which participants received advice from panels of AIs. We varied panel size, within-panel consensus, and the human-likeness of presentation. Accuracy improved for small panels relative to a single AI; larger panels yielded no gains. The level of within-panel consensus affected participants' reliance on AI advice: High consensus fostered overreliance; a single dissent reduced pressure to conform; wide disagreement created confusion and undermined appropriate reliance. Human-like presentations increased perceived usefulness and agency in certain tasks, without raising conformity pressure. These findings yield design implications for presenting multi-AI advice that preserve accuracy while mitigating conformity.

Read abstractHide abstract

0

Dyadic: A Scalable Platform for Human-Human and Human-AI Conversation Research

cs.HC cs.AI cs.CL David M. Markowitz · Mar 23, 2026

Dyadic is a web-based platform for studying human-human and human-AI conversations through text or voice-based interaction. It attempts to solve the methodological gap in conversation research by providing turnkey tools for experimental manipulation, live monitoring, and in-situ survey delivery during ongoing chats. The core value proposition is lowering barriers to entry for researchers studying dyadic interaction processes without requiring programming expertise.

Conversation is ubiquitous in social life, but the empirical study of this interactive process has been thwarted by tools that are insufficiently modular and unadaptive to researcher needs. To relieve many constraints in conversation research, the current tutorial presents an overview and introduction to a new tool, Dyadic (https://www.chatdyadic.com/), a web-based platform for studying human-human and human-AI conversations using text-based or voice-based chats. Dyadic is distinct from other platforms by offering studies with multiple modalities, AI suggestions (e.g., in human-human studies, AI can suggest responses to a participant), live monitoring (e.g., researchers can evaluate, in real time, chats between communicators), and survey deployment (e.g., Likert-type scales, feeling thermometers, and open-ended text boxes can be sent to humans for in situ evaluations of the interaction), among other consequential features. No coding is required to operate Dyadic directly, and integrations with existing survey platforms are offered.

Read abstractHide abstract

0

Deep Attention-based Sequential Ensemble Learning for BLE-Based Indoor Localization in Care Facilities

cs.LG cs.HC Minh Triet Pham, Quynh Chi Dang, Le Nhat Tan · Mar 22, 2026

This paper addresses BLE-based indoor localization in care facilities by shifting from independent-window classification to sequential learning. The proposed DASEL framework combines frequency-based feature engineering, bidirectional GRUs with attention mechanisms, and a two-level hierarchical ensemble to model temporal movement trajectories. Achieving a 53.1% improvement over traditional baselines on the ABC 2026 challenge dataset, the work demonstrates that capturing temporal dependencies is critical for accurate indoor localization in complex real-world environments.

Indoor localization systems in care facilities enable optimization of staff allocation, workload management, and quality of care delivery. Traditional machine learning approaches to Bluetooth Low Energy (BLE)-based localization treat each temporal measurement as an independent observation, fundamentally limiting their performance. To address this limitation, this paper introduces Deep Attention-based Sequential Ensemble Learning (DASEL), a novel framework that reconceptualizes indoor localization as a sequential learning problem. The framework integrates frequency-based feature engineering, bidirectional GRU networks with attention mechanisms, multi-directional sliding windows, and confidence-weighted temporal smoothing to capture human movement trajectories. Evaluated on real-world data from a care facility using 4-fold temporal cross-validation, DASEL achieves a macro F1 score of 0.4438, representing a 53.1% improvement over the best traditional baseline (0.2898).

Read abstractHide abstract

0

Agentic Personas for Adaptive Scientific Explanations with Knowledge Graphs

cs.AI cs.HC Susana Nunes, Tiago Guerreiro, Catia Pesquita · Mar 23, 2026

This paper tackles the limitation that XAI systems assume static user models, ignoring diverse epistemic stances among domain experts. The authors propose agentic personas—structured representations of expert reasoning strategies derived from clustered feedback and instantiated via LLMs—to condition reinforcement learning-based explanation generation on knowledge graphs. This enables adaptive explanations that align with specific interpretive preferences (mechanistic rigor vs. focused clarity) without requiring extensive individual-level human feedback, demonstrated in drug discovery with 22 expert participants.

AI explanation methods often assume a static user model, producing non-adaptive explanations regardless of expert goals, reasoning strategies, or decision contexts. Knowledge graph-based explanations, despite their capacity for grounded, path-based reasoning, inherit this limitation. In complex domains such as scientific discovery, this assumption fails to capture the diversity of cognitive strategies and epistemic stances among experts, preventing explanations that foster deeper understanding and informed decision-making. However, the scarcity of human experts limits the use of direct human feedback to produce adaptive explanations. We present a reinforcement learning approach for scientific explanation generation that incorporates agentic personas, structured representations of expert reasoning strategies, that guide the explanation agent towards specific epistemic preferences. In an evaluation of knowledge graph-based explanations for drug discovery, we tested two personas that capture distinct epistemic stances derived from expert feedback. Results show that persona-driven explanations match state-of-the-art predictive performance while persona preferences closely align with those of their corresponding experts. Adaptive explanations were consistently preferred over non-adaptive baselines (n = 22), and persona-based training reduces feedback requirements by two orders of magnitude. These findings demonstrate how agentic personas enable scalable adaptive explainability for AI systems in complex and high-stakes domains.

Read abstractHide abstract

0

Optimizing Feature Extraction for On-device Model Inference with User Behavior Sequences

cs.LG cs.AI cs.HC Chen Gong, Zhenzhe Zheng, Yiliu Chen et al. · Mar 23, 2026

Machine learning models on mobile devices spend 61-86% of execution time extracting features from user behavior logs rather than running inference. This paper introduces AutoFeature, a graph-based engine that eliminates redundant operations across features and consecutive executions using directed acyclic graph optimization and intelligent caching. Tested across five industrial services including TikTok and e-commerce platforms, it achieves 1.33×-4.53× end-to-end latency reduction without accuracy loss.

Machine learning models are widely integrated into modern mobile apps to analyze user behaviors and deliver personalized services. Ensuring low-latency on-device model execution is critical for maintaining high-quality user experiences. While prior research has primarily focused on accelerating model inference with given input features, we identify an overlooked bottleneck in real-world on-device model execution pipelines: extracting input features from raw application logs. In this work, we explore a new direction of feature extraction optimization by analyzing and eliminating redundant extraction operations across different model features and consecutive model inferences. We then introduce AutoFeature, an automated feature extraction engine designed to accelerate on-device feature extraction process without compromising model inference accuracy. AutoFeature comprises three core designs: (1) graph abstraction to formulate the extraction workflows of different input features as one directed acyclic graph, (2) graph optimization to identify and fuse redundant operation nodes across different features within the graph; (3) efficient caching to minimize operations on overlapping raw data between consecutive model inferences. We implement a system prototype of AutoFeature and integrate it into five industrial mobile services spanning search, video and e-commerce domains. Online evaluations show that AutoFeature reduces end-to-end on-device model execution latency by 1.33x-3.93x during daytime and 1.43x-4.53x at night.

Read abstractHide abstract

0

Conversation Tree Architecture: A Structured Framework for Context-Aware Multi-Branch LLM Conversations

cs.CL cs.AI cs.HC Pranav Hemanth, Sampriti Saha · Mar 22, 2026

This paper tackles logical context poisoning—the degradation of LLM responses when flat, linear conversation structures force topically distinct threads to accumulate in a single unbounded context window. The core idea is the Conversation Tree Architecture (CTA), which models conversations as a directed rooted tree $\mathcal{T}=(V,E,r,W)$ where each node $v \in V$ maintains an isolated local context window $w_v$. Structured flow operations—downstream passing $\phi_{\downarrow}$, upstream merging $\psi_{\uparrow}$, and volatile nodes—govern how context moves between branches. This matters because current interfaces offer no middle ground between discarding context (new chat) and accumulating noise (linear threads).

Large language models (LLMs) are increasingly deployed for extended, multi-topic conversations, yet the flat, append-only structure of current conversation interfaces introduces a fundamental limitation: all context accumulates in a single unbounded window, causing topically distinct threads to bleed into one another and progressively degrade response quality. We term this failure mode logical context poisoning. In this paper, we introduce the Conversation Tree Architecture (CTA), a hierarchical framework that organizes LLM conversations as trees of discrete, context-isolated nodes. Each node maintains its own local context window; structured mechanisms govern how context flows between parent and child nodes, downstream on branch creation and upstream on branch deletion. We additionally introduce volatile nodes, transient branches whose local context must be selectively merged upward or permanently discarded before purging. We formalize the architecture's primitives, characterize the open design problems in context flow, relate our framework to prior work in LLM memory management, and describe a working prototype implementation. The CTA provides a principled foundation for structured conversational context management and extends naturally to multi-agent settings.

Read abstractHide abstract

Nothing here yet