惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

云风的 BLOG
云风的 BLOG
Vercel News
Vercel News
G
Google Developers Blog
Martin Fowler
Martin Fowler
大猫的无限游戏
大猫的无限游戏
U
Unit 42
H
Hackread – Cybersecurity News, Data Breaches, AI and More
S
Securelist
Schneier on Security
Schneier on Security
F
Full Disclosure
P
Proofpoint News Feed
C
Cisco Blogs
J
Java Code Geeks
K
Kaspersky official blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Cisco Talos Blog
Cisco Talos Blog
小众软件
小众软件
博客园_首页
博客园 - 聂微东
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
C
CXSECURITY Database RSS Feed - CXSecurity.com
Project Zero
Project Zero
Google DeepMind News
Google DeepMind News
Security Latest
Security Latest
D
DataBreaches.Net
MongoDB | Blog
MongoDB | Blog
阮一峰的网络日志
阮一峰的网络日志
W
WeLiveSecurity
AI
AI
V
V2EX
B
Blog RSS Feed
Google Online Security Blog
Google Online Security Blog
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog
腾讯CDC
P
Privacy & Cybersecurity Law Blog
月光博客
月光博客
T
Threat Research - Cisco Blogs
T
The Exploit Database - CXSecurity.com
S
Schneier on Security
IT之家
IT之家
Latest news
Latest news
The GitHub Blog
The GitHub Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
The Last Watchdog
The Last Watchdog
爱范儿
爱范儿
Y
Y Combinator Blog
TaoSecurity Blog
TaoSecurity Blog
aimingoo的专栏
aimingoo的专栏
S
Secure Thoughts

cs.HC updates on arXiv.org

Knowing When to Ask: Self-Gated Clarification for Hierarchical Language Agents Collaborative Human-Agent Protocol (CHAP) Impedance MPC for Physical Human-Robot Interaction: Predictive Disturbance Rejection with Joint-Limit Safety Formalizing all indexed mathematics as a benchmark for general reasoning, with the example of implementing dilatations of categories Face versus Body Tracking for Human-Robot Interaction: An Egocentric Dataset What LLMs Must Forget to Teach Effectively: A DIY Approach to Premodern Japanese Language Pedagogy The New Social Image: How AI Competency and AI Proactivity Influence Self- and Peer-Perceptions in the Workplace Inform, Coach, Relate, Listen: Auditing LLM Caregiving Support Roles Visual Matters: Connecting Aesthetic Appeal and Production Quality of Photos, Infographics and Data Visualizations to Credibility of Social Media Posts Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build Learning to Decide with AI Assistance under Human-Alignment Positive Alignment: Artificial Intelligence for Human Flourishing Sycophantic AI makes human interaction feel more effortful and less satisfying over time The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers Aligning Human-AI-Interaction Trust for Mental Health Support: Survey and Position for Multi-Stakeholders The Augmentation Trap: AI Productivity and the Cost of Cognitive Offloading Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior Clinically Aware Synthetic Image Generation for Concept Coverage in Chest X-ray Models K2MUSE: A human lower-limb multimodal walking dataset spanning task and acquisition variability for rehabilitation robotics Privacy-Preserving Empathy Detection in Video Interactions GlyTwin: Digital Twin for Glucose Control in Type 1 Diabetes Through Optimal Behavioral Modifications Using Patient-Centric Counterfactuals AgentDynEx: Nudging the Mechanics and Dynamics of Multi-Agent Simulations Creating and Evaluating Personas Using Generative AI: A Scoping Review of 81 Articles Social Human Robot Embodied Conversation (SHREC) Dataset: Benchmarking Foundational Models' Social Reasoning Designing Synthetic Discussion Generation Systems: A Case Study for Online Facilitation FSPO: Few-Shot Optimization of Synthetic Preferences Personalizes to Real Users ExplainReduce: Generating global explanations from many local explanations AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care "Would You Want an AI Tutor?" Understanding Stakeholder Perceptions of LLM-based Systems in the Classroom Influencing Humans to Conform to Preference Models for RLHF User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation LLAMADRS: Evaluating Open-Source LLMs on Real Clinical Interviews--To Reason or Not to Reason? LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot Who Benefits from AI? Self-Selection, Skill Gap, and the Hidden Costs of AI Feedback Visual Analysis of Multi-outcome Causal Graphs Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms VERA: Generating Visual Explanations of Two-Dimensional Embeddings via Region Annotation TouchAI: Exploring human-AI perceptual alignment in touch through language model representations Principled Evaluation with Human Labels: One Rater at a Time and Rater Equivalence Modelling and Analysing Behaviours and Emotions via Complex User Interactions Towards an automated query modification assistant U-Sem: Semantic Enrichment, User Modeling and Mining of Usage Data on the Social Web From Linked Data to Relevant Data -- Time is the Essence Mining User Comment Activity for Detecting Forum Spammers in YouTube An Empirical Study of Real-World SPARQL Queries User Modeling Combining Access Logs, Page Content and Semantics A Human-Centric Approach to Group-Based Context-Awareness Survey on Various Gesture Recognition Techniques for Interfacing Machines Based on Ambient Intelligence Integration of Flexible Web Based GUI in I-SOAS New Methods of Analysis of Narrative and Semantics in Support of Interactivity Emotional State Categorization from Speech: Machine vs. Human Using Soft Constraints To Learn Semantic Models Of Descriptions Of Shapes Integrating multiple sources to answer questions in Algebraic Topology How Controlled English can Improve Semantic Wikis Variations of the Turing Test in the Age of Internet and Virtual Reality Intent expression using eye robot for mascot robot system Fuzzy inference based mentality estimation for eye robot agent Modeling the Experience of Emotion Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Embedding Data within Knowledge Spaces Cooperative interface of a swarm of UAVs Edhibou: a Customizable Interface for Decision Support in a Semantic Portal Combining Semantic Wikis and Controlled Natural Language MOOPPS: An Optimization System for Multi Objective Scheduling Proposition of the Interactive Pareto Iterated Local Search Procedure - Elements and Initial Experiments AceWiki: Collaborative Ontology Management in Controlled Natural Language AceWiki: A Natural and Expressive Semantic Wiki An Intelligent Multi-Agent Recommender System for Human Capacity Building Collaborative model of interaction and Unmanned Vehicle Systems' interface SimDialog: A visual game dialog editor An Analysis of Key Factors for the Success of the Communal Management of Knowledge Effective Generation of Subjectively Random Binary Sequences Practical Approach to Knowledge-based Question Answering with Natural Language Understanding and Advanced Reasoning The Cyborg Astrobiologist: Porting from a wearable computer to the Astrobiology Phone-cam Can the Internet cope with stress? Personalizing Image Search Results on Flickr Social Information Processing in Social News Aggregation Coupling Control and Human-Centered Automation in Mathematical Models of Complex Systems Social Browsing on Flickr Social Networks and Social Information Filtering on Digg Reuse of designs: Desperately seeking an interdisciplinary cognitive approach Communication of Social Agents and the Digital City - A Semiotic Perspective Understanding Design Fundamentals: How Synthesis and Analysis Drive Creativity, Resulting in Emergence Improving the CSIEC Project and Adapting It to the English Teaching and Learning in China Field geology with a wearable computer: 1st results of the Cyborg Astrobiologist System Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks The Cyborg Astrobiologist: Scouting Red Beds for Uncommon Features with Geological Significance The Cyborg Astrobiologist: First Field Experience Semantic filtering by inference on domain knowledge in spoken dialogue systems Robust Dialogue Understanding in HERALD ScheduleNanny: Using GPS to Learn the User's Significant Locations, Travel Times and Schedule The role of robust semantic analysis in spoken language dialogue systems A Situation Calculus-based Approach To Model Ubiquitous Information Services Semi-metric Behavior in Document Networks and its Application to Recommendation Systems Fast Hands-free Writing by Gaze Direction Tree-gram Parsing: Lexical Dependencies and Structural Relations Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies Representing Scholarly Claims in Internet Digital Libraries: A Knowledge Modelling Approach
Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality
Jiahuan Pei, Irene Viola, Haochen Huang, Junxiao Wang, Moonisa A · 2024-05-16 · via cs.HC updates on arXiv.org

Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an autonomous workflow tailored for integrating AI agents seamlessly into extended reality (XR) applications for fine-grained training. We present a demonstration of a multimodal fine-grained training assistant for LEGO brick assembly in a pilot XR environment. Specifically, we design a cerebral language agent that integrates LLM with memory, planning, and interaction with XR tools and a vision-language agent, enabling agents to decide their actions based on past experiences. Furthermore, we introduce LEGO-MRTA, a multimodal fine-grained assembly dialogue dataset synthesized automatically in the workflow served by a commercial LLM. This dataset comprises multimodal instruction manuals, conversations, XR responses, and vision question answering. Last, we present several prevailing open-resource LLMs as benchmarks, assessing their performance with and without fine-tuning on the proposed dataset. We anticipate that the broader impact of this workflow will advance the development of smarter assistants for seamless user interaction in XR environments, fostering research in both AI and HCI communities.