Loading...

Login Join Free Now

Academic

Latest AI Research

Stay ahead of the curve with our curated collection of the most impactful Artificial Intelligence research papers.

InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution settings.

Mon 2 Mar 2026

Authors: Qiyao Wang, Haoran Hu, Longze Chen, Hongbo Wang, Hamid Alinejad-Rokny, Yuan Lin, Min Yang

Leading Across the Spectrum of Human-AI Relationships: A Conceptual Framework for Increasingly Heterogeneous Teams

What shapes a consequential decision when human and artificial intelligence work on it together? The answer is becoming harder to see. A decision may look human-led after AI has set the frame, or appear automated while human judgment still carries decisive force.

Sun 1 Mar 2026

Authors: Alejandro R. Jadad

Robust Learning on Heterogeneous Graphs with Heterophily: A Graph Structure Learning Approach

Heterogeneous graphs with heterophily have emerged as a powerful abstraction for modeling complex real-world systems, where nodes of different types and labels interact in diverse and often non-homophilous ways. Despite recent advances, robust representation learning for such graphs remains largely unexplored, particularly in the presence of noisy or misleading connectivity.

Sat 28 Feb 2026

Authors: Yihan Zhang, Ercan E. Kuruoglu

Measurement Risk in Supervised Financial NLP: Rubric and Metric Sensitivity on JF-ICR

As LLMs become credible readers of earnings calls, investor-relations Q\&A, guidance, and disclosure language, supervised financial NLP benchmarks increasingly function as decision evidence for model selection and deployment. A hidden assumption is that gold labels make such evidence objective.

Fri 27 Feb 2026

Authors: Sidi Chang, Peiying Zhu, Yuxiao Chen, Rongdong Chai

TIO-SHACL: Comprehensive SHACL validation for TMF Intent Ontologies

Intent-based networking promises to revolutionize telecommunications network management by enabling operators to specify high-level goals rather than low-level configurations. The TM Forum Intent Ontology (tio) provides a standardized vocabulary for expressing network intents, yet lacks formal validation mechanisms to ensure intent correctness before its admission.

Thu 26 Feb 2026

Authors: Jean Martins, Leonid Mokrushin, Marin Orlic

Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems

As large language model (LLM) agents are deployed in high-stakes environments, the question of how safely to delegate subtasks to specialized sub-agents becomes critical. Existing work addresses multi-agent architecture selection at design time or provides broad empirical guidelines, but neither provides a runtime mechanism that dynamically adjusts the safety-efficiency trade-off as task context changes during execution.

Wed 25 Feb 2026

Authors: Yuan Sun

CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI Explanations

Explainable AI (XAI) aims to improve user understanding and decisions when using AI models. However, despite innovations in XAI, recent user evaluations reveal that this goal remains elusive.

Tue 24 Feb 2026

Authors: Louth Bin Rawshan, Zhuoyu Wang, Brian Y. Lim

Heterogeneous Scientific Foundation Model Collaboration

Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address specialized tasks beyond natural language.

Mon 23 Feb 2026

Authors: Zihao Li, Jiaru Zou, Feihao Fang, Xuying Ning, Mengting Ai, Tianxin Wei, Sirui Chen, Xiyuan Yang, Jingrui He

Investigating More Explainable and Partition-Free Compositionality Estimation for LLMs: A Rule-Generation Perspective

Compositional generalization tests are often used to estimate the compositionality of LLMs. However, such tests have the following limitations: (1) they only focus on the output results without considering LLMs' understanding of sample compositionality, resulting in explainability defects; (2) they rely on dataset partition to form the test set with combinations unseen in the training set, suffering from combination leakage issues.

Sun 22 Feb 2026

Authors: Ziyao Xu, Cong Wang, Houfeng Wang

End-to-End Evaluation and Governance of an EHR-Embedded AI Agent for Clinicians

Clinical AI systems require not just point-in-time evaluation but continuous governance: the ongoing practice of monitoring, evaluating, iterating, and re-evaluating performance throughout deployment. We present an end-to-end framework of governance that integrates rubric validation, live deployment feedback, technical performance monitoring, and cost tracking, with controlled experimentation gating system changes before deployment.

Sat 21 Feb 2026

Authors: Aaryan Shah, Andrew Hines, Alexia Downs, Denis Bajet, Paulius Mui, Fabiano Araujo, Laura Offutt, Aida Rutledge, Elizabeth Jimenez

METASYMBO: Multi-Agent Language-Guided Metamaterial Discovery via Symbolic Latent Evolution

Metamaterial discovery seeks microstructured materials whose geometry induces targeted mechanical behavior. Existing inverse-design methods can efficiently generate candidates, but they typically require explicit numerical property targets and are less suitable for early-stage exploration, where researchers often begin with incomplete constraints and qualitative intents expressed in natural language.

Fri 20 Feb 2026

Authors: Jianpeng Chen, Wangzhi Zhan, Dongqi Fu, Junkai Zhang, Zian Jia, Ling Li, Wei Wang, Dawei Zhou

Machine Collective Intelligence for Explainable Scientific Discovery

Deriving governing equations from empirical observations is a longstanding challenge in science. Although artificial intelligence (AI) has demonstrated substantial capabilities in function approximation, the discovery of explainable and extrapolatable equations remains a fundamental limitation of modern AI, posing a central bottleneck for AI-driven scientific discovery.

Thu 19 Feb 2026

Authors: Gyoung S. Na, Chanyoung Park

Learning Rate Engineering: From Coarse Single Parameter to Layered Evolution

Learning rate scheduling has evolved from the single global fixed rate of early SGD to sophisticated layer-wise adaptive strategies. We systematize this evolution into five generations: (Gen1) global fixed learning rates, (Gen2) global scheduling, (Gen3) parameter-level adaptation, (Gen4) layer-level differentiation, and (Gen5) joint layer-time scheduling.

Wed 18 Feb 2026

Authors: Ming-Hong Yao, Di Wang, Jian Cui, Jin-Yan Chen, Zi-Hao Cui, Fa Wang, Chen Wei, Qiu-Ye Yu

The Two Boundaries: Why Behavioral AI Governance Fails Structurally

Every system that performs effects has two boundaries: what it can do (expressiveness) and what governance covers (governance). In nearly all deployed AI systems, these boundaries are defined independently, creating three regions: governed capabilities (the only useful region), ungoverned capabilities (risk), and governance policies that address non-existent capabilities (theater).

Tue 17 Feb 2026

Authors: Alan L. McCann

Mechanized Foundations of Structural Governance: Machine-Checked Proofs for Governed Intelligence

We present five results in the theory of structural governance for cognitive workflow systems. Three are mechanized in Coq 8.

Mon 16 Feb 2026

Authors: Alan L. McCann

The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms

As AI transitions toward multi-agent systems (MAS) to solve complex workflows, research paradigms operate on the axiomatic assumption that agent collaboration mirrors the "Wisdom of the Crowd". We challenge this assumption by formalizing the Consensus Paradox: a phenomenon where agentic swarms prioritize internal architectural agreement over external logical truth.

Sun 15 Feb 2026

Authors: Dahlia Shehata, Ming Li

OptimusKG: Unifying biomedical knowledge in a modern multimodal graph

Biomedical knowledge graphs (KGs) are widely used in the life sciences, yet many are derived from unstructured documents and therefore lack schema-level constrains, whereas graphs assembled from structured resources are difficult to harmonize into a unified representation. We present OptimusKG, a multimodal biomedical labeled property graph (LPG) built from structured and semi-structured resources to preserve factual, type-specific metadata across molecular, anatomical, clinical, and environmental domains.

Sat 14 Feb 2026

Authors: Lucas Vittor, Ayush Noori, Iñaki Arango, Joaquín Polonuer, Sam Rodriques, Andrew White, David A. Clifton, Marinka Zitnik

AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling

Recent advances in multimodal large language models (LLMs) have revolutionized web agents that can automate complex tasks on websites. However, their accuracy remains limited by the scarcity of high-quality web trajectory training data.

Fri 13 Feb 2026

Authors: Fazle Elahi Faisal, Qianhui Wu, Baolin Peng, Jianfeng Gao

Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

Tool-calling agents are evaluated on tool selection, parameter accuracy, and scope recognition, yet LLM trajectory assessments remain inherently post-hoc. Disconnected from the active execution loop, such assessments identify errors that are usually addressed through prompt-tuning or retraining, and fundamentally cannot course-correct the agent in real time.

Thu 12 Feb 2026

Authors: Anh Ta, Junjie Zhu, Shahin Shayandeh

When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis

Democratic discourse analysis systems increasingly rely on multi-agent LLM pipelines in which distinct evaluator models are assigned adversarial roles to generate structured, multi-perspective assessments of political statements. A core assumption is that models will reliably maintain their assigned roles.

Wed 11 Feb 2026

Authors: Juergen Dietrich