Latest AI Research
Stay ahead of the curve with our curated collection of the most impactful Artificial Intelligence research papers.
Consumer Attitudes Towards AI in Digital Health: A Mixed-Methods Survey in Australia
ReadAI applications are increasingly being introduced into digital health. While technical performance has advanced rapidly, successful deployment mainly depends on consumer attitudes, especially to patient-facing applications.
Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering
ReadMedical retrieval-augmented generation (RAG) systems typically operate on text chunks extracted from biomedical literature, discarding the rich visual content (tables, figures, structured layouts) of original document pages. We propose MED-VRAG, an iterative multimodal RAG framework that retrieves and reasons over PMC document page images instead of OCR'd text.
Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation
ReadDeploying vision-language models (VLMs) in clinical settings demands auditable behavior under realistic failure conditions, yet the failure landscape of frontier VLMs on specialized medical inputs is poorly characterized. We audit five recent frontier and grounding-aware VLMs (Gemini~2.
Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning
ReadThe risks posed by AI features are increasing as they are rapidly integrated into software applications. In response, regulations and standards for safe and secure AI have been proposed.
Contextual Agentic Memory is a Memo, Not True Memory
ReadCurrent agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error with provable consequences for agent capability, long-term learning, and security.
Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents
ReadCurrent embodied agents are often limited to passive instruction-following or reactive need-satisfaction, lacking a stable, high-order value framework essential for long-term, self-directed behavior and resolving motivational conflicts. We introduce \textit{ValuePlanner}, a hierarchical cognitive architecture that decouples high-level value scheduling from low-level action execution.
When Agents Evolve, Institutions Follow
ReadAcross millennia, complex societies have faced the same coordination problem of how to organize collective action among cognitively bounded and informationally incomplete individuals. Different civilizations developed different political institutions to answer the same basic questions of who proposes, who reviews, who executes, and how errors are corrected.
The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text
ReadWe introduce Target-Event-Agent Networks (TEA Nets) as a computational framework to extract subjects (``Agents"), verbs (``Events"), and objects (``Targets") from texts. Grounded in cognitive network science and artificial intelligence, TEA Nets are implemented as an open-source Python library.
Fairness for distribution network operations and planning
ReadThe incorporation of fairness into the distribution network (DN) planning and operation has become a key goal of recent studies. The cost of implementing fairness, denominated the price of fairness (PoF), covers the efficiency that is renounced for attaining social cohesion through fair outcomes.
From Context to Skills: Can Language Models Learn from Context Skillfully?
ReadMany real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context.
Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading
ReadCurrent Large Language Model (LLM) evaluation frameworks utilize the same static prompt template across all models under evaluation. This differs from the common industry practice of using prompt optimization (PO) techniques to optimize the prompt for each model to maximize application performance.
Generative structure search for efficient and diverse discovery of molecular and crystal structures
ReadPredicting stable and metastable structures is central to molecular and materials discovery, but remains limited by the cost of searching high-dimensional energy landscapes. Deep generative models offer efficient structure sampling, yet their outputs remain shaped by training data and can underexplore minima that are rare but physically relevant.
Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor
ReadLarge language models (LLMs) are commonly evaluated for political bias based on their responses to fixed questionnaires, which typically place frontier models on the political left. A parallel literature shows that LLMs are sycophantic: they adapt their answers to the views, identities, and expectations of the user.
WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning
ReadWe present WaferSAGE, a framework for wafer defect visual question answering using small vision-language models. To address data scarcity in semiconductor manufacturing, we propose a three-stage synthesis pipeline incorporating structured rubric generation for precise evaluation.
Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs
ReadTo enhance LLMs' impact on math education, we need data on their mathematical prowess and biases across prompts. To fill this gap, we introduce MEDS (Math Education Digital Shadows) as a dataset mapping how large language models reason about and report mathematics across human- and AI-like conditions.
Trace-Level Analysis of Information Contamination in Multi-Agent Systems
ReadReasoning over heterogeneous artifacts (PDFs, spreadsheets, slide decks, etc. ) increasingly occurs within structured agent workflows that iteratively extract, transform, and reference external information.
SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation
ReadAutomatically generating interactive 3D indoor scenes from natural language is crucial for virtual reality, gaming, and embodied AI. However, existing LLM-based approaches often suffer from spatial errors and collisions, in part because common scene representations-raw coordinates or verbose code-are difficult for models to reason about 3D spatial relationships and physical constraints.
In-Context Examples Suppress Scientific Knowledge Recall in LLMs
ReadScientific reasoning rarely stops at what is directly observable; it often requires uncovering hidden structure from data. From estimating reaction constants in chemistry to inferring demand elasticities in economics, this latent structure recovery is what distinguishes scientific reasoning from curve fitting.
Belief-Guided Inference Control for Large Language Model Services via Verifiable Observations
ReadIn black-box large language model (LLM) services, response reliability is often only partially observable at decision time, while stronger inference pathways incur substantial computational cost, inducing a budgeted sequential decision problem: for each request, the system should decide whether the default low-cost response is sufficiently reliable or whether additional computation should be allocated to improve response quality. In this paper, we propose \textbf{Ver}ifiable \textbf{O}bservations for Risk-aware \textbf{I}nference \textbf{C}ontrol (\textsc{Veroic}), a framework for adaptive inference control in black-box LLM settings, which formulates request-time control as a \textit{partially observable Markov decision process} to capture partial observability and sequential budget coupling.
PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations
ReadVision-Language-Action (VLA) models advance robotic control via strong visual-linguistic priors. However, existing VLAs predominantly frame pretraining as supervised behavior cloning, overlooking the fundamental nature of robot learning as a goal-reaching process that requires understanding temporal task progress.