Loading...

Login Join Free Now

Academic

Latest AI Research

Stay ahead of the curve with our curated collection of the most impactful Artificial Intelligence research papers.

Synthetic Computers at Scale for Long-Horizon Productivity Simulation

Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.

Fri 1 May 2026

Authors: Tao Ge, Baolin Peng, Hao Cheng, Jianfeng Gao

LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

Electroencephalogram (EEG) signals are vital for automated seizure detection, but their inherent noise makes robust representation learning challenging. Existing graph construction methods, whether correlation-based or learning-based, often generate redundant or irrelevant edges due to the noisy nature of EEG data.

Thu 30 Apr 2026

Authors: Lincan Li, Zheng Chen, Yushun Dong

Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one another.

Wed 29 Apr 2026

Authors: Yujun Wu, Dongxu Zhang, Xinchen Li, Jinhang Xu, Yiling Duan, Yumou Liu, Jiabao Pan, Xuanhe Zhou, Jingxuan Wei, Siyuan Li, Jintao Chen, Conghui He, Cheng Tan

Normativity and Productivism: Ableist Intelligence? A Degrowth Analysis of AI Sign Language Translation Tools for Deaf People

Sign languages, of any geographical or accentual variation, understandably face continuous scrutiny under the ever present popularity of verbal dictation and audism. Through this, many potential problems arise with the current lack of accessible communication for those who rely on such sign languages for essential conversation.

Tue 28 Apr 2026

Authors: Nina Seron-Abouelfadil, Poppy Fynes

Splitting Argumentation Frameworks with Collective Attacks and Supports

This work proposes novel splitting techniques for argumentation formalisms that incorporate supports between defeasible elements. We base our studies on bipolar set-based argumentation frameworks (BSAFs) which generalize argumentation frameworks with collective attacks (SETAFs), as well as bipolar argumentation frameworks (BAFs), by incorporating both collective attacks and supports.

Mon 27 Apr 2026

Authors: Matti Berthold, Lydia Blümel, Giovanni Buraglio, Anna Rapberger

Mapping the Methodological Space of Classroom Interaction Research: Scale, Duration, and Modality in an Age of AI

Research on classroom interaction has long been divided between large-scale observation and in-depth ethnographic work. We propose a framework mapping this methodological space along three dimensions--scale, duration, and modality--where a study's position shapes what it reveals and obscures.

Sun 26 Apr 2026

Authors: Dorottya Demszky, Edith Bouton, Alison Twiner, Sara Hennessy, Richard Correnti

What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design

Terminal-agent benchmarks have become a primary signal for measuring the coding and system-administration capabilities of large language models. As the market for evaluation environments grows, so does the pressure to ship tasks quickly, often without thorough adversarial review of the verification logic.

Sat 25 Apr 2026

Authors: Ivan Bercovich

Characterizing the Consistency of the Emergent Misalignment Persona

Fine-tuning large language models (LLMs) on narrowly misaligned data generalizes to broadly misaligned behavior, a phenomenon termed emergent misalignment (EM). While prior work has found a correlation between harmful behavior and self-assessment in emergently misaligned models, it remains unclear how consistent this correspondence is across tasks and whether it varies across fine-tuning domains.

Fri 24 Apr 2026

Authors: Anietta Weckauff, Yuchen Zhang, Maksym Andriushchenko

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

Large language models (LLMs) make reward design in reinforcement learning substantially more scalable, but generated rewards are not automatically reliable training objectives. Existing work has focused primarily on generating, evolving, or selecting reward candidates, while paying less attention to when such candidates can be verified and deployed during policy optimization.

Thu 23 Apr 2026

Authors: Feiyu Wu, Xu Zheng, Zhuocheng Wang, Yi ming Dai, Hui Li

Agent-Agnostic Evaluation of SQL Accuracy in Production Text-to-SQL Systems

Text-to-SQL (T2SQL) evaluation in production environments poses fundamental challenges that existing benchmarks do not address. Current evaluation methodologies whether rule-based SQL matching or schema-dependent semantic parsers assume access to ground-truth queries and structured database schema, constraints that are rarely satisfied in real-world deployments.

Wed 22 Apr 2026

Authors: Taslim Jamal Arif, Kuldeep Singh

Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents

We present Collaborative Agent Reasoning Engineering (CARE), a disciplined methodology for engineering Large Language Model (LLM) agents in scientific domains. Unlike ad-hoc trial-and-error approaches, CARE specifies behavior, grounding, tool orchestration, and verification through reusable artifacts and systematic, stage-gated phases.

Tue 21 Apr 2026

Authors: Rahul Ramachandran, Nidhi Jha, Muthukumaran Ramasubramanian

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-specific characteristics. Here we introduce SpecVQA, a professional scientific-image benchmark for evaluating multimodal models on scientific spectral understanding, covering 7 representative spectrum types with expert-annotated question-answer pairs.

Mon 20 Apr 2026

Authors: Jialu Shen, Han Lyu, Suyang Zhong, Hanzheng Li, Haoyi Tao, Nan Wang, Changhong Chen, Xi Fang

A Pattern Language for Resilient Visual Agents

Integrating multimodal foundation models into enterprise ecosystems presents a fundamental software architecture challenge. Architects must balance competing quality attributes: the high latency and non-determinism of vision language action (VLA) models versus the strict determinism and real-time performance required by enterprise control loops.

Sun 19 Apr 2026

Authors: Habtom Kahsay Gidey, Alexander Lenz, Alois Knoll

Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from natural-language instructions. We compare three primary interaction paradigms, including domain-specific agents with structured tool use, computer-use agents, and general-purpose coding agents, by evaluating eight representative agents across 15 benchmark tasks and measuring visualization quality, efficiency, robustness, and computational cost.

Sat 18 Apr 2026

Authors: Jackson Vonderhorst, Kuangshi Ai, Haichao Miao, Shusen Liu, Chaoli Wang

D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery

Despite recent progress in language models and agents for scientific data-driven discovery, further advancing their capabilities is held back by the absence of verifiable environments representing real-world scientific this http URL fill this gap, we introduce D3-Gym, the first automatically constructed dataset with verifiable environments for scientific Data-Driven Discovery. D3-Gym comprises (1) 565 tasks sourced from 239 real scientific repositories across four disciplines where (2) each task is equipped with a natural language instruction, an executable environment with pre-installed dependencies, input dataset and artifact previews, a reference code solution, and an automatically synthesized evaluation script.

Fri 17 Apr 2026

Authors: Hanane Nour Moussa, Yifei Li, Zhuoyang Li, Yankai Yang, Cheng Tang, Tianshu Zhang, Nesreen K. Ahmed, Ali Payani, Ziru Chen, Huan Sun

From LLM-Driven Trading Card Generation to Procedural Relatedness: A Pokémon Case Study

Since the dawn of Trading Card Games, the genre has grown into a multi-billion-dollar industry engaging millions of analog and digital players worldwide. Popular TCGs rely on regular updates, balance adjustments, and rotating constraints to sustain engagement.

Thu 16 Apr 2026

Authors: Johannes Pfau, Panagiotis Vrettis

Splitting Assumption-Based Argumentation Frameworks

Assumption-Based Argumentation (ABA) is a well-established formalism for modelling and reasoning over debates, with a wide range of applications. However, the high computational complexity of core reasoning tasks in ABA poses a significant challenge for its applicability.

Wed 15 Apr 2026

Authors: Giovanni Buraglio, Wolfgang Dvorak, Stefan Woltran

Language Models Refine Mechanical Linkage Designs Through Symbolic Reflection and Modular Optimisation

Designing mechanical linkages involves combinatorial topology selection and continuous parameter fitting. We show that language models can systematically improve linkage designs through symbolic representations.

Tue 14 Apr 2026

Authors: João Pedro Gandarela, Thiago Rios, Stefan Menzel, André Freitas

LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning

Recent large language models (LLMs) have achieved impressive reasoning milestones but continue to struggle with high computational costs, logical inconsistencies, and sharp performance degradation on high-complexity problems. While neuro-symbolic methods attempt to mitigate these issues by coupling LLMs with symbolic reasoners, existing approaches typically rely on monotonic logics (e.

Mon 13 Apr 2026

Authors: Adam Ishay, Joohyung Lee

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

Graphical User Interface (GUI) agents have emerged as a promising paradigm for intelligent systems that perceive and interact with graphical interfaces visually. Yet supervised fine-tuning alone cannot handle long-horizon credit assignment, distribution shifts, and safe exploration in irreversible environments, making Reinforcement Learning (RL) a central methodology for advancing automation.

Sun 12 Apr 2026

Authors: Junan Hu, Jian Liu, Jingxiang Lai, Jiarui Hu, Yiwei Sheng, Shuang Chen, Jian Li, Dazhao Du, Song Guo