Improving Coherence and Persistence in Agentic AI for System Optimization

cs.AI cs.CL Pantea Karimi, Kimia Noorbakhsh, Mohammad Alizadeh, Hari Balakrishnan · Mar 22, 2026

What it does

Why it matters

This paper introduces Engram, an agentic architecture that sidesteps the 'coherence ceiling' of single-context LLM agents and the 'evolutionary neighborhood bias' of code-mutation systems by decoupling long-horizon exploration into...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

Designing high-performance system heuristics traditionally requires human experts to navigate multi-step conceptual shifts. This paper introduces Engram, an agentic architecture that sidesteps the 'coherence ceiling' of single-context LLM agents and the 'evolutionary neighborhood bias' of code-mutation systems by decoupling long-horizon exploration into sequential agent handoffs. Each agent distills findings into a persistent Research Digest, enabling cumulative progress without context degradation.

Critical review

Verdict

Bottom line

The paper presents a compelling architectural innovation for automated systems research. Engram's structured handoff mechanism—where agents archive discoveries into a persistent Digest before refreshing their context—provides a principled solution to context rot in long-horizon reasoning tasks. The empirical evaluation across multicast routing, LLM inference scheduling, and database query optimization demonstrates consistent improvements over both evolutionary baselines and prior agentic frameworks. However, the favorable comparison against Glia warrants scrutiny given significant author overlap between the two systems, and the paper understates the engineering complexity required to implement the Research Digest schema effectively.

“Engram exhibits superior performance across diverse domains including multi-cloud multicast, LLM inference request routing, and optimizing KV cache reuse in databases with natural language queries.”

Paper · Abstract

“Engram organizes exploration into a sequence of agents that iteratively design, test, and analyze mechanisms... Subsequent agents then begin with a fresh context window, reading the Research Digest to build on prior discoveries.”

Paper · Section 1

What holds up

The conceptual taxonomy distinguishing evolutionary (low coherence/flexibility), iterative agentic (low persistence), and Engram (high on all three) modalities is crisp and validated by the multi-cloud multicast case study (Table 1, Fig. 1). The ablation study rigorously isolates design contributions: sequential agents without structured knowledge transfer underperform full Engram, confirming that mere context resetting is insufficient without the Digest mechanism. Notably, the system's tolerance for temporary regressions—persisting with MILP formulations despite intermediate cost explosions to eventually achieve superior solutions—demonstrates genuine multi-step conceptual navigation rather than local optimization.

“Engram tolerates temporary score degradation, uses the failure to diagnose what is missing, and then continues refining within the same algorithmic family until the optimization becomes tractable and wins.”

Paper · Section 4.1

“Removing the Digest has a larger negative impact, suggesting that Digest plays a more critical role in guiding future agents than raw information in Archive.”

Paper · Figure 11

Main concerns

The evaluation against Glia (Hamadanian* et al., 2025) involves shared authors and institutional knowledge, raising questions about whether prompt engineering or implementation tuning advantages specific to Engram were unavailable to baseline configurations. The paper omits cost analysis—API token expenditure for 100-evaluation-run experiments across nine tasks likely represents significant financial investment, limiting accessibility. Additionally, while the Research Digest is architecturally central, its schema, summarization prompts, and retrieval mechanisms remain underspecified, constraining independent reproduction. The claim that Engram 'discovers' heuristics relies heavily on the quality of initial prompting: performance degrades significantly with minimal prompts (Fig. 7b), indicating the system requires substantial strategic guidance rather than operating autonomously from first principles.

“When removing the direction, evolutionary approaches and Glia remain trapped in Steiner-tree–style heuristics... In contrast, Engram discovers solver-backed designs.”

Paper · Section 4.1

“Method comparison with simple minimal prompt... Engram with o3 on average reaches better solutions than OpenEvolve, even when OpenEvolve uses gpt-5.2.”

Paper · Figure 7(b)

“We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired, multi-agent workflow.”

Glia paper (Hamadanian* et al., 2025) · arXiv abstract

Evidence and comparison

The evidence strongly supports Engram's superiority over evolutionary methods (OpenEvolve, FunSearch, EoH), with the multi-cloud multicast results clearly illustrating neighborhood bias in evolutionary approaches that remain fixated on Steiner-tree heuristics. However, the comparison to 'Human SOTA' relies on previously published algorithms rather than concurrent human expert attempts under identical time constraints, potentially skewing the benchmark. The evaluation spans nine distinct problems, lending external validity, though the ADRS benchmark tasks appear to favor algorithmic reasoning over other system design dimensions. The paper fairly notes that Engram requires high-level direction prompts to achieve optimal results, avoiding overstated claims of fully autonomous discovery.

“A comment block in one top solution even emphasizes this point explicitly: 'The whole routine is heuristically efficient—no MILP solver invocation—yet it typically cuts total egress cost by >30%.'”

Paper · Section 4.1

“Engram exceeds Human SOTA on five of six tasks and improves over OpenEvolve on four.”

Paper · Table 3

Reproducibility

Reproducibility is partially addressed but incomplete. The implementation leverages the deepagents library on LangChain/LangGraph, yet no public repository URL for Engram itself is provided in the text. Hyperparameters for baseline methods appear in Table 2, and the evaluation uses the ADRS benchmark (UCB-ADRS, 2026) with specified simulation budgets (100 runs). However, critical implementation details—including the exact Research Digest format, system prompt templates (only partially shown in Appendix C), and the specific LLM API versioning (o3, gpt-5.2)—are insufficiently documented to enable exact replication. No token count or cost metrics are reported, obscuring the economic feasibility of the approach.

“We have implemented Engram using the deepagents library built on LangChain and LangGraph (Chase, 2022).”

Paper · Section 3

“Experimental settings and hyperparameters... We run each approach 10 times and each run has a budget of 100 evaluation runs.”

Paper · Table 2

Abstract

Designing high-performance system heuristics is a creative, iterative process requiring experts to form hypotheses and execute multi-step conceptual shifts. While Large Language Models (LLMs) show promise in automating this loop, they struggle with complex system problems due to two critical failure modes: evolutionary neighborhood bias and the coherence ceiling. Evolutionary methods often remain trapped in local optima by relying on scalar benchmark scores, failing when coordinated multi-step changes are required. Conversely, existing agentic frameworks suffer from context degradation over long horizons or fail to accumulate knowledge across independent runs. We present Engram, an agentic researcher architecture that addresses these limitations by decoupling long-horizon exploration from the constraints of a single context window. Engram organizes exploration into a sequence of agents that iteratively design, test, and analyze mechanisms. At the conclusion of each run, an agent stores code snapshots, logs, and results in a persistent Archive and distills high-level modeling insights into a compact, persistent Research Digest. Subsequent agents then begin with a fresh context window, reading the Research Digest to build on prior discoveries. We find that Engram exhibits superior performance across diverse domains including multi-cloud multicast, LLM inference request routing, and optimizing KV cache reuse in databases with natural language queries.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.