Neural Computers

cs.LG cs.AI cs.LG Mingchen Zhuge, Changsheng Zhao, Haozhe Liu, Zijian Zhou, Shuming Liu, Wenyi Wang, Ernie Chang, Gael Le Lan, Junjie Fei, Wenxuan Zhang, Yasheng Sun, Zhipeng Cai, Zechun Liu, Yunyang Xiong, Yining Yang, Yuandong Tian, Yangyang Shi, Vikas Chandra, Jürgen Schmidhuber · Apr 7, 2026

What it does

Why it matters

This work instantiates early NC prototypes as video models that roll out terminal and desktop interfaces from text, pixels, and actions—showing that basic I/O alignment and short-horizon control are learnable without privileged program...

Main concern

Community signal

0 up · 0 down

AI Review AI reviewed

Plain-language introduction

Neural Computers (NCs) propose a new machine form where computation, memory, and I/O are unified inside a learned latent runtime state rather than separated as in conventional computers or external as in agents. This work instantiates early NC prototypes as video models that roll out terminal and desktop interfaces from text, pixels, and actions—showing that basic I/O alignment and short-horizon control are learnable without privileged program state. The results demonstrate early runtime primitives but also highlight that symbolic stability, routine reuse, and runtime governance remain unsolved on the long path toward the envisioned Completely Neural Computer (CNC).

Critical review

Verdict

Bottom line

This is an ambitious position paper paired with early empirical prototypes that successfully demonstrate interface rendering and local action fidelity, yet fall short on the symbolic reasoning and long-horizon consistency required for the proposed CNC vision. The work clearly delineates the gap between current video-based NCs—essentially action-conditioned world models for interfaces—and the Turing-complete, universally programmable, behavior-consistent runtime the authors call a CNC. The empirical results validate that short-horizon control and I/O alignment are achievable, but the arithmetic-probe failures (4–83% depending on prompting) and lack of demonstrated routine reuse suggest the CNC roadmap remains largely aspirational.

“Our long-term goal is the Completely Neural Computer (CNC): the mature, general-purpose realization of this emerging machine form, with stable execution, explicit reprogramming, and durable capability reuse.”

Zhuge et al., Neural Computers · Section 1

“Current NCs already realize early runtime primitives, especially I/O alignment and short-horizon control, while stable reuse and general-purpose execution remain out of reach.”

Zhuge et al., Neural Computers · Section 4.1

What holds up

The video-based instantiation credibly establishes that neural networks can learn to render structured interface state and respond to local action inputs with measurable fidelity. The CLI prototype achieves 54% character-level OCR accuracy and 0.54 exact-line accuracy, while the GUI prototype reaches 98.7% cursor accuracy when given explicit visual supervision. The ablation studies are thorough: they show that data quality dominates scale (110 hours of goal-directed data outperforms 1,400 hours of random exploration), that internal action-injection outperforms external conditioning (SSIM 0.863 vs 0.746), and that reprompting can bootstrap arithmetic performance from 4% to 83%, revealing the models' strength as steerable renderers even if not native reasoners.

“Character accuracy increases from 0.03 at initialization to 0.54 at 60k steps, with exact-line matches reaching 0.31.”

Zhuge et al., Neural Computers · Section 3.1.4, Table 4

“Under this explicit visual conditioning, cursor accuracy improves to 98.7%.”

Zhuge et al., Neural Computers · Section 3.2.4, Experiment 8

“Reprompting improves symbolic probes (4%→83%; Figure 6), reinforcing the view that current models are strong renderers and conditionable interfaces rather than native reasoners.”

Zhuge et al., Neural Computers · Section 3.1.4, Experiment 6

Main concerns

The central concern is the leap from impressive interface rendering to claims about future Turing-complete, self-contained computers. Current prototypes exhibit severe symbolic instability: without reprompting, the CLI model achieves only 4% on basic arithmetic probes, indicating that the latent state does not reliably encode symbolic computation. The evaluation is limited to open-loop rollouts against logged traces, so stability under closed-loop interaction and long-horizon task execution remains unverified. The paper also offers no empirical demonstration of the CNC-defining properties of routine reuse, installable capabilities, or behavior consistency—critical gaps given that these are posited as the primary advantages over agents and conventional computers. Finally, the comparison to Sora2 (71% arithmetic accuracy vs 4%) is under-explained, with only speculative hypotheses offered for the disparity.

“Table 5 shows that current video models, including this NC instantiation, struggle on these symbolic tasks. Wan2.1 achieves 0% accuracy, our NCCLIGen model reaches 4%, and Veo3.1 manages 2%.”

Zhuge et al., Neural Computers · Section 3.1.4, Experiment 5

“In the present prototypes, these prompts and actions are logged conditioning streams, so evaluation remains open-loop rather than closed-loop interaction with a live environment.”

Zhuge et al., Neural Computers · Section 3

“Substantial challenges remain in robust long-horizon reasoning, reliable symbolic processing, stable capability reuse, and explicit runtime governance.”

Zhuge et al., Neural Computers · Section 1

Evidence and comparison

The evidence supports the narrow claim that video models can learn interface dynamics from I/O traces, but it does not yet support the broader CNC thesis of a unified, programmable runtime. The comparison to related work is conceptually crisp—the authors clearly distinguish NCs from agents (which mediate external computers) and world models (which predict environment dynamics)—yet the empirical gap between the current prototypes and these existing system objects is not quantified. The arithmetic probe results (Table 5) suggest that Sora2 may have latent capabilities the authors' model lacks, but the paper offers only unverified hypotheses for this discrepancy rather than controlled experiments isolating model scale, data, or conditioning factors.

“Conventional computers are used directly, in today's agent stack, agents mediate existing computers while world models serve as a parallel predictive layer; NCs aim to unify these split functions within one learned runtime.”

Zhuge et al., Neural Computers · Section 4.3

“Sora2's 71% accuracy is a notable outlier and may reflect system-level advantages or additional training beyond our current setup.”

Zhuge et al., Neural Computers · Section 3.1.4, Table 5

Reproducibility

The paper provides substantial technical detail for reproduction, including exact model architectures (Wan2.1-based with DiT stacks), data pipelines (asciinema for CLI, vhs for Clean, Dockerized environments for GUI), and training regimes (∼15,000 H100 hours for CLIGen General, ∼7,000 for Clean, ∼23k GPU-hours for GUIWorld). Hyperparameters are specified (AdamW, lr $5 \times 10^{-5}$, weight decay $10^{-2}$, bfloat16, gradient clipping at 1.0), and the data engine construction is documented in depth. However, no code, model weights, or interactive demonstration environments have been released at the time of writing, which blocks independent verification of the CNC roadmap claims.

“Training NCCLIGen on CLIGen (General) requires ∼15,000 H100 GPU hours at batch size 1. Training on CLIGen (Clean) across both subsets requires ∼7,000 H100 GPU hours.”

Zhuge et al., Neural Computers · Section 3.1.3

“Runs use 64 GPUs for about 15 days, totaling about 23k GPU-hours per full pass.”

Zhuge et al., Neural Computers · Section 3.2.3

Abstract

We propose a new frontier: Neural Computers (NCs) -- an emerging machine form that unifies computation, memory, and I/O in a learned runtime state. Unlike conventional computers, which execute explicit programs, agents, which act over external execution environments, and world models, which learn environment dynamics, NCs aim to make the model itself the running computer. Our long-term goal is the Completely Neural Computer (CNC): the mature, general-purpose realization of this emerging machine form, with stable execution, explicit reprogramming, and durable capability reuse. As an initial step, we study whether early NC primitives can be learned solely from collected I/O traces, without instrumented program state. Concretely, we instantiate NCs as video models that roll out screen frames from instructions, pixels, and user actions (when available) in CLI and GUI settings. These implementations show that learned runtimes can acquire early interface primitives, especially I/O alignment and short-horizon control, while routine reuse, controlled updates, and symbolic stability remain open. We outline a roadmap toward CNCs around these challenges. If overcome, CNCs could establish a new computing paradigm beyond today's agents, world models, and conventional computers.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

Challenges are public to read, but only signed-in members can post them. Your challenge text is stored with your account for moderation, but usernames are not shown in the public thread.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.