Qira Research · Patent Pending

AI that won't make
things up about your stuff.

Paste your notes, a doc, an email thread — LOLM answers only from your material, cites the exact passage it used, and tells you plainly when the answer isn't in there, instead of inventing one. A 70B model writes; a local uncertainty-control layer keeps it honest and hands you an auditable receipt for every answer. The AI that knows when it doesn't know.

answers from your sources cites the passage says “not in here” honestly a receipt for every run patent pending
lolm-nfet-client · v0.1.1 · MIT npmjs.com →
$ npm install lolm-nfet-client

// stream a live run — every token, decision and receipt
import { runAgent, friendly } from 'lolm-nfet-client';

const run = await runAgent({
  command: 'why is the sky blue?',
  onEvent: ev => { const line = friendly(ev); if (line) console.log(line); },
  onProof: p  => console.log(p.verdict),     // honest receipt, every run
});
Architecture

Five streams. One fused representation.

Each token flows through five parallel streams that converge via learned, per-dimension fusion — the surface stays fluent while the latent core tracks what the sequence is actually doing.

ot = g · LN(Wh ht) + (1 − g) · LN(Wz zt) + Wm mt + Wr rt

Surface Decoder

h — local token relationships

Pre-norm Transformer with rotary position embeddings. The fluent voice.

Latent SSM Core

z — slow latent dynamics

Selective state-space model (Mamba-style) with parallel scan. Tracks the order beneath the words.

Regime Layer

r — discrete phase detection

Gumbel-Softmax over 64 codes with causal-conv neighbor interaction. Gradient-isolated so no code collapses.

Persistent Memory

m — cross-sequence state

Three banks — episodic, semantic, self — with gated, chunked read/write that keeps gradients flowing.

Manifestation Gate

g — surface vs latent arbitration

Per-dimension sigmoid deciding, feature by feature, whether the surface or the latent stream speaks.

The order beneath the words is load-bearing.

Why this matters for you, not just the math. The latent core carries a minority of the representation — yet remove it and the fluent surface collapses. That latent signal is exactly what the controller reads to tell when the model is on solid ground and when it isn't. It's the difference between an AI that sounds sure and one that can measure whether it should be.

Evidence

Every claim here is a mechanism you can check.

Nothing on this page is an adjective. The control decisions, the receipts, the math checks and the confidence map are all things you can reproduce in the live demo or read in the code. The architecture is validated independently on NVIDIA H200 and Google TPU v4 against parameter-matched baselines — the full benchmark numbers and ablations live in the code and the proof pack, kept off this page on purpose.

The NFET Agent

Control from latent dynamics.

Every token, the model measures four observables of its own trajectory — logit entropy, hidden drift, gate balance, regime entropy. A learned controller maps them to five actions that drive a real agent loop.

Prompted agents ask the model whether it is uncertain. This agent measures it.— Noise-Driven Functional Emergence Theory (NFET), applied
0
continue

Trajectory healthy — keep generating.

1
retrieve

Sustained uncertainty — go get evidence from memory or the web.

2
verify

Representation jumped while uncertain — check the draft.

3
branch

Stuck in a rut — fork alternatives, keep the healthiest.

4
finalize

Confident and stable — wrap up and answer.

command ──► generate segment ──► decide(telemetry, head) ▲ │ │ continue ───────┤ │ retrieve ───────┼──► evidence injected mid-run │ verify ─────────┼──► critique fed back to the draft │ branch ─────────┼──► healthiest fork survives └───────────────────────┘ finalize ───────► answer + proof receipt

A calibrated control policy works untrained from the first run; the control head then trains on the workspace's own logged traffic and takes over only when it is confident — otherwise the heuristic decides. Every run emits an honest receipt of what the controller actually did; it never claims the answer beat a baseline. A frontier mode lets a large reasoner generate while LOLM's latent machinery monitors and controls — and the whole workspace speaks the open Model Context Protocol, so it plugs into any modern agent stack.

Autonomy, bounded

It can act on its own — and it earns the right.

The same measured uncertainty that drives the controller also gates real action. LOLM acts autonomously only when its calibrated probability of being correct clears a risk-tiered bar — generous for read-only, near-zero for anything that touches money — and every action's outcome is verified before the receipt may say it happened.

Five honest levels

L1 → L5

From receipt-monitored answers to a bounded persistent agent that maintains goals, memory, verification, scheduling and tools. The system reports its real current level — never a higher one. It's live at /api/demo/agent/level.

Gated tools, verified outcomes

read · reversible only

It runs read and reversible tools on its own — each gated by measured uncertainty and its outcome independently checked. Money, sending, deletion and deploys are hard-gated to a human, no matter how confident it is. That ceiling is in the math, not a policy doc.

Bold where cheap, humble where not

the safety asymmetry

It acts where a mistake is recoverable and escalates where it isn't. A missing uncertainty signal is a reason to ask, never a license to act — and it stops itself at a budget, a safety limit, or when nothing more is worth doing.

Contribution, isolated

What each layer actually adds.

The fair question a reviewer asks: is the intelligence the big model, the telemetry, or the control? Here is what changes — and what does not — at each layer. We name what we have not measured rather than imply it.

Variant What it adds Changes the answer? What's proven
70B only the frontier voice baseline the capability is the frontier model's, not ours
70B + passive telemetry per-token uncertainty + confidence spans No — observer only the telemetry is real, measured per token
70B + NFET control retrieve / verify / branch / audit decisions Yes — the run's path (segment-level), not the 70B's per-token sampling control fires & is consumed; gated retrieval proven to lift answer correctness 0 → 7/8 on facts the writer can't otherwise know, with no regression on knowns; broader reasoning-quality lift still open
4B only a small local model baseline weak / inconsistent on hard reasoning (stated plainly)
4B + NFET inline graft (rewrites token logits) + control Yes — per token and path the graft measurably changes local generation; gated retrieval proven to lift correctness (baseline 0 → up to 88% on unknowable facts, no regression on knowns); broader quality lift still open

Grounded in the real test battery (artifacts/lolm-real-tests): controller invocation 100% of NFET runs, control actions fired in a minority, and the one clear win was retrieval grounding. None of this proves the answer beat a baseline — it proves the controller acted on measured uncertainty, and the raw trace is sealed and inspectable.

Live Demo

Watch it think.

Real runs: a 70B model (Llama 3.3) writes the answer; LOLM's local graft re-reads it per token for uncertainty and, at each segment boundary, decides whether to check notes, verify, or stop. Control acts between segments — it does not steer the 70B's tokens — and on most easy runs it stays out of the way. Every decision badge you see was made from measured telemetry — entropy, drift, gate, regime — not from a prompt. Tap a recorded run, or type your own below.

checking the live backend…

REPLAY

logit entropy

gate (surface share)
hidden drift
regime entropy
last control
Positioning

What it does that others don't.

These are mechanism claims, not benchmark claims — every row is something you can verify yourself in the demo above or in the code.

Plain chatbotPrompted agent (ReAct-style)LOLM-NFET agent
How it decides to check facts or verify It doesn't Asks itself in words — a self-report, which models are famously bad at Measures entropy, drift, gate and regime in its own activations, every token
Can you see which words it was unsure of? No No — it can only re-state the whole answer Yes — it highlights the exact spans it measured as least confident, from its own per-token signals
Can you see why it acted? Reasoning text (post-hoc; can confabulate) Numbers — the exact telemetry and z-scores behind every decision
An honest receipt of what it actually did None None Every run — what it retrieved, verified, or stopped on, red/yellow/green, with a math check. It does not claim the answer beat a baseline.
Learns from your usage No No (or cloud fine-tuning) Yes, locally — the controller retrains on your own logged runs
Runs on Someone else's cloud A frontier-model API 2 vCPUs — no GPU, no API key, fully private (this very demo)

The honest caveat: the demo's 0.6B research model will not out-write GPT-class systems — the claim is the control mechanism, and it is model-agnostic. The same graft rides on any open backbone (0.6B → 32B tested targets), and a hybrid mode lets the latent machinery monitor a frontier model while it does the writing.

Use it

What you can build with this.

Everything below ships in the repo today — labels are honest about maturity.

A private assistant over YOUR notes

works today · fully local

Point the importer at a markdown folder or Obsidian vault. When the agent's uncertainty spikes, it retrieves from your facts — ranked by relevance and importance, never sent off your machine, with a receipt showing exactly what it used. make import-notes NOTES=~/vault && make agent-ui

Uncertainty telemetry for any open model

works today · model-agnostic

The graft rides any Hugging Face backbone (0.6B and 4B shipped; 32B targeted) and streams per-token entropy, drift, gate balance, regime entropy and control logits over a documented SSE protocol. npm install lolm-nfet-client

A latent co-processor for frontier agents

works today · MCP

The whole workspace speaks the Model Context Protocol — plug it into Claude Code or Claude Desktop and a frontier agent gains local memory, the NFET control loop, and telemetry tools it can call like any other tool.

Big model writes, LOLM watches

built · bring an API key

Hybrid mode: a frontier model generates while the local latent machinery re-reads every token for control telemetry — measured uncertainty driving segment-level control (retrieve / verify / stop), not steering the big model's prose. The bridge ships in the repo; it activates with an Anthropic API key.

Reference

Cite LOLM.

BibTeX
@article{leonard2026lolm,
  title  = {LOLM: Language Modeling Beyond the Surface with Hybrid
            Transformer-SSM Latent Order Fields},
  author = {Leonard, Bryan and Leonard, Brandyn},
  year   = {2026},
  note   = {Qira LLC. Provisional patent application No. 64002166.}
}

Code is private during patent review and available under the LOLM Community License — free for research, education, and small entities. Access and commercial licensing on request via imagineqira.com.