#17749: feat(agents): Adds RLM harness: infinite context window (for some tasks)

by cezarc1 open 2026-02-16 04:44 View on GitHub →

docs commands agents size: XL

Cluster: Security Enhancements and Fixes

Adds a Recursive Language Model (RLM) engine. Instead of dumping everything into the context window and hoping for the best (yolo), the model gets a sandboxed JS REPL where it can write code, see the output, and iterate. Think of it as giving the agent a notebook to work through problems step by step. The LLM model now uses the code to express logic vs. using it's own context window. This is based on the [RLM paper](https://arxiv.org/pdf/2512.24601) and [DSPy's reference implementation](https://dspy.ai/api/modules/RLM/#when-to-use-rlm). <img width="6612" height="4900" alt="Fig2" src="https://github.com/user-attachments/assets/7dfa467c-2596-4d23-9a76-b8268aae60e4" /> ## Why For tasks over large context (long documents, multi-file searches, information aggregation) RLMs consistently outperform standard LLM calls due to leveraging symbolic logic (read: code generation). The key insight from the [paper](https://arxiv.org/pdf/2512.24601): the REPL becomes the model's working memory instead of the context window, so the model can strategically explore and accumulate results without context degradation. See results: <img width="914" height="249" alt="rlm" src="https://github.com/user-attachments/assets/2193a1e3-21ed-4d0e-8b8a-4cb34ea5ffa6" /> ## Scope - **RLM harness** (`harness-rlm.ts`) — Sandboxed VM loop with bounded iterations, depth-limited recursion, object store for intermediates, retry logic, and extract-fallback when iterations run out. - **Runtime API** — `context_read/search`, `repo_read/search`, `llm_query`, `tool_call`, `get_var/set_var`, `print`, `submit`. - **REPL prompt aligned with DSPy** — Explore first, iterate small, verify before submitting, use llm_query for semantics, submit only after seeing outputs. - **`rlm_call` agent tool** + **`/rlm` slash command** — programmatic and interactive entry points. - **Config** — `tools.rlm.*` with tunables for depth, iterations, LLM budget, timeout. ## How to enable/use Disabled by default. Flip one flag: ```yaml tools: rlm: enabled: true ``` Then: `/rlm summarize this project` or let the agent use `rlm_call` on its own. Optional tunables: `maxDepth` (0–8), `maxIterations` (1–96), `maxLlmCalls` (1–2048), `timeoutSeconds`, `extractOnMaxIterations`.  <h3>Greptile Summary</h3> Adds RLM (Recursive Language Model) harness that gives agents a sandboxed JavaScript REPL to iteratively solve tasks requiring large context exploration. Instead of dumping everything into the context window, the model writes small code snippets, observes outputs, and uses symbolic retrieval/slicing to work through problems step-by-step. The REPL becomes the working memory instead of the context window. **Major changes:** - Core RLM harness (`harness-rlm.ts`) with bounded iterations, depth-limited recursion, object store for intermediates, retry logic, and extract-fallback when iterations run out - Runtime API: `context_read/search`, `repo_read/search`, `llm_query`, `tool_call`, `get_var/set_var`, `print`, `submit` - `rlm_call` agent tool and `/rlm` slash command for programmatic and interactive entry points - Config schema (`tools.rlm.*`) with tunables for depth, iterations, LLM budget, timeout - Disabled by default; requires explicit `tools.rlm.enabled: true` **Implementation quality:** - Comprehensive test coverage including behavior tests for edge cases (unhandled rejections, Promise handling, retries) - Proper resource cleanup with tmpdir management - Good error handling with retry logic for transient failures - Security-conscious: workspace path validation, file size limits, binary detection, rate limiting - Well-documented constants and clear separation of concerns <h3>Confidence Score: 4/5</h3> - This PR is safe to merge with minor considerations - The implementation is well-designed with comprehensive tests, proper resource management, and security controls. The RLM harness includes workspace path validation, file size limits, recursion depth limits, and timeout enforcement. The feature is disabled by default and opt-in. The VM sandbox limitation is acknowledged and acceptable given the threat model. Code follows repository conventions and includes extensive behavioral tests for edge cases. Minor attention needed around VM context isolation and ensuring tool policy integration is complete. - Pay close attention to `src/commands/agent/harness-rlm.ts` VM sandbox implementation (lines 1675-1695) to ensure the limited context is sufficient and tool access boundaries are properly enforced <sub>Last reviewed commit: eb81c9a</sub>  <sub>(1/5) You can manually trigger the agent by mentioning @greptileai in a comment!</sub>