#15339: fix: BM25 score normalization and FTS5 query join operator

by echoVic open 2026-02-13 09:15 View on GitHub →

stale size: S

Cluster: Memory and Language Support Enhancements

## Summary Fix two bugs in memory hybrid search that effectively broke keyword-based relevance ranking. ## Bug 1: BM25 scores always 1.0 (#15224) `bm25RankToScore()` used `Math.max(0, rank)` but SQLite FTS5's `bm25()` returns **negative** values (lower = better match). Since rank is always negative, `Math.max(0, rank)` always returned 0, making every score `1/(1+0) = 1.0`. **Fix:** `Math.abs(rank)` — now more relevant results get higher scores. ## Bug 2: FTS5 multi-word queries return zero results (#15226) `buildFtsQuery()` joined tokens with `AND`, requiring ALL tokens in a single chunk. For queries like "meeting budget review", this returned zero results. **Fix:** Use `OR` instead. BM25 scoring naturally ranks documents matching more terms higher, so precision is preserved while recall improves dramatically. ## Changes - `src/memory/hybrid.ts`: `Math.max(0, rank)` → `Math.abs(rank)`, `AND` → `OR` - `src/memory/hybrid.test.ts`: Updated tests to verify negative rank differentiation and OR join Closes #15224 Closes #15226 Signed-off-by: echoVic <nicepeng@foxmail.com>  <h2>Greptile Overview</h2> <h3>Greptile Summary</h3> This PR updates the memory hybrid search helpers to (1) normalize SQLite FTS5 `bm25()` ranks into a 0–1 score, and (2) change multi-token FTS queries from `AND`-joined to `OR`-joined tokens. Tests were updated to reflect the new query join operator and to assert that negative BM25 ranks are differentiated. These helpers are used by `src/memory/manager.ts` and `src/memory/manager-search.ts` to build the `MATCH` query string and to convert `bm25(chunks_fts)` ranks into a `textScore` that is merged with vector similarity scores. <h3>Confidence Score: 2/5</h3> - Not safe to merge as-is due to incorrect BM25 score normalization that can invert keyword relevance ordering. - While the `OR` join change aligns with the described recall fix, the BM25 normalization change uses `Math.abs(rank)`, which is not monotonic with FTS5’s negative-is-better BM25 ranks when combined with `ORDER BY rank ASC`. This can cause higher-quality (more negative) matches to receive lower `textScore` than worse matches, undermining keyword relevance. - src/memory/hybrid.ts (bm25RankToScore) and any call sites relying on rank ordering (src/memory/manager-search.ts) <sub>Last reviewed commit: a6187d8</sub>