KARL: A Faster Agent for Enterprise Knowledge, powered by custom RL

2822 shaares
32 private links

2822 shaares · 32 private links

Filters

Links per page

20 50 100

KARL: A Faster Agent for Enterprise Knowledge, powered by custom RL

While SFT distillation meaningfully improves overall performance over the base model, the gap between the two approaches is most apparent when combined with test-time compute. On in-distribution tasks, SFT benefits substantially from parallel sampling (69.1 → 75.3), yet on out-of-distribution tasks the gains are negligible (59.4 → 59.6). This suggests that distillation teaches the model to imitate task-specific expert behavior, which scales well within the training distribution but fails to generalize beyond it. In contrast, KARL benefits from test-time compute both in- and out-of-distribution, indicating that RL develops more general search capabilities rather than task-specific heuristic

ai · rl · agent

March 5, 2026 at 10:10:20 PM EST * · permalink

https://www.databricks.com/sites/default/files/2026-03/karl.pdf

Filters

Links per page

20 50 100