Search: [memory]

[2510.27246] Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs

llm · paper · memory

March 23, 2026 at 8:40:12 AM EDT * · permalink

·

[2603.13875] GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

a solid, well-executed paper with a clean idea and good ablations, but limited in ambition by the small scale and synthetic-heavy evaluation. The core insight — that gradient-based memory writing with meta-learned initialization beats forward-only writing — is believable and likely to hold at larger scale, though the computational tradeoff gets harder.

llm · paper · memory

March 22, 2026 at 1:08:55 PM EDT * · permalink

·

https://arxiv.org/abs/2603.13875

[2603.16862] Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term Memory

llm · paper · memory

March 22, 2026 at 10:06:35 AM EDT * · permalink

·

https://arxiv.org/abs/2603.16862

Attention-Residuals/Attention_Residuals.pdf at master · MoonshotAI/Attention-Residuals

ai · llm · paper · memory

March 16, 2026 at 11:05:29 AM EDT * · permalink

·

https://github.com/MoonshotAI/Attention-Residuals/blob/master/Attention_Residuals.pdf

[2602.10715] Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents

Experiments across diverse backbone models, retrieval-based methods, and memory systems demonstrate that cognitive memory remains challenging and reveals failures not captured by existing benchmarks.

ai · llm · memory · agent · paper

March 8, 2026 at 4:16:04 PM EDT * · permalink

·

https://arxiv.org/abs/2602.10715

[2508.00031] Git Context Controller: Manage the Context of LLM-based Agents like Git

Large language model (LLM) based agents have shown impressive capabilities by interleaving internal reasoning with external tool use. However, as these agents are deployed in long-horizon workflows, such as coding for a big, long-term project, context management becomes a critical bottleneck. We introduce Git-Context-Controller (GCC), a structured context management framework inspired by software version control systems. GCC elevates context as versioned memory hierarchy like Git. It structures agent memory as a persistent file system with explicit operations: COMMIT, BRANCH, MERGE, and CONTEXT, enabling milestone-based checkpointing, exploration of alternative plans, and structured reflection. Our approach empowers agents to manage long-term goals, isolate architectural experiments, and recover or hand off memory across sessions and agents. Empirically, agents equipped with GCC achieve state-of-the-art performance on the SWE-Bench-Lite benchmark, resolving 48.00 of software bugs, outperforming 26 competitive systems. In a self-replication case study, a GCC-augmented agent builds a new CLI agent from scratch, achieving 40.7 task resolution, compared to only 11.7 without GCC. The code is released at: this https URL

ai · paper · llm · memory

February 17, 2026 at 2:31:12 PM EST * · permalink

·

https://arxiv.org/abs/2508.00031

[2510.02219] Contrastive Retrieval Heads Improve Attention-Based Re-Ranking

The strong zero-shot and long-context capabilities of recent Large Language Models (LLMs) have paved the way for highly effective re-ranking systems. Attention-based re-rankers leverage attention weights from transformer heads to produce relevance scores, but not all heads are created equally: many contribute noise and redundancy, thus limiting performance. To address this, we introduce CoRe heads, a small set of retrieval heads identified via a contrastive scoring metric that explicitly rewards high attention heads that correlate with relevant documents, while downplaying nodes with higher attention that correlate with irrelevant documents. This relative ranking criterion isolates the most discriminative heads for re-ranking and yields a state-of-the-art list-wise re-ranker. Extensive experiments with three LLMs show that aggregated signals from CoRe heads, constituting less than 1% of all heads, substantially improve re-ranking accuracy over strong baselines. We further find that CoRe heads are concentrated in middle layers, and pruning the computation of final 50% of model layers preserves accuracy while significantly reducing inference time and memory usage.

llm · paper · memory

February 13, 2026 at 10:23:10 AM EST * · permalink

·

https://arxiv.org/abs/2510.02219

[2410.02642] Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers

Information retrieval (IR) systems have played a vital role in modern digital life and have cemented their continued usefulness in this new era of generative AI via retrieval-augmented generation. With strong language processing capabilities and remarkable versatility, large language models (LLMs) have become popular choices for zero-shot re-ranking in IR systems. So far, LLM-based re-ranking methods rely on strong generative capabilities, which restricts their use to either specialized or powerful proprietary models. Given these restrictions, we ask: is autoregressive generation necessary and optimal for LLMs to perform re-ranking? We hypothesize that there are abundant signals relevant to re-ranking within LLMs that might not be used to their full potential via generation. To more directly leverage such signals, we propose in-context re-ranking (ICR), a novel method that leverages the change in attention pattern caused by the search query for accurate and efficient re-ranking. To mitigate the intrinsic biases in LLMs, we propose a calibration method using a content-free query. Due to the absence of generation, ICR only requires two (O(1)) forward passes to re-rank N documents, making it substantially more efficient than generative re-ranking methods that require at least O(N) forward passes. Our novel design also enables ICR to be applied to any LLM without specialized training while guaranteeing a well-formed ranking. Extensive experiments with two popular open-weight LLMs on standard single-hop and multi-hop information retrieval benchmarks show that ICR outperforms RankGPT while cutting the latency by more than 60% in practice. Through detailed analyses, we show that ICR's performance is specially strong on tasks that require more complex re-ranking signals. Our findings call for further exploration on novel ways of utilizing open-weight LLMs beyond text generation.

llm · paper · memory

February 13, 2026 at 10:22:58 AM EST * · permalink

·

https://arxiv.org/abs/2410.02642

Observational Memory | Memory | Mastra Docs

Observations

When message history tokens exceed a threshold (default: 30,000), the Observer creates observations — concise notes about what happened.

When observations exceed their threshold (default: 40,000 tokens), the Reflector condenses them — combining related items and reflecting on patterns.

The result is a three-tier system:

Recent messages: Exact conversation history for the current task
Observations: A log of what the Observer has seen
Reflections: Condensed observations when memory becomes too long

ai · agent · memory

February 12, 2026 at 12:55:28 PM EST * · permalink

·

https://mastra.ai/docs/memory/observational-memory

RAW.works - Recursive Language Models as Memory Systems

we were able to demonstrate a “Top-5” LongMemEval result with very minimal modifications to dspy.RLM, just some helper functions to process the “multi-chat” sessions

ai · agent · memory

February 12, 2026 at 12:34:34 PM EST * · permalink

·

https://raw.works/recursive-language-models-as-memory-systems/

AMD Ryzen Memory Tweaking & Overclocking Guide | TechPowerUp

ryzen · memory · oc

January 16, 2020 at 12:11:55 AM EST * · permalink

·

https://www.techpowerup.com/review/amd-ryzen-memory-tweaking-overclocking-guide/