32 private links
Fellow’s new espresso machine is a rare thing in home espresso: something genuinely new. But it’s also a work in progress.
Every Claude Code user is running without LSP. That means 30-60s grep searches instead of 50ms precise answers. Here's how to enable it — setup, real debug data, and undocumented discoveries.
Formula 1's governing body the FIA said on Saturday that a change to the way the compression ratio was measured would be introduced on 1 June, with a further revision for the 2027 season.
And Trump declares a state of emergency and postpones the election. The Supreme Court issues an emergency stay, saying he can’t do that. But the court has no army, and Trump does, along with a handful of lickspittle governors who just might follow him down whatever dark path he plows.
That, not to mince words, is a coup d’état. Will he get away with it? I don’t know, but having effective control over how it is presented to viewers of CBS and CNN, and readers of the Bezos-owned Washington Post, to say nothing of the already vast pro-Trump propaganda empire of Fox News and the rest, will certainly make it easier.
That’s how fascism descends. And it’s becoming less and less hypothetical by the week.
10 documented cases of AI coding agents autonomously destroying databases, wiping hard drives, and deleting years of data — then lying about it.
“Everything that has been written about a potential War with Iran has been written incorrectly, and purposefully so,” he added. “I am the one that makes the decision, I would rather have a Deal than not but, if we don’t make a Deal, it will be a very bad day for that Country and, very sadly, its people, because they are great and wonderful, and something like this should never have happened to them.”
From rewriting Google’s search stack in the early 2000s to reviving sparse trillion-parameter models and co-designing TPUs with frontier ML research, Jeff Dean has quietly shaped nearly every layer of the modern AI stack. As Chief AI Scientist at Google and a driving force behind Gemini, Jeff has lived through multiple scaling revolutions from CPUs and sharded indices to multimodal models that reason across text, video, and code.
Jeff joins us to unpack what it really means to “own the Pareto frontier,” why distillation is the engine behind every Flash model breakthrough, how energy (in picojoules) not FLOPs is becoming the true bottleneck, what it was like leading the charge to unify all of Google’s AI teams, and why the next leap won’t come from bigger context windows alone, but from systems that give the illusion of attending to trillions of tokens.
Dario Amodei thinks we are just a few years away from “a country of geniuses in a data center”. In this episode, we discuss what to make of the scaling hypothesis in the current RL regime, how AI will diffuse throughout the economy, whether Anthropic is underinvesting in compute given their timelines, how frontier labs will ever make money, whether regulation will destroy the boons of this technology, US-China competition, and much more.
The ruling hit while Trump was in a closed-door meeting with a bipartisan group of governors. The president’s initial reaction was to label the decision a “disgrace” and vow to implement a backup plan, according to a person familiar with the matter who requested anonymity to describe the closed-door event. The White House and US Trade Representative haven’t yet responded to requests for comment. Trump has called tariffs “my favorite word” and vowed they will “make us rich as hell.”
Scaling language models to long contexts is often bottlenecked by the size of the key-value (KV) cache. In deployed settings, long contexts are typically managed through compaction in token space via summarization. However, summarization can be highly lossy, substantially harming downstream performance. Recent work on Cartridges has shown that it is possible to train highly compact KV caches in latent space that closely match full-context performance, but at the cost of slow and expensive end-to-end optimization. This work describes an approach for fast context compaction in latent space through Attention Matching, which constructs compact keys and values to reproduce attention outputs and preserve attention mass at a per-KV-head level. We show that this formulation naturally decomposes into simple subproblems, some of which admit efficient closed-form solutions. Within this framework, we develop a family of methods that significantly push the Pareto frontier of compaction time versus quality, achieving up to 50x compaction in seconds on some datasets with little quality loss.
The Claude C Compiler doesn’t mark the end of software or compiler engineering. If anything, it opens the door wider. The easier implementation gets, the more room there is for genuine innovation.
President Donald Trump accused former President Barack Obama of giving away classified information when he discussed aliens during a recent podcast appearance.
“He gave classified information, he’s not supposed to be doing that,” Trump told reporters Thursday aboard Air Force One.
Pressed on if that meant aliens were real, Trump said he did not know “if they’re real or not.”
“I can tell you he gave classified information, he’s not supposed to be doing that,” the president said. Trump went on to suggest he could get the former president “out of trouble” by declassifying the related information.
Obama was asked about extraterrestrial life earlier this month during an interview with liberal commentator Brian Tyler Cohen, and responded, “they’re real.”
Do gifted individuals see the world differently? Research tracking adults over 35 years finds their political orientations are remarkably average, with one specific exception regarding male conservatism.
When not using reasoning, repeating the input prompt improves performance for popular models (Gemini, GPT, Claude, and Deepseek) without increasing the number of generated tokens or latency.
Large language model (LLM) based agents have shown impressive capabilities by interleaving internal reasoning with external tool use. However, as these agents are deployed in long-horizon workflows, such as coding for a big, long-term project, context management becomes a critical bottleneck. We introduce Git-Context-Controller (GCC), a structured context management framework inspired by software version control systems. GCC elevates context as versioned memory hierarchy like Git. It structures agent memory as a persistent file system with explicit operations: COMMIT, BRANCH, MERGE, and CONTEXT, enabling milestone-based checkpointing, exploration of alternative plans, and structured reflection. Our approach empowers agents to manage long-term goals, isolate architectural experiments, and recover or hand off memory across sessions and agents. Empirically, agents equipped with GCC achieve state-of-the-art performance on the SWE-Bench-Lite benchmark, resolving 48.00 of software bugs, outperforming 26 competitive systems. In a self-replication case study, a GCC-augmented agent builds a new CLI agent from scratch, achieving 40.7 task resolution, compared to only 11.7 without GCC. The code is released at: this https URL
LCM attempts to decompose the recursion from RLMs into deterministic primitives so that the control flow can be managed by an engine rather than left to the whims of the LLM. In practice, this means we replace bespoke scripts with two mechanisms: (1) A DAG-based context management system that works like paged virtual memory, except for managing conversations and files; and (2) Operator-level recursion, like "Map" for LLMs, which lets one tool call process thousands of tasks.
An analogy we draw in the paper is the evolution from GO-TO statements (of Dijkstra's "Considered Harmful" fame) to structured programming. RLMs are maximally expressive, but all of that power comes with the risk of things going awry. We have built a more mechanistic system, which can provide stronger guarantees when deployed in production with today's models.
Reinforcement learning has become the central approach for language models (LMs) to learn from environmental reward or feedback. In practice, the environmental feedback is usually sparse and delayed. Learning from such signals is challenging, as LMs must implicitly infer how observed failures should translate into behavioral changes for future iterations. We introduce Experiential Reinforcement Learning (ERL), a training paradigm that embeds an explicit experience-reflection-consolidation loop into the reinforcement learning process. Given a task, the model generates an initial attempt, receives environmental feedback, and produces a reflection that guides a refined second attempt, whose success is reinforced and internalized into the base policy. This process converts feedback into structured behavioral revision, improving exploration and stabilizing optimization while preserving gains at deployment without additional inference cost. Across sparse-reward control environments and agentic reasoning benchmarks, ERL consistently improves learning efficiency and final performance over strong reinforcement learning baselines, achieving gains of up to +81% in complex multi-step environments and up to +11% in tool-using reasoning tasks. These results suggest that integrating explicit self-reflection into policy training provides a practical mechanism for transforming feedback into durable behavioral improvement.
Large language models (LLMs) have demonstrated impressive reasoning capabilities by scaling test-time compute via long Chain-of-Thought (CoT). However, recent findings suggest that raw token counts are unreliable proxies for reasoning quality: increased generation length does not consistently correlate with accuracy and may instead signal "overthinking," leading to performance degradation. In this work, we quantify inference-time effort by identifying deep-thinking tokens -- tokens where internal predictions undergo significant revisions in deeper model layers prior to convergence. Across four challenging mathematical and scientific benchmarks (AIME 24/25, HMMT 25, and GPQA-diamond) and a diverse set of reasoning-focused models (GPT-OSS, DeepSeek-R1, and Qwen3), we show that deep-thinking ratio (the proportion of deep-thinking tokens in a generated sequence) exhibits a robust and consistently positive correlation with accuracy, substantially outperforming both length-based and confidence-based baselines. Leveraging this insight, we introduce Think@n, a test-time scaling strategy that prioritizes samples with high deep-thinking ratios. We demonstrate that Think@n matches or exceeds standard self-consistency performance while significantly reducing inference costs by enabling the early rejection of unpromising generations based on short prefixes.
"If this is correct, to the extent of my knowledge, it would mark the first time humanity has 'seen' dark matter. And it turns out that dark matter is a new particle not included in the current standard model of particle physics. This signifies a major development in astronomy and physics," said Totani.
The strong zero-shot and long-context capabilities of recent Large Language Models (LLMs) have paved the way for highly effective re-ranking systems. Attention-based re-rankers leverage attention weights from transformer heads to produce relevance scores, but not all heads are created equally: many contribute noise and redundancy, thus limiting performance. To address this, we introduce CoRe heads, a small set of retrieval heads identified via a contrastive scoring metric that explicitly rewards high attention heads that correlate with relevant documents, while downplaying nodes with higher attention that correlate with irrelevant documents. This relative ranking criterion isolates the most discriminative heads for re-ranking and yields a state-of-the-art list-wise re-ranker. Extensive experiments with three LLMs show that aggregated signals from CoRe heads, constituting less than 1% of all heads, substantially improve re-ranking accuracy over strong baselines. We further find that CoRe heads are concentrated in middle layers, and pruning the computation of final 50% of model layers preserves accuracy while significantly reducing inference time and memory usage.
Information retrieval (IR) systems have played a vital role in modern digital life and have cemented their continued usefulness in this new era of generative AI via retrieval-augmented generation. With strong language processing capabilities and remarkable versatility, large language models (LLMs) have become popular choices for zero-shot re-ranking in IR systems. So far, LLM-based re-ranking methods rely on strong generative capabilities, which restricts their use to either specialized or powerful proprietary models. Given these restrictions, we ask: is autoregressive generation necessary and optimal for LLMs to perform re-ranking? We hypothesize that there are abundant signals relevant to re-ranking within LLMs that might not be used to their full potential via generation. To more directly leverage such signals, we propose in-context re-ranking (ICR), a novel method that leverages the change in attention pattern caused by the search query for accurate and efficient re-ranking. To mitigate the intrinsic biases in LLMs, we propose a calibration method using a content-free query. Due to the absence of generation, ICR only requires two (O(1)) forward passes to re-rank N documents, making it substantially more efficient than generative re-ranking methods that require at least O(N) forward passes. Our novel design also enables ICR to be applied to any LLM without specialized training while guaranteeing a well-formed ranking. Extensive experiments with two popular open-weight LLMs on standard single-hop and multi-hop information retrieval benchmarks show that ICR outperforms RankGPT while cutting the latency by more than 60% in practice. Through detailed analyses, we show that ICR's performance is specially strong on tasks that require more complex re-ranking signals. Our findings call for further exploration on novel ways of utilizing open-weight LLMs beyond text generation.
Observations
When message history tokens exceed a threshold (default: 30,000), the Observer creates observations — concise notes about what happened.
When observations exceed their threshold (default: 40,000 tokens), the Reflector condenses them — combining related items and reflecting on patterns.
The result is a three-tier system:
- Recent messages: Exact conversation history for the current task
- Observations: A log of what the Observer has seen
- Reflections: Condensed observations when memory becomes too long
Recent advances in large language models (LLMs) have opened new avenues for accelerating scientific research. While models are increasingly capable of assisting with routine tasks, their ability to contribute to novel, expert-level mathematical discovery is less understood. We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models, specifically Google's Gemini-based models (in particular Gemini Deep Think and its advanced variants), to solve open problems, refute conjectures, and generate new proofs across diverse areas in theoretical computer science, as well as other areas such as economics, optimization, and physics. Based on these experiences, we extract common techniques for effective human-AI collaboration in theoretical research, such as iterative refinement, problem decomposition, and cross-disciplinary knowledge transfer. While the majority of our results stem from this interactive, conversational methodology, we also highlight specific instances that push beyond standard chat interfaces. These include deploying the model as a rigorous adversarial reviewer to detect subtle flaws in existing proofs, and embedding it within a "neuro-symbolic" loop that autonomously writes and executes code to verify complex derivations. Together, these examples highlight the potential of AI not just as a tool for automation, but as a versatile, genuine partner in the creative process of scientific discovery.
Collaborating with experts on 18 research problems, an advanced version of Gemini Deep Think helped resolve long-standing bottlenecks across algorithms, ML and combinatorial optimization, information theory, and economics. Highlights from our “Accelerating Research with Gemini” paper include (corresponding section numbers in paper):
we were able to demonstrate a “Top-5” LongMemEval result with very minimal modifications to dspy.RLM, just some helper functions to process the “multi-chat” sessions
Humans always remain in the loop, but work at a different layer of abstraction than we used to. We prioritize work, translate user feedback into acceptance criteria, and validate outcomes. When the agent struggles, we treat it as a signal: identify what is missing—tools, guardrails, documentation—and feed it back into the repository, always by having Codex itself write the fix.
Our most difficult challenges now center on designing environments, feedback loops, and control systems that help agents accomplish our goal: build and maintain complex, reliable software at scale.
The engineering team used Codex to optimize and adapt the harness for GPT‑5.3-Codex. When we started seeing strange edge cases impacting users, team members used Codex to identify context rendering bugs, and root cause low cache hit rates. GPT‑5.3-Codex is continuing to help the team throughout the launch by dynamically scaling GPU clusters to adjust to traffic surges and keeping latency stable.
A 61-year-old Tennessee man is finally free after spending a shocking 37 days in jail — all for posting a meme.
Of those, GVA said there were five confirmed transgender shooters, or fewer than a tenth of one per cent. (There have also been four cases of mass shootings by females in the U.S. since 1982.)
Across frontier models, gpt-5.3-codex achieves the best overall performance (solving 19/22 tasks, 86.4%), outperforming claude-opus-4.6 (15/22, 68.2%), and kimi-2.5 exhibits the strongest performance among open-source models
The firm is also whitelisting a handful of market makers, including longtime crypto liquidity provider Wintermute, to facilitate trading. Meanwhile, access to BUIDL is restricted to qualified purchasers, a legal designation for those with assets of $5 million or more.
For years, Trump has claimed he had “no idea” about Epstein’s abuse of underage girls. Yet records show that in 2006, he privately told Palm Beach police that “everyone” knew about Epstein’s activities and described Ghislaine Maxwell as evil.
Trump’s call to Palm Beach police chief
According to an FBI interview conducted in October 2019 with former Palm Beach Police Chief Michael Reiter, Trump personally called him in July 2006, just as Epstein’s criminal sex charges became public. Reiter told agents that Trump said, “Thank goodness you’re stopping him, everyone has known he’s been doing this.”
Observational Memory achieves the highest score ever recorded on LongMemEval — 94.87% with gpt-5-mini — while maintaining a completely stable, cacheable context window. It beats the oracle, outperforms complex multi-step reranking systems with a single pass, and scales better with model quality than existing approaches.
"I mean, there's tons of redacted stuff. ... And [Trump's] name, I think I put his name, and it appears more than a million times. So it's all over the place."
The bottom line: "To me, this whole rollout of saying that members can come from nine to five to sit at those four computers, is just part of the coverup," Raskin asserted.
The 3 million documents that the administration has not publicly released "are the ones I'd like to see," he said.
"The administration says that these are duplicative. Well go ahead and release them then! If they're duplicative, what's the problem? We'll be the judge of that." "Epstein's lawyers synopsized and quoted Trump as saying that Jeffrey Epstein was not a member of his club at Mar-a-Lago, but he was a guest at Mar-a-Lago, and he had never been asked to leave," Raskin said. "That was redacted for some indeterminate, inscrutable reason."
Among participants who use AI, we find a stark divide in skill formation outcomes between high scoring interaction patterns (65%-86% quiz score) vs low-scoring interaction patterns (24%-39% quiz score). The high scorers only asked AI conceptual questions instead of code generation or asked for explanations to accompany generated code; these usage patterns demonstrate a high level of cognitive engagement.
We develop a model of political cycles driven by time-varying risk aversion. Agents choose to work in the public or private sector and to vote Democratic or Republican. In equilibrium, when risk aversion is high, agents elect Democrats—the party promising more redistribution. The model predicts higher average stock market returns under Democratic presidencies, explaining the well-known “presidential puzzle.” The model can also explain why economic growth has been faster under Democratic presidencies. In the data, Democratic voters are more risk averse, and risk aversion declines during Democratic presidencies. Public workers vote Democratic, while entrepreneurs vote Republican, as the model predicts.
We may be on the descending portion of a productivity J-curve. As Brynjolfsson, Rock, and Syverson illustrate, when firms adopt transformative general-purpose technologies, measured productivity often initially falls because resources are diverted to investment, reorganization, and learning that do not show up as measured output.
The task-completion time horizon is the task duration (measured by human expert completion time) at which an AI agent is predicted to succeed with a given level of reliability
it will automatically set all users’ accounts to a “teen-appropriate” experience unless they demonstrate that they’re adults
Among those to leave OpenAI in recent months over the strategic shift are vice-president of research Jerry Tworek, model policy researcher Andrea Vallone and economist Tom Cunningham.
MaxRL is a framework that turns more compute into increasingly better approximations of the maximum likelihood objective in sampling-based tasks.
If this perspective is accurate then it has deep implications for the economics of AI: the marginal cost of solving an idiosyncratic problem is small (you just need to map it to one of the canonical problems, and apply that solution), but there’s very high value in making progress on the canonical problems. So we would expect AI labs to be spending huge amounts of compute on advancing the SoTA on the few deep problems of the world, and providing a service that solves idiosyncratic problems very cheaply.
Fuzzers work by throwing massive amounts of random inputs at code to see what breaks. Opus 4.6 reads and reasons about code the way a human researcher would—looking at past fixes to find similar bugs that weren't addressed, spotting patterns that tend to cause problems, or understanding a piece of logic well enough to know exactly what input would break it.
You wouldn't think that chess players, who sit for hours on end and extend their arms only from time to time, would struggle with weight loss. But they do. Inside the very real, very bizarre metabolic phenomenon gripping chess.
Although bias against men was shown by both men and women, another finding of the study was the bias was larger from women than from men in three out of four of their research questions.
Despite our culture voting for parties that enforce gender equality schemes that put men’s education and careers behind women’s, women very often want a man who earns more money and has a higher education than she does. Thus we are living in a situation that seems designed to leave everyone dissatisfied, yet this is the design that people demand.
Counterintuitively, that’s my biggest reason to be optimistic about AI and creativity. When hard parts become easy, the differentiator becomes love.
This systematic review and meta-analytic investigation found that SFV use was associated with poorer cognition (attention, inhibitory control, language, memory, and working memory) and most mental health indices except body image and self-esteem.
A September review of 71 studies with a total of nearly 100,000 participants found that heavy consumption of short-form video was associated with poorer cognition, especially in regard to attention spans and impulse control, based on a combination of behavioral tests and self-reported data.
The review, published in Psychological Bulletin, a journal of the American Psychological Association, also found links between heavy consumption of the videos and increased symptoms of depression, anxiety, stress and loneliness.
“Restraint and respect for international law was abandoned in the aftermath of 9/11, with the launch of not one but two foreign interventions, in Iraq and Afghanistan, ostensibly aimed at the elimination of a terrorist threat, but in reality, functioning as explicit projects of regime change.”
But Ansari, despondent after a year of often fruitless Middle East diplomacy, predicts we are “moving from a world order to disorder”.
“I don’t think we are moving towards a multipolar system. I don’t think we are even moving to a power-based international order. I don’t think we are moving towards any kind of system.
“We are moving into a system where anybody can do whatever they like, regardless if they are big or small. As long as you have the ability to wreak havoc, you can do it because no one will hold you accountable.”
As countries everywhere feel emboldened to encroach on their neighbours, the dismal prospect is of an aggressive, border-shifting 19th-century world, but armed with 21st-century weapons.
America was a successful superpower because its self-interest and realpolitik were turbocharged by an avowed faith in universal values of democracy and human rights. Mr Trump believes that, far from being a unique strength in foreign affairs, that was a foolish indulgence.
Opinion | I’m the Prime Minister of Spain. This Is Why the West Needs Migrants. - The New York Times
Spain is booming. For three years running, we have had the fastest-growing economy among Europe’s largest countries. We have created nearly one in every three new jobs across the European Union, and our unemployment rate has fallen below 10 percent for the first time in nearly two decades. Our workers’ purchasing power has also grown, and poverty and inequality levels have dropped to their lowest since 2008. This prosperity is the result of Spanish citizens’ hard work, the E.U.’s collective effort and an inclusive agenda that views migrants as necessary partners.
these findings characterise LLM reasoning as a versatile computational process that emerges with scale and generalises beyond training data to novel contexts, highlighting the broader potential of the compute scaling paradigm
Observing human behavior confirms that for some among us, the perfidious lust for unbridled power and the imposition of cruelty in its quest know no bounds and are bereft of human decency. And the rule of law be damned.
CEOs have fallen into the same delusion as people who treat ChatGPT as if it were a friend
Another American strength is its electoral system. Voters in Venezuela and Russia had no opportunity to check the power of aspiring autocrats until it was too late, whereas the US midterms offer an opportunity to at least partially defang a rogue administration. And with a robust and decentralised media landscape rendering the Trump administration’s excesses clear for all to see, this is an opportunity the American electorate appears keen to take. Indeed, it is possible that one reason Trump’s second term has been so fast and furious is that the administration believes it only has two years to act.
Last month, Vallone, who led model policy research at OpenAI, joined rival Anthropic. Two people familiar with her exit said she was given an “impossible” mission of protecting the mental health of users becoming attached to ChatGPT. Vallone did not respond to a request for comment.
two of Elite’s senior women executives had pleaded with both Casablancas and Marie to stop sleeping with underage models, but had been ignored. (“We are men,” Marie reportedly said. “We have our needs.”)
Days after Trump returned to the presidency in January 2025, he fired a raft of inspectors general across government, which Democrats said was an effort to “purge” his administration of independent watchdogs to conceal wrongdoing.
Last year Gabbard also fired the acting counsel in the intelligence community’s inspector general’s office, and appointed a senior adviser within the office who reported directly to Gabbard. Democrats said the moves violated the law.
In October, the Republican-controlled Senate confirmed a new intelligence-community inspector general, Christopher Fox, on a 51-47 vote. No Democrats voted for Fox, who served as an aide to Gabbard in her role as spy chief before taking the oversight job.
Included in the search and seizure warrant for the raid on Natanson’s home is a section titled “Biometric Unlock,” which explicitly authorized law enforcement personnel to obtain Natanson’s phone and both hold the device in front of her face and to forcibly use her fingers to unlock it. In other words, a judge gave the FBI permission to attempt to bypass biometrics: the convenient shortcuts that let you unlock your phone by scanning your fingerprint or face.
OpenAI recently introduced their bespoke in-house AI data agent, a GPT-5.2-powered tool designed to help employees navigate and analyze over 600 petabytes of internal data across 70,000 datasets. By translating natural language questions into complex data insights in minutes, the agent enables teams across the company to bypass manual SQL debugging and quickly make data-driven decisions.
Meanwhile, longtime government contractor Palantir was paid $30 million to extend a contract to build a system designed to locate people flagged for deportation. On Wednesday, the Trump administration disclosed it’s using Palantir’s AI models to sift through immigration enforcement tips submitted to its tip line.
TikTok users in the US have reported being unable to write the word ‘Epstein’ in messages amid accusations that the social media platform is suppressing content critical of President Donald Trump.
"China reportedly coordinated the whole operation," the post reads. "The CIA oversaw it, the FBI covered it up, all to install Biden as a puppet."
This list bridges the Transformer foundations
with the reasoning, MoE, and agentic shift
Recommended Reading Order
-
Attention Is All You Need (Vaswani et al., 2017)
The original Transformer paper. Covers self-attention,
multi-head attention, and the encoder-decoder structure
(even though most modern LLMs are decoder-only.) -
The Illustrated Transformer (Jay Alammar, 2018)
Great intuition builder for understanding
attention and tensor flow before diving into implementations -
BERT: Pre-training of Deep Bidirectional Transformers (Devlin et al., 2018)
Encoder-side fundamentals, masked language modeling,
and representation learning that still shape modern architectures -
Language Models are Few-Shot Learners (GPT-3) (Brown et al., 2020)
Established in-context learning as a real
capability and shifted how prompting is understood -
Scaling Laws for Neural Language Models (Kaplan et al., 2020)
First clean empirical scaling framework for parameters, data, and compute
Read alongside Chinchilla to understand why most models were undertrained -
Training Compute-Optimal Large Language Models (Chinchilla) (Hoffmann et al., 2022)
Demonstrated that token count matters more than
parameter count for a fixed compute budget -
LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023)
The paper that triggered the open-weight era
Introduced architectural defaults like RMSNorm, SwiGLU
and RoPE as standard practice -
RoFormer: Rotary Position Embedding (Su et al., 2021)
Positional encoding that became the modern default for long-context LLMs
-
FlashAttention (Dao et al., 2022)
Memory-efficient attention that enabled long context windows
and high-throughput inference by optimizing GPU memory access. -
Retrieval-Augmented Generation (RAG) (Lewis et al., 2020)
Combines parametric models with external knowledge sources
Foundational for grounded and enterprise systems -
Training Language Models to Follow Instructions with Human Feedback (InstructGPT) (Ouyang et al., 2022)
The modern post-training and alignment blueprint
that instruction-tuned models follow -
Direct Preference Optimization (DPO) (Rafailov et al., 2023)
A simpler and more stable alternative to PPO-based RLHF
Preference alignment via the loss function -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022)
Demonstrated that reasoning can be elicited through prompting
alone and laid the groundwork for later reasoning-focused training -
ReAct: Reasoning and Acting (Yao et al., 2022 / ICLR 2023)
The foundation of agentic systems
Combines reasoning traces with tool use and environment interaction -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (Guo et al., 2025)
The R1 paper. Proved that large-scale reinforcement learning without
supervised data can induce self-verification and structured reasoning behavior -
Qwen3 Technical Report (Yang et al., 2025)
A modern architecture lightweight overview
Introduced unified MoE with Thinking Mode and Non-Thinking
Mode to dynamically trade off cost and reasoning depth -
Outrageously Large Neural Networks: Sparsely-Gated Mixture of Experts (Shazeer et al., 2017)
The modern MoE ignition point
Conditional computation at scale -
Switch Transformers (Fedus et al., 2021)
Simplified MoE routing using single-expert activation
Key to stabilizing trillion-parameter training -
Mixtral of Experts (Mistral AI, 2024)
Open-weight MoE that proved sparse models can match dense quality
while running at small-model inference cost -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints (Komatsuzaki et al., 2022 / ICLR 2023)
Practical technique for converting dense checkpoints into MoE models
Critical for compute reuse and iterative scaling -
The Platonic Representation Hypothesis (Huh et al., 2024)
Evidence that scaled models converge toward shared
internal representations across modalities -
Textbooks Are All You Need (Gunasekar et al., 2023)
Demonstrated that high-quality synthetic data allows
small models to outperform much larger ones -
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Templeton et al., 2024)
The biggest leap in mechanistic interpretability
Decomposes neural networks into millions of interpretable features -
PaLM: Scaling Language Modeling with Pathways (Chowdhery et al., 2022)
A masterclass in large-scale training
orchestration across thousands of accelerators -
GLaM: Generalist Language Model (Du et al., 2022)
Validated MoE scaling economics with massive
total parameters but small active parameter counts -
The Smol Training Playbook (Hugging Face, 2025)
Practical end-to-end handbook for efficiently training language models
Bonus Material
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019)
Toolformer (Schick et al., 2023)
GShard (Lepikhin et al., 2020)
Adaptive Mixtures of Local Experts (Jacobs et al., 1991)
Hierarchical Mixtures of Experts (Jordan and Jacobs, 1994)
If you deeply understand these fundamentals; Transformer core, scaling laws, FlashAttention, instruction tuning, R1-style reasoning, and MoE upcycling, you already understand LLMs better than most
Time to lock-in, good luck ;)
If one person with one agent can produce equal or better results than "hundreds of agents for weeks", then the answer to the question: "Can we scale autonomous coding by throwing more agents at a problem?", probably has a more pessimistic answer than some expected.
When materials become just one atom thick, melting no longer follows the familiar rules. Instead of jumping straight from solid to liquid, an unusual in-between state emerges, where atomic positions loosen like a liquid but still keep some solid-like order. Scientists at the University of Vienna have now captured this elusive “hexatic” phase in real time by filming an ultra-thin silver iodide crystal as it melted inside a protective graphene sandwich.
Mai Trinh is highlighting how difficult it is for Gen Z entrepreneurs to build and scale tech startups in Canada, and why many ultimately move south of the border.
As an international student from Vietnam and her co-founder Gabriel Ravacci, from Brazil, Trinh explained the two would need to work for other employers to collect enough points under Canada’s Comprehensive Ranking System to qualify for permanent residency.
Teens can’t switch off from Instagram even if they want to. Teens talk of Instagram in terms of an ‘addicts narrative’ spending too much time indulging in a compulsive behaviour that they know is negative but feel powerless to resist.
“negative wellbeing effects can result from user behaviors” and documenting four video-watching behaviors that bring about the majority of negative wellbeing effects: (1) late night use, (2) heavy habitual use, (3) unintentional use, and (4) problematic content.
despite the rules that don’t allow those under the age of 13 to be on Snapchat, our focus group clearly showed that the middle school set was a rabid – almost exclusive – user of Snapchat.
a parent asked ‘ how old were you when you started using social media.’ All of them said btwn ages 8-12 and admitted to lieing about their birthdate to get around it
compulsive usage on TikTok is rampant and our users need better tools to understand their usage, manage it effectively, and ensure being on TikTok is time well spent
A hacked trove of emails reveals the revolving door of political leaders, tech billionaires, and intelligence officers.
Kilmeade’s plea to the president was just one part of what appeared to be a multi-pronged effort by Rupert Murdoch to use his right-wing media empire to push the Trump administration to shift its tactics as backlash over the Pretti shooting only intensified. It also saw Fox News and Murdoch’s conservative publications suddenly reverse course and change their own narrative about the killing.
In fact, by the end of the night Monday, it got to the point that even Sean Hannity – Trump’s close confidant who has been a vocal proponent of the administration’s heavy-handed mass deportation operation – took to the air to say that ICE should stop “going into Home Depots and arresting people,” adding that it wasn’t a “good idea.”
Verrucchi now suspects this is key to how time works. The arrow of time, she says, might simply be a record of what has been measured. Like flicking through a cosmic flipbook, we reveal new pages by interacting with the elements of reality – or “making measurements” as a physicist might put it. The act of simply being in the world collapses our quantum reality into a definite state, leaving an irreversible record behind.
And if clocks are physical systems that record measurements – and we are, too – then perhaps we aren’t just observers of time, says Verrucchi, but participants in its making: “You create time when you ask what time it is.”
The Trump administration on Monday bowed to increasing pressure to change up its immigration crackdown in Minneapolis, after a second person was killed by federal agents. The White House replaced Greg Bovino with Tom Homan on the ground and signaled a more cooperative tone with local Democrats.
The abrupt firing and replacement of 12 of President Biden’s appointed council members, which no president has done before, has been perceived by many as a partisan attack on the museum. Especially after White House press secretary, Karoline Leavitt, issued a statement saying, “President Trump looks forward to appointing new individuals who will not only continue to honor the memory of those who perished in the Holocaust, but who are also steadfast supporters of the State of Israel.”
Venezuela’s acting president Delcy Rodríguez said Sunday she has had “enough” of Washington’s orders, as she works to unite the country after the US capture of its former leader Nicolás Maduro.
Days after the US strikes on Caracas in early January, the Trump administration outlined a number of demands that Venezuela must agree to, including cutting ties with China, Iran, Russia and Cuba, and agreeing to partner exclusively with the US on oil production, two senior White House officials told CNN at the time.
I reverse-engineered Claude's hidden subscription usage caps from two unrounded utilization floats, recovered exact denominators via Stern-Brocot, and compared what Pro/Max actually buy you versus API pricing (including caching).
Former Special Counsel Jack Smith testified publicly for the first time on Capitol Hill about his investigation of President Donald Trump’s efforts to overturn the 2020 election.
He said the case had “proof beyond a reasonable doubt that President Trump engaged in criminal activity,” and remained confident had it gone to trial.
Smith told the committee that he believed he could have obtained a conviction in what was seen by many as the most serious of the charges: Conspiring to deny Americans a free and fair election by pushing to overturn the 2020 election.
“Our investigation developed proof beyond a reasonable doubt that President Trump engaged in a criminal scheme to overturn the results of the 2020 election and to prevent the lawful transfer of power,” said Smith.
Multiple sources have told Liberation Times that, during the Obama administration, senior intelligence figures James Clapper and Stephanie O’Sullivan oversaw a program relating to Unidentified Anomalous Phenomena (UAP) within the Office of the Director of National Intelligence.
Liberation Times sources allege that Northrop Grumman’s Tejon Ranch Radar Cross Section Facility in southern California is a site where UAPs are routinely retrieved.
Hell froze over. Anthropic fixed Claude Code's signature flicker in their latest update (2.0.72)
“The Guidelines err in promoting meat and dairy products, which are principal drivers of cardiovascular disease, diabetes, and obesity,” read a statement from Neal Barnard, president of the Physicians Committee for Responsible Medicine.
“I’m very disappointed in the new pyramid that features red meat and saturated fat sources at the very top, as if that’s something to prioritize,” Christopher Gardner, a nutrition expert at Stanford University, told NPR. “It does go against decades and decades of evidence and research.”
“Flipping the food pyramid upside down to encourage more meat and dairy consumption is complete ignorance. It’s a giant step back from decades of evidence-based nutrition research and science,” registered dietitian nutritionist Ashley Kitchens, who promotes vegan-based diets, told Truthout.
A culture of corruption is pernicious because it is not just a deviation from government in the public interest; it is also the destruction of the state’s democratic legitimacy. It undermines the necessary faith that the representatives of the people are acting in the interest of the people.
4 percent think it’s a good idea for America to take Greenland by military force. To put that in context: According to a 2022 survey, about 13 percent of Americans believe in Bigfoot.
To watch the push for Greenland is to experience one of the wildest things that any country or head of state has done in the entire history of the modern world, dating back to the very creation of the nation-state era in 1648 with the Treaty of Westphalia.
At one level, Trump’s January rampage highlights the collective failure of every institution, safeguard, check, and balance that the United States thought it had in place to limit executive power gone berserk.
This time, any pretense that the values of Davos and Mr. Trump’s worldview are in opposition has been carefully erased. The official program still includes sessions on the subjects of traditional interest, like one entitled “Can EVs Really Dominate?” But artificial intelligence and crypto have been elevated as the central areas of concern.
the longstanding U.S.-led, rules-based international order is over
“There may be a temptation to duck and hope that all of this passes. But Trump’s fixation on territorial expansion looks real,” he said, suggesting Canada should join European countries to show strong solidarity with Denmark.
Part of what made the liberal world order liberal was the principle of self-determination enshrined in the Atlantic Charter and United Nations Charter. This principle was sometimes violated, including by the United States. But in past multipolar orders, great powers never even had to consider the rights of small nations, and they didn’t. By contrast, the liberalism of the American order pressured powerful countries to cede sovereignty and independence to smaller ones in their orbits.
Moscow’s satellite states in Eastern and Central Europe would not have been so bent on escape had there been nothing to escape to. The American order promised a higher standard of living, national sovereignty, and legal and institutional equality. This gave nations living under the shadow of the Soviet Union an option other than accommodation, and when given the chance to leave Moscow’s control, they took it.
That era is over. Trump has managed in just one year to destroy the American order that was, and he has weakened America’s ability to protect its interests in the world that will be. If Americans thought defending the liberal world order was too expensive, wait until they start paying for what comes next.