Moonshot’s Kimi K2.5 is 'open,' 595GB, and built for agent swarms — Reddit wants a smaller one

Moonshot’s Kimi K2.5 is 'open,' 595GB, and built for agent swarms — Reddit wants a smaller one

Two days after releasing what analysts call the most powerful open-source AI model ever created, researchers from China's Moonshot AI logged onto Reddit to face a restless audience. The Beijing-based startup had reason to show up. Kimi K2.5 had just landed headlines about closing the gap with American AI giants and testing the limits of US. chip export controls. But the developers waiting on r/LocalLLaMA , a forum where engineers trade advice on running powerful language models on everything from a single consumer GPU to a small rack of prosumer hardware, had a different concern. They wanted to know when they could actually use it. The three-hour Ask Me Anything session became an unexpectedly candid window into frontier AI development in 2026 — not the polished version that appears in corporate blogs, but the messy reality of debugging failures, managing personality drift, and confronting a fundamental tension that defines open-source AI today. Moonshot had published the model's weights for anyone to download and customize. The file runs roughly 595 gigabytes. For most of the developers in the thread, that openness remained theoretical. Three Moonshot team members participated under the usernames ComfortableAsk4494 , zxytim , and ppwwyyxx . Over approximately 187 comments, they fielded questions about architecture, training methodology, and the philosophical puzzle of what gives an AI model its "soul." They also offered a picture of where the next round of progress will come from — and it wasn't simply "more parameters." Developers asked for smaller models they can actually run, and Moonshot acknowledged it has a problem The very first wave of questions treated Kimi K2.5 less like a breakthrough and more like a logistics headache. One user asked bluntly why Moonshot wasn't creating smaller models alongside the flagship. "Small sizes like 8B, 32B, 70B are great spots for the intelligence density," they wrote . Another said huge models had become difficult to celebrate because many developers simply couldn't run them. A third pointed to American competitors as size targets, requesting coder-focused variants that could fit on modest GPUs. Moonshot's team didn't announce a smaller model on the spot. But it acknowledged the demand in terms that suggested the complaint was familiar. " Requests well received! " one co-host wrote. Another noted that Moonshot's model collection already includes some smaller mixture-of-experts models on Hugging Face, while cautioning that small and large models often require different engineering investments. The most revealing answer came when a user asked whether Moonshot might build something around 100 billion parameters optimized for local use. The Kimi team responded by floating a different compromise: a 200 billion or 300 billion parameter model that could stay above what it called a " usability threshold " across many tasks. That reply captured the bind open-weight labs face. A 200-to-300 billion parameter model would broaden access compared to a trillion-parameter system, but it still assumes multi-GPU setups or aggressive quantization. The developers in the thread weren't asking for "somewhat smaller." They were asking for models sized for the hardware they actually own — and for a roadmap that treats local deployment as a first-class constraint rather than a hobbyist afterthought. The team said scaling laws are hitting diminishing returns, and pointed to a different kind of progress As the thread moved past hardware complaints, it turned to what many researchers now consider the central question in large language models: have scaling laws begun to plateau? One participant asked directly whether scaling had " hit a wall ." A Kimi representative replied with a diagnosis that has become increasingly common across the industry. "The amount of high-quality data does not grow as fast as the available compute," they wrote, "so scaling under the conventional 'next token prediction with Internet data' will bring less improvement." Then the team offered its preferred escape route. It pointed to Agent Swarm , Kimi K2.5's ability to coordinate up to 100 sub-agents working in parallel, as a form of "test-time scaling" that could open a new path to capability gains. In the team's framing, scaling doesn't have to mean only larger pretraining runs. It can also mean increasing the amount of structured work done at inference time, then folding those insights back into training through reinforcement learning. "There might be new paradigms of scaling that can possibly happen," one co-host wrote . "Looking forward, it's likely to have a model that learns with less or even zero human priors." The claim implies that the unit of progress may be shifting from parameter count and pretraining loss curves toward systems that can plan, delegate, and verify — using tools and sub-agents as building blocks rather than relying on a single massive forward pass. Agent Swarm works by keeping each sub-agent's memory separate from the coordinator On paper, Agent Swarm sounds like a familiar idea in a new wrapper: many AI agents collaborating on a task. The AMA surfaced the more important details — where the memory goes, how coordination happens, and why orchestration doesn't collapse into noise. A developer raised a classic multi-agent concern. At a scale of 100 sub-agents, an orchestrator agent often becomes a bottleneck, both in latency and in what the community calls "context rot" — the degradation in performance that occurs as a conversation history fills with internal chatter and tool traces until the model loses the thread. A Kimi co-host answered with a design choice that matters for anyone building agent systems in enterprise settings. The sub-agents run with their own working memory and send back results to the orchestrator, rather than streaming everything into a shared context. "This allows us to scale the total context length in a new dimension!" they wrote . Another developer pressed on performance claims. Moonshot has publicly described Agent Swarm as capable of achieving about 4.5 times speedup on suitable workflows, but skeptics asked whether that figure simply reflects how parallelizable a given task is. The team agreed: it depends. In some cases, the system decides that a task doesn't require parallel agents and avoids spending the extra compute. It also described sub-agent token budgets as something the orchestrator must manage, assigning each sub-agent a task of appropriate size. Read as engineering rather than marketing, Moonshot was describing a familiar enterprise pattern: keep the control plane clean, bound the outputs from worker processes, and avoid flooding a coordinator with logs it can't digest. Reinforcement learning compute will keep increasing, especially for training agents The most consequential shift hinted at in the AMA wasn't a new benchmark score. It was a statement about priorities. One question asked whether Moonshot was moving compute from "System 1" pretraining to "System 2" reinforcement learning — shorthand for shifting from broad pattern learning toward training that explicitly rewards reasoning and correct behavior over multi-step tasks. A Kimi representative replied that RL compute will keep increasing, and suggested that new RL objective functions are likely, " especially in the agent space ." That line reads like a roadmap. As models become more tool-using and task-decomposing, labs will spend more of their budget training models to behave well as agents — not merely to predict tokens. For enterprises, this matters because RL-driven improvements often arrive with tradeoffs. A model can become more decisive, more tool-happy, or more aligned to reward signals that don't map neatly onto a company's expectations. The AMA didn't claim Moonshot had solved those tensions. It did suggest the team sees reinforcement learning as the lever that will matter more in the next cycle than simply buying more GPUs. When asked about the compute gap between Moonshot and American labs with vastly larger GPU fleets, the team was candid. "The gap is not closing I would say," one co-host wrote . "But how much compute does one need to achieve AGI? We will see." Another offered a more philosophical framing: "There are too many factors affecting available compute. But no matter what, innovation loves constraints." The model sometimes calls itself Claude, and Moonshot explained why that happens Open-weight releases now come with a standing suspicion: did the model learn too much from competitors? That suspicion can harden quickly into accusations of distillation, where one AI learns by training on another AI's outputs. A user raised one of the most uncomfortable claims circulating in open-model circles — that K2.5 sometimes identifies itself as "Claude," Anthropic's flagship model. The implication was heavy borrowing. Moonshot didn't dismiss the behavior. Instead it described the conditions under which it happens. With the right system prompt, the team said, the model has a high probability of answering "Kimi," particularly in thinking mode. But with an empty system prompt, the model drifts into what the team called an " undefined area ," which reflects pretraining data distributions rather than deliberate training choices. Then it offered a specific explanation tied to a training decision. Moonshot said it had upsampled newer internet coding data during pretraining, and that this data appears more associated with the token "Claude" — likely because developers discussing AI coding assistants frequently reference Anthropic's model. The team pushed back on the distillation accusation with benchmark results. "In fact, K2.5 seems to outperform Claude on many benchmarks," one co-host wrote. "HLE, BrowseComp, MMMU Pro, MathVision, just to name a few." For enterprise adopters, the important point isn't the internet drama. It's that identity drift is a real failure mode — and one that organizations can often mitigate by controlling system prompts rather than leaving the model's self-description to chance. The AMA treated prompt governance not as a user-experience flourish, but as operational hygiene. Users said the model lost its personality, and Moonshot admitted that "soul" is hard to measure A recurring theme in the thread was that K2.5's writing style feels more generic than earlier Kimi models. Users described it as more like a standard "helpful assistant" — a tone many developers now see as the default personality of heavily post-trained models. One user said they loved the personality of Kimi K2 and asked what happened. A Kimi co-host acknowledged that each new release brings some personality change and described personality as subjective and hard to evaluate. "This is a quite difficult problem," they wrote . The team said it wants to improve the issue and make personality more customizable per user. In a separate exchange about whether strengthening coding capability compromises creative writing and emotional intelligence, a Kimi representative argued there's no inherent conflict if the model is large enough. But maintaining " writing taste " across versions is difficult, they said, because the reward model is constantly evolving. The team relies on internal benchmarks — a kind of meta-evaluation — to track creative writing progress and adjust reward models accordingly. Another response went further, using language that would sound unusual in a corporate AI specification but familiar to people who use these tools daily. The team talked about the " soul " of a reward model and suggested the possibility of storing a user "state" reflecting taste and using it to condition the model's outputs. That exchange points to a product frontier that enterprises often underestimate. Style drift isn't just aesthetics. It can change how a model explains decisions, how it hedges, how it handles ambiguity, and how it interacts with customers and employees. The AMA made clear that labs increasingly treat "taste" as both an alignment variable and a differentiator — but it remains hard to measure and even harder to hold constant across training runs. Debugging emerged as the unglamorous truth behind frontier AI research The most revealing cultural insight came in response to a question about surprises during training and reinforcement learning. A co-host answered with a single word, bolded for emphasis: debugging. "Whether it's pre-training or post-training, one thing constantly manifests itself as the utmost priority: debugging," they wrote . The comment illuminated a theme running through the entire session. When asked about their "scaling ladder" methodology for evaluating new ideas at different model sizes, zxytim offered an anecdote about failure. The team had once hurried to incorporate Kimi Linear, an experimental linear-attention architecture, into the previous model generation. It failed the scaling ladder at a certain scale. They stepped back and went through what the co-host called "a tough debugging process," and after months finally made it work. "Statistically, most ideas that work at small scale won't pass the scaling ladder," they continued. "Those that do are usually simple, effective, and mathematically grounded. Research is mostly about managing failure, not celebrating success." For technical leaders evaluating AI vendors, the admission is instructive. Frontier capability doesn't emerge from elegant breakthroughs alone. It emerges from relentless fault isolation — and from organizational cultures willing to spend months on problems that might not work. Moonshot hinted at what comes next, including linear attention and continual learning The AMA also acted as a subtle teaser for Kimi's next generation. Developers asked whether Kimi K3 would adopt Moonshot's linear attention research , which aims to handle long context more efficiently than traditional attention mechanisms. Team members suggested that linear approaches are a serious option. "It's likely that Kimi Linear will be part of K3," one wrote . "We will also include other optimizations." In another exchange, a co-host predicted K3 "will be much, if not 10x, better than K2.5." The team also highlighted continual learning as a direction it is actively exploring, suggesting a future where agents can work effectively over longer time horizons — a critical enterprise need if agents are to handle ongoing projects rather than single-turn tasks. "We believe that continual learning will improve agency and allow the agents to work effectively for much longer durations," one co-host wrote. On Agent Swarm specifically, the team said it plans to make the orchestration scaffold available to developers once the system becomes more stable. "Hopefully very soon," they added. What the AMA revealed about the state of open AI in 2026 The session didn't resolve every question. Some of the most technical prompts — about multimodal training recipes, defenses against reward hacking, and data governance — were deferred to a forthcoming technical report. That's not unusual. Many labs now treat the most operationally decisive details as sensitive. But the thread still revealed where the real contests in AI have moved. The gap that matters most isn't between China and the United States, or between open and closed. It's the gap between what models promise and what systems can actually deliver. Orchestration is becoming the product. Moonshot isn't only shipping a model. It's shipping a worldview that says the next gains come from agents that can split work, use tools, and return structured results fast. Open weights are colliding with hardware reality, as developers demand openness that runs locally rather than openness that requires a data center. And the battleground is shifting from raw intelligence to reliability — from beating a benchmark by two points to debugging tool-calling discipline, managing memory in multi-agent workflows, and preserving the hard-to-quantify "taste" that determines whether users trust the output. Moonshot showed up on Reddit in the wake of a high-profile release and a growing geopolitical narrative. The developers waiting there cared about a more practical question : When does "open" actually mean "usable"? In that sense, the AMA didn't just market Kimi K2.5. It offered a snapshot of an industry in transition — from larger models to more structured computation, from closed APIs to open weights that still demand serious engineering to deploy, and from celebrating success to managing failure. "Research is mostly about managing failure," one of the Moonshot engineers had written. By the end of the thread, it was clear that deployment is, too.

SolarWinds Fixes Four Critical Web Help Desk Flaws With Unauthenticated RCE and Auth Bypass

SolarWinds Fixes Four Critical Web Help Desk Flaws With Unauthenticated RCE and Auth Bypass

SolarWinds has released security updates to address multiple security vulnerabilities impacting SolarWinds Web Help Desk, including four critical vulnerabilities that could result in authentication bypass and remote code execution (RCE). The list of vulnerabilities is as follows - CVE-2025-40536 (CVSS score: 8.1) - A security control bypass vulnerability that could allow an unauthenticated

Tomodachi Life: Living The Dream Teases Weddings And Babies, Releases In April

Tomodachi Life: Living The Dream Teases Weddings And Babies, Releases In April

Nintendo dove deep this morning on its upcoming simulation game, Tomodachi Life: Living The Dream. The Switch game (notably not Switch 2, though Nintendo reminds that you can play it on Switch 2) is a sequel to the 3DS game Tomodachi Life that released in North America in 2014 for 3DS. You can read our review of that game here . Tomodachi Life: Living The Dream releases April 16. In the game you create Miis and watch them live out their lives on Yoomian Island as you interact with them and create scenarios for the characters to meet and interact. The original game was renowned for being bizarre, and though Living the Dream's reveal teased some of that strangeness, today's presentation focused on the more typical parts of the game. You create the characters using familiar Mii creation tools and can dictate their gender (male, female, or nonbinary) as well as their dating preferences. Unlike previous Mii creation tools, you can now create elements like face paint, and that customization seems to extend to nearly every facet of the game. You can design the exteriors of buildings, the ground, clothing, and even draw pets for the Miis to have. The presentation showed one Mii walking with a hand drawn dog. Once you have Miis you can select elements of their personalities, gift them quirks like the animations they use to eat, and even dictate familiar phrases they will use. You can also play god and place your Miis next to each other to force them to interact and hopefully become friends. You can even choose what they talk about by typing in topics, which an automated, robotic voice will dictate. You can also push characters to become roommates with up to eight characters being able to live together in one large house. You can also witness elaborate romantic scenarios. Miis can develop crushes on each other, profess their love, and even be turned down. A scenario in the presentation showed one character being confronted by multiple suitors and having to choose. The way the game works is the Miis are meant to be characters with their own agency so even though you choose elements like where they go and how they do something, like profess their love to their crush, it will be up to the characters to make the final decisions. At the end of the presentation, Nintendo showcased a wedding scene, and even showed a baby crawling on the ground. It's unclear if those elements will just be performative visual moments, or if there will be opportunities to create families that birth children who grow up to become their own Miis. The game happens in real time (meaning your Miis will be living their lives even when you're not playing), so it remains to be seen how that part of the game will work. Alongside all the character interactions, Nintendo also showed off how much control you have over the design of the island. It feels very Animal Crossing in this way. Different stores sell food, clothing, offer photo gallery opportunities, there is a news station that gives frequent updates, and you can also build and relocate everything on the island, and even change the land mass. In a complimentary way, it looks like Nintendo is trying to merge elements of The Sims (creating and watching characters live out their lives) and Animal Crossing (living on and designing an island while witness to interpersonal character interactions). We will see if the game can live up to the reputation of either of those franchises when it releases on April 16.

AirTag 1 vs. AirTag 2 Buyer's Guide: All 15+ Differences Compared

AirTag 1 vs. AirTag 2 Buyer's Guide: All 15+ Differences Compared

Apple's new AirTag introduces a series of small improvements, so how does it compare to the original model from 2021? The second-generation ‌AirTag‌ arrives five years after the original, bringing improvements to tracking range, speaker output, and internal design while retaining the same outward design and accessory compatibility. At the same time, first-generation AirTags remain available from some retailers at reduced prices, raising the question of whether the newer model is worth choosing over the original, or whether the earlier ‌AirTag‌ still makes sense as a lower-cost option. The comparison below outlines every difference between the two generations, including Apple-announced feature upgrades and hardware changes identified through teardowns. While both models perform the same core function of tracking items through the Find My network, there are some small differences worth noting: ‌AirTag‌ (first-generation, 2021) ‌AirTag‌ (second-generation, 2026) First-generation Ultra Wideband chip Second-generation Ultra Wideband chip Shorter Precision Finding range Up to 50% farther Precision Finding range Precision Finding on iPhone only Precision Finding on ‌iPhone‌ and Apple Watch (Series 9 and later and Ultra 2) Earlier Bluetooth implementation Upgraded Bluetooth with increased range Bluetooth identifiers rotate at standard intervals Bluetooth identifiers rotate more frequently Standard speaker volume Up to 50% louder speaker Chime note in F Chime note in G Works on earlier supported iOS versions Requires iOS 26.2.1 or later Reset without a required wait between battery removals Reset requires battery out for at least five seconds each cycle 11g weight 11.8g weight (around 7% heavier) Back text listing "Assembled in China" and "Designed by Apple" Back text in all-caps listing IP67, NFC, and ‌Find My‌ Thicker main PCB Thinner main PCB with revised battery connectors, and additional test pads and markings Smaller speaker coil Slightly larger speaker coil Speaker magnet more easily removable Speaker magnet more firmly secured and harder to remove Wider box with flat printed text and plastic pull tabs Redesigned narrower box with updated artwork, raised UV printed text, and paper pull tabs Folio-style inner tray holding up to two rows of two AirTags Redesigned inner tray with simpler design holding up to four AirTags For buyers choosing between the two ‌AirTag‌ models, the decision depends less on basic tracking and more on how and where an ‌AirTag‌ is typically used. Both generations rely on the same ‌Find My‌ network for long-distance location updates, offer similar battery life, and work with the same accessories, so neither model is considerably better for general item tracking. The second-generation ‌AirTag‌ is likely to benefit users who frequently rely on Precision Finding rather than approximate location. The extended Ultra Wideband range makes it easier to determine the specific location of items, while the louder speaker improves audibility in noisy spaces or when an ‌AirTag‌ is buried inside a bag or suitcase. Support for Precision Finding on compatible Apple Watch models also makes the newer ‌AirTag‌ more convenient for users who often leave their ‌iPhone‌ behind. The first-generation ‌AirTag‌ remains a practical option for the overwhelming majority of use cases, such as tracking keys, backpacks, or household items that are usually misplaced within short distances. If available at a meaningful discount, it may offer better value for users who do not need Precision Finding at extended range, do not use an Apple Watch for item location, or simply want basic ‌Find My‌ functionality at the lowest cost. For existing ‌AirTag‌ owners, there is certainly no pressing need to upgrade. For new buyers, the second-generation ‌AirTag‌ simply offers the most complete feature set and greater flexibility going forward, acting as a moderate specification bump over the previous model. Tag: AirTag This article, " AirTag 1 vs. AirTag 2 Buyer's Guide: All 15+ Differences Compared " first appeared on MacRumors.com Discuss this article in our forums

Here's Everything Apple Released This Week

Here's Everything Apple Released This Week

Following a quiet start to 2026, the final week of January has been a busy one for Apple so far. There are new versions of the AirTag and the Black Unity band for the Apple Watch, and the Apple Creator Studio bundle is now available. Apple also released iOS 26.2.1 and watchOS 26.2.1 updates, and iOS 26.3 beta testing continues. While the launch of Apple Creator Studio would have been a fitting opportunity for Apple to unveil new MacBook Pro models with M5 Pro and M5 Max chips, unfortunately it looks like that proved to be nothing more than wishful thinking. We have recapped our coverage of these topics below. New AirTag Apple Unveils New AirTag With Longer Range, Louder Speaker, and More 10+ Things to Know About the New AirTag 2 New AirTag's Improved Precision Finding Requires These iPhone Models AirTag 2: These Airlines Offer Feature That Helps Find Your Lost Bags Precision Finding on Apple Watch Doesn't Work With the Original AirTag Teardown Reveals AirTag 2 is Full of Hidden Changes New Black Unity Band Apple Introduces New Black Unity Apple Watch Band Apple Creator Studio Apple's 'Creator Studio' App Bundle Now Available for $12.99 Per Month Apple Creator Studio Hands-On: What You Get for $12.99 Per Month Apple Updates Final Cut Pro and Logic Pro With These New Features Apple Updates Keynote, Numbers, and Pages Apps With New Free and Paid Features Pixelmator Pro Launches on iPad With Apple Pencil Support and More Apple Stops Selling $200 'Pro Apps' Bundle With Final Cut Pro and More Software Updates Apple Releases iOS 26.2.1 With AirTag 2 Support Apple Releases watchOS 26.2.1, Adding Precision Finding Support for AirTag 2 Apple Seeds Third Betas of iOS 26.3 and iPadOS 26.3 to Developers Apple Seeds Third Betas of iOS 26.3 and iPadOS 26.3 to Public Beta Testers iOS 26.3 Adds Privacy Setting to Limit Carrier Location Tracking Warning: These Continuity Features Are Broken on Latest iOS 26.3 and iPadOS 26.3 Betas iPhone 5s Gets New Software Update 13 Years After Launch This article, " Here's Everything Apple Released This Week " first appeared on MacRumors.com Discuss this article in our forums