Caio Theodoro Caio Theodoro
← Blog · Mar 2026 · 19 min read

Why I Stopped Trusting My ARIMA Models and Started Simulating Entire Economies Instead

How MiroFish's Swarm Intelligence Engine Is Rewriting the Rules of Demand Forecasting — And What Happens When You Feed It FRED, Macro Signals, and 40 Years of History

ML Engineering

How MiroFish's Swarm Intelligence Engine Is Rewriting the Rules of Demand Forecasting — And What Happens When You Feed It FRED, Macro Signals, and 40 Years of History

By a senior AI engineer who has spent too many nights debugging time-series pipelines and is finally excited about something again.


The Problem Nobody Wants to Admit

Here's the dirty secret of demand forecasting: most of our models are embarrassingly bad at the moments when accuracy matters most.

I've spent the last four years building forecasting systems at scale — everything from retail inventory prediction to SaaS revenue projections. I've tuned LightGBM models, stacked LSTMs on top of transformers, and thrown every FRED macro series I could find at gradient-boosted trees. And for 90% of the time, these models do fine. They catch the seasonal patterns. They track the trend. They impress stakeholders in quarterly reviews.

But then something happens. A tariff announcement. A Fed rate decision that nobody expected. A viral social media moment that shifts consumer sentiment overnight. And suddenly, the model that was nailing MAPE at 4.2% last month is off by 35%, and you're staring at a warehouse full of inventory nobody wants or an empty shelf that should have been full three weeks ago.

The fundamental issue is that traditional demand forecasting treats the world like a physics problem — deterministic, reducible to equations, solvable with enough features and enough data. But demand is driven by people. People who react to each other, who form opinions based on what their neighbors are buying, who panic-buy when the news cycle gets scary, who suddenly decide oat milk is over and cashew milk is in because a TikTok went viral.

That's why MiroFish caught my attention, and why I think it represents a genuinely new paradigm for how we think about demand forecasting.


What MiroFish Actually Is (No Buzzwords, I Promise)

MiroFish is an open-source swarm intelligence engine that predicts outcomes by simulating emergent social behavior, rather than by fitting statistical models to historical data. It was built by Guo Hangjiang, an undergraduate at Beijing University of Posts and Telecommunications, in roughly ten days. Within 24 hours of a rough demo, Chen Tianqiao — the billionaire founder of Shanda Group — committed around $4.1 million to incubate the project. It hit #1 on GitHub's global trending list above OpenAI and Google repositories and has accumulated over 45,000 stars as of this writing.

The core idea is beguilingly simple: instead of asking "what does the historical data predict?", MiroFish asks "what would thousands of simulated people do when faced with this situation?"

You provide "seed material" — news articles, financial reports, policy documents, datasets, market research, whatever describes the current state of the world relevant to your question. You describe your prediction need in natural language. MiroFish then does five things:

First, it uses GraphRAG (Graph-based Retrieval Augmented Generation) to parse your input and extract entities and relationships, building a structured knowledge graph. It's not treating your data as a flat bag of text or a column of numbers. It's mapping who the key players are, how they're connected, what forces and pressures exist.

Second, it generates agent personas from that graph — each with distinct backgrounds, stances, personalities, and behavioral logic. An "Environment Configuration Agent" sets up the simulation parameters and rules of the world.

Third, the simulation runs on two parallel social platforms simultaneously (one resembling Twitter's short-form dynamics, the other resembling Reddit's threaded discussion format). Agents post, debate, argue, form coalitions, shift opinions, and influence each other. Their memories update continuously as simulated time advances, powered by Zep Cloud for persistent long-term storage.

Fourth, a dedicated ReportAgent analyzes the emergent outcomes and compiles a human-readable forecast that synthesizes what the agent population collectively arrived at.

Fifth, you can enter the simulation yourself. You can interview any individual agent. You can ask the ReportAgent follow-up questions. You can inject new variables and watch what happens. This isn't a black box that spits out a number — it's a living system you can interrogate.

The simulation engine under the hood is OASIS (Open Agent Social Interaction Simulations) from CAMEL-AI, which supports up to one million agents with 23 different social actions: following, commenting, reposting, liking, muting, searching, and more.


Why This Matters for Demand Forecasting Specifically

Traditional demand forecasting fails at precisely the inflection points where accuracy has the highest dollar value, and the reason is architectural: statistical models are trained on historical correlations, and correlations break down when the underlying causal dynamics shift.

Consider a practical example. You're forecasting demand for consumer electronics ahead of the holiday season. Your model has CPI, consumer sentiment index, unemployment rate, and three years of historical sales data. Then, a week before Black Friday, the Fed announces an unexpected rate hold, a major competitor launches a surprise price war, and a prominent tech reviewer publishes a scathing video that goes viral. Your ARIMA model doesn't know what to do with any of this. Your gradient-boosted tree might catch the macro shift a quarter later, when the CPI data finally reflects it.

MiroFish handles this differently because it's not fitting curves to past data. It's simulating how different market participants react to each other in real time. You seed it with the current macro environment, the competitor landscape, the social media sentiment, and you watch thousands of simulated consumers, retailers, analysts, and media outlets interact. What emerges isn't a point estimate — it's a scenario landscape, a distribution of plausible futures shaped by the same social dynamics that drive real demand.

This is the architectural insight that makes swarm intelligence relevant to demand forecasting in a way that traditional ML approaches simply aren't: demand is an emergent property of collective human behavior, and the only honest way to forecast an emergent property is to simulate the system that generates it.


Feeding the Beast: How Macro Data, FRED, and Historical Series Become Seed Material

Here's where things get practical. MiroFish isn't a plug-and-play forecasting API. It's an engine that requires thoughtful input design — and this is actually where the demand forecasting application becomes powerful, because the quality of your seed material determines the quality of your simulation.

FRED and Macroeconomic Indicators as World-Building Data

The Federal Reserve Economic Data (FRED) database is, in my opinion, the single most underutilized resource in applied demand forecasting. It contains over 800,000 time series covering everything from the federal funds rate to county-level unemployment to the University of Michigan Consumer Sentiment Index. Most ML practitioners dump a handful of FRED series into their feature matrix and call it a day.

MiroFish inverts this relationship. Instead of treating FRED data as numerical features in a regression, you use it as context for world-building. You're not feeding the Consumer Price Index into a model as a floating-point number. You're writing the seed material that says: "CPI has risen 3.8% year-over-year, the highest in six months. Real wages have stagnated. Grocery prices are up 5.2% while energy costs remain elevated. The Fed has held rates steady at 4.25-4.50% through the first quarter of 2026, and forward guidance suggests two cuts are likely before year-end."

This matters because the agents in MiroFish respond to this information the way real people do — not as coefficients in a regression equation, but as contextual knowledge that shapes their behavior, risk tolerance, and purchasing decisions. A simulated retail consumer who "knows" that inflation is sticky and wages are flat behaves very differently from one who "knows" that prices are falling and employment is strong. And when thousands of these agents interact, the collective demand signal that emerges is grounded in realistic behavioral responses to actual economic conditions.

The key FRED series I've found most useful for demand forecasting seed material include:

Consumer-side indicators like the Consumer Price Index (CPIAUCSL), Personal Consumption Expenditures (PCE), Real Disposable Personal Income (DSPIC96), the Consumer Sentiment Index (UMCSENT), and Retail Sales data (RSXFS). These tell your agents what it "feels like" to be a consumer right now.

Labor market indicators like the Unemployment Rate (UNRATE), Nonfarm Payrolls (PAYEMS), Job Openings (JTSJOL), and the Quits Rate (JTSQUR). The labor market drives consumer confidence, and consumer confidence drives spending behavior. Research has shown that including macroeconomic variables like CPI and Consumer Sentiment in forecasting models can provide significantly greater explanatory power, and some studies have achieved forecasting error reductions of over 50% by incorporating leading macroeconomic indicators.

Financial conditions indicators like the Federal Funds Rate (FEDFUNDS), the 10-Year Treasury Yield (DGS10), the S&P 500 (SP500), and the Financial Conditions Index. These shape business investment, credit availability, and the mood of the institutional players in your simulation.

Supply-side indicators like the Producer Price Index (PPIACO), Industrial Production (INDPRO), Inventory-to-Sales Ratios (ISRATIO), and Import Price Indexes. These set the constraints on what's available, at what cost, for your simulated economy.

The trick is not to dump all of these into a single seed document. It's to synthesize them into a coherent narrative of the current economic environment. MiroFish's GraphRAG will do some of this extraction for you, but the more structured and narrative your seed material is, the richer the knowledge graph becomes, and the more realistic the agent behavior.

Historical Demand Data as Agent Memory

This is where things get really interesting. MiroFish agents have persistent memory via Zep Cloud, and you can seed that memory with historical patterns.

Rather than feeding a time series into a statistical model, you can embed historical demand patterns into the seed material as context that agents internalize. For instance: "This product category has historically seen a 22% demand spike in Q4, followed by a 15% pullback in Q1. During the 2023 holiday season, demand exceeded forecasts by 31% due to a viral social media campaign. The 2024 season was flat as post-pandemic normalization continued."

The agents don't "run" a statistical model on this history. They carry it as contextual knowledge — the same way a real retail buyer or supply chain manager carries years of pattern recognition in their head. When the simulation runs, their decisions are informed by this history but not mechanically determined by it. They can deviate when the simulated conditions warrant it, the same way an experienced human forecaster overrides the model when they smell something changing.

Real-Time Signals as Dynamic Variable Injection

One of MiroFish's most powerful features for demand forecasting is what the project describes as the "God's-eye view" — the ability to dynamically inject variables into a running simulation. This maps directly to the demand forecasting use case of scenario planning and stress testing.

Say your simulation is running a baseline demand forecast for Q3. Midway through, you inject: "Breaking: Major port strike shuts down West Coast shipping for two weeks." The agents — importers, retailers, consumers, media — react in real time. You watch supply chain anxiety cascade through the system. Some consumers panic-buy. Some retailers switch suppliers. Some delay purchases entirely. The emergent demand pattern shifts in ways that no static model could anticipate, because the cascade effects are driven by agent-to-agent interaction.

You can pull these injection scenarios from anywhere: breaking news feeds, social media sentiment shifts, commodity price spikes, geopolitical developments, competitor announcements. The beauty of the injection mechanism is that it tests your demand forecast against exactly the kind of exogenous shocks that break traditional models.


The Architecture of a Swarm-Based Demand Forecasting Pipeline

Having spent a few weeks prototyping with MiroFish, here's the architecture I've converged on for a demand forecasting application. It's not a production-ready pipeline — MiroFish is still at v0.1.2 — but it's a blueprint that I think will become standard within the next two years.

Layer 1: Data Ingestion and Synthesis

Pull the latest FRED data via the FRED API. Pull internal sales history from your data warehouse. Pull social sentiment from your monitoring tools. Pull competitor intelligence from your market research feeds. Synthesize all of this into a structured seed document — part narrative, part data, part knowledge graph primer. This is the "state of the world" your simulation will be built on.

The seed document is the single most important artifact in the entire pipeline. Garbage in, garbage out applies with exponential force to swarm simulations, because a poorly constructed world produces agents whose behavior doesn't map to reality.

Layer 2: Simulation Configuration

Define your agent population. For a consumer goods demand forecast, you might want: a mix of consumer archetypes (price-sensitive, brand-loyal, impulse-driven, research-heavy), a cohort of retail decision-makers, a handful of market analysts, some media agents who amplify signals, and perhaps institutional buyers or wholesale agents. The GraphRAG step will help generate these, but you'll want to review and tune the personas to ensure your population reflects your actual market.

Set your simulation parameters: how many rounds, what platforms (the Twitter-like platform for rapid sentiment cascades, the Reddit-like platform for deeper discussion and analysis), what social actions are enabled.

Layer 3: Simulation Execution

MiroFish runs the simulation on its dual-platform architecture. Agents interact, debate, form opinions, change their minds. The temporal memory updates continuously. The system automatically tracks your prediction question throughout the simulation.

This is where the swarm intelligence actually happens. No single agent is "predicting demand." Demand emerges from the collective behavior of agents making individual decisions — exactly like real markets. You're not fitting a model. You're growing a market.

Layer 4: Analysis and Extraction

The ReportAgent synthesizes what happened. You can also enter the simulation and interview specific agents — a powerful feature for understanding why the demand forecast came out the way it did. Ask the simulated price-sensitive consumer why they delayed their purchase. Ask the retail buyer why they over-ordered. The explanatory power here vastly exceeds anything you get from SHAP values on a gradient-boosted tree.

Layer 5: Scenario Branching

Run the simulation multiple times with different injected variables. What happens if inflation accelerates? What if a competitor drops prices 20%? What if a supply chain disruption hits? Each branch produces a scenario forecast, and the distribution across scenarios gives you something traditional forecasting rarely provides: a genuine uncertainty estimate grounded in behavioral dynamics rather than statistical confidence intervals.


The Polymarket Proof Point: When Swarm Predictions Meet Real Money

The most compelling validation of MiroFish's forecasting architecture doesn't come from an academic benchmark — it comes from prediction markets. Multiple developers have connected MiroFish to Polymarket trading bots, using swarm simulations to identify mispricings in real money markets.

One well-documented case involved a trader who simulated 2,847 digital agents before every trade on Polymarket's short-term Bitcoin price markets. Over 338 trades, they reported $4,266 in profit, with one position returning 1,655% in five minutes. The edge was conceptually simple: Polymarket is a crowd-behavior market, and MiroFish is a crowd-behavior simulator. When the simulated crowd's consensus diverged from what Polymarket was pricing, the bot entered the trade.

Another experiment — more rigorous and more illuminating — involved simulating 200 agents to predict whether maritime shipping in the Strait of Hormuz would return to normal by the end of April 2026. The researcher seeded 200 agents with different roles (government officials, media, military, energy companies, traders, citizens) and ran 100+ rounds of interaction. The organic group consensus was 47.9% probability, while Polymarket's market price was 31% — a 16.9 percentage point gap. Interestingly, the most cautious agents — the pessimists whose organic expressions during the simulation most closely matched Polymarket's pricing — were the ones with the most domain expertise who participated naturally rather than being interviewed.

This isn't a crystal ball. Nobody should treat MiroFish output as ground truth. But it's a proof of concept that swarm intelligence can produce actionable signals in real financial markets, and that mapping directly tells us it can produce actionable signals for demand forecasting.


What Swarm Intelligence Does That Traditional Models Can't

Let me be concrete about the specific advantages I've seen in demand forecasting contexts:

Cascade effect modeling. When a tariff is announced, a traditional model sees a feature change in the next quarter's data. MiroFish simulates the cascade: importers scramble to pre-order, retailers respond by adjusting shelf space, consumers see news coverage and either panic-buy or defer, media coverage amplifies or dampens the reaction. The demand signal that emerges reflects the social dynamics of the disruption, not just its first-order economic effect.

Sentiment contagion. Consumer sentiment isn't just a number in a survey. It spreads through social networks, gets amplified by media coverage, and creates feedback loops. A MiroFish simulation captures this contagion — agents influencing other agents — in a way that a Consumer Sentiment Index feature in a regression simply cannot.

Heterogeneous agent behavior. Real markets are not composed of a single "representative consumer." They contain price-sensitive shoppers, brand loyalists, impulse buyers, and careful researchers. A swarm simulation with diverse agent personas captures how these different populations respond differently to the same stimulus, and how their interactions produce the aggregate demand signal.

Narrative-level explanation. When a traditional model's forecast is wrong, you can look at feature importances, but you can't ask it why. When a MiroFish simulation produces an unexpected result, you can go into the simulation, interview the agents, and understand the social dynamics that produced the outcome. This is transformative for supply chain decision-making, where trust in the forecast is as important as the forecast itself.


The Honest Limitations

I want to be real about what MiroFish cannot do in its current state, because over-promising is how we destroy useful technologies.

No published benchmarks. As of March 2026, there are no rigorous published studies comparing MiroFish's predictions against actual outcomes in a systematic way. The Polymarket results are anecdotal. The system produces plausible scenarios, and those scenarios feel behaviorally realistic, but "feels right" is not validation.

LLM costs are nontrivial. Running hundreds of agents through dozens of simulation rounds means a lot of LLM API calls. The MiroFish docs recommend starting with fewer than 40 rounds for a reason. For enterprise-scale demand forecasting that runs daily across thousands of SKUs, the compute economics don't work yet. This will change as inference costs continue to fall, but right now it's a constraint.

Seed material quality is everything. If your seed document poorly represents the current state of the world, your simulation will be confidently wrong. There's no statistical fallback — no historical data to anchor the model if the world-building is off. This requires genuine domain expertise in constructing the input.

It's not a replacement for statistical models. MiroFish excels at the qualitative, behavioral, emergent dynamics that traditional models miss. It does not excel at the precise quantitative point estimates that supply chains need for daily inventory management. The most promising approach — and several commentators have noted this — is combining swarm simulation with conventional forecasting to cross-validate outputs.


Where This Goes Next

The convergence of swarm intelligence engines like MiroFish with macroeconomic data pipelines (FRED, Bloomberg, Refinitiv), alternative data feeds (social sentiment, satellite imagery, credit card transaction data), and traditional forecasting stacks (Prophet, LightGBM, temporal fusion transformers) is, I believe, the next major shift in applied demand forecasting.

The specific technical developments I'm watching:

Automated FRED-to-seed-material pipelines that continuously synthesize the latest macroeconomic data into narrative context documents suitable for MiroFish ingestion. This eliminates the manual bottleneck in seed material creation and enables near-real-time simulation refreshes.

Hybrid ensemble architectures where a traditional statistical model provides the baseline point estimate, and a MiroFish simulation provides the scenario distribution and the behavioral uncertainty band around that estimate. The statistical model tells you "demand will be approximately 10,000 units." The swarm simulation tells you "but if the Fed surprises hawkish and competitor X runs a promotion, the agents suggest a 30% downside scenario with 40% probability."

Domain-specific agent libraries pre-configured for common demand forecasting contexts — retail, CPG, automotive, pharma, tech hardware — so that practitioners don't need to build personas from scratch every time.

Calibration against historical shocks. Running MiroFish simulations against known historical disruptions (the 2020 pandemic, the 2021 supply chain crisis, the 2022 inflation spike, the 2025 tariff waves) and measuring how well the simulated emergent behavior matches what actually happened. This is the path to the rigorous benchmarks the system currently lacks.


The Bottom Line

MiroFish is not magic. It's not a crystal ball. It's an architecture — a fundamentally different way of thinking about forecasting that prioritizes behavioral emergence over statistical extrapolation.

For demand forecasting specifically, it fills a gap that I've felt acutely in my career: the gap between what the numbers say and what the market actually does when human psychology takes the wheel. FRED data, macroeconomic indicators, historical demand series, real-time news signals — all of these are more powerful as seed material for a swarm simulation than as features in a regression, because they inform behavior rather than parameterize a curve.

We're early. The tooling is immature. The benchmarks don't exist yet. The cost profile doesn't support production-scale daily forecasting for most organizations. But the architectural insight — that demand is emergent, that you forecast emergent properties by simulating the system that generates them, and that thousands of LLM-powered agents with diverse personas and persistent memory can produce that simulation at useful fidelity — is sound.

I've been in this field long enough to know that most things that seem revolutionary turn out to be incremental. But occasionally, you encounter an idea that makes you rethink the entire problem from the ground up.

Swarm intelligence for demand forecasting is that kind of idea. And MiroFish, messy and early as it is, is the best available implementation of it today.


MiroFish is open-source and available at github.com/666ghj/MiroFish. FRED data is freely accessible at fred.stlouisfed.org.