Muse Spark is not another chatbot refresh. It is Meta’s first model from the new Superintelligence Labs, built over nine months after scrapping Llama 4’s stack, and it is already powering meta.ai. After testing it against GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, the pattern is clear: it wins on health and visual reasoning, holds its own on science, and loses badly on code and autonomous office work.
If you are deciding whether to log in with your Facebook or Instagram account to use it, this is the operator’s guide, not the press release.
⚡Quick Verdict Card
Best For
Health Q&A, chart interpretation, visual reasoning, free daily assistant
Skip For
Coding (needs more testing), autonomous workflows (needs more testing), abstract puzzles, privacy-sensitive tasks
Cost
Free with Meta account login (no subscription required)
Overall Score
AI Index: 52/100 – Top 5 overall, behind GPT-5.4 (57) and Claude (53)
What Muse Spark Actually Is
Think of it as a multimodal reasoning model with three gears. It takes text, image, and voice input, and returns text only for now. It runs inside Meta’s apps, free to use, and requires a Meta account login.
Meta built it specifically for personal, everyday reasoning. The company highlights two flagship abilities: multi-agent parallel reasoning for hard problems, and health reasoning trained with over 1,000 physicians. It also scores unusually well on visual STEM tasks like chart and diagram interpretation.
The backstory matters. In June 2025, Meta spent $14.3 billion to acquire a 49% nonvoting stake in Scale AI and brought in its cofounder Alexandr Wang as Meta’s first-ever chief AI officer. Wang’s team scrapped the Llama 4 architecture entirely and rebuilt from scratch – new infrastructure, new data pipelines, new training approach. Muse Spark (originally code-named Avocado) is the result. It was trained with over 10x less compute than Llama 4 Maverick while scoring higher, which tells you how broken the previous stack was.
This is also Meta’s first closed-source model – a sharp U-turn from Zuckerberg’s “open source AI is the path forward” manifesto. Wang has said they plan to open-source future versions, but right now Muse Spark’s weights are locked. The developer community is split: some see it as a necessary pivot after Llama 4 failed to gain traction, others see it as Meta closing the gates now that they finally have a competitive reasoning model.
The Three Speeds You Will Actually Use
⚡
Instant Mode DEFAULT
Fast, conversational, good for lookups, quick summaries, and casual chat. Use this for 80% of tasks. Lowest latency thanks to 10x compute efficiency over Llama 4.
🧠
Thinking Mode STEP-BY-STEP
Adds step-by-step reasoning. Slower, but more thorough analysis. Use for document review, chart explanation, or multi-step problems. ARC-AGI-2 score: 42.5 (weak vs GPT-5.4’s 76.1).
🔮
Contemplating Mode DIFFERENTIATOR
Instead of one model thinking longer, Muse Spark spins up multiple agents that reason in parallel and synthesize. This pushed the model to 50.2% on Humanity’s Last Exam and 38.3% on FrontierScience Research – both ahead of GPT-5.4 and Gemini. Rolling out gradually.
Benchmark Reality Check
Do not buy the headline rank. Look at the splits. Muse Spark scores 52 on the Artificial Analysis Intelligence Index – that puts it in the top 5 overall, but behind GPT-5.4 (57), Gemini 3.1 Pro (57), and Claude Opus 4.6 (53). The interesting story is in the category breakdowns.
📊 Benchmark Comparison Dashboard
Scores normalized for visual comparison. Higher is better.
Muse Spark
GPT-5.4
Claude Opus 4.6
Gemini 3.1 Pro
🏆 AI Analysis Index
Muse Spark
52
GPT-5.4
57 👑
Claude 4.6
53
Gemini 3.1
57 👑
🩺 HealthBench Hard
Muse Spark
42.8 👑
GPT-5.4
40.1
Gemini 3.1
20.6
📈 CharXiv Reasoning
Muse Spark
86.4 👑
GPT-5.4
82.8
Gemini 3.1
80.2
🧪 Humanity’s Last Exam
Muse Spark
50.2% 👑
GPT-5.4
43.9%
Gemini 3.1
48.4%
💻 Terminal-Bench Coding
Muse Spark
59.0
GPT-5.4
75.1 👑
Gemini 3.1
68.5
🧩 ARC-AGI-2 Reasoning
Muse Spark
42.5
GPT-5.4
76.1 👑
Gemini 3.1
76.5 👑
🤖 GDPval-AA Agentic (ELO)
Muse Spark
1,444
GPT-5.4
1,674 👑
Claude 4.6
1,607
⚙️ Token Efficiency – Cost to Complete Full Index
Muse Spark
58M
tokens
GPT-5.4
120M
tokens (2x more)
Claude 4.6
157M
tokens (2.7x more)
Gemini 3.1
~58M
tokens
Fewer tokens = faster responses and cheaper to run at scale. Muse Spark matches Gemini and uses half what GPT-5.4 needs.
Where It Beats the Others
Health and Nutrition
This is the clearest win. With physicians in the training loop, Muse Spark gives structured, factual responses with interactive breakdowns. In testing, it correctly identified macros from a photo of airport snacks and ranked them by protein without manual label reading, while Gemini hallucinated one brand and GPT-5.4 gave generic advice. The 42.8 HealthBench Hard score puts it definitively ahead of GPT-5.4’s 40.1 and Gemini’s 20.6.
Visual Data
On charts, diagrams, and photos of real objects, it trails Gemini by 2 points on MMMU-Pro (80.5% vs 82.4%) but beats everyone on CharXiv at 86.4. If your work involves interpreting figures from papers or dashboards, this is your free tool. It correctly reads log scales where GPT-5.4 missed the axis transformation in our tests.
Scientific Reasoning in Contemplating Mode
For hard, open-ended science questions, the parallel agents help. Instead of a single model reasoning for longer (like Gemini Deep Think or GPT Pro), Contemplating mode orchestrates multiple agents that reason in parallel and synthesize. As Meta explains it: “To spend more test-time reasoning without drastically increasing latency, we can scale the number of parallel agents that collaborate to solve hard problems.” It outperformed both GPT-5.4 Pro and Gemini Deep Think on Humanity’s Last Exam with 50.2%.
Where It Loses
🚫 Where Muse Spark Falls Short
💻 Coding16-point gap
Terminal-Bench: 59.0 vs GPT-5.4’s 75.1. Writes functional Python but misses edge cases, produces verbose refactors. In our test, it introduced a subtle bug with mutable defaults. Use Claude Opus 4.6 or GPT-5.4 for all code tasks.
🧩 Abstract Reasoning34-point gap
ARC-AGI-2: 42.5 vs 76+ for both GPT-5.4 and Gemini. For novel pattern puzzles or lateral thinking tasks, Muse Spark is not competitive.
🤖 Agentic Office Work230 ELO gap
GDPval-AA: 1,444 ELO vs GPT-5.4’s 1,674. It will not reliably run end-to-end workflows across tools – filling spreadsheets, navigating sites, chaining desktop tasks. Needs hand-holding.
Best Use Cases by Job Role
Use this as a buying filter. If your day involves messy inputs and a finished artifact, Muse Spark fits. If you need autonomous execution, skip it.
🎯 Role-by-Role Decision Matrix
🩺
Health Coaches, Clinicians, Patients
HealthBench: 42.8 (best in class)
✅ Best Muse Spark Uses
Photo nutrition analysisExercise form feedbackLab concept explainersMacro breakdown from food photos
Coding analysis (GPT-5.4)Statistical modelingData pipeline work
📱
Marketers and Creators
Free multimodal + fast Instant mode
✅ Best Muse Spark Uses
Ad creative critiqueInstagram image analysisQuick copy variantsVisual content audit
⛔ Use Something Else For
Campaign automation (no API)API workflowsCopywriting (Claude)
⚙️
Ops Managers and Founders
Vision + parallel reasoning for complex queries
✅ Best Muse Spark Uses
Photo troubleshootingWhiteboard to summaryTravel researchQuick visual analysis
⛔ Use Something Else For
Spreadsheet automationCRM updatesMulti-step office workflows Try Manus AI or GPT-5.4 for these tasks
🚫
Developers
Terminal-Bench: 59.0 – Not a coding model
⛔ Avoid for Now
All code tasksCode reviewDebuggingRefactoring
💡 Instead Use
Claude Opus 4.6 for code review and debugging. GPT-5.4 for general coding.
6 Tested Prompts With Variance Across Models
These are written as work orders – the format that gets the best results from Muse Spark. Each includes what to expect versus other LLMs.
1Health: Airport Snack Photo Analysis
Act as a registered dietitian. Analyze this photo of snacks. Output a table with: item name, estimated protein, carbs, fat, calories, and a health score 1-10. Then recommend the top 2 high-protein options under 250 calories. Stop and ask if any item is unclear.
Muse Spark Identified 5/6 items, sortable table with explanations GPT-5.4 Accurate macros but no visual grounding Gemini Mislabeled one bar brand
🏆 Winner: Muse Spark – free and visually grounded
2Research: Chart From a Paper
Thinking mode. Explain this bar chart step by step. First describe axes and units. Second, list the top 3 trends. Third, suggest one limitation of the data. Output in bullet points.
Muse Spark Correctly read log scales, detailed breakdown GPT-5.4 Missed the axis transformation Claude More concise but less accurate on values
🏆 Winner: Muse Spark – CharXiv strength on display
3Science: Frontier Reasoning
Contemplating mode. Evaluate three possible explanations for why protein X stabilizes under heat shock. Use parallel reasoning, then synthesize the most plausible mechanism with supporting evidence and one counter-argument.
Muse Spark Three distinct agent perspectives + synthesis GPT-5.4 Faster but shallower Gemini DT Good depth, slower
Instant mode. I photographed the error code E24 on my Bosch dishwasher. Tell me what it means, the most common fix I can try safely, and when to call a technician. Include safety warnings.
Muse Spark Correct diagnosis (drain issue), step-by-step fix, safety warnings GPT-5.4 Same answer but without seeing the photo
🤝 Tie – but Muse Spark is free and multimodal
5Marketing: Instagram Creative Critique
Analyze this Instagram ad screenshot. Score hook, clarity, and CTA from 1-10. Suggest 3 specific copy changes to improve CTR for UAE audience. Keep tone friendly, not salesy.
Muse Spark Actionable feedback + localized UAE suggestions Gemini Slightly better at visual layout critique Claude Best rewritten copy
💡 Best workflow: Muse for quick audit, Claude for rewrite
6Coding: Refactor Python Function
Refactor this Python function for readability and add type hints. Preserve behavior.
Muse Spark Completed but introduced subtle mutable defaults bug GPT-5.4 Passed tests first try Claude Passed tests first try
⚠️ Do not use Muse Spark for code yet
How to Prompt Muse Spark for Best Results
Stop writing chatty prompts. Use this work order structure that matches how the model was optimized:
📋Muse Spark Work Order TemplateCopy template
Act as a[role].
Your task is to[specific deliverable].
Use these inputs only:[text, image, voice note].
Success looks like:[clear definition of done].
Output format:[table, bullets, 3-paragraph memo].
Constraints:[tone, length, avoid medical advice, cite sources].
Stop and ask if:[image unclear, data missing, cost or risk high].
⚠️ The last line matters most. Muse Spark will otherwise keep trying to be helpful and burn time, especially in Thinking mode.
Privacy, Cost, and Platform Notes
💰
Cost: Free
Free with a Meta account. No subscription. More generous than GPT-5.4 Pro or Claude Opus, which sit behind paywalls. API is private preview only – no public pricing yet.
📱
Platforms
Available now on meta.ai and the Meta AI app. Rolling out to WhatsApp, Instagram, Facebook, Messenger, and Ray-Ban Meta glasses in coming weeks.
🔒
Privacy Warning
You must log in with Facebook or Instagram. Meta’s privacy policy sets few limits on how data shared with its AI can be used. The model will unlock features citing your shared content. Treat it like a public tool. Do not upload patient data, contracts, or anything sensitive.
The $14.3 Billion Backstory That Explains Everything
Muse Spark does not make sense without the context. In June 2025, Zuckerberg was reportedly unhappy with how Llama models lagged behind OpenAI and Anthropic. Meta spent $14.3 billion to acquire a 49% nonvoting stake in Scale AI and hired its cofounder Alexandr Wang as Meta’s first-ever chief AI officer. Wang and Zuckerberg then went on a talent acquisition spree, reportedly offering AI researchers at rival labs pay packages climbing into the hundreds of millions of dollars when equity was included, pulling talent from OpenAI, Anthropic, and Google.
Wang’s team created Meta Superintelligence Labs and scrapped the entire Llama 4 stack. Nine months later, Muse Spark is the result. It was code-named Avocado internally, it is Meta’s first proprietary model, and it is positioned as “the first step toward a personal superintelligence.” The developer community is watching closely because Meta has historically been the open-source standard bearer in AI – and this model is closed.
Final Verdict: When to Choose Muse Spark
✅ Use Muse Spark When
✓ Health, nutrition, or fitness explanation from text or photos
✓ Interpreting charts, diagrams, or visual data quickly
✓ Frontier science reasoning (Contemplating mode)
✓ You want a free, fast assistant inside Instagram or WhatsApp
✓ Photo-based troubleshooting or visual analysis
✓ Quick ad creative audits and visual feedback
⛔ Skip Muse Spark When
✗ Writing or debugging code
✗ Reliable autonomous workflows across tools
✗ Solving novel abstract puzzles
✗ You cannot accept Meta account login for privacy
✗ You need API access for automation
✗ Complex multi-step office tasks (spreadsheets, CRM)
For most knowledge workers, the smart setup is hybrid: Muse Spark for health and vision, GPT-5.4 or Claude for code and agents (see our best AI for business breakdown), Gemini for top-tier multimodal when you need that extra 2% accuracy. That combination covers the gaps without paying for three subscriptions, because Muse Spark is free.
🧭 The Hybrid Stack for 2026
HEALTH & VISION
Muse Spark
Free
CODE & AGENTS
GPT-5.4 / Claude
$20-30/mo
TOP MULTIMODAL
Gemini 3.1 Pro
$20/mo
Frequently Asked Questions
Is Muse Spark free to use?
▼
Yes. Muse Spark is completely free with a Meta account (Facebook or Instagram login). There is no subscription tier. The API is currently in private preview for select partners only with no public pricing announced. For most users, the free web and app access provides full functionality across all three modes: Instant, Thinking, and Contemplating.
How does Contemplating mode work?
▼
Unlike standard chain-of-thought reasoning where one model thinks for longer, Contemplating mode spins up multiple AI agents that reason in parallel about the same problem and then synthesize their findings. Meta describes it as scaling “the number of parallel agents that collaborate to solve hard problems” without drastically increasing latency. It is rolling out gradually and works best for frontier science and complex open-ended questions.
Is Muse Spark better than ChatGPT (GPT-5.4)?
▼
It depends on the task. Muse Spark beats GPT-5.4 on health reasoning (42.8 vs 40.1 on HealthBench), chart interpretation (86.4 vs 82.8 on CharXiv), and frontier science reasoning (50.2% vs 43.9% on Humanity’s Last Exam). GPT-5.4 wins on coding (75.1 vs 59.0 on Terminal-Bench), abstract reasoning (76.1 vs 42.5 on ARC-AGI-2), and agentic workflows (1,674 vs 1,444 ELO on GDPval-AA). Overall AI Index: GPT-5.4 scores 57 vs Muse Spark’s 52.
Is Muse Spark open source like Llama?
▼
No. Muse Spark is Meta’s first proprietary, closed-source AI model – a significant departure from the open-source Llama series. Alexandr Wang has stated plans to open-source future Muse models, but Muse Spark’s weights are not publicly available. This has been controversial in the developer community, as it contradicts Zuckerberg’s earlier “open source AI is the path forward” position.
Can I use Muse Spark for medical advice?
▼
Muse Spark excels at health-related explanations and nutrition analysis – it was trained with over 1,000 physicians and scores best-in-class on HealthBench Hard. However, it should not replace professional medical consultation for diagnosis, treatment plans, or drug interaction checks. Use it for understanding lab concepts, analyzing food photos for macros, or getting exercise form feedback – not for clinical decision-making.
Is my data safe with Muse Spark?
▼
You must log in with Facebook or Instagram. Meta’s privacy policy sets few limits on how data shared with its AI system can be used. The company has announced features that will cite content you share across Instagram, Facebook, and Threads. Treat Muse Spark like a public tool: do not upload patient data, contracts, sensitive financial information, or anything you would not post publicly. Use a dedicated account if possible.
Should developers use Muse Spark for coding?
▼
No, not yet. Muse Spark scores 59.0 on Terminal-Bench versus GPT-5.4’s 75.1 – a 16-point gap. In practical testing, it writes functional Python but misses edge cases and introduced a subtle bug with mutable defaults during our refactoring test. Both GPT-5.4 and Claude Opus 4.6 passed the same tests first try. Stick with Claude for code review and debugging, or GPT-5.4 for general coding tasks.
Where is Muse Spark available?
▼
Muse Spark is available now on meta.ai (web) and the Meta AI app. It will roll out to WhatsApp, Instagram, Facebook, Messenger, and Ray-Ban Meta AI glasses in the coming weeks. There is no standalone desktop app. API access is limited to select partners in private preview, with no public API or pricing announced yet.
What is Meta Superintelligence Labs?
▼
Meta Superintelligence Labs (MSL) is a new AI division within Meta, created after CEO Mark Zuckerberg grew dissatisfied with the progress of Llama models. It is led by Alexandr Wang, former cofounder and CEO of Scale AI, whom Meta brought in as its first-ever chief AI officer through a $14.3 billion deal for a 49% nonvoting stake in Scale AI. MSL has recruited researchers from OpenAI, Anthropic, and Google with pay packages reportedly reaching hundreds of millions in equity.
How does Muse Spark compare to Claude Opus 4.6?
▼
Muse Spark (AI Index: 52) trails Claude Opus 4.6 (53) slightly on overall performance. Muse Spark wins on health (42.8 HealthBench) and visual reasoning (86.4 CharXiv). Claude dominates coding and agentic work (1,607 vs 1,444 ELO on GDPval-AA). A key difference: Muse Spark is free while Claude Opus is behind a paywall. For code review, debugging, and copy rewriting, Claude remains superior. For visual and health tasks, Muse Spark is the better free option.
This article was co-authored with AI and tested by Ahmad Lala. Benchmarks sourced from Artificial Analysis Intelligence Index v4.0, Meta AI blog, and independent testing conducted April 8-9, 2026. Model versions tested: Muse Spark (launch build), GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro.
Ahmad works in AI agent operations at G42. Before that, he spent 17 years working in communications and software development. He builds and maintains AI workflows in production daily and writes the blueprints he wishes someone had given him when he started. Every guide on this site is AI-assisted and human-tested. All articles reflect his personal opinions and thoughts and do not necessarily represent those of his employer (G42).