MODEL INTELLIGENCE
Enterprise Model Market Map. Hosted Platforms vs Open Weight
A pragmatic lens on procurement, security, latency, cost, and governance tradeoffs. What changes matter right now, which platforms are production ready, and when to self host vs use hosted APIs.
Updated
May 2026: reflects GPT-5.5, Claude Opus 4.7, Gemini 3.1, Llama 4, Grok 4.3, DeepSeek V4, Mistral Large 3
Coverage
OpenAI, Anthropic, Google, Meta, xAI, DeepSeek, Mistral, Cohere + deployment tradeoffs
Use this for
Vendor selection, cost modeling, compliance planning, PoC scoping
Platform Overview
Major platforms as of May 2026, with current models, strengths, and procurement readiness.
OpenAI
Hosted API
Enterprise-Ready
Key Models
GPT-5.5GPT-5.5 ProGPT-5.4o-series reasoning
Strengths
- Frontier reasoning, agentic coding, and computer use (GPT-5.5 Pro)
- Now available on AWS Bedrock alongside Azure OpenAI for multi-cloud procurement
- Mature tooling ecosystem, prompt caching, batch API for cost control
Considerations
- Data retention varies by deployment (API, Azure, Bedrock all differ)
- Pro tier pricing premium over standard GPT-5.5 for highest reasoning
- Version pinning recommended; minor releases (5.4 → 5.5) shift behavior
Anthropic
Hosted API
Enterprise-Ready
Key Models
Claude Opus 4.7Claude Sonnet 4.6Claude Haiku 4.5
Strengths
- Frontier safety posture, strong instruction following, agentic workflows
- Advisor tool: Sonnet or Haiku executor + Opus on-demand guidance for cost-efficient agents
- AWS Bedrock and GCP Vertex AI integration for FedRAMP, HIPAA, BAA
Considerations
- Output token limits still apply per response; chain calls for long generation
- Opus pricing premium; mix tiers via advisor pattern for production economics
- Cloud-managed deployments required for most regulated industries
Google (Gemini)
Hosted API
Enterprise-Ready
Key Models
Gemini 3.1 ProGemini 3 ProGemini 3 Flash
Strengths
- 1M token context, native multimodal (text, image, video, audio) in one API
- Adaptive thinking levels (minimal, low, medium, high) for cost vs depth control
- Gemini Enterprise Agent Platform consolidates Vertex AI for agentic governance
Considerations
- Vertex AI is transitioning to Gemini Enterprise Agent Platform; track migration timeline
- Pricing differs across Vertex AI, Agent Platform, and direct API
- Model Garden adds third-party (Anthropic, Llama, Gemma) under one billing umbrella
Meta (Llama)
Open-Weight
Self-Serve
Key Models
Llama 4 Scout (17B active, 10M ctx)Llama 4 Maverick (17B active, 128 experts)Llama 4 Behemoth (preview)
Strengths
- Natively multimodal MoE architecture, industry-leading 10M token context (Scout)
- Run on-prem or in VPC for full data control; fits a single H100 (Scout)
- Apache-style permissive use; no per-token API costs at fixed infra cost
Considerations
- Requires GPU infrastructure and ML ops maturity (inference servers, scaling)
- Behemoth flagship not yet GA; production stack typically pairs Llama with Claude or GPT
- Fine-tuning expertise needed for domain-specific lift over hosted competitors
xAI (Grok)
Hosted API
Mixed
Key Models
Grok 4.3Grok 4 FastGrok Voice
Strengths
- Aggressive pricing ($1.25 / $2.50 per M tokens) for frontier-tier reasoning
- 1M token context, function calling, structured outputs, prompt caching
- Voice cloning suite alongside text and vision for multimodal agents
Considerations
- Younger enterprise compliance posture; review DPA, data residency, retention
- Less mature governance ecosystem vs OpenAI/Anthropic on Bedrock
- Brand and content-moderation profile differs; review for regulated audiences
DeepSeek
Hybrid
Mixed
Key Models
DeepSeek V4 ProDeepSeek V4 FlashDeepSeek V3.2DeepSeek R1
Strengths
- Open-weight under MIT (code + weights on Hugging Face) with frontier reasoning
- 1.6T parameter MoE (49B active) at fraction of US-platform pricing
- Strong on coding and reasoning benchmarks; available hosted or self-deployed
Considerations
- China-headquartered; enterprise procurement may face data sovereignty review
- Hosted API data policy distinct from open-weight self-deployment
- Promotional pricing windows; budget against list rates for production planning
Mistral AI
Hybrid
Mixed
Key Models
Mistral Large 3Mistral Medium 3Mistral Small 4Voxtral TTS
Strengths
- Hosted API plus Apache 2.0 open-weight (Large 3: 41B active / 675B total)
- Medium 3 delivers ~90% of Sonnet-class quality at significantly lower cost
- Forge enterprise platform with Mistral Vibe agent for fine-tuning lifecycle
Considerations
- EU-headquartered; strong fit for European data sovereignty mandates
- Smaller ecosystem vs OpenAI/Anthropic; verify SDK and tooling parity
- Open-weight tier requires self-hosting expertise; hosted tier on Azure Foundry
Cohere
Hosted API
Enterprise-Ready
Key Models
Command A (111B)Command R+Embed v4Rerank
Strengths
- Enterprise RAG focus: Command A 256K context, 150% higher throughput than R+
- Embed v4 Matryoshka embeddings (256/512/1024/1536 dims) for unified retrieval
- Deployment flexibility: Cohere Platform, AWS Bedrock, GCP, Azure, on-prem
Considerations
- Best fit for RAG, agents, and multilingual; less general-purpose chat
- Two H100s required for Command A self-deployment
- Pricing structure differs from pure token-based APIs; model availability varies by cloud
Tradeoff Matrix: Hosted API vs Open-Weight
Key decision dimensions for selecting deployment approach. No universal answer; tradeoffs depend on your constraints.
Data Control
Hosted API
Data sent to vendor; review DPA, BAA, zero-retention options, region pinning
Open-Weight
Full control; run in your VPC, on-prem, or air-gapped environments
Recommendation
Use open-weight (Llama 4, Mistral Large 3, DeepSeek V4) for PII/PHI; hosted APIs with zero-retention for everything else
Cost Structure
Hosted API
Per-token pricing with prompt caching; tier mixing now standard (Haiku executor + Opus advisor)
Open-Weight
Fixed infra cost; H100 cluster + ML team predictable but capital-intensive
Recommendation
Hosted APIs for MVP and bursty volume; open-weight breaks even around 10M-50M requests/month depending on model size
Latency
Hosted API
Streaming first-token 200-600ms typical; prompt caching cuts repeat prompt latency 60-80%
Open-Weight
Controlled by your infra; can hit sub-100ms p95 with vLLM/TensorRT-LLM tuning
Recommendation
Hosted APIs for async workflows; open-weight for sub-100ms real-time UX or sustained high-throughput inference
Compliance
Hosted API
SOC 2, FedRAMP High, HIPAA via Bedrock / Azure / Vertex; both OpenAI and Anthropic now available on AWS Bedrock
Open-Weight
Your compliance posture; vendor only provides weights, no data path
Recommendation
Cloud-integrated APIs (Azure OpenAI, AWS Bedrock for OpenAI + Anthropic, Vertex/Agent Platform for Gemini) cover most regulated industries
Context Windows
Hosted API
1M tokens standard at frontier (GPT-5.5, Gemini 3.1 Pro, Grok 4.3)
Open-Weight
Llama 4 Scout pushes to 10M; DeepSeek V4 Pro at 1M; verify quality at long-context tail
Recommendation
Frontier hosted APIs for general 1M-token workflows; Llama 4 Scout for codebase-scale or full-archive retrieval
Agent + Tool Use
Hosted API
Function calling, structured outputs, computer use, code execution all GA at frontier tier
Open-Weight
Tool use depends on inference framework; Llama 4 + Mistral Small 4 (Devstral) competitive
Recommendation
GPT-5.5 Pro, Claude Opus 4.7, and Gemini 3.1 Pro lead on agentic; pair with Haiku/Flash executors for cost
Model Updates
Hosted API
Frequent minor releases (4.5 → 4.6 → 4.7) shift behavior; version pinning required for production
Open-Weight
You control upgrade cadence; weights are immutable artifacts
Recommendation
Pin specific snapshots in production; canary new versions in staging with eval suite before promotion
Want a vendor selection tailored to your use case?
We deliver model selection guidance customized to your requirements: cost envelope, latency targets, data residency, compliance needs, and existing cloud contracts. Output includes vendor shortlist, cost modeling, PoC plan, and procurement ready RFP criteria.