MODEL INTELLIGENCE

Enterprise Model Market Map. Hosted Platforms vs Open Weight

A pragmatic lens on procurement, security, latency, cost, and governance tradeoffs. What changes matter right now, which platforms are production ready, and when to self host vs use hosted APIs.

Updated

May 2026: reflects GPT-5.5, Claude Opus 4.7, Gemini 3.1, Llama 4, Grok 4.3, DeepSeek V4, Mistral Large 3

Coverage

OpenAI, Anthropic, Google, Meta, xAI, DeepSeek, Mistral, Cohere + deployment tradeoffs

Use this for

Vendor selection, cost modeling, compliance planning, PoC scoping

Platform Overview

Major platforms as of May 2026, with current models, strengths, and procurement readiness.

OpenAI

Hosted API

Enterprise-Ready

Key Models

GPT-5.5GPT-5.5 ProGPT-5.4o-series reasoning

Strengths

Frontier reasoning, agentic coding, and computer use (GPT-5.5 Pro)
Now available on AWS Bedrock alongside Azure OpenAI for multi-cloud procurement
Mature tooling ecosystem, prompt caching, batch API for cost control

Considerations

Data retention varies by deployment (API, Azure, Bedrock all differ)
Pro tier pricing premium over standard GPT-5.5 for highest reasoning
Version pinning recommended; minor releases (5.4 → 5.5) shift behavior

Anthropic

Hosted API

Enterprise-Ready

Key Models

Claude Opus 4.7Claude Sonnet 4.6Claude Haiku 4.5

Strengths

Frontier safety posture, strong instruction following, agentic workflows
Advisor tool: Sonnet or Haiku executor + Opus on-demand guidance for cost-efficient agents
AWS Bedrock and GCP Vertex AI integration for FedRAMP, HIPAA, BAA

Considerations

Output token limits still apply per response; chain calls for long generation
Opus pricing premium; mix tiers via advisor pattern for production economics
Cloud-managed deployments required for most regulated industries

Google (Gemini)

Hosted API

Enterprise-Ready

Key Models

Gemini 3.1 ProGemini 3 ProGemini 3 Flash

Strengths

1M token context, native multimodal (text, image, video, audio) in one API
Adaptive thinking levels (minimal, low, medium, high) for cost vs depth control
Gemini Enterprise Agent Platform consolidates Vertex AI for agentic governance

Considerations

Vertex AI is transitioning to Gemini Enterprise Agent Platform; track migration timeline
Pricing differs across Vertex AI, Agent Platform, and direct API
Model Garden adds third-party (Anthropic, Llama, Gemma) under one billing umbrella

Meta (Llama)

Open-Weight

Self-Serve

Key Models

Llama 4 Scout (17B active, 10M ctx)Llama 4 Maverick (17B active, 128 experts)Llama 4 Behemoth (preview)

Strengths

Natively multimodal MoE architecture, industry-leading 10M token context (Scout)
Run on-prem or in VPC for full data control; fits a single H100 (Scout)
Apache-style permissive use; no per-token API costs at fixed infra cost

Considerations

Requires GPU infrastructure and ML ops maturity (inference servers, scaling)
Behemoth flagship not yet GA; production stack typically pairs Llama with Claude or GPT
Fine-tuning expertise needed for domain-specific lift over hosted competitors

xAI (Grok)

Hosted API

Mixed

Key Models

Grok 4.3Grok 4 FastGrok Voice

Strengths

Aggressive pricing ($1.25 / $2.50 per M tokens) for frontier-tier reasoning
1M token context, function calling, structured outputs, prompt caching
Voice cloning suite alongside text and vision for multimodal agents

Considerations

Younger enterprise compliance posture; review DPA, data residency, retention
Less mature governance ecosystem vs OpenAI/Anthropic on Bedrock
Brand and content-moderation profile differs; review for regulated audiences

DeepSeek

Hybrid

Mixed

Key Models

DeepSeek V4 ProDeepSeek V4 FlashDeepSeek V3.2DeepSeek R1

Strengths

Open-weight under MIT (code + weights on Hugging Face) with frontier reasoning
1.6T parameter MoE (49B active) at fraction of US-platform pricing
Strong on coding and reasoning benchmarks; available hosted or self-deployed

Considerations

China-headquartered; enterprise procurement may face data sovereignty review
Hosted API data policy distinct from open-weight self-deployment
Promotional pricing windows; budget against list rates for production planning

Mistral AI

Hybrid

Mixed

Key Models

Mistral Large 3Mistral Medium 3Mistral Small 4Voxtral TTS

Strengths

Hosted API plus Apache 2.0 open-weight (Large 3: 41B active / 675B total)
Medium 3 delivers ~90% of Sonnet-class quality at significantly lower cost
Forge enterprise platform with Mistral Vibe agent for fine-tuning lifecycle

Considerations

EU-headquartered; strong fit for European data sovereignty mandates
Smaller ecosystem vs OpenAI/Anthropic; verify SDK and tooling parity
Open-weight tier requires self-hosting expertise; hosted tier on Azure Foundry

Cohere

Hosted API

Enterprise-Ready

Key Models

Command A (111B)Command R+Embed v4Rerank

Strengths

Enterprise RAG focus: Command A 256K context, 150% higher throughput than R+
Embed v4 Matryoshka embeddings (256/512/1024/1536 dims) for unified retrieval
Deployment flexibility: Cohere Platform, AWS Bedrock, GCP, Azure, on-prem

Considerations

Best fit for RAG, agents, and multilingual; less general-purpose chat
Two H100s required for Command A self-deployment
Pricing structure differs from pure token-based APIs; model availability varies by cloud

Tradeoff Matrix: Hosted API vs Open-Weight

Key decision dimensions for selecting deployment approach. No universal answer; tradeoffs depend on your constraints.

Data Control

Hosted API

Data sent to vendor; review DPA, BAA, zero-retention options, region pinning

Open-Weight

Full control; run in your VPC, on-prem, or air-gapped environments

Recommendation

Use open-weight (Llama 4, Mistral Large 3, DeepSeek V4) for PII/PHI; hosted APIs with zero-retention for everything else

Cost Structure

Hosted API

Per-token pricing with prompt caching; tier mixing now standard (Haiku executor + Opus advisor)

Open-Weight

Fixed infra cost; H100 cluster + ML team predictable but capital-intensive

Recommendation

Hosted APIs for MVP and bursty volume; open-weight breaks even around 10M-50M requests/month depending on model size

Latency

Hosted API

Streaming first-token 200-600ms typical; prompt caching cuts repeat prompt latency 60-80%

Open-Weight

Controlled by your infra; can hit sub-100ms p95 with vLLM/TensorRT-LLM tuning

Recommendation

Hosted APIs for async workflows; open-weight for sub-100ms real-time UX or sustained high-throughput inference

Compliance

Hosted API

SOC 2, FedRAMP High, HIPAA via Bedrock / Azure / Vertex; both OpenAI and Anthropic now available on AWS Bedrock

Open-Weight

Your compliance posture; vendor only provides weights, no data path

Recommendation

Cloud-integrated APIs (Azure OpenAI, AWS Bedrock for OpenAI + Anthropic, Vertex/Agent Platform for Gemini) cover most regulated industries

Context Windows

Hosted API

1M tokens standard at frontier (GPT-5.5, Gemini 3.1 Pro, Grok 4.3)

Open-Weight

Llama 4 Scout pushes to 10M; DeepSeek V4 Pro at 1M; verify quality at long-context tail

Recommendation

Frontier hosted APIs for general 1M-token workflows; Llama 4 Scout for codebase-scale or full-archive retrieval

Agent + Tool Use

Hosted API

Function calling, structured outputs, computer use, code execution all GA at frontier tier

Open-Weight

Tool use depends on inference framework; Llama 4 + Mistral Small 4 (Devstral) competitive

Recommendation

GPT-5.5 Pro, Claude Opus 4.7, and Gemini 3.1 Pro lead on agentic; pair with Haiku/Flash executors for cost

Model Updates

Hosted API

Frequent minor releases (4.5 → 4.6 → 4.7) shift behavior; version pinning required for production

Open-Weight

You control upgrade cadence; weights are immutable artifacts

Recommendation

Pin specific snapshots in production; canary new versions in staging with eval suite before promotion

Want a vendor selection tailored to your use case?

We deliver model selection guidance customized to your requirements: cost envelope, latency targets, data residency, compliance needs, and existing cloud contracts. Output includes vendor shortlist, cost modeling, PoC plan, and procurement ready RFP criteria.

Request selection guidance Explore more insights