AI API Compatibility Tester

Paste an OpenAI-format request and see what carries over to Anthropic, Gemini, Mistral, Cohere, Groq and Ollama, with the rewritten body for each.

This AI API compatibility tester takes the OpenAI-format request you already have and shows what carries over to every other major provider in 2026, then gives you the rewritten body for each one. You paste a Chat Completions request and we map ten features (basic chat, system message, temperature, max_tokens, tools, vision, JSON mode, streaming, seed and logprobs) against OpenAI, Anthropic, Gemini, Mistral, Cohere, Groq and Ollama. Every cell is a clear supported, partial or not-supported flag with a one-line caveat. Open a provider tab and you get runnable SDK code, plus notes that point to the cheaper or cleaner path, such as Anthropic prompt caching or the correct Gemini role names. We built it as a deterministic translator, so it rewrites your request without ever running it. There are no keys to enter, and everything runs in your browser, so nothing you paste is uploaded.

100% in your browser. Nothing you type ever leaves this page.

AI API request translator and provider compatibility matrix for OpenAI, Anthropic, Gemini, Mistral, Cohere, Groq and Ollama

This AI API compatibility tester takes an OpenAI-format request and shows what carries over to every major provider in 2026, then gives you the rewritten body for each. You paste a Chat Completions request, and we check ten features (basic chat, system message, temperature, max_tokens, tools, vision, JSON mode, streaming, seed and logprobs) against OpenAI, Anthropic, Gemini, Mistral, Cohere, Groq and Ollama. Each provider tab carries runnable SDK code plus short notes on the cheaper or cleaner path. It helps anyone planning a migration or a multi-provider fallback. Everything runs in your browser, so nothing you paste is uploaded.

OpenAI-format request body (JSON)

Why an OpenAI-format request is a useful pivot

An AI API compatibility check starts from one practical fact: somewhere around 2024 the OpenAI Chat Completions schema quietly became the shape everyone copied (messages with system, user, assistant roles, tools, temperature, max_tokens, stream, and optionally response_format). Anthropic, Mistral, Groq, Together, Fireworks, every local server I have touched (Ollama, LM Studio, llama.cpp's server). They all expose some flavor of an OpenAI-compatible endpoint at /v1/chat/completions. Gemini and Cohere kept their own native schemas. Even they ship an OpenAI adapter, though. So here is my rule of thumb. Write to the OpenAI shape, translate outward, and you have handed yourself the least painful exit the day you want to leave.

Here is the part nobody warns you about, though. "OpenAI-compatible" is a spectrum, not a checkbox. Plain chat (messages in, completion out) ports cleanly and you will barely notice. Tools mostly work. It is how arguments get parsed and how parallel calls come back that varies just enough to bite you. Vision exists on OpenAI, Anthropic, Gemini, except each one wants the image wrapped its own special way. JSON mode? A mess across the board, honestly. Streaming runs everywhere, then the chunk format drifts at the edges and your parser chokes. This tool lays every feature against every provider so you can price the move before you commit.

What the feature matrix covers in 2026

I picked the ten features that actually decide whether a port is an afternoon or a week. Basic chat messages. System message as a real field, versus something you cram into the first user turn. Temperature and max_tokens behaving the same way by default. Tools / function calling, single and parallel. Vision inputs (image_url and base64). JSON mode / structured outputs with schema enforcement. Streaming over Server-Sent Events. response_format with an explicit schema. seed for reproducible sampling. And logprobs, when you want token-level confidence. Every cell is a green check, a red cross, or a partial flag carrying a one-line "yeah, but" caveat.

Whenever a provider drops a new flagship, I refresh the matrix. The 2026 baseline I am testing against: GPT-5 turbo and GPT-5 mini on OpenAI, Claude Opus 4.7 and Sonnet 4.6 on Anthropic, Gemini 3.0 Pro on Google, Mistral Large 3 and Codestral 2 on Mistral, Command R+ 2026 on Cohere, Llama 4 405B and Mixtral 8x22B served through Groq, and whatever GGUF you feel like running through Ollama or llama.cpp.

How the request translation works

For each provider you get a snippet you can actually run, written in that provider's own SDK. Under the hood it is doing five boring conversions, the kind that are easy to botch. Role normalisation (Anthropic pulls system into its own field, Gemini folds it into the first user turn). Tool schema mapping (OpenAI wraps things in type: function with a nested function object, Anthropic wants a flat tools array, Gemini wants function_declarations). Vision reshaping (OpenAI takes a content array with image_url, Anthropic uses type: image with source, Gemini wants inline data plus a mime type). Parameter renaming, where Mistral keeps max_tokens but Gemini insists on maxOutputTokens. Then lining up the response envelope, because every provider wraps the model output a little differently.

Copy it as a starting point, not a finished thing. The naive translation nails maybe 80 percent of cases, in my experience. That last 20 percent? It needs provider-specific tuning. Which is exactly what the warnings under each snippet are for ("Anthropic prefers prompt caching on the first 1024 tokens, consider adding cache_control"), so you know which follow-ups earn your time and which you can ignore.

Common provider gotchas the tool calls out

Anthropic system message: it is a top-level system string, not a message with role system. Your messages array has to start with user. Skip that and you will get a 400.
Gemini message format: the roles are user and model, not assistant. Trips up everyone the first time. Content lives inside a parts array, too.
Mistral tool calls: parallel tool calling only works on Large 3, last I checked. The smaller models hand tools back one at a time.
Cohere "preamble" vs "system": Cohere calls it preamble, not system, on the v2/chat endpoint. The newer messages-style v2 takes both. Old v1 does not, so check which one you are on.
Groq streaming: to save bytes, SSE chunks drop delta.role after the first one. Assume it is on every chunk and your client breaks. Ask me how I know.
Ollama JSON mode: it is format: json at the top level of the request, not response_format the way OpenAI does it.
Vision base64 size limits: OpenAI takes up to 20 MB per image, Gemini 20 MB. Anthropic caps at 5 MB, though, and that is the one that will surprise you. The tool flags any embedded image too big for where it is headed.

When to keep the OpenAI client versus when to switch SDKs

Honestly? The compatibility layer is good enough that most teams just keep the official openai Node or Python client and re-point it at a new base URL and key. That alone covers OpenAI, Groq, Together, Fireworks, Ollama. Mistral too, once you override a couple of headers. Anthropic and Gemini are where I would actually swap to the native SDK. They expose prompt caching, server-side tool use, grounded responses the OpenAI client just cannot reach. So my rule. Switch SDKs only when the feature you want is provider-native (Anthropic's caching, Gemini grounding with search). Otherwise stay put. The diff is not worth it, I think, unless you are chasing one of those.

Frequently asked questions

Does the tool make any actual API calls?

Nope. Every translation runs right in your browser. No key to enter. Nothing gets sent to any provider, and the request body you paste never leaves the page. Think of it as a deterministic translator sitting on top of the OpenAI schema. It rewrites your request. It doesn't run it.

Why do some translations show warnings?

A warning shows up when the straight translation would run fine, but the provider has a cheaper or better way to get there. A few I flag: Anthropic prompt caching when you reuse the same system message, Gemini grounding for factual queries, sane Cohere preamble lengths. The plain output is already correct. The warning is just pointing at the optimisation you would probably want anyway.

What model does each provider use in the example code?

By default it reaches for the 2026 flagship: GPT-5 turbo on OpenAI, Claude Sonnet 4.6 on Anthropic, Gemini 3.0 Pro on Google, Mistral Large 3 on Mistral, Command R+ 2026 on Cohere, Llama 4 405B via Groq, and llama3.3:70b for Ollama. Sensible defaults. Not gospel. Swap the model string for whatever you are actually running before you ship it.

Does it handle the new responses API from OpenAI?

Partly. This version is built around the classic Chat Completions schema. The Responses API that landed in late 2025 is half-supported in the OpenAI tab. Translate out to other providers, though, and it falls back to the Chat Completions path. Why? Frankly, nobody else has implemented the Responses shape yet.

Can I use this output in production?

Please do not paste it straight into prod and walk away. The snippets are a head start, not a finished build. You will still want provider-specific error handling. Retries with exponential backoff when you hit rate limits. Cost tracking, some observability. I give you the request shape. The production wiring around it stays your job.