Manage LLM Providers

Pinchy is model-agnostic. You bring the API key (or local URL) for whichever LLM provider you trust, and every agent picks a model from the providers you’ve enabled. This guide covers what to do after the initial setup wizard — adding more providers, switching models, removing keys.

For the very first provider during the setup wizard, see Installation. For the air-gapped local Ollama setup, see the dedicated Ollama (Local) guide.

Supported providers

Provider	Auth	What you get
Anthropic	API key	Claude family — strongest tool calling and reasoning
OpenAI	API key	GPT-4o family, o-series reasoning models
Google	API key	Gemini family — long context, fast and cheap
Ollama Cloud	API key	Hosted open-source models (Kimi, Qwen, Mistral, Gemini Flash) via ollama.com
Ollama (Local)	URL	Fully air-gapped local inference — see Ollama setup

You can have any combination of providers configured at the same time. Each agent picks one model from one provider.

Add a provider

Go to Settings → LLM Provider
Click the provider you want to add
Paste your API key (or for local Ollama, enter the URL)
Click Save

Pinchy validates the credentials immediately by making a test call to the provider’s /models endpoint. If the key works, the provider activates and its models become available in every agent’s model dropdown within a few seconds.

API keys are encrypted at rest with AES-256-GCM. They never appear in logs, audit events, or error messages.

Change an agent’s model

Each agent uses one model at a time. To change it:

Open the agent’s chat
Click the gear icon next to the agent name → General
Pick a new model from the Model dropdown
Click Save

The dropdown shows every model from every configured provider. Models are grouped by provider, so you can quickly compare options.

Switch the default provider

The “default provider” is the one Pinchy reaches for when creating new agents. You can change it at Settings → LLM Provider by clicking Set as default on any configured provider. Existing agents keep their current model — only newly created agents pick up the new default.

Remove a provider

Go to Settings → LLM Provider
Find the provider in the list
Click Remove

If any agent currently uses a model from the removed provider, the chat will fail to start until you assign that agent a model from a still-configured provider. Pinchy will not silently re-assign agents.

How costs are tracked

Tokens used through every provider are recorded in the Usage Dashboard at /usage. Cost is estimated using the per-model prices baked into Pinchy’s model config — provider invoices remain the source of truth. Local Ollama records token counts but always shows zero cost.

Troubleshooting

When a provider returns an error, Pinchy shows it directly in the chat as a distinct error card with the agent name and the provider’s error message. Admins see a hint pointing to Settings → LLM Provider; non-admin users see a prompt to contact their administrator. Transient errors (rate limits, timeouts) suggest trying again.

Model-unavailable errors

When the provider returns a server error (HTTP 5xx) that Pinchy recognises as a model availability problem — for example, the model has been discontinued or is temporarily offline — the chat shows a structured model-unavailable bubble instead of a raw error message. The bubble includes:

The agent name and the model identifier that failed
A short plain-English explanation of what happened
A collapsible section with the raw technical details if you need them for support
A Switch model → link that takes you directly to the agent’s model settings so you can pick a replacement in one click

This avoids having to hunt through menus when a model goes down.

Upstream schema/format errors (`thought_signature`)

When the chat surfaces a transient upstream issue bubble mentioning thought_signature, the upstream provider has rejected the request because of a known schema defect — most commonly Gemini 3 dropping its required thought_signature field on a tool-call replay turn (upstream openclaw/openclaw#72879, tracked in Pinchy as #338). The model itself is healthy; the next replay of the same message usually succeeds.

What to do:

Click Retry on the failed message. The vast majority of these clear on the first retry.
If retries fail repeatedly for the same agent on the same model, the upstream issue is currently more severe than usual. Open the audit page filtered by eventType=agent.upstream_format_error to confirm the frequency, then either pause that agent or switch it to a different model family (e.g. anthropic/claude-sonnet-4.5 or ollama-cloud/deepseek-v4-pro) until upstream stabilises. For a fleet-wide picture across every error shape (not just upstream_format_error), filter by eventType=chat.agent_error and group the results by detail->>'errorClass' — that umbrella event fires unconditionally for every error chunk plus the silent-stream timeout.

Pinchy will continue to ship a fix for this once OpenClaw closes the matching OpenAI-compat path bug (upstream openclaw/openclaw#34008); the native Google provider path is already fixed in OpenClaw 2026.5.18.

Removed model: `ollama-cloud/kimi-k2-thinking`

ollama-cloud/kimi-k2-thinking has been removed from Pinchy’s supported model list. Agents that used this model will show the model-unavailable bubble the next time they receive a message. To resolve it:

Click Switch model → in the error bubble, or open the agent’s settings manually (gear icon → General → Model).
Select a replacement — ollama-cloud/deepseek-v4-pro is a good like-for-like option for reasoning-heavy workloads.
Click Save. The agent is ready immediately.

“Invalid API key” — Double-check the key with the provider’s own dashboard. Anthropic keys start with sk-ant-, OpenAI keys with sk-, Google keys are typically AIza....

“Your credit balance is too low” — The provider account has run out of credits. Top up on the provider’s billing page.

“Rate limit exceeded” — Too many requests in a short window. Wait a moment and try again. If this happens often, check your plan’s rate limits on the provider’s dashboard.

“Could not reach the provider” — Network problem between your Pinchy instance and the provider. If you’re running behind a strict firewall, allowlist the provider’s API hostname.

“No compatible models found” — The provider responded but none of its models support tool calling. For Ollama-local, pull a tool-capable model like qwen3.5:9b. For cloud providers, this should not happen — file an issue if it does.

The model dropdown is empty after adding a key — Pinchy caches the model list for one hour for cloud providers. Try waiting a minute, or remove and re-add the provider to force a refresh. Local Ollama is always fetched live.

Ollama Cloud returns HTTP 500: "Internal Server Error (ref: …)" on every retry — Ollama Cloud occasionally retires a model from its serving fleet without surfacing a model_not_found error or a deprecation header. The model can still appear on ollama.com/library/<model> and pull fine for self-hosted Ollama, but the hosted ollama.com/v1 endpoint no longer routes to it, so every request to that one model ends in a generic upstream 500 with a fresh reference ID. We first saw this with kimi-k2-thinking on 2026-05-08; consecutive retries each produced different upstream reference IDs, all 500. If retries fail consistently and other Ollama Cloud models on the same key still work, treat the failing model as unavailable and switch the agent to another Ollama Cloud model (for example Kimi K2.5, Kimi K2.6, or DeepSeek V4). If every Ollama Cloud model fails, the issue is transient — wait a moment and retry.