Deploy and Call a Yamify Inference API

Yamify screenshot

Inference endpoints are attached to your Yamify workspace and managed alongside your deployed apps.

Yamify can create an inference endpoint inside your workspace and expose it through an OpenAI-compatible API shape.

This is useful when you want to power:

AI features in your own app
client-facing copilots
n8n workflows
OpenClaw-backed automations

What you get

When you deploy an inference API, Yamify returns:

a projectId
a workspace association
an endpoint URL
a bearer token for calling that endpoint

The endpoint follows this pattern:

POST /api/v1/inference/{projectId}/chat/completions

Supported providers

Current provider options are:

openai
openrouter
groq
deepseek

Typical flow

1. Choose a workspace

The workspace must already have a Yam available. Yamify will attach the inference endpoint to that workspace.

2. Create the endpoint

You create the endpoint with:

a project name
a provider
a model
a provider API key
optional temperature

3. Store the returned token safely

The returned bearer token is what your app uses to call the endpoint. Treat it like an app secret.

4. Call the endpoint like an OpenAI chat API

Your product or workflow sends messages, model, and optional generation settings.

Example request shape

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "user",
      "content": "Summarize this customer ticket."
    }
  ],
  "temperature": 0.2
}

Common use cases

Frontend apps built in Lovable or Cursor

Use Yamify as the inference backend while your frontend lives elsewhere.

OpenClaw and n8n automations

Centralize model access behind one endpoint and reuse it across workflows.

Multi-client agencies

Create one inference endpoint per client workspace so credentials and runtime logic stay isolated.

Best practices

Use one workspace per client or environment when isolation matters
Keep provider keys scoped to the endpoint that needs them
Use descriptive endpoint names like support-brain, lead-score, or ops-copilot
Rotate tokens if they are exposed in logs or demos
Keep latency-sensitive traffic on the inference endpoint, not on MCP

Failure modes to watch

Workspace not found: the workspace ID is wrong or not yours
No Yam configured: the workspace exists, but Yam creation is incomplete
Inference project is not ready: deployment exists but is not in ready state yet
Unauthorized: the bearer token is missing or invalid
Provider configuration missing: the provider key or model was not stored properly

Validation checklist

After you deploy:

Confirm the project appears in the Yam application list.
Confirm the endpoint URL is present.
Send a test prompt and verify you receive a provider response.
Add application-side retries and timeout handling before shipping to users.

When to use this instead of OpenClaw

Use Inference API when you want:

a programmable model backend
direct app-to-model requests
standard API integration from code

Use OpenClaw when you want:

an interactive agent UI
human-facing control workflows
skill-pack-based agent behavior

What you get​

Supported providers​

Typical flow​

1. Choose a workspace​

2. Create the endpoint​

3. Store the returned token safely​

4. Call the endpoint like an OpenAI chat API​

Example request shape​

Common use cases​

Frontend apps built in Lovable or Cursor​

OpenClaw and n8n automations​

Multi-client agencies​

Best practices​

Failure modes to watch​

Validation checklist​

When to use this instead of OpenClaw​