OpenAI API

Use Basepod's LLM with any OpenAI-compatible client.

API Endpoint

When running an MLX model, the API is available at:

Local: http://localhost:11434/v1
External: https://llm.your-domain.com/v1

Chat Completions

Request

bash

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Llama-3.2-3B-Instruct-4bit",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 1000,
    "temperature": 0.7
  }'

Response

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1705555555,
  "model": "mlx-community/Llama-3.2-3B-Instruct-4bit",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 10,
    "total_tokens": 25
  }
}

Using with SDKs

Python (OpenAI SDK)

python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="not-needed"  # No auth required locally
)

response = client.chat.completions.create(
    model="mlx-community/Llama-3.2-3B-Instruct-4bit",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    max_tokens=500
)

print(response.choices[0].message.content)

JavaScript/TypeScript

typescript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'not-needed'
});

const response = await client.chat.completions.create({
  model: 'mlx-community/Llama-3.2-3B-Instruct-4bit',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Write a haiku about programming.' }
  ]
});

console.log(response.choices[0].message.content);

cURL

bash

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Llama-3.2-3B-Instruct-4bit",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Streaming

Enable streaming for real-time responses:

Python

python

stream = client.chat.completions.create(
    model="mlx-community/Llama-3.2-3B-Instruct-4bit",
    messages=[{"role": "user", "content": "Write a story."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

JavaScript

typescript

const stream = await client.chat.completions.create({
  model: 'mlx-community/Llama-3.2-3B-Instruct-4bit',
  messages: [{ role: 'user', content: 'Write a story.' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Parameters

Parameter	Type	Default	Description
`model`	string	required	Model ID
`messages`	array	required	Conversation history
`max_tokens`	int	1000	Max response length
`temperature`	float	0.7	Randomness (0-2)
`top_p`	float	1.0	Nucleus sampling
`stream`	bool	false	Enable streaming

Available Models

Get the list of available models:

bash

curl http://localhost:11434/v1/models

Response:

json

{
  "data": [
    {
      "id": "mlx-community/Llama-3.2-3B-Instruct-4bit",
      "object": "model",
      "owned_by": "mlx-community"
    }
  ]
}

Integration Examples

LangChain

python

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:11434/v1",
    api_key="not-needed",
    model="mlx-community/Llama-3.2-3B-Instruct-4bit"
)

response = llm.invoke("Hello!")
print(response.content)

LlamaIndex

python

from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
    api_base="http://localhost:11434/v1",
    api_key="not-needed",
    model="mlx-community/Llama-3.2-3B-Instruct-4bit"
)

Vercel AI SDK

typescript

import { createOpenAI } from '@ai-sdk/openai';

const openai = createOpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'not-needed'
});

const model = openai('mlx-community/Llama-3.2-3B-Instruct-4bit');

OpenAI API ​

API Endpoint ​

Chat Completions ​

Request ​

Response ​

Using with SDKs ​

Python (OpenAI SDK) ​

JavaScript/TypeScript ​

cURL ​

Streaming ​

Python ​

JavaScript ​

Parameters ​

Available Models ​

Integration Examples ​

LangChain ​

LlamaIndex ​

Vercel AI SDK ​

OpenAI API

API Endpoint

Chat Completions

Request

Response

Using with SDKs

Python (OpenAI SDK)

JavaScript/TypeScript

cURL

Streaming

Python

JavaScript

Parameters

Available Models

Integration Examples

LangChain

LlamaIndex

Vercel AI SDK