OpenAI API
Use Basepod's LLM with any OpenAI-compatible client.
API Endpoint
When running an MLX model, the API is available at:
- Local:
http://localhost:11434/v1 - External:
https://llm.your-domain.com/v1
Chat Completions
Request
bash
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/Llama-3.2-3B-Instruct-4bit",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"max_tokens": 1000,
"temperature": 0.7
}'Response
json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1705555555,
"model": "mlx-community/Llama-3.2-3B-Instruct-4bit",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 10,
"total_tokens": 25
}
}Using with SDKs
Python (OpenAI SDK)
python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="not-needed" # No auth required locally
)
response = client.chat.completions.create(
model="mlx-community/Llama-3.2-3B-Instruct-4bit",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
max_tokens=500
)
print(response.choices[0].message.content)JavaScript/TypeScript
typescript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:11434/v1',
apiKey: 'not-needed'
});
const response = await client.chat.completions.create({
model: 'mlx-community/Llama-3.2-3B-Instruct-4bit',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Write a haiku about programming.' }
]
});
console.log(response.choices[0].message.content);cURL
bash
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/Llama-3.2-3B-Instruct-4bit",
"messages": [{"role": "user", "content": "Hello!"}]
}'Streaming
Enable streaming for real-time responses:
Python
python
stream = client.chat.completions.create(
model="mlx-community/Llama-3.2-3B-Instruct-4bit",
messages=[{"role": "user", "content": "Write a story."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")JavaScript
typescript
const stream = await client.chat.completions.create({
model: 'mlx-community/Llama-3.2-3B-Instruct-4bit',
messages: [{ role: 'user', content: 'Write a story.' }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | required | Model ID |
messages | array | required | Conversation history |
max_tokens | int | 1000 | Max response length |
temperature | float | 0.7 | Randomness (0-2) |
top_p | float | 1.0 | Nucleus sampling |
stream | bool | false | Enable streaming |
Available Models
Get the list of available models:
bash
curl http://localhost:11434/v1/modelsResponse:
json
{
"data": [
{
"id": "mlx-community/Llama-3.2-3B-Instruct-4bit",
"object": "model",
"owned_by": "mlx-community"
}
]
}Integration Examples
LangChain
python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:11434/v1",
api_key="not-needed",
model="mlx-community/Llama-3.2-3B-Instruct-4bit"
)
response = llm.invoke("Hello!")
print(response.content)LlamaIndex
python
from llama_index.llms.openai_like import OpenAILike
llm = OpenAILike(
api_base="http://localhost:11434/v1",
api_key="not-needed",
model="mlx-community/Llama-3.2-3B-Instruct-4bit"
)Vercel AI SDK
typescript
import { createOpenAI } from '@ai-sdk/openai';
const openai = createOpenAI({
baseURL: 'http://localhost:11434/v1',
apiKey: 'not-needed'
});
const model = openai('mlx-community/Llama-3.2-3B-Instruct-4bit');