Integrations

OpenAI

Call the OpenAI API from your FlareX app — chat completions, streaming, embeddings, retries, and cost control.

Updated 2026-04-25 00:00 UTC

OpenAI's API is one HTTP call to /v1/chat/completions (or its successors). The SDK is a thin wrapper. This page covers what trips most people up: streaming, retries, token limits, and cost control.

API key

Get one at platform.openai.com → API keys. Add to Secrets as OPENAI_API_KEY.

Mirror the Anthropic walkthrough for Claude — same shape, different env var (ANTHROPIC_API_KEY).

Pattern 1: Single completion

Smallest useful integration:

Add a /summarize endpoint. POST { text: string } → { summary: string }.
Use gpt-4o-mini. Truncate input to 8K tokens. Cap output at 200 tokens.

import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const resp = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  max_tokens: 200,
  messages: [
    { role: 'system', content: 'Summarize the user\'s text in 2-3 sentences.' },
    { role: 'user', content: text.slice(0, 32_000) },
  ],
});
return { summary: resp.choices[0]!.message.content };

Pattern 2: Streaming

For chat-like interfaces, streaming makes the response feel instant. Use SSE end-to-end:

app.get('/chat-stream', async (req, reply) => {
  reply.raw.writeHead(200, {
    'content-type': 'text/event-stream',
    'cache-control': 'no-cache',
    'connection': 'keep-alive',
  });

  const stream = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    stream: true,
    messages: [...],
  });

  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta?.content ?? '';
    if (delta) reply.raw.write(`data: ${JSON.stringify({ delta })}\n\n`);
  }
  reply.raw.write('data: [DONE]\n\n');
  reply.raw.end();
});

Tell FlareX:

Add a streaming /chat endpoint. SSE response. Frame format: data: {json}\n\n.
End with data: [DONE]. On client disconnect, abort the OpenAI stream
to stop the cost meter.

Heads up

If you forget to abort on client disconnect, the OpenAI request keeps running and you keep paying until it completes. Always wire up req.on('close', () => stream.controller.abort()).

Pattern 3: Tool use / function calling

For structured output, use OpenAI's tool calling rather than parsing free-text:

Add a /classify endpoint. POST { text: string } → { category, confidence }.
Use gpt-4o-mini with tool calling — declare a `submit_classification`
tool with category enum and confidence number, force tool_choice to that
tool, parse the args from the response.

This is way more reliable than asking for "respond with JSON" — the model is constrained by the schema, not just instructed.

gpt-4o-mini is ~30× cheaper than gpt-4o and good enough for most jobs (classification, summarization, simple drafting). Don't reach for the flagship model unless the smaller one is actually failing.

2. Cap `max_tokens`

The output cost is per-token. If your users only need a paragraph, cap at 200 — not 4096.

3. Cache aggressively

Same input → same output. Cache by content hash:

import crypto from 'node:crypto';

function hashKey(messages: any[]) {
  return crypto.createHash('sha256').update(JSON.stringify(messages)).digest('hex');
}

const key = `openai:${hashKey(messages)}`;
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);

const fresh = await openaiCall(messages);
await redis.setex(key, 3600, JSON.stringify(fresh));
return fresh;

Tell FlareX:

Cache /summarize responses by SHA-256 of the input text + model id.
TTL 24h. Skip cache if request has ?fresh=1.

Retries + rate limits

OpenAI returns 429 with a Retry-After header when you're throttled. The SDK retries 2× by default with exponential backoff — bump it for production:

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  maxRetries: 5,
  timeout: 60_000,
});

For sustained throughput beyond your default rate limit, request a higher tier in the OpenAI dashboard.

Token counting + truncation

Models have hard input limits (gpt-4o-mini: ~128K). For long documents, you must chunk + summarize, then summarize the summaries. Tell FlareX:

For inputs over 100K tokens, chunk into 50K-token slices, summarize
each chunk in parallel, then summarize the chunk-summaries. Use
tiktoken for accurate token counting.

Errors you'll see

Status	Meaning	What to do
401	Invalid key	Check Secrets — keys are revocable, may have been rotated
429	Rate limited or out of credits	Honor `Retry-After`. If credits, top up or downgrade model
500	OpenAI hiccup	Retry with backoff (SDK does this automatically)
503	Overloaded	Same — usually transient
`context_length_exceeded`	Input too long	Truncate or chunk

What's next

3rd-party APIs overview — fetch + retry + cache patterns
Webhooks — for async OpenAI workflows (file uploads, fine-tuning)
Build an API service — wrap an LLM as your own API

PreviousStripe NextGoogle APIs