ChatGPT API - Integrating AI in Web Applications#

The OpenAI API is one of the most powerful tools available to developers looking to enrich their applications with artificial intelligence capabilities. From text generation and document analysis to building advanced chatbots, the ChatGPT API opens the door to a new generation of web applications. In this guide, we will cover all the key aspects of integration, from the basics to advanced optimization techniques.

OpenAI API Overview#

OpenAI provides a RESTful API that allows interaction with language models from any programming language. The key components of the ecosystem include:

Chat Completions API - the primary endpoint for conversation and text generation
Embeddings API - generating vector representations of text
Moderation API - content filtering for safety
Assistants API - advanced assistants with memory and tools
Images API (DALL-E) - image generation and editing
Audio API (Whisper, TTS) - speech transcription and voice synthesis

Installation and Configuration#

Let's start by installing the official Node.js SDK:

npm install openai

Client configuration:

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

You can obtain your API key from the platform.openai.com dashboard. Never embed the key directly in your code - always use environment variables.

GPT-4 and GPT-3.5 Models#

OpenAI offers several models that differ in capabilities, speed, and cost:

GPT-4 Turbo (gpt-4-turbo)#

Latest version of GPT-4 with a 128K token context window
Knowledge up to April 2024
Excellent for complex tasks: code analysis, multi-step reasoning, long text generation
Cost: ~$10/1M input tokens, ~$30/1M output tokens

GPT-4o (gpt-4o)#

Optimized version of GPT-4 - faster and cheaper
Multimodal - supports text, images, and audio
128K token context window
Cost: ~$2.50/1M input tokens, ~$10/1M output tokens

GPT-3.5 Turbo (gpt-3.5-turbo)#

Fast and economical model for simpler tasks
16K token context window
Ideal for: classification, summarization, simple chatbots
Cost: ~$0.50/1M input tokens, ~$1.50/1M output tokens

// Model selection based on task complexity
const model = taskComplexity === "high"
  ? "gpt-4-turbo"
  : taskComplexity === "medium"
    ? "gpt-4o"
    : "gpt-3.5-turbo";

Chat Completions API - Fundamentals#

The Chat Completions API is the main endpoint for interacting with GPT models. Communication happens through an array of messages with roles:

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

async function chatCompletion() {
  const response = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages: [
      {
        role: "system",
        content:
          "You are a web development expert. " +
          "You answer concisely and provide code examples.",
      },
      {
        role: "user",
        content: "How do I implement middleware in Next.js?",
      },
    ],
    temperature: 0.7,
    max_tokens: 1000,
  });

  return response.choices[0].message.content;
}

Conversation Roles#

system - instructions defining the model's behavior (persona, style, constraints)
user - messages from the user
assistant - model responses (used for conversation continuity)

Key Parameters#

| Parameter | Description | Range | |-----------|-------------|-------| | temperature | Response creativity | 0.0 - 2.0 | | max_tokens | Maximum response length | 1 - model limit | | top_p | Nucleus sampling | 0.0 - 1.0 | | frequency_penalty | Penalty for word repetition | -2.0 - 2.0 | | presence_penalty | Penalty for topic repetition | -2.0 - 2.0 | | stop | Stop sequences | string[] |

Streaming - Real-Time Responses#

Streaming allows you to display responses token by token, significantly improving the user experience:

async function streamChat(userMessage: string) {
  const stream = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: userMessage },
    ],
    stream: true,
  });

  let fullResponse = "";

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || "";
    fullResponse += content;
    process.stdout.write(content); // Real-time display
  }

  return fullResponse;
}

Streaming in Next.js with Server-Sent Events#

// app/api/chat/route.ts
import { NextRequest } from "next/server";
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export async function POST(req: NextRequest) {
  const { messages } = await req.json();

  const stream = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages,
    stream: true,
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content || "";
        if (content) {
          controller.enqueue(
            encoder.encode(`data: ${JSON.stringify({ content })}\n\n`)
          );
        }
      }
      controller.enqueue(encoder.encode("data: [DONE]\n\n"));
      controller.close();
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    },
  });
}

Function Calling and Tool Use#

Function calling allows the model to invoke developer-defined functions, enabling integration with external APIs and databases:

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Tool definitions
const tools: OpenAI.Chat.Completions.ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Gets the current weather for a given city",
      parameters: {
        type: "object",
        properties: {
          city: {
            type: "string",
            description: "City name, e.g., London",
          },
          unit: {
            type: "string",
            enum: ["celsius", "fahrenheit"],
            description: "Temperature unit",
          },
        },
        required: ["city"],
      },
    },
  },
  {
    type: "function",
    function: {
      name: "search_products",
      description:
        "Searches products in the database based on a query",
      parameters: {
        type: "object",
        properties: {
          query: { type: "string", description: "Search phrase" },
          category: { type: "string", description: "Product category" },
          maxPrice: { type: "number", description: "Maximum price" },
        },
        required: ["query"],
      },
    },
  },
];

// Function implementations
async function getWeather(city: string, unit = "celsius") {
  const response = await fetch(
    `https://api.weather.example/current?city=${city}&unit=${unit}`
  );
  return response.json();
}

async function searchProducts(
  query: string,
  category?: string,
  maxPrice?: number
) {
  return db.products.findMany({
    where: {
      name: { contains: query },
      ...(category && { category }),
      ...(maxPrice && { price: { lte: maxPrice } }),
    },
  });
}

// Handling function calls
async function handleToolCalls(userMessage: string) {
  const response = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages: [{ role: "user", content: userMessage }],
    tools,
    tool_choice: "auto",
  });

  const message = response.choices[0].message;

  if (message.tool_calls) {
    const toolResults = await Promise.all(
      message.tool_calls.map(async (toolCall) => {
        const args = JSON.parse(toolCall.function.arguments);

        let result;
        switch (toolCall.function.name) {
          case "get_weather":
            result = await getWeather(args.city, args.unit);
            break;
          case "search_products":
            result = await searchProducts(
              args.query,
              args.category,
              args.maxPrice
            );
            break;
          default:
            result = { error: "Unknown function" };
        }

        return {
          role: "tool" as const,
          tool_call_id: toolCall.id,
          content: JSON.stringify(result),
        };
      })
    );

    // Second call with tool results
    const finalResponse = await openai.chat.completions.create({
      model: "gpt-4-turbo",
      messages: [
        { role: "user", content: userMessage },
        message,
        ...toolResults,
      ],
    });

    return finalResponse.choices[0].message.content;
  }

  return message.content;
}

Embeddings - Vector Representations of Text#

Embeddings are numerical representations of text in vector space. They enable semantic search, clustering, and content comparison:

// Generating an embedding
async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
    dimensions: 1536,
  });

  return response.data[0].embedding;
}

// Calculating cosine similarity
function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, ai) => sum + ai * ai, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, bi) => sum + bi * bi, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

// Semantic search
async function semanticSearch(
  query: string,
  documents: { id: string; text: string; embedding: number[] }[]
) {
  const queryEmbedding = await generateEmbedding(query);

  const results = documents
    .map((doc) => ({
      ...doc,
      similarity: cosineSimilarity(queryEmbedding, doc.embedding),
    }))
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, 5);

  return results;
}

Embedding Models#

| Model | Dimensions | Cost/1M tokens | |-------|------------|----------------| | text-embedding-3-small | 512-1536 | ~$0.02 | | text-embedding-3-large | 256-3072 | ~$0.13 | | text-embedding-ada-002 | 1536 | ~$0.10 |

Token Management#

Tokens are the fundamental billing unit in the OpenAI API. Effective token management is crucial for cost control:

import { encoding_for_model } from "tiktoken";

// Counting tokens
function countTokens(text: string, model = "gpt-4-turbo"): number {
  const enc = encoding_for_model(model as any);
  const tokens = enc.encode(text);
  enc.free();
  return tokens.length;
}

// Trimming context to token limit
function trimMessages(
  messages: OpenAI.Chat.Completions.ChatCompletionMessageParam[],
  maxTokens: number
): OpenAI.Chat.Completions.ChatCompletionMessageParam[] {
  const systemMessage = messages.find((m) => m.role === "system");
  const conversationMessages = messages.filter((m) => m.role !== "system");

  let totalTokens = systemMessage
    ? countTokens(systemMessage.content as string)
    : 0;
  const trimmed: OpenAI.Chat.Completions.ChatCompletionMessageParam[] = [];

  // Iterate from newest messages
  for (let i = conversationMessages.length - 1; i >= 0; i--) {
    const msgTokens = countTokens(
      conversationMessages[i].content as string
    );
    if (totalTokens + msgTokens > maxTokens) break;
    totalTokens += msgTokens;
    trimmed.unshift(conversationMessages[i]);
  }

  if (systemMessage) trimmed.unshift(systemMessage);
  return trimmed;
}

Rate Limiting and Error Handling#

OpenAI enforces request limits (RPM - requests per minute) and token limits (TPM - tokens per minute). Here is a robust handling strategy:

import OpenAI from "openai";

// Class with retry and exponential backoff
class OpenAIClient {
  private client: OpenAI;
  private maxRetries: number;

  constructor(apiKey: string, maxRetries = 3) {
    this.client = new OpenAI({ apiKey });
    this.maxRetries = maxRetries;
  }

  async chatCompletion(
    params: OpenAI.Chat.Completions.ChatCompletionCreateParamsNonStreaming
  ) {
    let lastError: Error | null = null;

    for (let attempt = 0; attempt < this.maxRetries; attempt++) {
      try {
        return await this.client.chat.completions.create(params);
      } catch (error) {
        lastError = error as Error;

        if (error instanceof OpenAI.RateLimitError) {
          const waitTime = Math.pow(2, attempt) * 1000;
          console.warn(
            `Rate limit hit - waiting ${waitTime}ms (attempt ${attempt + 1})`
          );
          await new Promise((resolve) => setTimeout(resolve, waitTime));
          continue;
        }

        if (error instanceof OpenAI.APIError) {
          if (error.status && error.status >= 500) {
            const waitTime = Math.pow(2, attempt) * 1000;
            await new Promise((resolve) => setTimeout(resolve, waitTime));
            continue;
          }
        }

        throw error;
      }
    }

    throw lastError;
  }
}

Cost Optimization#

API costs can grow quickly. Here are proven optimization strategies:

1. Choosing the Right Model#

function selectModel(task: string): string {
  const simpleTaskPatterns = [
    /classification|categorize/i,
    /summarize|summary/i,
    /translate|translation/i,
    /format|formatting/i,
  ];

  const isSimple = simpleTaskPatterns.some((p) => p.test(task));
  return isSimple ? "gpt-3.5-turbo" : "gpt-4o";
}

2. Response Caching#

import { Redis } from "ioredis";

const redis = new Redis(process.env.REDIS_URL!);

async function cachedCompletion(
  messages: OpenAI.Chat.Completions.ChatCompletionMessageParam[],
  model: string
) {
  const cacheKey = `openai:${model}:${JSON.stringify(messages)}`;
  const cached = await redis.get(cacheKey);

  if (cached) {
    return JSON.parse(cached);
  }

  const response = await openai.chat.completions.create({
    model,
    messages,
  });

  await redis.setex(cacheKey, 3600, JSON.stringify(response));
  return response;
}

3. Prompt Compression#

// Instead of verbose instructions, use concise prompts
// BAD:
const verbosePrompt =
  "I would like you to analyze the following text and generate " +
  "a brief summary that contains the most important information...";

// GOOD:
const concisePrompt =
  "Summarize the text in 2-3 sentences, preserving key facts:";

4. Request Batching#

async function batchProcess(items: string[], batchSize = 5) {
  const results: string[] = [];

  for (let i = 0; i < items.length; i += batchSize) {
    const batch = items.slice(i, i + batchSize);
    const promises = batch.map((item) =>
      openai.chat.completions.create({
        model: "gpt-3.5-turbo",
        messages: [{ role: "user", content: `Process: ${item}` }],
      })
    );

    const batchResults = await Promise.all(promises);
    results.push(
      ...batchResults.map(
        (r) => r.choices[0].message.content || ""
      )
    );
  }

  return results;
}

Prompt Engineering - Best Practices#

The quality of model responses heavily depends on prompt quality. Here are key techniques:

Few-Shot Prompting#

const messages: OpenAI.Chat.Completions.ChatCompletionMessageParam[] = [
  {
    role: "system",
    content: "You classify customer reviews as: positive, negative, or neutral.",
  },
  {
    role: "user",
    content: "Great product, I recommend it to everyone!",
  },
  {
    role: "assistant",
    content: "positive",
  },
  {
    role: "user",
    content: "Product is okay, nothing special.",
  },
  {
    role: "assistant",
    content: "neutral",
  },
  {
    role: "user",
    content: "Terrible quality, never again.",
  },
];

Chain-of-Thought#

const systemPrompt = `You are a data analysis expert.
When answering:
1. Identify the key input data
2. Describe your reasoning step by step
3. Provide conclusions with justification
4. End with a concrete recommendation

Mark initial thoughts with the <thinking> tag and the final answer with the <answer> tag.`;

Structured Output#

const response = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages: [
    {
      role: "system",
      content:
        "Return responses exclusively in JSON format. " +
        "Schema: { summary: string, keywords: string[], sentiment: string, score: number }",
    },
    {
      role: "user",
      content: `Analyze this review: "${reviewText}"`,
    },
  ],
  response_format: { type: "json_object" },
});

Building a Conversational Chatbot#

A complete chatbot implementation with conversation memory:

import OpenAI from "openai";

interface ConversationMessage {
  role: "system" | "user" | "assistant";
  content: string;
}

class Chatbot {
  private openai: OpenAI;
  private conversations: Map<string, ConversationMessage[]>;
  private systemPrompt: string;
  private maxMessages: number;

  constructor(systemPrompt: string, maxMessages = 20) {
    this.openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
    this.conversations = new Map();
    this.systemPrompt = systemPrompt;
    this.maxMessages = maxMessages;
  }

  private getConversation(sessionId: string): ConversationMessage[] {
    if (!this.conversations.has(sessionId)) {
      this.conversations.set(sessionId, [
        { role: "system", content: this.systemPrompt },
      ]);
    }
    return this.conversations.get(sessionId)!;
  }

  async sendMessage(
    sessionId: string,
    userMessage: string
  ): Promise<string> {
    const conversation = this.getConversation(sessionId);
    conversation.push({ role: "user", content: userMessage });

    // Trim to message limit
    if (conversation.length > this.maxMessages + 1) {
      const system = conversation[0];
      const recent = conversation.slice(-this.maxMessages);
      conversation.length = 0;
      conversation.push(system, ...recent);
    }

    const response = await this.openai.chat.completions.create({
      model: "gpt-4o",
      messages: conversation,
      temperature: 0.7,
      max_tokens: 800,
    });

    const assistantMessage =
      response.choices[0].message.content || "";
    conversation.push({ role: "assistant", content: assistantMessage });

    return assistantMessage;
  }

  clearConversation(sessionId: string): void {
    this.conversations.delete(sessionId);
  }
}

// Usage
const supportBot = new Chatbot(
  "You are a customer support assistant for TechShop. " +
    "You help with orders, returns, and product information. " +
    "You respond politely and concisely in English."
);

RAG Integration (Retrieval-Augmented Generation)#

RAG allows the model to answer questions based on your own data, eliminating hallucinations:

import OpenAI from "openai";
import { Pool } from "pg";

const pool = new Pool({ connectionString: process.env.DATABASE_URL });

// Finding relevant documents
async function findRelevantDocs(
  query: string,
  limit = 5
): Promise<{ content: string; source: string; similarity: number }[]> {
  const queryEmbedding = await generateEmbedding(query);

  const result = await pool.query(
    `SELECT content, source,
            1 - (embedding <=> $1::vector) as similarity
     FROM documents
     WHERE 1 - (embedding <=> $1::vector) > 0.7
     ORDER BY embedding <=> $1::vector
     LIMIT $2`,
    [JSON.stringify(queryEmbedding), limit]
  );

  return result.rows;
}

// RAG pipeline
async function ragQuery(userQuestion: string): Promise<string> {
  // 1. Find relevant documents
  const docs = await findRelevantDocs(userQuestion);

  // 2. Build context
  const context = docs
    .map((d) => `[Source: ${d.source}]\n${d.content}`)
    .join("\n\n---\n\n");

  // 3. Generate response
  const response = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages: [
      {
        role: "system",
        content:
          "Answer questions based solely on the provided context. " +
          "If the context does not contain the answer, state this explicitly. " +
          "Cite sources in your response.",
      },
      {
        role: "user",
        content: `Context:\n${context}\n\nQuestion: ${userQuestion}`,
      },
    ],
    temperature: 0.3,
  });

  return response.choices[0].message.content || "";
}

Safety and Content Moderation#

OpenAI provides a Moderation API for filtering unsafe content:

// Content moderation
async function moderateContent(
  text: string
): Promise<{ flagged: boolean; categories: string[] }> {
  const response = await openai.moderations.create({ input: text });
  const result = response.results[0];

  const flaggedCategories = Object.entries(result.categories)
    .filter(([, flagged]) => flagged)
    .map(([category]) => category);

  return {
    flagged: result.flagged,
    categories: flaggedCategories,
  };
}

// Moderation middleware for chatbot
async function safeChatMiddleware(
  userMessage: string,
  handler: (msg: string) => Promise<string>
): Promise<string> {
  // Check user message
  const inputModeration = await moderateContent(userMessage);
  if (inputModeration.flagged) {
    console.warn("Blocked message:", inputModeration.categories);
    return "I'm sorry, but I cannot process this message.";
  }

  // Generate response
  const response = await handler(userMessage);

  // Check model response
  const outputModeration = await moderateContent(response);
  if (outputModeration.flagged) {
    console.warn("Blocked response:", outputModeration.categories);
    return "I'm sorry, I am unable to provide a response to this question.";
  }

  return response;
}

Security Best Practices#

Input validation - check message length and content from users
Rate limiting - restrict the number of requests per user
Prompt injection protection - separate system instructions from user data
Logging - monitor API usage and suspicious patterns
Cost controls - set budget limits in the OpenAI dashboard

Next.js Integration - Complete API Route Example#

Full implementation of a chat endpoint in Next.js App Router:

// app/api/chat/route.ts
import { NextRequest, NextResponse } from "next/server";
import OpenAI from "openai";
import { Redis } from "ioredis";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const redis = new Redis(process.env.REDIS_URL!);

// Rate limiting
async function checkRateLimit(ip: string): Promise<boolean> {
  const key = `ratelimit:chat:${ip}`;
  const requests = await redis.incr(key);
  if (requests === 1) await redis.expire(key, 60);
  return requests <= 10; // max 10 req/min
}

// Input validation
function validateInput(body: unknown): {
  valid: boolean;
  messages?: OpenAI.Chat.Completions.ChatCompletionMessageParam[];
  error?: string;
} {
  if (!body || typeof body !== "object") {
    return { valid: false, error: "Invalid input data" };
  }

  const { messages } = body as { messages: unknown };
  if (!Array.isArray(messages) || messages.length === 0) {
    return { valid: false, error: "No messages provided" };
  }

  if (messages.length > 50) {
    return { valid: false, error: "Too many messages" };
  }

  return { valid: true, messages };
}

export async function POST(req: NextRequest) {
  try {
    // Rate limiting
    const ip = req.headers.get("x-forwarded-for") || "unknown";
    const allowed = await checkRateLimit(ip);
    if (!allowed) {
      return NextResponse.json(
        { error: "Too many requests. Please try again in a minute." },
        { status: 429 }
      );
    }

    // Validation
    const body = await req.json();
    const validation = validateInput(body);
    if (!validation.valid) {
      return NextResponse.json(
        { error: validation.error },
        { status: 400 }
      );
    }

    // Moderate the last message
    const lastMessage = validation.messages![
      validation.messages!.length - 1
    ];
    if (lastMessage.role === "user") {
      const moderation = await openai.moderations.create({
        input: lastMessage.content as string,
      });
      if (moderation.results[0].flagged) {
        return NextResponse.json(
          { error: "Message violates usage policies." },
          { status: 400 }
        );
      }
    }

    // Streaming response
    const stream = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [
        {
          role: "system",
          content:
            "You are a helpful assistant. " +
            "You respond concisely and substantively.",
        },
        ...validation.messages!,
      ],
      stream: true,
      temperature: 0.7,
      max_tokens: 1000,
    });

    const encoder = new TextEncoder();
    const readable = new ReadableStream({
      async start(controller) {
        try {
          for await (const chunk of stream) {
            const content = chunk.choices[0]?.delta?.content;
            if (content) {
              controller.enqueue(
                encoder.encode(`data: ${JSON.stringify({ content })}\n\n`)
              );
            }
          }
          controller.enqueue(encoder.encode("data: [DONE]\n\n"));
        } catch (err) {
          controller.error(err);
        } finally {
          controller.close();
        }
      },
    });

    return new Response(readable, {
      headers: {
        "Content-Type": "text/event-stream",
        "Cache-Control": "no-cache",
        Connection: "keep-alive",
      },
    });
  } catch (error) {
    console.error("Chat API error:", error);
    return NextResponse.json(
      { error: "Internal server error" },
      { status: 500 }
    );
  }
}

Comparison with Claude API and Gemini API#

When choosing an AI API provider, it is worth comparing the key differences:

| Feature | OpenAI (GPT-4) | Anthropic (Claude 3) | Google (Gemini Pro) | |---------|-----------------|---------------------|---------------------| | Context window | 128K | 200K | 1M+ | | Function calling | Yes | Yes (tool use) | Yes | | Streaming | Yes | Yes | Yes | | Multimodal | Yes (vision, audio) | Yes (vision) | Yes (vision, audio, video) | | Structured output | JSON mode | JSON with XML | JSON mode | | Price (input/1M) | from $2.50 | from $3 | from $1.25 | | Price (output/1M) | from $10 | from $15 | from $5 |

When to Choose OpenAI?#

Broadest ecosystem - Assistants API, fine-tuning, DALL-E, Whisper
Function calling - most mature implementation
Community - largest knowledge base and examples
Fine-tuning - ability to customize models with your own data

When to Consider Alternatives?#

Claude - long context (200K), better instruction following, safety focus
Gemini - million-token context window, Google Cloud integration, lower pricing
Local models (Ollama/Llama) - full data privacy, no API costs

Unified Abstraction Layer#

// Common interface for different AI providers
interface AIProvider {
  chat(messages: Message[], options?: ChatOptions): Promise<string>;
  stream(
    messages: Message[],
    options?: ChatOptions
  ): AsyncIterable<string>;
  embed(text: string): Promise<number[]>;
}

class OpenAIProvider implements AIProvider {
  async chat(messages: Message[], options?: ChatOptions) {
    const response = await openai.chat.completions.create({
      model: options?.model || "gpt-4o",
      messages: messages as any,
    });
    return response.choices[0].message.content || "";
  }

  async *stream(messages: Message[], options?: ChatOptions) {
    const stream = await openai.chat.completions.create({
      model: options?.model || "gpt-4o",
      messages: messages as any,
      stream: true,
    });
    for await (const chunk of stream) {
      yield chunk.choices[0]?.delta?.content || "";
    }
  }

  async embed(text: string) {
    const response = await openai.embeddings.create({
      model: "text-embedding-3-small",
      input: text,
    });
    return response.data[0].embedding;
  }
}

Summary#

The ChatGPT API is a powerful tool that opens enormous possibilities for web application developers. Key takeaways:

Choose models wisely - GPT-4 for complex tasks, GPT-3.5 for simple operations
Implement streaming - significantly improves user experience
Leverage function calling - integrates AI with your application's business logic
Use RAG - eliminates hallucinations and allows the use of your own data
Optimize costs - caching, appropriate model selection, prompt compression
Prioritize security - moderation, rate limiting, input validation

Need AI Integration in Your Application?#

MDS Software Solutions Group specializes in integrating AI solutions with web applications. We build intelligent chatbots, RAG systems, semantic search engines, and automation tools powered by OpenAI API, Claude, and other LLM models.

Contact us to discuss how artificial intelligence can streamline your business and give you a competitive edge. We offer consultations, AI architecture design, and full implementation - from prototype to production.