Skip to main content
Runpod can be integrated with any system that supports custom endpoint configuration. Integration is straightforward: any library or framework that accepts a custom base URL for API calls will work with Runpod without specialized adapters or connectors. This means you can use Runpod with tools like n8n, CrewAI, LangChain, and many others by simply pointing them to your Runpod endpoint URL.

Deployment options

Runpod offers four deployment options for endpoint integrations:

Public Endpoints

Public Endpoints are pre-deployed AI models that you can use without setting up your own Serverless endpoint. They’re vLLM-compatible and return OpenAI-compatible responses, so you can get started quickly or test things out without deploying infrastructure. The following Public Endpoint URLs are available for OpenAI-compatible models:
# Public Endpoint for Qwen3 32B AWQ
https://api.runpod.ai/v2/qwen3-32b-awq/openai/v1

# Public Endpoint for ibm/IBM Granite-4.0-H-Small
https://api.runpod.ai/v2/granite-4-0-h-small/openai/v1

vLLM workers

vLLM workers provide an inference engine that returns OpenAI-compatible responses, making it ideal for tools that expect OpenAI’s API format. When you deploy a vLLM endpoint, access it using the OpenAI-compatible API at:
https://api.runpod.ai/v2/ENDPOINT_ID/openai/v1
Where ENDPOINT_ID is your Serverless endpoint ID.

SGLang workers

SGLang workers also return OpenAI-compatible responses, offering optimized performance for certain model types and use cases.

Load balancing endpoints

Load balancing endpoints let you create custom endpoints where you define your own inputs and outputs. This gives you complete control over the API contract and is ideal when you need custom behavior beyond standard inference patterns.

Model configuration for compatibility

Some models require specific vLLM environment variables to work with external tools and frameworks. You may need to set a custom chat template or tool call parser to ensure your model returns responses in the format your integration expects. For example, you can configure the Qwen/qwen3-32b-awq model for OpenAI compatibility by adding these environment variables in your vLLM endpoint settings:
ENABLE_AUTO_TOOL_CHOICE=true
REASONING_PARSER=qwen3
TOOL_CALL_PARSER=hermes
These settings enable automatic tool choice selection and set the right parsers for the Qwen3 model to work with tools that expect OpenAI-formatted responses. For more information about tool calling configuration and available parsers, see the vLLM tool calling documentation.

Integration tutorials

Follow these step-by-step tutorials to integrate Runpod with popular tools:

Compatible frameworks

The same integration pattern works with any framework that supports custom OpenAI-compatible endpoints, including:
  • CrewAI: A framework for orchestrating role-playing autonomous AI agents.
  • LangChain: A framework for developing applications powered by language models.
  • AutoGen: Microsoft’s framework for building multi-agent conversational systems.
  • Haystack: An end-to-end framework for building search systems and question answering.
  • n8n: A workflow automation tool with AI integration capabilities.
Configure these frameworks to use your Runpod endpoint URL as the base URL, and provide your Runpod API key for authentication.

Third-party integrations

For infrastructure management and orchestration, Runpod integrates with:
  • dstack: Simplified Pod orchestration for AI/ML workloads.
  • SkyPilot: Multi-cloud execution framework.
  • Mods: AI-powered command-line tool.
I