Deployment options
Runpod offers four deployment options for endpoint integrations:Public Endpoints
Public Endpoints are pre-deployed AI models that you can use without setting up your own Serverless endpoint. They’re vLLM-compatible and return OpenAI-compatible responses, so you can get started quickly or test things out without deploying infrastructure. The following Public Endpoint URLs are available for OpenAI-compatible models:vLLM workers
vLLM workers provide an inference engine that returns OpenAI-compatible responses, making it ideal for tools that expect OpenAI’s API format. When you deploy a vLLM endpoint, access it using the OpenAI-compatible API at:ENDPOINT_ID
is your Serverless endpoint ID.
SGLang workers
SGLang workers also return OpenAI-compatible responses, offering optimized performance for certain model types and use cases.Load balancing endpoints
Load balancing endpoints let you create custom endpoints where you define your own inputs and outputs. This gives you complete control over the API contract and is ideal when you need custom behavior beyond standard inference patterns.Model configuration for compatibility
Some models require specific vLLM environment variables to work with external tools and frameworks. You may need to set a custom chat template or tool call parser to ensure your model returns responses in the format your integration expects. For example, you can configure theQwen/qwen3-32b-awq
model for OpenAI compatibility by adding these environment variables in your vLLM endpoint settings:
Integration tutorials
Follow these step-by-step tutorials to integrate Runpod with popular tools:Integrate with n8n
Connect Runpod to n8n for AI-powered workflow automation.
Integrate with CrewAI
Use Runpod to power autonomous AI agents in CrewAI.
Compatible frameworks
The same integration pattern works with any framework that supports custom OpenAI-compatible endpoints, including:- CrewAI: A framework for orchestrating role-playing autonomous AI agents.
- LangChain: A framework for developing applications powered by language models.
- AutoGen: Microsoft’s framework for building multi-agent conversational systems.
- Haystack: An end-to-end framework for building search systems and question answering.
- n8n: A workflow automation tool with AI integration capabilities.