What you’ll learn
In this tutorial, you’ll learn how to:- Deploy a vLLM worker on Runpod Serverless.
- Configure your vLLM endpoint for OpenAI compatibility.
- Connect CrewAI to your Runpod endpoint.
- Test your integration with a simple agent.
Requirements
- You’ve created a Runpod account.
- You’ve created a Runpod API key.
- You have CrewAI installed in your development environment.
- (Optional) For gated models, you’ve created a Hugging Face access token.
Step 1: Deploy a vLLM worker on Runpod
First, you’ll deploy a vLLM worker to serve your language model.1
Create a new vLLM endpoint
Open the Runpod console and navigate to the Serverless page.Click New Endpoint and select vLLM under Ready-to-Deploy Repos.
2
Configure your endpoint
For more details on vLLM deployment options, see Deploy a vLLM worker.
- Enter the model name or Hugging Face model URL (e.g.,
openchat/openchat-3.5-0106
). - Expand the Advanced section:
- Set Max Model Length to
8192
(or an appropriate context length for your model). - You may need to enable tool calling and set an appropriate reasoning parser depending on your model.
- Set Max Model Length to
- Click Next.
- Click Create Endpoint.
3
Copy your endpoint ID
Once deployed, navigate to your endpoint in the Runpod console and copy the Endpoint ID. You’ll need this to connect your endpoint to CrewAI.
Step 2: Connect CrewAI to your Runpod endpoint
Now you’ll configure CrewAI to use your Runpod endpoint as an OpenAI-compatible API.1
Open LLM connections settings
Open the CrewAI dashboard and look for the LLM connections section.
2
Select custom OpenAI provider
Under Provider, select custom-openai-compatible from the dropdown menu.
3
Add your Runpod API key
Configure the connection with your Runpod credentials:
- For
OPENAI_API_KEY
, use your Runpod API Key. You can find or create API keys in the Runpod console.
4
Configure the base URL
For Replace
OPENAI_API_BASE
, add the base URL for your vLLM’s OpenAI-compatible endpoint:ENDPOINT_ID
with your actual endpoint ID from Step 1.5
Test the connection
Click Fetch Available Models to test the connection. If successful, CrewAI will retrieve the list of models available on your endpoint.
Step 3: Test your integration
Verify that your CrewAI agents can use your Runpod endpoint.1
Create a test agent
Create a simple CrewAI agent that uses your Runpod endpoint for its language model.
2
Run a test task
Assign a simple task to your agent and run it to verify that it can communicate with your Runpod endpoint.
3
Monitor requests
Monitor requests from your CrewAI agents in the endpoint details page of the Runpod console.
4
Verify responses
Confirm that your agent is receiving appropriate responses from your model running on Runpod.
Next steps
Now that you’ve integrated Runpod with CrewAI, you can:- Build complex multi-agent systems using your Runpod endpoint to serve the necessary models.
- Explore other integration options.
- Learn more about OpenAI compatibility features in vLLM.