Previously, I wrote about running DeepSeek R1 1.5B model on AWS EC2 instance.

In this post, we will explore how to deploy it to AWS EC2 instance and expose its API to the outside world.

1.5B is the smallest R1 model, however the smallest instance required to run it must have at least 4GB of RAM, so we will be using t3.medium.

When creating an instance, make sure to enable HTTP traffic.

User data script:

#!/usr/bin/env bash

curl -fsSL https://ollama.com/install.sh | sh

export OLLAMA_HOST=0.0.0.0:80

sudo ollama serve &

sleep 1

ollama pull deepseek-r1:1.5b

Make a request:

curl ec2-13-61-178-152.eu-north-1.compute.amazonaws.com/api/generate -d '{
  "model": "deepseek-r1:1.5b",
  "prompt": "Hello. Introduce yourself.",
  "stream": false
}' | jq

Response:

{
  "model": "deepseek-r1:1.5b",
  "created_at": "2025-01-30T19:41:16.945227397Z",
  "response": "<think>\n\n</think>\n\nHello! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek. I'm at your service and would be delighted to assist you with any inquiries or tasks you may have.",
  "done": true,
  "done_reason": "stop",
  "context": [
    ...
  ],
  "total_duration": 9775275177,
  "load_duration": 25811605,
  "prompt_eval_count": 9,
  "prompt_eval_duration": 1060000000,
  "eval_count": 44,
  "eval_duration": 8688000000
}

Author Of article : F.P. Mortal Read full article