惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
WordPress大学
WordPress大学
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
小众软件
小众软件
美团技术团队
Attack and Defense Labs
Attack and Defense Labs
S
Security Archives - TechRepublic
C
Comments on: Blog
腾讯CDC
V
Visual Studio Blog
Help Net Security
Help Net Security
MyScale Blog
MyScale Blog
S
Secure Thoughts
P
Privacy & Cybersecurity Law Blog
I
Intezer
NISL@THU
NISL@THU
T
Tor Project blog
G
Google Developers Blog
罗磊的独立博客
E
Exploit-DB.com RSS Feed
Hugging Face - Blog
Hugging Face - Blog
The Cloudflare Blog
P
Proofpoint News Feed
C
Cisco Blogs
量子位
A
Arctic Wolf
Scott Helme
Scott Helme
Schneier on Security
Schneier on Security
Blog — PlanetScale
Blog — PlanetScale
I
InfoQ
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Stack Overflow Blog
Stack Overflow Blog
T
Troy Hunt's Blog
H
Heimdal Security Blog
云风的 BLOG
云风的 BLOG
N
News and Events Feed by Topic
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
SecWiki News
SecWiki News
P
Proofpoint News Feed
有赞技术团队
有赞技术团队
B
Blog
C
Check Point Blog
O
OpenAI News
N
News | PayPal Newsroom
www.infosecurity-magazine.com
www.infosecurity-magazine.com
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
L
LINUX DO - 最新话题
L
Lohrmann on Cybersecurity
Hacker News: Ask HN
Hacker News: Ask HN
Security Latest
Security Latest

Runpod Blog.

DeepSeek V4 in the wild, and how to run it on Runpod New Runpod datacenter now live: AP-IN-1 Track GPU spend across your team with Cost Centers The GPU supply supercycle is here. Here’s what AI builders need to know. Community Spotlight: One-click AI image and video generation on Runpod with SwarmUI | Runpod Blog Community Spotlight: LoRA Pilot Data Prep to Inference Introducing the Runpod Assistant: Manage Your Cloud GPU Resources with Natural Language OpenAI's Parameter Golf: Train the Best Language Model That Fits in 16MB on Runpod LLM inference optimization: techniques that actually reduce latency and cost Pruna P-Video and Vidu Q3 public endpoints now available on Runpod Runpod brand spelling guide Quickstart - Runpod Documentation The AI market looks nothing like the narrative Training StyleGAN3 with Vision-Aided GAN on Runpod KoboldAI – The Other Roleplay Front End, And Why You May Want to Use It How to Connect Cursor to LLM Pods on Runpod for Seamless AI Dev Community Spotlight: How AnonAI Scaled Its Private Chatbot Platform with Runpod Prompt Scheduling with Disco Diffusion on Runpod Runpod's Latest Innovation: Dockerless CLI for Streamlined AI Development Run Your Own AI from Your iPhone Using Runpod Introducing Flash: Run GPU workloads on Runpod Serverless: No Docker required Use Claude Code with your own model on Runpod: No Anthropic account required Avoid Errors by Selecting the Proper Resources for Your Pod What hackers built on Runpod at TreeHacks 2026 Easily Back Up and Restore Your Pod with Cloud Sync + Backblaze B2 The Complete Guide to GPU Requirements for LLM Fine-Tuning AI Guides, Tutorials & GPU Infrastructure Insights | Runpod Your first Claude Code project within Runpod: a complete setup guide 10 billion Serverless requests and counting Building for resilience: Runpod’s response to the AWS us-east-1 outage How to Connect Google Colab to Runpod Founder Series #1: The Runpod Origin Story AMD MI300X vs. NVIDIA H100: Mixtral 8x7B Inference Benchmark How to Run the FLUX Image Generator with ComfyUI on Runpod Run Llama 3.1 405B with Ollama on Runpod: Step-by-Step Deployment How to Run FLUX Image Generator with Runpod (No Coding Needed) How to Use 65B+ Language Models on Runpod Deploy Llama 3.1 with vLLM on Runpod Serverless: Fast, Scalable Inference in Minutes Open Source Video & LLM Roundup: The Best of What’s New Run vLLM on Runpod Serverless: Deploy Open Source LLMs in Minutes Introduction to vLLM and PagedAttention New update to Github integration: release rollback! | Runpod Blog A note to the developers who built Runpod with us Deploy ComfyUI as a Serverless API Endpoint Setting up Slurm on Runpod Clusters: A Technical Guide Building an OCR System Using Runpod Serverless From No-Code to Pro: Optimizing Mistral-7B on Runpod for Power Users Lessons While Using Generative Language and Audio For Practical Use Cases Runpod RoundUp 3 – AI Music and Stock Sound Effect Creation New Navigational Changes To Runpod UI Use alpha_value To Blast Through Context Limits in LLaMa-2 Models Runpod Roundup 5 – Visual/Language Comprehension, Code-Focused LLMs, and Bias Detection Runpod is Proud to Sponsor the StockDory Chess Engine Runpod Roundup 4 – Open Source LLM Evaluators, 3D Scene Reconstruction, Vector Search Meta and Microsoft Release Llama 2 as Open Source SuperHot 8k Token Context Models Are Here For Text Generation How to Manage Funding Your Runpod Account Encrypted Volumes on Runpod: Protect Your Data at Rest How to Run a "Hello World" on Runpod Serverless Runpod AI field notes: December 2025 Faster GitHub Builds: Major Performance Improvements to Our Automated Integration Partnering with Defined AI to Bridge the Data Wealth Gap How to Run Serverless AI and ML Workloads on Runpod How to fine-tune a model using Axolotl Transcribe and translate audio files with Faster Whisper Runpod Achieves SOC 2 Type II Certification: Continuing Our Compliance Journey Orchestrating GPU workloads on Runpod with dstack DeepSeek V3.1: A Technical Analysis of Key Changes from V3-0324 Deep Cogito Releases Suite of LLMs Trained with Iterative Policy Improvement Wan 2.2 Releases With a Plethora Of New Features Iterative Refinement Chains with Small Language Models The New Runpod.io: Clearer, Faster, Built for What’s Next Introducing Clusters: On-Demand Multi-Node AI Compute Run DeepSeek R1 on Just 480GB of VRAM How Do I Transfer Data Into My Runpod? Spot vs. On-Demand Instances: What’s the Difference? Deploy GitHub Repos to Runpod with One Click Run GGUF Quantized Models Easily with KoboldCPP on Runpod How to Work with GGUF Quantizations in KoboldCPP Introducing Better Forge: Spin Up Stable Diffusion Pods Faster Supercharge Your LLMs with SGLang: Boost Performance and Customization Mastering Serverless Scaling on Runpod: Optimize Performance and Reduce Costs RAG vs. Fine-Tuning: Which Is Best for Your LLM? Run Larger LLMs on Runpod Serverless Than Ever Before – Llama-3 70B (and beyond!) How to Run vLLM on Runpod Serverless (Beginner-Friendly Guide) Embracing New Beginnings: Welcoming Banana.dev Community to Runpod Stable Diffusion + ComfyUI on Runpod: Easy Setup Guide Runpod RoundUp 2 – 32k Token Context LLMs and New StabilityAI Offerings Runpod Roundup: High-Context LLMs, SDXL, and Llama 2 16k Context LLM Models Now Available On Runpod Savings Plans Are Here For Secure Cloud Pods – How To Purchase a Monthly Plan And Save Big Pygmalion-7b from PygmalionAI has been released, and it's amazing Ada Architecture Pods Are Here – How Do They Stack Up Against Ampere? Spin up a Text Generation Pod with Vicuna and Experience a GPT-4 Rival Using OpenPose to Annotate Poses Within Stable Diffusion Set Up a Chatbot with Oobabooga on Runpod Connect VSCode to Your Runpod Instance (Quick SSH Guide) Deploy a Stable Diffusion UI on Runpod in Minutes Google Colab Pro vs. Runpod: Best GPU Cloud for AI Workloads How to Run a GPU-Accelerated Virtual Desktop on Runpod
Exploring Runpod Serverless: Create Workers From Templates
Eliot Cowley · 2025-09-03 · via Runpod Blog.

Runpod Serverless is a cloud computing solution designed for short-lived, event-driven tasks. Runpod automatically manages the underlying infrastructure so you don’t have to worry about scaling or maintenance. You only pay for the compute time that you actually use, so you don’t pay when your application is idle.

You configure an endpoint for your Serverless application with compute resources and other settings, and workers process requests that arrive at that endpoint. You create a handler function that defines how workers process incoming requests and return results. Runpod automatically starts and stops workers based on demand to optimize resource usage and minimize cost.

When a client sends a request to your endpoint, it is put into a queue and waits for a worker to become available. A worker processes the request using your handler function and returns a result to the client.

Diagram of a Runpod serverless endpoint where a request queue initializes workers that return output to the user

You can certainly create custom workers from scratch, but in most cases it’s easiest to start with a template. Runpod provides several templates to help you get started. Let’s create workers using a few of these templates.

What you’ll learn

In this blog post you’ll learn how to:

  • Create a Serverless worker from a template on GitHub
  • Test a worker on your local computer
  • Deploy a worker to Runpod Serverless from a GitHub repository

Requirements

worker-basic

The worker-basic template is a minimal Serverless example. When the endpoint receives a request, Runpod spins up a worker to execute the handler function, which in this case prints out some text and sleeps for a few seconds.

Let’s try testing this template locally:

  1. Open a terminal on your local computer.
  2. Clone the worker-basic repository on GitHub:
  1. Open the worker-basic folder in your preferred code editor. Take a look through the files:
  • Dockerfile: Configures the environment for a Docker container. Notice that it configures Python and installs the necessary packages before calling the handler function.
  • README.md: Instructions for deploying the worker.
  • requirements.txt: Sets the Python packages for Docker to install.
  • rp_handler.py: Script containing the handler function for the worker.
  • test_input.json: Mock input data to test the handler function.
  1. Create a Python virtual environment:
  1. Activate the Python virtual environment.
  • On macOS/Linux:
  • On Windows:
  1. Install the Runpod SDK:
  1. Run rp_handler.py. The script will automatically read test_input.json as input, passing it to the handler function as an event:
  • You should get output similar to the following:
  1. Take a look at test_input.json. Notice that the input object matches the input that the handler function took. Now change the prompt and seconds fields and rerun the handler function. You should see output that matches the new input:

In this example, the worker simply prints some text and sleeps for a given number of seconds. In a real application, you would replace this with functionality like running a Large Language Model (LLM) or performing some other compute-intensive operation. We will try doing this later.

Let’s look through rp_handler.py so we can understand how it works:

The handler(event) function is the entry point for the worker.

event is a dictionary containing the request input in the input key. Here, we store the input values in local variables, print them to the console, and sleep.

When we run the script, it calls runpod.serverless.start, which requests a worker at the endpoint, and sets the handler function to handler.

We will learn how to deploy a worker later - for now, let’s check out another template.

worker-template

  1. Open a terminal on your local computer.
  2. Clone the worker-template repository on GitHub:
  1. Open the worker-template folder in your preferred code editor. Look through the files - in particular, let’s look at the Dockerfile. Note that it uses the runpod/base image, which includes CUDA, multiple versions of Python, uv, jupyter notebook and common dependencies.
  2. Create a Python virtual environment:
  1. Activate the Python virtual environment.
  • On macOS/Linux:
  • On Windows:
  1. Install the Runpod SDK:
  1. Run handler.py. The script will automatically read test_input.json as input, passing it to the handler function as an event:
  • You should get output similar to the following:
  1. Take a look at test_input.json. Notice that the input object matches the input that the handler function took. Now change the name field and rerun the handler function. You should see output that matches the new input:

In this example, the worker simply prints some text. In a real application, you would replace this with functionality like running a Large Language Model (LLM) or performing some other compute-intensive operation. We will try doing this later.

Let’s look through handler.py so we can understand how it works:

As the comments mention, if your handler function uses an LLM, you should load it at the start of your script rather than in the handler function itself so that it’s not loaded every time the handler function is called.

The handler(job) function is the entry point for the worker.

job is a dictionary containing the request input in the input key. Here, we store the input value name in a local variable and print it to the console.

The runpod.serverless.start function requests a worker at the endpoint, and sets the handler function to handler.

Deploy a worker from GitHub

Now that we have learned how to create a simple worker from a template, let’s learn how to deploy it:

  1. Sign in to GitHub and fork the worker-basic or worker-template repository. Alternatively, you can create a new repository and copy one of the template’s files into it.
  2. Open the Settings page in the Runpod Console.
  3. Under Connections, find the GitHub card and select Connect.

Runpod settings page with the GitHub Connect button highlighted under Connections

  1. Sign in to your GitHub account.
  1. Choose which repositories Runpod can access:
  • All repositories: Access to all current and future repositories.
  • Only select repositories: Choose specific repositories. In this case, make sure you select the template repository that you forked.
  1. GitHub redirects you back to your Runpod settings, where you should see that your GitHub account is now connected. You can edit the connection settings at any time by selecting Edit Connection.

Runpod Connections panel showing a connected GitHub account with an Edit Connection button

  1. In the left sidebar, under Manage, select Serverless.

Runpod settings page with Serverless highlighted in the sidebar and GitHub account connected

  1. Select New Endpoint.

Runpod Serverless page with the New Endpoint button highlighted

  1. Under Import Git Repository, use the search bar to find the repository that you forked from the worker template and select it.

Runpod new serverless endpoint screen with a Git repository search list

  1. Configure the deployment settings:
  • Select which Branch to deploy from.
  • Enter the Dockerfile Path from the root of the repository.
  • Select Next.

Configure GitHub Repository step with branch, Dockerfile path, and Next button highlighted

  1. Configure the endpoint settings:
  • Enter an Endpoint Name.
  • Select a GPU Configuration. For this example, the 16 GB GPU is sufficient.
    1. Note: Make sure you have credits in your Runpod account.
  • Select Deploy Endpoint
  1. If Runpod successfully deploys your endpoint, it redirects you to the endpoint’s page. Select the Builds tab to check on the initial build and wait for it to finish. Once it’s finished, wait for Runpod to roll out the workers. When the endpoint is ready, it should look like this:

Runpod endpoint Builds tab showing a completed build for the worker-basic endpoint

  1. Select the Requests tab. You should see a sample request similar to the following:
  1. Select Run to send the request to the endpoint:

Runpod endpoint Requests tab with a test JSON input and the Run button highlighted

  1. Runpod sends the request to the queue, where it waits for an available worker. When a worker becomes available, it assigns the request to the worker, and the worker executes the handler function using the given input. You can view the status of the request in the console:

Runpod request results showing a completed job with JSON output, delay, and execution times

Next steps

Congratulations, you have successfully created a worker from a template repository and deployed it from GitHub! These examples were very basic, but there are many other more practical templates available, which we will explore in future blog posts. You can also check them out yourself on GitHub.

Try modifying your handler function to do something more interesting, like having an LLM process a query, or running compute-intensive code. You can also implement GitHub Actions for Continuous Integration/Continuous Deployment to automatically test and deploy every time you push to your repository.

Author profile: Eliot Cowley