Introduction to LLMs for Beginners

We're going to build a command-line Topic Explainer that takes any subject and breaks it down for a chosen audience, from absolute beginner to expert. This is a solid first project if you are just getting started with LLMs because it teaches system prompts, message history, and streaming in one small script. I have shipped dozens of these internal tools, and this is the exact pattern I reach for first.

What you'll need

Python 3.10 or newer.
The OpenAI SDK: pip install openai
An Oxlo.ai API key from https://portal.oxlo.ai. The free tier includes 60 requests per day across 16 models, which is plenty for this tutorial.

Step 1: Send your first prompt

Before we add any abstractions, we will wire up the Oxlo.ai client and make a single chat completion to verify the endpoint and credentials. I am using llama-3.3-70b here because it is a reliable general-purpose flagship model.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

user_message = "Explain how a large language model works."

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": user_message},
    ],
)

print(response.choices[0].message.content)

Step 2: Add a system prompt

Raw completions can wander. We will lock the behavior down with a system prompt so the assistant always explains topics at the requested level and keeps answers concise. Here is the system prompt I use for this agent. You can tune the rules later.

SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""

Now we pass it into the messages array.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""

user_message = "Explain how a large language model works at a beginner level."

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_message},
    ],
)

print(response.choices[0].message.content)

Step 3: Wrap it in a function

Hard-coded messages are fine for one-offs, but we want a reusable function that accepts a topic and a level. This keeps the setup code clean and makes the agent easier to test.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""

def explain_topic(topic: str, level: str = "beginner") -> str:
    user_message = f"Explain '{topic}' at a {level} level."
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    print(explain_topic("how neural networks learn", "beginner"))

Step 4: Stream the response

Waiting for the full response to return feels slow. We will enable streaming and print chunks as they arrive. On Oxlo.ai, popular models like llama-3.3-70b have no cold starts, so the first token hits the terminal quickly.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""

def explain_topic_stream(topic: str, level: str = "beginner"):
    user_message = f"Explain '{topic}' at a {level} level."
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
        stream=True,
    )
    for chunk in response:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

if __name__ == "__main__":
    explain_topic_stream("how transformers handle attention", "beginner")

Step 5: Give it memory

A real tutor answers follow-ups. We will keep a messages list in memory and append each user question and assistant reply so the context persists across turns. This is the simplest possible conversation loop.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""

def run_tutor():
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
    ]
    print("Topic Explainer is ready. Type 'quit' to exit.")
    while True:
        user_input = input("\nTopic or question: ").strip()
        if user_input.lower() == "quit":
            break
        messages.append({"role": "user", "content": user_input})
        response = client.chat.completions.create(
            model="llama-3.3-70b",
            messages=messages,
            stream=True,
        )
        assistant_reply = ""
        for chunk in response:
            if chunk.choices[0].delta.content:
                text = chunk.choices[0].delta.content
                assistant_reply += text
                print(text, end="", flush=True)
        print()
        messages.append({"role": "assistant", "content": assistant_reply})

if __name__ == "__main__":
    run_tutor()

Run it

Save the final script as tutor.py, export your key, and run it. Here is a sample session I recorded earlier today.

$ export OXLO_API_KEY="oxlo_..."
$ python tutor.py

Topic Explainer is ready. Type 'quit' to exit.

Topic or question: Explain how LLMs predict the next word at a beginner level

Imagine you are playing a game where you read a sentence and guess the next word. You have read every book, article, and web page on the internet, so you have a good sense of what word usually comes next. An LLM does exactly that, but with math. It turns words into numbers, looks at the pattern of the sentence so far, and outputs the most likely next word. Then it adds that word back to the sentence and repeats the process until it finishes.

Topic or question: What are those numbers called?

They are called embeddings, or vectors. Each word gets mapped to a long list of numbers that capture its meaning, so similar words end up close together in that number space.

Wrap-up and next steps

You now have a working conversational agent that runs against Oxlo.ai with request-based pricing. That means you can send long system prompts or multi-turn conversations without watching token costs scale, which makes this pattern cheap to experiment with. Two concrete next steps: swap in deepseek-v3.2 for math or coding explanations if you want to see stronger reasoning on a free-tier model, or add a Gradio UI so non-technical teammates can use it. If you want to see how the flat per-request pricing compares for heavier workloads, check the details at https://oxlo.ai/pricing.

推荐订阅源

DEV Community