惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

AI-Discovered Vulnerabilities Need A Triage Queue, Not A Panic Channel AI Agent Workboards Need Audit Controls Before They Need More Agents Demystifying DevRel: What It Actually Is (And Why Should You Become One?) Your AI, Your Device, Your Data - Introducing Aide QuietPulse - Mood Tracker Principal Components in TypeScript (Part 3) The pgAudit Attribution Gap: Why Role-Level Logging Fails GDPR and How to Close It Gemma 4 CAD Orchestrator I built a local Postgres triage co-pilot because HIPAA says I can't paste plans into ChatGPT or Claude Live Holographic Editor In Fractal Time Everbench: A document management system with Local Intelligence Instanton in Fractal Time The Hidden Features of Claude How I Built an AI News Brief with Next.js, Supabase, Vercel, and GPT-4o-mini How We Built a Multi-Agent AI Documentation System (And What We Learned) I got tired of writing post-mortems — so I built RCAi for SREs MIA: A Futuristic AI Desktop Assistant Built with Voice, Gestures, and Controlled Chaos Best Programming Language for Backend Web Development: PHP vs Python PayPal Alternatives for Indian Businesses: Best Payment Gateways for International Card Payments (2026) Gemma 4 Made Me Rethink Local AI: Not Just Text, But Images Too Clean Architecture in .NET Explained (The Dependency Rule) I Compiled Rust to WebAssembly and Made My JavaScript 6 Faster Outlook.com Is the Final Boss of 'Just Send an Email' Conditional Statements and Control Flow in Python Insults & Cutlasses, Local LLM Sword Fighting on Melee Island Production Lab: ECS Fargate + Prometheus + Grafana + Loki + Alloy + Node Exporter How 12 AI agent frameworks handle human approval (most badly) The Four-Index Reality: Why AI Search Isn't One Thing I Scanned 1 Million AI Services. Here's What Worries Me More Than the Vulnerabilities Managing multiple docker hub accounts using docker-use System Design Interview: Decentralized Web Crawler Metric Cardinality: High or Low? 4 Steps to Making the Right Choice 로컬 LLM 셋업 가이드 (v23) GEO vs SEO in 2026 — What Google's May Guidance Changed Cursor Review 2026 — Honest 'Not For Me' Take From a VSCode User Hello from rikuq — a practitioner blog for solo AI SaaS founders Why DevOps Engineers Need Practical Tutorials, Not Just Theory AI Agents in CI/CD: Give Them Context, Not Production Authority Now I See Why Translators Are Panicking Over AI—Should Coders Panic Too? Why I Track HRV Every Morning (And How It Actually Changes My Day) Diffusion Language Models: How NVIDIA's Nemotron-Labs DLM Is Killing Token-by-Token Generation Chatbots GPT pour le support client : ce que les équipes françaises ont réellement besoin de savoir I Hit the 1,232-Byte Wall So You Don't Have To Google Just Rebuilt the Search Box (Again) — But This Time It's Different Aether: A local Android assistant built with Gemma 4 BoxAgnts Introduction (1) — Out of the Box mkdev: trusted HTTPS for localhost, mapped by name Just one question, one answer. Why Java Still Rules the Programming World in 2026 Four Architectures for Letting Claude Edit Elementor (and Why We Shipped Clone-and-Mutate) yard-yaml 0.1.1: safer UTF-8 handling for YAML documentation I Built a Mac App That Keeps Your Clipboard in Sync Across All Your Android Devices Stop Using UUIDs: Why B2B SaaS Needs ULIDs in Laravel 🐘 I'm a non-technical founder who built a Slack approval tool. Here's what actually broke first. Open-Sourcing Our Game AI Stack — SDKs, Templates, and CLI Tools for NPC Dialogue I Built an AI System That Makes 1,000 Decisions a Day. Here's Where I Drew the Line. Lets Encrypt DNS Challenge with Traefik and AWS Route 53 Building an agent-ready website: how to make your site readable for ChatGPT, Perplexity and autonomous agents A productivity tool with GitHub as your cloud database How We Built Dynamic NPC Dialogue with LLMs — Lessons from Early Access cmux: The Native macOS Terminal Built for Running AI Coding Agents in Parallel Deep Atlantic Storage: Rewriting in Rust How I Built a Bulk Image Optimizer with $0 Server Costs Using Vanilla JS and Canvas API Humans and Machines read differently, I think I have a fix? Claude Code Deleted 92 Images Without Asking. This Happens More Than You Think. Method Calling Stack in Java I Built Schedule Sensei & Pushed It to GitHub – Here's What's Inside (And I Need Your Help 👀) OIC: From a Working Toast Watcher to a General "Watch It for Me" Agent Memory is two-thirds of what an AI chip costs to build The XState persistence problem is five years old. Here is what we built to finally solve it. i added MCP support to my SaaS in an afternoon. here's the whole thing. Framework: Link Building ☁️ Importing existing S3 buckets into Terraform state made easy with terraform import existing s3 bucket I Built a Token System on Solana (Without Any Backend Code) 터미널 AI 에이전트 구축 (v21) I Built an AI 3D Model Generator — Here's How I Handle Meshes in the Browser 🛡️ PromptGuard: I Built a Local AI Privacy Firewall That Sanitizes Your Prompts Before They Leave Your Machine PostgreSQL WAL Bloat: Why Automatic Management Is Often Insufficient? Seven PRs Before Lunch: Parallel Claude Code Tabs Plus Audit-Before-Bump Deployment using all three Kubernetes probes Qwen 3.6 Has Four Tiers. Here's How to Route Without Burning Cash. RAG 시스템 실전 구축 (v21) How I handle my errors in PHP The Blind Spot in Treasure Hunt Engine Configuration: Long-Term Server Health Run NVIDIA NIM on Your Own GPU — Same API, Different Endpoint Webflow SEO Implementation 로컬 LLM 셋업 가이드 (v21) How Logs Travel From Your EKS Pod to Datadog 𝗦𝘁𝗼𝗽 𝗖𝗿𝗮𝗺𝗺𝗶𝗻𝗴 𝗙𝗼𝗿 𝗘𝘅𝗮𝗺𝘀, 𝗦𝘁𝗮𝗿𝘁 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗥𝗲𝗮𝗹 𝗦𝗸𝗶𝗹𝗹𝘀 How to Use EXPLAIN ANALYZE in PostgreSQL: A Visual Guide gRPC Performance: tonic (Rust) vs grpc-go Benchmarked at Scale Hack The Box (HTB): Cap Machine (Full Walkthrough) Visual Search Optimization studygemma: AI study buddy for CS students Architectural Tradeoffs in Webhook Idempotency and SaaS API Versioning One Open Source Project a Day (No. 75): Understand Anything - The AI Engine That Turns Any Codebase Into an Explorable Knowledge Graph From mock-only-works to real-world-works: 48 hours of reCAPTCHA debugging I built a free music tool AI Talking Avatar Pipelines Broke Our Ad CTR by 3.7% 800G to 400G Breakout: How to Scale 400G Networks with 800G Ports
Gemma 4 GenAI Coach - GenAI Concepts Made Easy with an Interactive Playground
koushalya200 · 2026-05-25 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4.

ModelX GenAI Interactive Playground: Learning Gemma 4, Grounding, and Agentic Design

What I Built

I built ModelX GenAI Interactive Playground, an opinionated learning portal that helps developers understand how to design, ground, and debug Gemma 4–powered chatbot and agentic applications.

Instead of being “just another chat UI”, the app is structured as a guided playground:

  • A multi‑turn chatbot vs agentic playground where you can flip between simple chat and tool‑using agent flows.
  • A grounding playground to compare:
    • prompt‑based grounding,
    • tool/MCP‑style grounding, and
    • RAG over a small in‑memory vector index.
  • A RAG & vector DB lab to reason about chunking, top‑k, token budgets, and context window pressure.
  • A prompt injection & safety lab where you can experiment with red‑team prompts and see simple safety heuristics in action.
  • A metrics & system‑design view that shows per‑turn token usage and latency for the current session, and generates a dynamic Mermaid diagram of the selected architecture.
  • A Gemma‑specific guide, coach, and quiz that walk the user through model choices and tradeoffs (26B A4B vs 31B, edge vs server), grounded in the official Gemma documentation.1
  • A Google AI Studio context caching demo that shows how to create cached context for Gemma 4 and compare usage/latency with and without caching.2

The goal is not just to “host a model”, but to make the design decisions around Gemma‑based systems visible and learnable.

Demo

  • Live app: https://modelx-genai-interactive-playground-ulwenmfdutovvcfyffv4x5.streamlit.app/
  • Recommended walkthrough:
    1. Start at “⭐ How to use this portal (start here)” in the left sidebar.
    2. Pick your backend (OpenRouter or Google AI Studio) and a Gemma 4 model.
    3. Go to Grounding playground to see how the same question changes under different grounding strategies.
    4. Move to Playground (LLM & Agent) to compare chatbot vs agentic flows and watch per‑turn metrics.
    5. Visit Prompt injection & safety lab and RAG & Vector DB lab for deeper experiments.
    6. Finish at Metrics scorecard & system design and Google context caching demo to inspect your telemetry and caching behavior.

(You can embed a Loom / YouTube walkthrough or GIF here once you record it.)

Code

Key modules:

  • app.py – Streamlit entrypoint, backend selection (OpenRouter vs Google AI Studio), navigation, and the persistent “active backend/model” header.
  • openrouter_client.py – Thin wrapper around OpenRouter’s OpenAI‑compatible chat API (Gemma 4 26B A4B / 31B free routes) with token/latency metrics.
  • google_ai_client.py – Thin client around the Gemini API generateContent endpoint for Gemma 4 models.
  • playground.py – Multi‑turn chatbot vs agent loop with configurable context strategies.
  • grounding_playground.py – Side‑by‑side grounding experiments (prompt grounding, tool/MCP‑style, and RAG).
  • rag_lab.py – RAG/vector DB design lab with token‑budget reasoning.
  • safety_lab.py – Prompt injection & safety heuristics.
  • metrics_scorecard.py – Per‑session summary and per‑turn metrics view (stored in st.session_state.interactions).
  • diagrams.py – Mermaid system diagram generator driven by user selections.
  • gemma_guide.py / gemma_coach.py – Gemma‑focused guide, coach, and quiz.
  • google_context_cache_demo.py – Google AI Studio context caching demo using cachedContents.
  • how_to_use.py, ui_flow_explainer.py, infra_explainer.py – documentation pages inside the app.

How I Used Gemma 4

Model choice

The playground is built around the Gemma 4 family, focusing on:

  • Gemma 4 26B A4B – Mixture‑of‑Experts configuration (26B total, 4B active per token), tuned for strong reasoning and long‑context workloads on small servers.
  • Gemma 4 31B – dense 31B model for maximum quality on server deployments, with strong reasoning, coding, and long‑context capabilities.

These sizes are a sweet spot for an educational portal:

  • Small enough to be usable via hosted APIs.
  • Large enough to clearly show the impact of prompt design, context trimming, grounding, and RAG on quality.

The UI lets you switch models in the sidebar, and the active backend + model are always shown at the top of the main area so users know exactly which configuration they’re exploring.

Backends: OpenRouter and Google AI Studio (user‑selectable)

The app supports both OpenRouter and Google AI Studio, but always uses exactly one backend at a time. The user:

  1. Chooses which backends to configure in the sidebar.
  2. Enters:
    • An OpenRouter API key (for easy access to free Gemma 4 26B A4B / 31B routes), or
    • A Google AI Studio key (for direct Gemini API calls to Gemma 4 models).
  3. Picks the active backend for this session via a radio button.

Under the hood:

  • When OpenRouter is active:

    • openrouter_client.py calls https://openrouter.ai/api/v1/chat/completions with the selected Gemma 4 model ID and normalizes token/latency usage.
    • This is a nice “batteries‑included” path for users who don’t want to set up Google AI Studio immediately.
  • When Google AI Studio is active:

    • google_ai_client.py calls https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent with the user’s AI Studio key.
    • The app surfaces usageMetadata fields such as promptTokenCount and candidatesTokenCount for metrics, and google_context_cache_demo.py shows how to use cachedContents to get cachedContentTokenCount for context caching.

This separation makes it easy for learners to switch between “router‑style” and “direct provider” patterns without changing any of the UI or agent logic.

Context caching (Google AI Studio)

The Google context caching demo page walks through the Gemini API’s context caching for Gemma 4:

  1. Create cached context

    • Calls POST /v1beta/cachedContents with a large, stable prompt (e.g. Gemma 4 documentation and system instructions).
    • Stores the returned cachedContents/... name in st.session_state.
  2. Compare calls

    • Call A (no cache): uses generateContent with full context + question in one prompt.
    • Call B (with cache): uses cachedContent: "cachedContents/..." and sends only the new question.
  3. Inspect usage and latency

    • The app shows usageMetadata for both calls:
      • promptTokenCount
      • candidatesTokenCount
      • totalTokenCount
      • cachedContentTokenCount (for the cached call)
    • Users can see how many tokens are reused from cache and how latency changes.

This gives a very concrete, visual way to understand why context caching matters for long prompts and how Gemma 4 can take advantage of it via the Gemini API.

Agentic patterns & grounding

The Playground (LLM & Agent) and Grounding playground expose agent plumbing in a transparent way:

  • Chatbot vs agentic mode:

    • Chatbot mode: forwards the multi‑turn messages list to Gemma.
    • Agentic mode:
    • Runs toy tools (math evaluator, pseudo search).
    • Injects tool outputs as labeled assistant turns at the end of the context.
    • Applies a context strategy:
      • Keep all turns.
      • Sliding window (last N turns).
      • Summarize older history when beyond a threshold.
  • Grounding playground:

    • Prompt‑based grounding: paste “docs” text into the system prompt and compare answers with/without that text.
    • Tool/MCP‑style grounding: simulate a docs/search MCP server by pasting tool output and injecting it either as:
    • an assistant turn, or
    • part of the system prompt.
    • RAG‑style grounding:
    • Uses an in‑memory TF‑IDF + cosine similarity index (lightweight stand‑in for FAISS) to retrieve top‑k chunks from a corpus.
    • Appends those chunks to the prompt, mirroring a full RAG pipeline without adding heavy dependencies.

The RAG & Vector DB lab then lets users play with chunk size and top‑k and explains how those choices affect token budgets, latency, and KV/cache pressure in Gemma‑based systems.

System Design

The portal is intentionally small and composable: a thin agentic shell around Gemma 4 with strong observability and multiple “labs” that share the same backend, model, and session state.

Architecture diagram

Components

  • UI (Streamlit)

    • Sidebar:
    • Backend selection (OpenRouter vs Google AI Studio).
    • Model selection per backend.
    • Context strategy.
    • Navigation between all playgrounds and docs pages.
    • Main area:
    • Chat/agent playground.
    • Grounding, RAG, safety, and caching labs.
    • Metrics table and system diagram.
    • “How to use this portal”, UI–Agent flow, and Infra/Serving docs.
  • Session state & controller

    • Shared st.session_state holds:
    • chat_history per page,
    • interactions (per‑turn metrics),
    • architecture_choices,
    • context_strategy,
    • backend_config.
    • Agent controller:
    • Builds messages from history.
    • Injects grounding/tool/RAG context as separate turns.
    • Applies context strategy before calling the active backend client.
  • Metrics & system design

    • Every model call appends an entry to interactions with tokens, latency, model, app type, framework, context strategy, and (optionally) safety flags.
    • metrics_scorecard.py:
    • Computes per‑session aggregates.
    • Displays a per‑turn table for the current session only.
    • diagrams.py:
    • Generates a Mermaid diagram based on architecture_choices so the UI always reflects the user’s current design.
  • Labs

    • Grounding, RAG, safety, and caching labs all use the same backend client and session state, but with different wiring patterns. This keeps the mental model consistent: change the wiring, not the model.

How I Used this Portal for Learning

To help new users, I added a dedicated page:

⭐ How to use this portal (start here)

It explains:

  • How to select backends/models.
  • A recommended learning path:
    • Grounding → Playground → Safety → RAG → Metrics → Gemma guide/coach → Context caching.
  • Where to find deeper technical details:
    • UI–Model–Agent flow shows exactly how user input, session state, agent logic, and model calls fit together.
    • Infra & Serving 101 maps these patterns to OpenRouter, Google AI Studio, and vLLM/local.

By the end, users don’t just know how to call Gemma 4; they have an experiential understanding of how to architect, ground, observe, and iterate Gemma‑powered systems across multiple serving options.


  1. Based on Gemma 4 docs and overviews. 

  2. Based on the Gemini API context caching docs.