惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

I
InfoQ
Last Week in AI
Last Week in AI
大猫的无限游戏
大猫的无限游戏
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
WordPress大学
WordPress大学
H
Help Net Security
P
Proofpoint News Feed
B
Blog
腾讯CDC
博客园 - 司徒正美
Recorded Future
Recorded Future
酷 壳 – CoolShell
酷 壳 – CoolShell
S
Security Archives - TechRepublic
N
News and Events Feed by Topic
T
The Exploit Database - CXSecurity.com
www.infosecurity-magazine.com
www.infosecurity-magazine.com
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
O
OpenAI News
GbyAI
GbyAI
Attack and Defense Labs
Attack and Defense Labs
T
Troy Hunt's Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
F
Future of Privacy Forum
V
Vulnerabilities – Threatpost
T
Threatpost
The Cloudflare Blog
Recent Announcements
Recent Announcements
爱范儿
爱范儿
S
Schneier on Security
Blog — PlanetScale
Blog — PlanetScale
Cyberwarzone
Cyberwarzone
T
The Blog of Author Tim Ferriss
T
True Tiger Recordings
P
Proofpoint News Feed
S
Secure Thoughts
F
Fox-IT International blog
aimingoo的专栏
aimingoo的专栏
阮一峰的网络日志
阮一峰的网络日志
M
Microsoft Research Blog - Microsoft Research
F
Full Disclosure
Google Online Security Blog
Google Online Security Blog
T
Threat Research - Cisco Blogs
S
Securelist
罗磊的独立博客
L
Lohrmann on Cybersecurity
博客园 - 三生石上(FineUI控件)
T
Tailwind CSS Blog
MongoDB | Blog
MongoDB | Blog

DEV Community

Laravel Waiting Request Why Google Can't See Your React Breadcrumbs (And the 4-Line Fix) Microsoft tried to kill the printer driver. Healthcare said no. The Blueprint Beneath the Blueprint: Designing Data Model and Choosing Its Database REST APIs vs Webhooks in Telecom Billing - Which One Actually Makes Sense? Accounting Made Simple: AI-Powered Financial Insights of Japanese Companies with Gemma 4 The append-only AST trick that makes Flutter AI chat actually smooth Designing the Future of Payments — Why XML Still Matters in the Age of APIs From Legacy to Live — Reviving XMLPayments with GitHub Copilot Two Weeks Into Learning Solana XMLPayments — The Hidden Backbone of Modern Financial Orchestration AI Agents in Practice — Read from the beginning Reviving My Gemma Agentic Framework: From Prototype to Polished Repo Smart Contracts Demand Better Infrastructure: Building on contract.dev Self-Hosted LLM Tool Calling: Forge and the Build-vs-Buy Decision ORA-00072 오류 원인과 해결 방법 완벽 가이드 OpenWA for CTOs: Self-Hosted WhatsApp Gateway Trade-Offs NotebookLM Automation With notebooklm-py: Useful, But Classify Data First Docker v29.5.x Operator Upgrade Checklist Coding-Agent Instruction Design: The CLAUDE.md File That Prevents Rework When I Finally Realized My Runtime Was Holding Me Back GnokeOps: Host Your Own AI House Party The Death of Static Rate Limiters: Why Your Java Virtual Threads Need BBR-Style Adaptive Concurrency AI Agents in Practice — Part 2: What Makes Something an Agent Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule that fixed mine Beyond Prompts: Structuring AI Workflows for Real Frontend Engineering From an Abandoned Hackathon Project to an AI Study Workspace 🚀 Terraform with AI: Build AWS Infra (Cursor + MCP) What If AI Didn’t Need the Internet? 750,000 Chips, 140 Trillion Tokens: The Math Behind DeepSeek's Permanent Price Cut You're Renting Someone Else's Compute — And It's Costing You More Than You Think CSS :has() Selector: The Layout Trick I Wish I Knew 5 Years Ago Five Clusters. Five Lessons. One Production System. Synaptic: A Local-First AI Dev Companion That Remembers How You Think Revolutionizing Edge MedTech: Building a Sovereign Sleep Apnea Companion ("XiHan Snore Coach") with Gemma 4 HDD Eksternal Tiba-Tiba Tidak Bisa Diakses di Windows? Ini Tiga Lapis Fix-nya DMARC p=none vs p=quarantine vs p=reject: what to use and when DSA Application in Real Life: How Git Diff Works: LCS Intuition, Myers Algorithm, and Real Code Changes I solo-built a reputation layer for AI agents on NEAR — and here's what I learned I built an AI faceless video generator in 2 months — here's the stack Diffusion Language Models: How NVIDIA Nemotron-Labs Diffusion Shatters the Autoregressive Speed Ceiling llm-nano-vm v0.8.0 — deterministic FSM runtime for LLM pipelines, now with output validation and per-step timeouts From the Renaissance to the Quantum Dawn: AI, Computation, and the Next Paradigm Shift How I Built a Review Site with 800+ Articles Using AI I Built a Smart Kitchen AI with Gemma 4 That Turns Fridge Photos Into Recipes Why your vulnerability dashboard is lying to you (and how to fix it) From Abandoned Prototype to Smart AI System: Reviving Trafiq AI with GitHub Copilot Why Country/State/City Pickers Are Weirdly Hard Node.js 22 LTS — EOL Date, Support Timeline, and What Comes Next The 7-Layer Memory Architecture Behind Modern AI Agents I Imagined Hermes Agent Running an Entire Smart City — And It Changed How I See AI One backend, four products: why we bet on platform-per-brand AI's tech debt is invisible — even to AI. I solved it at the architecture layer. Why ROAS 300% Can Still Mean Losses — Gross Margin in 5 Ecommerce Verticals You Don’t Need to Try Every AI Tool to Keep Up NovelPilot: A Novel Writing Agent Powered by Gemma 4 BoxAgnts is an Out-Of-The-Box Secure AI Agent ToolBox in a WASM SandBox Gemma 4 deep dive: why a 1.5 GB model scores 37.5% on competition mathematics, how the MoE routing actually works, and which model fits your hardware. Full breakdown inside. BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090 Google Just Declared the Chat-Log Interface Dead. Here's What Neural Expressive Actually Signals for Developers. ARCHITECTURE SPECIFICATION & FORMAL SYSTEM REPORT: k501-AIONARC Notes from a Hammock What's Google Antigravity 2.0 ? Here's What the Agent Harness Actually Changes for Developers. Building an E2EE Chat App in Flask - Part 3: Keeping File Uploads Safe Google's Gemini Spark. Here's What It Actually Does for Developers. Microsoft Just Shipped MCP Governance for .NET. Here's What It Actually Enforces. How I Built a Pakistan Internet Speed Test Platform at 16 How to Build a Supervisor Agent Architecture Without Frameworks I Built My Own Corner of the Internet — Here's What It Looks Like How does VuReact compile Vue 3's defineExpose() to React? Neo-VECTR's Rift Ascent Idempotency Keys: The API Safety Net You Probably Aren't Using Building E-Commerce Sites for Niche Products: Technical Lessons from Specialty Outdoor Retailers Audit Logs: The Silent Guardian of Every Serious System Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled BetAGracevI I Built a Post-Quantum Cryptographic Identity SDK for AI Agents — Here's Why It Needs to Exist Running Claude Code across multiple repos without losing context There Are Cameras in Every Room of My House. I Put Them There. Why your AI agent loops forever (and how to break the cycle) How does VuReact compile Vue 3's defineSlots() to React? Building a Privacy-First Resume Editor with Typst WASM and React One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd From Pixels to Prescriptions: Building an Autonomous Healthcare Booking Agent with LangGraph MonoGame - A Game Engine for Those Who Love Reinventing the Wheel # Day 24: In Solana, Everything is an Account Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests RP2040 Wristwatch Tells Time With a Vintage VU Meter Needle observations about models / 2026, may From Video Transcripts to Source-Grounded AI Notes: A Practical Look at Notesnip AI Agent Dev Environment Guide — Real Experience from an AI Living Inside a Server How I Run 7 AI Models 24/7: Multi-Agent Architecture in Practice What exactly changes with the Claude Max plan? I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible OpenAI's $2M-tokens-for-equity YC deal, decoded Why DMX Infrastructure is Still Stuck in the 90s Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay I Made the Wrong Bet on Event Streaming in Our Treasure Hunt Engine
AI Travel Assistant Powered by Gemma 4; With Streaming, Image Input, and Visual Recommendation Cards
Developer on · 2026-05-23 · via DEV Community

This is a submission for the Gemma 4 Challenge: Build with Gemma 4


What I Built

Planning a trip used to mean bouncing between five browser tabs — one for flights, one for hotels, one for itineraries, one for Reddit threads, and one you forgot you opened. I wanted to collapse that into a single conversation.

Gemma Travel Assistant is an AI-powered chat app that helps you plan trips from scratch. Tell it your budget, your vibe, your dates. Ask follow-up questions. Upload a photo of somewhere you saw on Instagram and ask "where is this, and what should I do there?" It remembers everything you said earlier in the conversation and uses it to give you better answers.

What makes it feel different from a plain chatbot:

  • It doesn't just write paragraphs. When Gemma recommends hotels or destinations, the app parses those recommendations out of the response and renders them as visual cards — name, location, type badge (hotel / destination / restaurant), star rating, price range. You can scan five options in three seconds instead of reading five bullet points.

  • Responses stream token by token. You start reading the answer while Gemma is still writing it. For a full 5-day itinerary that can be 600+ words, this makes the experience feel instant instead of frozen.

  • It understands images natively. Drop in a photo — a landscape, a hotel lobby, a plate of food — and the model uses it as context. No extra vision pipeline, no OCR. Gemma 4 handles it directly.


Demo

Example conversation:

You: Plan a 5-day trip to Kyoto in October, budget around $1500, I love temples and local food

Gemma: Here's a day-by-day itinerary for Kyoto in October — peak foliage season, so I've planned around the best viewing spots...
(streams in, then suggestion cards appear below for ryokans and restaurants)

You: (uploads a photo of a bamboo forest)

Gemma: That's Arashiyama Bamboo Grove in western Kyoto. It's already on day 3 of your itinerary — here are the best times to visit to beat the crowds...

GitHub: https://github.com/mushahidmehdi/gemma-travel-assistant


Code

Stack:
| Layer | Choice |
|---|---|
| Framework | Next.js 16 (App Router) |
| Model | Gemma 4 31B Dense via OpenRouter |
| Styling | Tailwind CSS |
| Markdown | ReactMarkdown |
| Icons | Lucide React |

Project structure:

src/
├── app/
│   ├── api/chat/route.ts   # Streaming SSE proxy → OpenRouter
│   ├── layout.tsx
│   └── page.tsx            # Centered card layout
└── components/
    ├── ChatInterface.tsx   # Input, image upload, message list
    ├── ChatMessage.tsx     # Bubble renderer + suggestion parser
    └── SuggestionCard.tsx  # Hotel / destination / restaurant cards

Enter fullscreen mode Exit fullscreen mode


How I Used Gemma 4

Choosing the model

I went with Gemma 4 31B Dense (google/gemma-4-31b-it). Here's why that specific model, not the others:

The E2B / E4B models are designed for edge and mobile — brilliant for offline use, but I needed server-grade reasoning quality for multi-day itineraries with budget constraints, visa tips, and local context. A 2B model can hallucinate confidently about things it doesn't know well.

The 26B MoE model is optimized for throughput. For a travel assistant where a single user sends a message and waits for the reply, throughput wasn't the bottleneck. Quality and coherence over a long conversation were.

The 31B Dense hits the right balance: strong enough to produce well-structured, accurate travel advice, consistent enough to reliably follow formatting instructions (more on that below), and available on OpenRouter's free tier so anyone can clone the repo and run it without a credit card.

The 128K context window was the other deciding factor. Planning a real trip is a long conversation. By the time you've discussed your budget, chosen a region, rejected two hotel options, added a day trip, and asked about visa requirements, you've accumulated thousands of tokens of context. Smaller context windows start dropping earlier constraints. With 128K, nothing gets forgotten.

Streaming the response

The API route doesn't buffer — it pipes OpenRouter's SSE stream directly to the browser:

// src/app/api/chat/route.ts
const stream = new ReadableStream({
  async start(controller) {
    const reader = response.body!.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

      for (const line of lines) {
        const data = line.slice(6);
        if (data === '[DONE]') { controller.close(); return; }
        try {
          const parsed = JSON.parse(data);
          const content = parsed.choices?.[0]?.delta?.content;
          if (content) controller.enqueue(new TextEncoder().encode(content));
        } catch { /* skip malformed chunks */ }
      }
    }
    controller.close();
  },
});

return new Response(stream, {
  headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});

Enter fullscreen mode Exit fullscreen mode

On the client, ChatInterface reads the stream chunk by chunk and appends to the last message in state, so React re-renders progressively as tokens arrive.

Structured output via prompting

I didn't use a formal structured output API. Instead, the system prompt tells Gemma to append a fenced suggestions block at the end of any response that involves specific recommendations:

When suggesting places, format your hotel/destination/restaurant recommendations
as a JSON block at the end of your response:

Enter fullscreen mode Exit fullscreen mode


suggestions
[{
"name": "Nishiyama Onsen Keiunkan",
"location": "Yamanashi, Japan",
"type": "hotel",
"rating": 4.9,
"price": "$$$",
"description": "The world's oldest hotel, operating since 705 AD..."
}]

Enter fullscreen mode Exit fullscreen mode


typescript

ChatMessage then does two things: strips that block from the visible text (so it doesn't appear as raw JSON in the bubble), and passes the parsed array to SuggestionCard components:

function parseSuggestions(content: string) {
  const match = content.match(/```
{% endraw %}
suggestions\n([\s\S]*?)
{% raw %}
```/);
  if (!match) return { text: content, suggestions: [] };

  const text = content.replace(/```
{% endraw %}
suggestions\n[\s\S]*?
{% raw %}
```/, '').trim();
  try {
    return { text, suggestions: JSON.parse(match[1]) };
  } catch {
    return { text: content, suggestions: [] }; // graceful fallback
  }
}

Enter fullscreen mode Exit fullscreen mode

If Gemma omits the block — for a conversational reply like "Great, let's add a day trip!" — the component falls through cleanly and just shows the text bubble. No crashes, no empty card rows.

Multimodal input

Image uploads are encoded as base64 data URLs and injected into the last user message as an image_url content block — the format OpenRouter and Gemma 4 expect:

if (msg.role === 'user' && image && isLastMessage) {
  return {
    role: 'user',
    content: [
      { type: 'text', text: msg.content },
      { type: 'image_url', image_url: { url: image } }, // base64 data URL
    ],
  };
}

Enter fullscreen mode Exit fullscreen mode

Gemma 4's native vision understands the image without any preprocessing on my end — no external OCR, no separate vision model call. The model sees both the image and the conversation history and responds in context.


Building this made me appreciate how much the context window size and multimodal capability change what's actually possible in a single conversation. A travel assistant that forgets what you said three messages ago, or that can't look at a photo you found, is just a fancier search box. Gemma 4 31B makes it feel like talking to someone who's actually paying attention.