惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

Sofi Log #001: Thailand's Tourism Tax & the 180-Day AI Surveillance Wall Sofi Log #006: Decentralized IP-Address Obfuscation Specs Sofi Log #008: Bypassing Legacy Cross-Border Bank Fee Traps Secret Rotation Automation: The Operational Cost of Security Sofi Log #009: Portable Identity & DID Passport Framework Sofi Log #011: Autonomous Smart Treasury Repatriation Specs History of Linux & Unix I asked Claude if my plan was on track for the goal — and got an honest 'No' PHPStan 'expects X, Y given' — the trace it doesn't give you Open-source Playwright wrapper that passes bot.sannysoft.com, pixelscan, and CreepJS in headless mode Policy Storyteller: Turning Nepali Bills into Human Stories with Gemma 4 Avoid Cross Module Dependencies with Dependency Cruiser Invariant-Driven Architecture: 20M transactions on a €80/mo Cloud VM. Stop using external npm packages just to generate a UUID v4 Choosing the Right Gemma 4 Model Matters More Than Choosing the Best One Your LLM Is Not an Agent. Your Framework Is Not Enough. You Need a Harness. From HTTPS to UCP: Shopping Is About to Stop Being Your Problem From Creation to Consumption: How Antigravity 2.0 and Gemini Spark Are Defining the Agentic Era 10 Mistakes I Wish I Knew Before Taking the CKA Exam AI That Actually Does Stuff: Autonomous Agents Explained Exploring AI workflow Orchestration: Comparing Weft, Python & Alternative Pipeline Approaches El Poder del Aprendizaje Federado: Cuando los Algoritmos Distribuidos Entrenan a la IA Email Marketing Automation in 2026: 5 Tools (and 1 Self-Hosted) Through Their APIs A Replay Runbook For Missed Publishing Windows Why timeout handling matters more than most backend logic How I Make $6,800/Month Selling Niche VS Code Extensions Model Routing Cost Checklist: Hosted APIs, Open Models, Or Self-Hosted Inference? ORA-00207 오류 원인과 해결 방법 완벽 가이드 Deno 2.8 Operator Upgrade Checklist: CI, Lockfiles, Node Compatibility, And Rollback AI-Discovered Vulnerabilities Need A Triage Queue, Not A Panic Channel AI Agent Workboards Need Audit Controls Before They Need More Agents Demystifying DevRel: What It Actually Is (And Why Should You Become One?) Your AI, Your Device, Your Data - Introducing Aide Gemma 4 GenAI Coach - GenAI Concepts Made Easy with an Interactive Playground QuietPulse - Mood Tracker Principal Components in TypeScript (Part 3) The pgAudit Attribution Gap: Why Role-Level Logging Fails GDPR and How to Close It Gemma 4 CAD Orchestrator I built a local Postgres triage co-pilot because HIPAA says I can't paste plans into ChatGPT or Claude Live Holographic Editor In Fractal Time Everbench: A document management system with Local Intelligence Instanton in Fractal Time The Hidden Features of Claude How I Built an AI News Brief with Next.js, Supabase, Vercel, and GPT-4o-mini How We Built a Multi-Agent AI Documentation System (And What We Learned) I got tired of writing post-mortems — so I built RCAi for SREs MIA: A Futuristic AI Desktop Assistant Built with Voice, Gestures, and Controlled Chaos Best Programming Language for Backend Web Development: PHP vs Python PayPal Alternatives for Indian Businesses: Best Payment Gateways for International Card Payments (2026) Gemma 4 Made Me Rethink Local AI: Not Just Text, But Images Too Clean Architecture in .NET Explained (The Dependency Rule) I Compiled Rust to WebAssembly and Made My JavaScript 6 Faster Outlook.com Is the Final Boss of 'Just Send an Email' Conditional Statements and Control Flow in Python Insults & Cutlasses, Local LLM Sword Fighting on Melee Island Production Lab: ECS Fargate + Prometheus + Grafana + Loki + Alloy + Node Exporter How 12 AI agent frameworks handle human approval (most badly) The Four-Index Reality: Why AI Search Isn't One Thing I Scanned 1 Million AI Services. Here's What Worries Me More Than the Vulnerabilities Managing multiple docker hub accounts using docker-use System Design Interview: Decentralized Web Crawler Metric Cardinality: High or Low? 4 Steps to Making the Right Choice 로컬 LLM 셋업 가이드 (v23) GEO vs SEO in 2026 — What Google's May Guidance Changed Cursor Review 2026 — Honest 'Not For Me' Take From a VSCode User Hello from rikuq — a practitioner blog for solo AI SaaS founders Why DevOps Engineers Need Practical Tutorials, Not Just Theory AI Agents in CI/CD: Give Them Context, Not Production Authority Now I See Why Translators Are Panicking Over AI—Should Coders Panic Too? Why I Track HRV Every Morning (And How It Actually Changes My Day) Diffusion Language Models: How NVIDIA's Nemotron-Labs DLM Is Killing Token-by-Token Generation Chatbots GPT pour le support client : ce que les équipes françaises ont réellement besoin de savoir I Hit the 1,232-Byte Wall So You Don't Have To Google Just Rebuilt the Search Box (Again) — But This Time It's Different Aether: A local Android assistant built with Gemma 4 BoxAgnts Introduction (1) — Out of the Box mkdev: trusted HTTPS for localhost, mapped by name Just one question, one answer. Why Java Still Rules the Programming World in 2026 Four Architectures for Letting Claude Edit Elementor (and Why We Shipped Clone-and-Mutate) yard-yaml 0.1.1: safer UTF-8 handling for YAML documentation I Built a Mac App That Keeps Your Clipboard in Sync Across All Your Android Devices Stop Using UUIDs: Why B2B SaaS Needs ULIDs in Laravel 🐘 I'm a non-technical founder who built a Slack approval tool. Here's what actually broke first. Open-Sourcing Our Game AI Stack — SDKs, Templates, and CLI Tools for NPC Dialogue I Built an AI System That Makes 1,000 Decisions a Day. Here's Where I Drew the Line. Lets Encrypt DNS Challenge with Traefik and AWS Route 53 Building an agent-ready website: how to make your site readable for ChatGPT, Perplexity and autonomous agents A productivity tool with GitHub as your cloud database How We Built Dynamic NPC Dialogue with LLMs — Lessons from Early Access cmux: The Native macOS Terminal Built for Running AI Coding Agents in Parallel Deep Atlantic Storage: Rewriting in Rust How I Built a Bulk Image Optimizer with $0 Server Costs Using Vanilla JS and Canvas API Humans and Machines read differently, I think I have a fix? Claude Code Deleted 92 Images Without Asking. This Happens More Than You Think. Method Calling Stack in Java I Built Schedule Sensei & Pushed It to GitHub – Here's What's Inside (And I Need Your Help 👀) OIC: From a Working Toast Watcher to a General "Watch It for Me" Agent Memory is two-thirds of what an AI chip costs to build The XState persistence problem is five years old. Here is what we built to finally solve it.
Using Gemma4 2B to Assist Community Health Workers
The Ecstatic · 2026-05-25 · via DEV Community

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

Problem Statement

Nigeria has a scarcity of health workers, with the ratio of doctors to patients standing at 1:9000. This need is felt more in rural and semi-urban areas.

To address this need, Community Health Extension Workers (CHEW) exist to help bridge this gap in rural and underserved communities.

These workers undergo 2-3 years of training and can provide care for select conditions, referring patients to secondary and tertiary health facilities if needed.

In 2022, the Nigerian Federal Ministry of Health published the latest Standard Treatment Guidelines for Nigeria (STG) to provide evidence-based clinical protocols for the diagnosis, prevention, and management of common diseases and clinical conditions within the Nigerian context.

AI as a Solution

Health workers have to make decisions relying on the STG. What if we have a way to assist health workers in getting detailed treatment information, based on the STG?

This article draws inspiration from some existing work that has developed a similar solution. But in all cases, the solutions are web-based and rely on the internet and heavy LLMs.

We propose using the Gemma4 2B model to power solutions that utilize RAG to deliver treatment guides to health workers.

Why Gemma4 2B?

It's a light-weight model that can run on mobile phones. With a size of about 7GB, it can be loaded easily on a high-end phone. This means we can deploy a full RAG solution on mobile phones or tablets.

This is significant as it means that the health workers can still use the AI solution even in places where there is no internet, a common issue for workers in rural and underserved communities.

Proof of Concept

To prove our solution, we will deploy a simple RAG solution on a Windows system. This solution will use local models hosted on Windows.

Requirements

  • Ollama
  • Visual Studio
  • .NET AI Framework Libraries.

Setting Up

We use Ollama to download and run models locally. We start by signing up on Ollama and downloading the Ollama Windows application.

After installing the Ollama application, we check if it's properly installed by running ollama in PowerShell:
Image showing ollama running

We also pull some models using Ollama:
> ollama pull embeddinggemma
> ollama pull gemma4:e2b

Here we pull 2 models. embeddinggemma which will be used to generate embeddings for our RAG solution, and gemma4:e2b which will be used as the reasoning LLM:
Image showing pulling of local models

Developing the RAG Solution

The source code can be found here.

First, we create a Console project in Visual Studio and add some important libraries:

dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
dotnet add package Microsoft.Extensions.Configuration.DependencyInjection
dotnet add package OllamaSharp
dotnet add package Microsoft.Extensions.AI
dotnet package add Microsoft.SemanticKernel.Connectors.InMemory --prerelease

Enter fullscreen mode Exit fullscreen mode

We have added AI Extension and Vector Store libraries to interact with our model and generate embeddings. We also added OllamSharp to enable interaction with our Ollama-hosted models, locally or in the cloud.

Dataset

The STG has been extracted and formatted into JSON to help developers. We can download it from https://github.com/chisomrutherford/nigeria-clinical-guidelines-dataset/tree/main.

Implementation

Our implementation step is quite straightforward

  1. Load dataset
  2. Use an embedding model to generate embeddings for the dataset
  3. Save all embeddings in an in-memory vector store
  4. When a user queries, generate an embedding for the user's query
  5. Do a vector search against the in-memory vector store.
  6. Use retrieved data to enhance the prompt.
  7. Send prompt to Gemma-4 2B model and show response.
  8. Save the response and allow the user to chat with the model.

First, we declare our model and chat client:

string? model = _configuration["Ollama:ModelName"];
string? embedModel = _configuration["Ollama:EmbedModelName"];
string? url = _configuration["Ollama:BaseUrl"];

var client = new HttpClient();
client.BaseAddress = new Uri(url);

var ollamaGen = new OllamaApiClient(client, embedModel);

Enter fullscreen mode Exit fullscreen mode

Next, we define our vector store. Before we do, let's define what our data structure will look like:

public class ClinicalGuidelineVector
{
    [VectorStoreKey]
    public int Id { get; set; }

    [VectorStoreData]
    public string ConditionName { get; set; } = "";

    [VectorStoreData]
    public string RawJson { get; set; } = "";
sou
    [VectorStoreVector(Dimensions: 768, DistanceFunction = DistanceFunction.CosineSimilarity)] // depends on embedding model
    public ReadOnlyMemory<float> Embedding { get; set; } = new ReadOnlyMemory<float>();
}

Enter fullscreen mode Exit fullscreen mode

It's important to use the exact dimensions for the embedding model. To check the exact dimension, run this in PowerShell:

ollama show embeddinggemma:latest

Enter fullscreen mode Exit fullscreen mode

We can then see the model information:
Image showing a model's detailed information

Next, we create and initialize our vector store using an in-memory vector store:

var vectorStore = new InMemoryVectorStore();

VectorStoreCollection<int, ClinicalGuidelineVector> collection =
    vectorStore.GetCollection<int, ClinicalGuidelineVector>("clinical_guidelines");

await collection.EnsureCollectionExistsAsync();

Enter fullscreen mode Exit fullscreen mode

After this, we load the data and generate our embeddings:

var data = await LoadData();

if (data.Count == 0)
{
    return;
}

int id = 0;
foreach (var item in data)
{
    var text = BuildSearchText(item);

    var embeddingResponse = await ollamaGen.EmbedAsync(text);
    var embedding = embeddingResponse.Embeddings[0];

    var vector = new ClinicalGuidelineVector
    {
        ConditionName = item.ConditionName!,
        RawJson = JsonSerializer.Serialize(item),
        Embedding = embedding,
        Id = id++
    };

    await collection.UpsertAsync(vector);

    Console.WriteLine("Inserted vector ID, {0}", id);
}

Enter fullscreen mode Exit fullscreen mode

Here, we first build a string of the search text and then generate an embedding for the text. Finally, we save the embedding, a serialized string of that data, and the condition name.

When a user asks a question, we first convert their information into an embedding and search the vector store to find the information relevant to the query:

Console.Write("Your prompt: ");
Console.WriteLine(Environment.NewLine);

var query = Console.ReadLine();

var queryEmbed = await ollamaGen.EmbedAsync(query);
var queryVector = queryEmbed.Embeddings[0];

IAsyncEnumerable<VectorSearchResult<ClinicalGuidelineVector>> results =
    collection.SearchAsync(queryVector, top: 3);

List<ClinicalGuidelineVector> clinicalGuidelines = [];

await foreach (var result in results)
{
    clinicalGuidelines.Add(result.Record);
}

Enter fullscreen mode Exit fullscreen mode

Then we flatten the guidelines retrieved and add them to the user's prompt before sending to Gemma-4 2B:

model = "gemma4:e2b";

IChatClient chatClient = new OllamaApiClient(client, model);

List<ChatMessage> messages = [];
messages.Add(new ChatMessage(ChatRole.System, Constants.SystemPrompt));

var context = string.Join("\n\n",
        clinicalGuidelines.Select(m => m.RawJson));

var prompt = string.Format(Constants.PromptTemplate, context, query);

messages.Add(new ChatMessage(ChatRole.User, prompt));

var chatResponse = chatClient.GetStreamingResponseAsync(messages);

string fullResponse = "";

await foreach (var response in chatResponse)
{
    fullResponse += response;
    Console.Write(response.Text);
}

messages.Add(new ChatMessage(ChatRole.Assistant, fullResponse));

Enter fullscreen mode Exit fullscreen mode

Here, we first set the model to the reasoning model to gemma4:e2b. Next, we set the system prompt. After this, we flattened the retrieved guidelines and used them to augment the user's prompt. The entire information is then passed to the gemma4:e2b model.
The AI-generated response is also saved as history to keep the conversation going.

Challenges

We have seen how to implement a good RAG solution to help health workers use the STG. However, this comes with one challenge. The Gemma-4 2B model, despite its size and specification to allow it on mobile phones, can only work on high-end mobile phones, which may not be affordable for the target health workers (CHEW), many of whom rely on ubiquitous mobile phones as their work tools.

Conclusion

Using the Gemma-4 2B can be a game-changer in the health sector as it enhances the capability of health workers using government-approved STG. The solution can be deployed to assist community health workers working in underserved communities, although the cost of high-end mobile phones can be a limiting factor to deploying the solution.