InertiaRSS Track and read blogs, news, and tech you care about
Read Original Open in InertiaRSS

Recommended Feeds

freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
量子位
Hugging Face - Blog
Hugging Face - Blog
M
MIT News - Artificial intelligence
GbyAI
GbyAI
Last Week in AI
Last Week in AI
WordPress大学
WordPress大学
云风的 BLOG
云风的 BLOG
阮一峰的网络日志
阮一峰的网络日志
宝玉的分享
宝玉的分享
V
Visual Studio Blog
博客园 - 【当耐特】
罗磊的独立博客
L
LangChain Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
小众软件
小众软件
Y
Y Combinator Blog
Jina AI
Jina AI
有赞技术团队
有赞技术团队

掘金

Win 安装Claude Code FastAPI 的 CORSMiddleware 跨域中间件 Java 自研 ReAct Agent 半年后,我用 LangGraph 验证了这些设计取舍 🚀AI编程工作流终极形态:GitNexus!零Token消耗实现代码知识图谱化!让Claude Code和Codex拥有上帝视角彻底告别盲目改代码,复杂项目重 LeetCode 72. 编辑距离:动态规划经典题解 被The Graph的GraphQL查询坑了三天,我用一个真实DeFi项目把链上数据索引彻底搞懂了 (AI) 编写简单 AI 助手 (ds-agent) 别再让 pnpm 跟着 nvm 跑了!独立安装终极指南 Claude Code 为什么这么顺?Anthropic 最新复盘:真正撑住它的不是模型,而是缓存 从 /simplify 指令深挖 Claude Code 多 Agent 协同机制 Function-Calling与工具使用 新手上路(六):Claude code装上ECC全家桶:38 个子代理、156 个技能、生产级 Hooks 与 Rules 体系 我在 Claude、Kimi、opencode 三个 AI 之间搭了一条自动协作管道 【技能篇】OpenClaw Skill 详解:给 AI 装上"专业外挂" wagmi v2 多链钱包切换:一个 Uniswap 仿盘项目让我踩了三天坑 两周浅学 RAG 我把 Python re 模块比喻成摸金手套 新手上路(三):Claude Code Skills 装了一堆没用?20+ 个 Skill 横向对比 + 三套组合方案,按需抄 K2.6、DeepSeek V4、GPT-5.5 都来了,组合拳打起来 Claude Code 进阶之路:从记忆系统到子代理编排 [java] 编译之后的记录类(Record Classes)长什么样子(上) 国产大模型能力大比拼,社区有话说 我研读了 500 个 Spring Boot 生产级代码库,90% 都犯了这 7 个致命错误 JAVA重点难点 转发-中央网信办部署开展“清朗·整治AI应用乱象”专项行动 合同同步逻辑 【合并已排序数组的三种实现策略,哪一种更可取?】 30天减20斤挑战:少一斤发100红包(2) 我竟然被JavaScript的隐式类型转换坑了三天! 二十五.Electron 初体验与进阶 本地到生产,解决 AI 全栈最后一公里——构建&部署&运维 程序员创业半年:顺的事、不顺的事,和我一直没想清楚的事 UI组件库elementplus 像使用 Redis 一样操作 LocalStorage 向量检索的流程是怎样的?Embedding 和 Rerank 各自的作用? LangChain DeepAgents 速通指南(七)—— DeepAgents使用Agent Skill 为什么越来越多的大厂抛弃MCP,转向CLI? 【节点】[SquareRoot节点]原理解析与实际应用 juejin.cn juejin.cn 从 “存得下” 到 “算得快”:工业物联网需要新一代时序数据平台越来越多工业用户开始意识到一个问题:**数据是存下来了, - 掘金 放弃 Claude 订阅?我用 8 年前的服务器,强跑 Google 最强开源模型 Gemma 4 真实测评! Python开发者狂喜!200+课时FastAPI全栈实战合集,10大模块持续更新中🔥 从 Claw-Code 看 AI 驱动的大型项目开发:2 人 + 10 个自治 Agent 如何产出 48K 行 Rust 代码 秒级创建实例,火山引擎 Milvus Serverless 让 AI Agent 开发更快更省火山引擎MilvusSer MediaPlayer 播放器架构:NuPlayer 的 Source/Decoder/Renderer 三驾马车 juejin.cn juejin.cn juejin.cn juejin.cn
One Open Source Project a Day (No. 103): Open-Generative-AI - An Open Source AI Video and Image Creation Hub
冬奇Lab · 2026-05-17 · via 掘金

Introduction

"Creative freedom belongs to everyone, unfiltered and unconstrained."

This is the 103rd article in the "One Open Source Project a Day" series. Today, the project we are introducing is Open-Generative-AI.

In the field of AI video and image generation, although powerful platforms such as Kling, Sora, and Midjourney have emerged, the closed-source ecosystem, subscription fees, and strict content review (Guardrails) limit the creativity of many creators. Open-Generative-AI, as an open-source alternative to these platforms, integrates more than 200 advanced models to provide users with an unfiltered, customizable, and self-hostable creative environment.

What You Will Learn

  • Core Concepts: How to build a unified multi-model AI creation center.
  • Main Features: Covers full capabilities including text-to-image, image-to-image, text-to-video, image-to-video, audio-driven lip sync, etc.
  • Technical Highlights: Supports local inference on Electron desktop (sd.cpp and Wan2GP) and remote GPU offloading.
  • Application Scenarios: From personal artistic creation to automated media pipeline construction.
  • Competitive Advantages: No content filtering, zero subscription fees, fully private deployment.

Prerequisites

  • : Basic understanding of generative AI (Diffusion Models, Video Generation).
  • : Familiarity with JavaScript/TypeScript development environment.
  • Basic Docker/Node.js deployment knowledge.

Project Background

Project Introduction

Open-Generative-AI is a free and open-source AI studio for images, videos, movies, and lip-syncing. Its core value lies in the concept of "Infinite Budget" film workflow, allowing creators to break free from expensive subscription services and leverage top models such as Flux, Kling, Wan 2.2 on local or self-hosted servers. It not only provides a web interface but also has a powerful desktop client, and can even serve as a backend skill library for AI coding agents (such as Claude Code).

Author/Team Introduction

  • Author: Anil-matcha
  • Background: An active open-source developer focused on AI toolchains and media processing.
  • Project creation time: 2024 (continuously updated at a high speed).

Project Data

  • ⭐ GitHub Stars: 14.5k+
  • 🍴 Forks: 2.5k+
  • 📦 Version: v1.0.9 (Latest)
  • 📄 License: MIT
  • 🌐 Website: muapi.ai/open-genera…

Key Features

Core Purpose

Open-Generative-AI provides a highly integrated UI interface that allows users to invoke various AI generative models through simple configurations (such as API Key or local model path), enabling the complete workflow from creative conception to final rendering.

Use Cases

  1. Short Video / Film Creation
    • Use Cinema Studio's professional camera controls (focal length, aperture) to generate high-quality shots.
  2. Podcast/Marketing Video Production
    • Use Lip Sync Studio to make static portraits speak according to audio, creating talking-head videos.
  3. Private/Unfiltered Creation
    • Eliminate the security concerns of commercial platforms and run unfiltered models on local machines.
  4. Automated AI Media Pipeline
    • By integrating a skill library, let AI agents automatically execute the task of "prompt generation -> generation -> editing -> stitching".

Quick Start

There are two ways to quickly experience it:

1. Online browser usage Visit muapi.ai to directly experience the four studio modes.

2. Local deployment (source code installation)

# 克隆仓库
git clone https://github.com/Anil-matcha/Open-Generative-AI.git
cd Open-Generative-AI

# 安装依赖
pnpm install

# 启动开发服务器
pnpm dev

# 构建桌面端 (Electron)
npm run electron:build

Core Features

  1. Image Studio
    • Supports 50+ text-to-image models and 55+ image-to-image models.
  2. Video Studio
    • Covers 40+ text-to-video models and 60+ image-to-video models, with intelligent mode switching.
  3. Lip Sync Studio
    • 9 dedicated models, supporting lip-sync video generation from portrait images or existing videos.
  4. Cinema Studio
    • An interface designed for cinematic-quality visuals, with professional camera controls.
  5. Local Inference
    • Built-in sd.cpp supports Apple Silicon (Metal) and CUDA/ROCm; supports Wan2GP remote GPU servers.
  6. Multi-Image Input
    • Allows uploading up to 14 reference images to specific editing models.
  7. Workflow Studio
    • A node-based editor for visually building and running multi-step AI pipelines.

Project Advantages

Comparison ItemsOpen-Generative-AICommercial AI platforms (Sora/Midjourney)Similar traditional open-source UI (Automatic1111)
Number of models200+ (cross-vendor integration)Single-vendor models onlyMainly Stable Diffusion
Content filteringNone (user-controlled)Extremely strictNone
Deployment methodWeb/Desktop/Self-hostedCloud onlyComplex local installation
Integration CapabilityExtremely Strong (API + SDK + CLI)ClosedPlugin-Driven

Project Deep Dive

Architecture Design: Two Local Inference Engines

The flexibility of the Open-Generative-AI desktop client lies in how it handles local compute power.

1. Built-in sd.cpp (Bundled)

This is a C++ engine based on stable-diffusion.cpp, directly packaged within the application.

  • Advantage: Out-of-the-box usage, with Metal acceleration support for Mac M-series chips. It supports not only SD 1.5/SDXL but also new models like Z-Image.
  • Technical Details: By calling the sd-cli driver, it does not rely on a complex Python environment.

2. Wan2GP (Remote Engine)

For models like Wan 2.2, Hunyuan Video, etc., which require high-performance NVIDIA GPUs, since these runtimes are typically based on CUDA, they cannot run with high performance directly on Mac.

  • Solution: Users can run the Wan2GP server on a Linux machine with a GPU, and Open-Generative-AI acts as a client connected via URL.
  • Significance: It enables cross-platform computing power scheduling, allowing Mac users to also leverage top-tier video models.

Key implementation: Intelligent workflow switching.

The project has undergone significant UI/UX optimization. When a user enters Image or Video Studio, the system monitors in real time whether a reference image has been uploaded.

  • If none is uploaded, the model list automatically switches to the Text-to-Image/Video model collection.
  • Once the user uploads an image, the list immediately switches to the Image-to-Image/Video models (e.g., Kling i2v, LTX Video i2v).

This state-based intelligent routing greatly reduces the complexity of user operations.


Project Address & Resources

Official Resources

Related Resources

Target Audience

  • Digital Artists & Film Creators: Looking for low-cost, unrestricted creative tools.
  • AI Developers: Engineers who want to quickly integrate multi-model capabilities.
  • Open Source Enthusiasts: Prefer private deployment and self-hosted applications.

Welcome to myPersonal HomepageFind more useful knowledge and interesting products