惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

人人都是产品经理
人人都是产品经理
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
P
Proofpoint News Feed
T
Tailwind CSS Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
G
GRAHAM CLULEY
Engineering at Meta
Engineering at Meta
Blog — PlanetScale
Blog — PlanetScale
量子位
GbyAI
GbyAI
C
Cybersecurity and Infrastructure Security Agency CISA
Know Your Adversary
Know Your Adversary
阮一峰的网络日志
阮一峰的网络日志
P
Privacy International News Feed
T
Tenable Blog
Cisco Talos Blog
Cisco Talos Blog
P
Privacy & Cybersecurity Law Blog
T
Tor Project blog
L
Lohrmann on Cybersecurity
S
Secure Thoughts
Y
Y Combinator Blog
S
Securelist
H
Hackread – Cybersecurity News, Data Breaches, AI and More
有赞技术团队
有赞技术团队
月光博客
月光博客
Cyberwarzone
Cyberwarzone
H
Heimdal Security Blog
博客园 - 聂微东
Latest news
Latest news
The Hacker News
The Hacker News
小众软件
小众软件
T
Troy Hunt's Blog
Google Online Security Blog
Google Online Security Blog
D
DataBreaches.Net
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Martin Fowler
Martin Fowler
罗磊的独立博客
www.infosecurity-magazine.com
www.infosecurity-magazine.com
U
Unit 42
Vercel News
Vercel News
T
The Blog of Author Tim Ferriss
F
Fortinet All Blogs
SecWiki News
SecWiki News
MongoDB | Blog
MongoDB | Blog
C
Check Point Blog
aimingoo的专栏
aimingoo的专栏
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Stack Overflow Blog
Stack Overflow Blog
WordPress大学
WordPress大学

Towards AI

The Verified Identity Agent Bridge | Towards AI You Can’t Prompt Your Away Your LLM Problems | Towards AI The Free Agent Trap | Towards AI Your Agentic Loop Will Drift. Here Is the KL Divergence Equation That Measures How Far It Has Wandered From Its Original Instruction. | Towards AI Building AI Agents in Rust — part 3 | Towards AI Self-Hosting Airflow at Home: Automating Stock Price Data Collection | Towards AI The 76-Hour Frontier: How the Takedown of Claude Fable 5 Birthed the Military-Industrial-AI Complex | Towards AI I Trained a Markdown File to Boost GPT-5.5 by 23 Points — It Shouldn't Work | Towards AI We Replaced ChatGPT With a Local AI Server. Six Months of Honest Data. | Towards AI What Really Makes Cars Pollute? A Data Science Deep Dive into CO₂ Emissions | Towards AI Training GPT-2 From Scratch on a GTX1050 | Towards AI Principal Component Analysis (PCA): Theory, Mathematics, and Applications Build a Zero-Cost Web Automation Pipeline With OpenRouter, OpenClaw, and MediaUse I Gave Qwen3.7-Plus a Screenshot and It Found the Exact Pixel to Click for $0.40 Beyond the Prompt: Why Autonomous AI Agents Are Replacing the Chatbot Moonshot Cracked Claude Code’s Playbook with an MIT Terminal Agent and a $0.60 Model Connections, Roles, and Warehouses: Getting CoCo Desktop Production-Ready from Day One My First $5,000 Month Writing About AI Engineering on Medium Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good LangChain Explained: Understanding Models, Prompts, Chains, Memory, Indexes, and Agents TOON: Beyond JSON for LLMs Claude Code Casual, Pro, Elite: The Three Working Personas of Claude Code Mastery MiniMax M3 Decodes 1M Tokens 15x Faster — and It Shouldn’t Be This Cheap Using Amazon SQS for AI Agent Orchestration I Ran a 1.5B-Active Model on My Laptop That Embarrassed a 26B by 46 Points How to Build a Self-Improving Company with AI Part 3 — Implementation/Engine-Level: Choosing the Runtime That Gives You These for Free Part 2 — Serve-Level Speed: System Design That Stabilizes P95/P99 3-Part Series: LLM Latency in Production (Part 1) Claude Code: The AI Coding Partner Changing How Developers Build Software Claude Code Pitfalls: Claude Code Won’t Do What You Told It: A Troubleshooting Catalog Full-Stack Data Scientists for the Agentic Coding World Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier How One Spring Boot Optimization Saved Our Startup $30,000 a Year Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works What Is a Reverse Proxy? (And Why Every Backend Developer Should Care) What Claude Opus 4.8 Actually Changes If You’re Building Agents QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing When LLMs Meet Knowledge Graphs on the Battlefield Fine-Tuning is Dead: Why Context Orchestration Won in 2026 5 Things Broke When I Shipped a RAG + MCP Agent to Production. Google Co-Scientist: Hyper Scaling Research and Discovery Microsoft Just Embarrassed Browser Web Agents — 1,000 Lines Made GPT-5.4 Beat Opus 4.6 on 200 Web Tasks The Modern Data Stack Is Broken — Here’s How to Fix It With AI, Governance, and Real Architecture Building Production MCP Servers: What the Spec Won’t Tell You When Should an Agent Stop? The Anatomy of Termination Harness Engineering: The Layer That Matters More Than the Model AI Engineers Who Can’t Debug Are Getting Fired (Here’s How I Debug with Claude Code) Claude Code Memory: Why You Keep Explaining the Same Thing to Claude (and the Five Layers That Fix It) Claude Code Subagents: The Claude Code Feature You Skip Every Day (And Why It Quietly Wrecks Your Sessions) Agentic AI and the SMB Banking Advantage Claude Code: Spec-Driven Development — Why Your AI Coding Sessions Fall Apart at Hour Three The Real Cost of Agentic AI Nobody Budgets For SVM : 40 must visit Interview Questions (Part 2) Your AI Agent Works Perfectly in the Demo. Here Are the 6 Ways It Dies in Production. Unleashing the Power of ONNX for Speedier SBERT Inference Terraform vs CI/CD for Serverless Deployments Merve Noyan Stopped Writing Training Scripts — Her Agent Just Fine-Tuned 18 Models Solo for $11.40 Why Your Sales Forecast Is Always 20% Wrong (And How To Make It 12% Wrong) Genetic Cubic n{C/A} Ratios For Elementary Robotics Design Top 20 AdaBoost Interview Questions & Answers (Part 2 of 2) Agentic AI Vs AI Agents — What Are the Key Differences? LAI #127: The Infrastructure Layer of AI Is Becoming the Product Anthropic Caught Its Own AI Planning to Blackmail Engineers RNNs Cannot Think What Transformers Think Cheaply. ICLR 2026 Proved the Gap Is Exponential. Time Series Made So Easy My Aunt Got It on the Second Read Claude Cowork 101 | Towards AI Is 3-Bit KV Cache the Holy Grail? A Reality Check on Google’s TurboQuant LangGraph Multi-Agent Architecture: Building a Self-Critiquing AI Debate System AutoML on Autopilot | Towards AI I Ran This Open-Source AI Tool on a Messy Codebase and Got 71x Fewer Tokens — Here Is Exactly What Happened Month in 4 Papers (April 2026) AI Kept Forgetting My Notes. Fixing That Taught Me How It Actually Works. How ChatGPT Makes You Addicted Crack ML Interviews with Confidence: K-Nearest Neighbors (KNN 20 Q&A) The Event-Driven Blueprint: How I Scaled a Spring Boot System to 10 Million Kafka Messages/Day Building Vector Search? Why FAISS Alone Isn’t Enough TAI #202: GPT-5.5 Moves Codex Into Real Work Machine Learning System Design -The Model Serving Triangle, With One Forward Pass Flowing Through Every Trade-off (Part3) AI Orchestration in Action: How MuleSoft and LLMs Fuel the Future of Enterprise AI GPT-4 Has 1.8 Trillion Parameters. It Uses 2% of Them Per Token. Part 20: Data Manipulation in Multi-Dimensional Aggregation A Fundamental Introduction to Genetic Algorithm -Part Two TAI #200: Anthropic’s Mythos Capability Step Change and Gated Release From Notebook to Production: Running ML in the Real World (Part 4) Sqribble’s Template‑Driven Document Automation Anthropic Just Shipped the Layer That’s Already Going to Zero Long-Term vs Short-Term Memory for AI Agents: A Practical Guide Without the Hype The L1 Loss Gradient, Explained From Scratch Your Postcode Is Deciding Your Care. I Built a Pipeline to Prove It. I Directed AI Agents to Build a Tool That Stress-Tests Incentive Designs. Here’s What It Found. Your System Prompt Is the Product — Not the Feature The LLM Wiki Trend Has a Retention Problem Nobody Mentions Top 20 Data Preparation Interview Questions and Answers (Part 2 of 2) LAI #122: Word Embeddings Started in 1948, Not With Word2Vec Top 15 Computer Vision Datasets [2026] 40 Generative AI Interview Questions That Actually Get Asked in 2026 (With Answers)
Beyond Chat: Processing Images, PDFs, and Documents with the OpenAI Adapter in Oracle Integration Cloud | Towards AI
Sarfaraz Merchant · 2026-06-18 · via Towards AI

Originally published on Towards AI.

Exploring File Uploads, Image Processing, and Document Extraction using the Native OpenAI Adapter in OIC.

Beyond Chat: Processing Images, PDFs, and Documents with the OpenAI Adapter in Oracle Integration Cloud

When Oracle introduced the OpenAI Adapter in Oracle Integration Cloud (OIC), most of the examples I came across focused on text generation and chatbot-style interactions.

Naturally, I assumed that if I wanted to process files such as images, PDFs, or Word documents, I would probably need to build custom REST integrations and call the OpenAI APIs directly.

While exploring the adapter, I noticed a few operations related to file management. That got me curious.

Can the OpenAI Adapter upload files?

Can those files be processed later using a File ID?

Can we extract structured data from invoices, images, or PDF documents without building custom REST calls?

To answer these questions, I decided to build a small proof of concept.

The results were quite interesting.

Using only the native OpenAI Adapter, I was able to:

  • Upload files to OpenAI
  • Retrieve and reuse File IDs
  • Process images and PDF documents
  • Extract structured JSON responses
  • Analyze multilingual content, including Arabic and Persian documents

In this article, I’ll walk through the approach, the architecture, and a few practical use cases that I tested while building the POC.

By the end, you’ll have a reusable pattern that can be applied to invoice extraction, document processing, content analysis, and many other AI-powered integration scenarios.

OpenAI Adapter provides native operations for uploading and processing files.

Understanding the Architecture

While building the POC, I decided to separate the solution into two integrations.

The first integration is responsible for uploading a file to OpenAI and obtaining a File ID.

The second integration uses that File ID together with a prompt to process the document and return a structured response.

I found this approach cleaner because the uploaded file becomes a reusable asset. The same file can be analyzed multiple times using different prompts without uploading it again.

Integration 1 — Upload File

The first integration accepts a file and uploads it using the OpenAI Adapter’s Upload File operation.

Supported examples include:

  • Images (JPG, PNG)
  • PDF documents
  • Microsoft Word documents

The adapter returns a unique File ID similar to:

file-xxxxxxxxxxxxxxxx

This File ID can then be stored, logged, or passed to another integration for processing.

Integration 2 — Process File

The second integration receives:

  • A File ID
  • Processing instructions

It then invokes the OpenAI Adapter using the Responses operation.

Depending on the type of document and the prompt provided, OpenAI can:

  • Extract invoice data
  • Analyze documents
  • Summarize content
  • Return structured JSON
  • Process multilingual text

The response can be consumed directly within OIC and mapped to downstream systems.

Solution Architecture.

High-Level Flow

The overall process looks like this:

Document
(Image/PDF/DOCX)
|
v
+------------------+
| Upload File |
| OpenAI Adapter |
+------------------+
|
v
File ID
|
v
+------------------+
| Responses API |
| OpenAI Adapter |
+------------------+
|
v
Structured Output
(JSON)

One thing I particularly liked about this pattern is that it relies entirely on the native OpenAI Adapter. There are no custom REST calls, no manual API payload construction, and no need to manage Base64-encoded content.

Upload FIle to OpenAI Account.
Process OpenAI Files.

Building Integration 1 — Uploading Files to OpenAI

The first integration is responsible for uploading a document to OpenAI and obtaining a File ID that can be reused later.

In my case, I tested the following file types:

  • JPG images
  • PNG images
  • PDF documents
  • Microsoft Word documents

Step 1 — Create the OpenAI Connection

Start by creating a connection using the OpenAI Adapter.

Provide:

  • OpenAI API Key
  • Connection Name
  • Security Configuration

Once the connection is successfully tested, it can be used within your integration.

OpenAI Connection.

Step 2 — Create the Upload File Integration

Create an App-Driven Orchestration.

2.1 Add Trigger Rest Adapter to Accept File as input with:

– Select the multipart attachment processing options

  • Request is multipart with payload
  • Multipart request is of type multipart/form-data with HTML form payload

2.2 Add the OpenAI Adapter and select the following operation:

Upload File

This operation allows OIC to upload a file directly to OpenAI without invoking the REST API manually.

Step 3 — Configure the Request Mapping

The Upload File operation expects two important pieces of information:

Purpose

The purpose determines how the uploaded file will be used.

For image processing:

vision

For document processing (PDF, DOCX, TXT, etc.):

user_data

File Content

Map the incoming attachment or file reference to the adapter’s File element.

Example mapping:

RequestWrapper
├── Purpose
└── File
└── streamReference

The adapter handles the upload process and stores the file within OpenAI.

Upload File Mapper.

Step 4 — Execute the Integration

After activation, invoke the integration with a sample document.

If the upload succeeds, the response contains a File ID.

Example:

file-3UoRxBRyqV7pJRgPC4WSgi

This File ID becomes the bridge between the upload step and the document processing step.

Uploaded File to OpenAI Accuont.
Storage FIles in OpenAI Account.

Lesson Learned: File Names Matter

While testing the Upload File operation, I ran into an issue that took a little time to troubleshoot.

Some uploads failed when the file name contained spaces or special characters.

For example:

WhatsApp Image 2026-06-05 at 7.25.23 PM.jpeg
Error if the File Name contains space.

After renaming the file to a simpler format without spaces or special characters, the upload completed successfully.

Example:

invoice_20260605.jpeg

If you encounter unexpected upload failures, one of the first things to check is the file name.

Although this may vary depending on the adapter version and environment, using clean file names is a good practice and helped avoid issues during testing.

Why the File ID Matters

Initially, I assumed I would need to send Base64 content to OpenAI every time I wanted to analyze a document.

The File ID approach is much cleaner.

Once the document is uploaded:

  • The file can be reused
  • Multiple prompts can be executed against the same file
  • There is no need to repeatedly transfer the document

This makes the pattern both efficient and easy to maintain.

At this stage, we have successfully uploaded a document and obtained a File ID.

The next step is where things become interesting: using that File ID to extract meaningful information from the document.

“This was the point where I realized the adapter was capable of much more than chat-based interactions. The File ID effectively turns the uploaded document into a reusable asset that can be processed multiple times.”

Building Integration 2 — Processing Files Using the Responses API

Now that the document has been uploaded and we have a File ID, the next step is to ask OpenAI to process the file.

This is where the Responses operation comes into play.

The Responses API allows us to provide:

  • Instructions (prompt)
  • File reference (File ID)
  • Expected output format

and receive a structured response.

Step 1 — Create the Processing Integration

Create a second integration.

This integration accepts:

FileId — Contains the OpenAI File ID returned by the Upload File integration.

FileType — Determines the type of file being processed. For images, I passed input_image, while for PDFs and Word documents, I passed input_file.

instruction — Contains the prompt or instructions that should be sent to the model.

Add the OpenAI Adapter and select:

Responses
OpenAI Responses.

Step 2 — Build the Request

The request consists of two content elements:

Content 1 — Instructions

The first content item contains the instructions we want the model to follow.

Example:

Extract invoice information from the attached document.
Return valid JSON only.
Include:
- Invoice Number
- Date
- Items
- Tax
- Total Amount

In the mapper:

Type = input_text
Text = <your prompt>

Content 2 — File Reference

The second content item references the uploaded file.

For image files:

Type = input_image
File Id = file-xxxxxxxx

For PDF and Word documents:

Type = input_file
File Id = file-xxxxxxxx

The File ID comes from the Upload File integration we created earlier.

At runtime, OpenAI retrieves the uploaded file and processes it together with the prompt.

Responses Mapping.

Inside the Responses request, I configured the following mappings:

  • Modelgpt-4.1
  • Roleuser
  • Content 1 Typeinput_text
  • Content 1 Text → Query Parameter instruction
  • Content 2 Type → Query Parameter FileType
  • Content 2 File ID → Query Parameter FileId

This design allowed me to use the same integration for multiple scenarios simply by changing the values passed at runtime.

For example, when processing a restaurant invoice image, I passed:

  • FileType = input_image
  • FileId = OpenAI File ID
  • instruction = “Extract invoice details and return valid JSON”

For a PDF document, I only changed the FileType and instruction:

  • FileType = input_file
  • FileId = OpenAI File ID
  • instruction = “Extract page-wise content and return valid JSON”

The integration logic remained exactly the same.

Step 3 — Configure the Model

For my testing, I used GPT-4.1.

Recommended settings:

Temperature = 0

When extracting structured information, deterministic responses are generally easier to consume downstream.

Step 4 — Execute the Integration

After activation, invoke the integration using:

  • File ID
  • Processing prompt
  • File Type

The model analyzes the file and returns a response.

A typical response might look like:

{
"Result" : "```json\n{\n \"Invoice Number\": \"123982857\",\n \"Date\": \"23/04/2022\",\n \"Items\": [\n {\"Item Name\": \"شاورما دجاج عربية عصملية\", \"Qty\": 1, \"Total\": 8.000},\n {\"Item Name\": \"حمص\", \"Qty\": 2, \"Total\": 4.000},\n {\"Item Name\": \"فتوش\", \"Qty\": 1, \"Total\": 3.000},\n {\"Item Name\": \"تبولة\", \"Qty\": 1, \"Total\": 2.500},\n {\"Item Name\": \"بطاطا مقلية\", \"Qty\": 1, \"Total\": 2.000},\n {\"Item Name\": \"خبز صاج\", \"Qty\": 2, \"Total\": 1.000},\n {\"Item Name\": \"شاورما لحم عربية\", \"Qty\": 1, \"Total\": 8.000},\n {\"Item Name\": \"كولا زجاج\", \"Qty\": 2, \"Total\": 4.000},\n {\"Item Name\": \"لبن\", \"Qty\": 2, \"Total\": 4.000},\n {\"Item Name\": \"مية 250 مل\", \"Qty\": 2, \"Total\": 2.000}\n ],\n \"Tax\": 2.420,\n \"Total Amount\": 46.570\n}\n```"
}

What impressed me most was that the model not only extracted the invoice totals, but also correctly identified Arabic item names, quantities, and line totals, and returned them in a clean JSON structure.

Within OIC, this response can be parsed and then can be mapped directly to downstream applications or business processes.

Processed Image Data.

Use Case 1 — Restaurant Invoice Extraction

To validate the solution, I started with a restaurant invoice written in Arabic.

The objective was simple:

  • Upload the invoice image
  • Process it using the OpenAI Adapter
  • Extract business data in a structured format

Rather than returning a textual description of the invoice, the model was instructed to generate JSON containing:

  • Invoice Number
  • Invoice Date
  • Item Details
  • Tax Amount
  • Total Amount

The result was surprisingly accurate.

The model successfully identified:

  • Arabic item names
  • Item quantities
  • Line totals
  • Tax information
  • Grand total

This demonstrates how the OpenAI Adapter can be used as a lightweight document extraction service directly within Oracle Integration Cloud.

Potential use cases include:

  • Supplier invoice processing
  • Expense receipts
  • Restaurant bills
  • Retail receipts
  • Purchase orders

For organizations already using OIC, this pattern can significantly reduce manual data entry and simplify document-driven workflows.

Use Case 2 — Processing Persian and Arabic Documents

After validating the invoice extraction scenario, I wanted to see how the adapter would perform with larger documents.

For this test, I used Persian and Arabic PDF documents containing multiple pages of text.

The goal was different from the invoice use case.

Instead of extracting business fields, I wanted to:

  • Preserve the original language
  • Maintain page boundaries
  • Separate paragraphs
  • Return a structured response suitable for further processing

The prompt instructed the model to return page-wise JSON.

A simplified response structure looked like this:

{
"pages": [
{
"page_number": 1,
"paragraphs": [
"...",
"..."
]
}
]
}

The results were encouraging.

The model was able to:

  • Read multilingual content
  • Separate pages
  • Preserve paragraph structure
  • Return machine-readable JSON

This opens up interesting possibilities for:

  • Digitizing legacy documents
  • Knowledge extraction
  • Content indexing
  • Search and retrieval solutions
  • Document migration projects

One important observation is that document understanding and transcription accuracy are two different goals.

For scenarios where verbatim OCR accuracy is critical, Oracle Document Understanding may still be a valuable companion service. However, for document analysis and content structuring, the OpenAI Adapter performed remarkably well.

Lessons Learned

  • Use clean file names — avoid spaces and special characters when uploading files.
  • Upload once, reuse many times — leverage the returned File ID instead of sending file content repeatedly.
  • Parameterize your integration — File ID, File Type, and Instructions make the solution highly reusable.
  • Prompt engineering matters — clear instructions significantly improve extraction quality and consistency.
  • Expect an additional parsing step — JSON responses are returned inside the Result field.
  • One integration can support multiple document types — images, PDFs, and Word documents can be processed using the same pattern.
  • The OpenAI Adapter is more than chat — it can extract, understand, and structure information from documents and images.

Source Code

The complete Oracle Integration Cloud package (.car), sample prompts, and supporting assets used in this article are available on GitHub.

GitHub Repository:

https://github.com/sarfarazmerchant/oic-openai-adapter-file-processing.git

Feel free to download the package, import it into your OIC environment, and adapt it to your own document-processing use cases.

Final Thoughts

This proof of concept demonstrated that the OpenAI Adapter in Oracle Integration Cloud can do much more than chat interactions. By combining the Upload File and Responses operations, I was able to process images, PDFs, and Word documents, and transform them into structured JSON ready for downstream integrations.

The pattern is simple:

Upload File

Get File ID

Process with Responses API

Receive Structured Data

For me, this was a practical introduction to AI-powered document processing in OIC, and I hope it helps other integration developers explore similar use cases.

Have you explored the OpenAI Adapter in OIC? I’d love to hear about your use cases and experiences.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.