惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
D
Docker
Blog — PlanetScale
Blog — PlanetScale
罗磊的独立博客
美团技术团队
V
V2EX
Last Week in AI
Last Week in AI
D
DataBreaches.Net
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Microsoft Security Blog
Microsoft Security Blog
Microsoft Azure Blog
Microsoft Azure Blog
人人都是产品经理
人人都是产品经理
M
MIT News - Artificial intelligence
P
Proofpoint News Feed
B
Blog RSS Feed
博客园_首页
B
Blog
博客园 - 叶小钗
I
InfoQ
WordPress大学
WordPress大学
L
LangChain Blog
Apple Machine Learning Research
Apple Machine Learning Research
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
A
About on SuperTechFans
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
Latest news
Latest news
W
WeLiveSecurity
T
The Exploit Database - CXSecurity.com
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
aimingoo的专栏
aimingoo的专栏
小众软件
小众软件
Cyberwarzone
Cyberwarzone
Scott Helme
Scott Helme
D
Darknet – Hacking Tools, Hacker News & Cyber Security
C
CERT Recently Published Vulnerability Notes
C
CXSECURITY Database RSS Feed - CXSecurity.com
Recent Commits to openclaw:main
Recent Commits to openclaw:main
N
News and Events Feed by Topic
S
Secure Thoughts
The Hacker News
The Hacker News
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Google DeepMind News
Google DeepMind News

博客园 - 佛西亚

https访问 asp.net Core上传文件 .NET Framework 4.0 DLL注册GAC 用SQL Server Profiler跟踪AX执行的SQL语句 D365升级包 Ant Design Pro V5 + Django Restful Framework Token认证前台实现 Ant Design Pro V5 + Django Restful Framework Token认证后台实现(二) Ant Design Pro V5 + Django Restful Framework Token认证后台实现(一) D365 FO产生随机字符串 D365 FO Array增加排序 D365 FO无法命中断点 Ant Design Pro V5 开发时使用后台服务数据 JavaScript跨域访问 同步数据库报错 DataEntity增加关联DataSource Java通过代理上传文件到Azure blob 使用iText7操作PDF D365 FO Json序列化和反序列化 D365 FO操作Azure Blob
使用iText 7读取PDF文件中的文本和图片
佛西亚 · 2021-09-04 · via 博客园 - 佛西亚

读取文本

using (PdfReader reader = new PdfReader(fileName))
            {
                using (PdfDocument pdfDocument = new PdfDocument(reader))
                {
                    for (int i = 1; i <= pdfDocument.GetNumberOfPages(); i++)
                    {
                        string pdfContentString = PdfTextExtractor.GetTextFromPage(pdfDocument.GetPage(i));
                        MessageBox.Show(pdfContentString);
                    }
                }
            }

读取图片

using (PdfReader reader = new PdfReader(fileName))
            {
                using (PdfDocument pdfDocument = new PdfDocument(reader))
                {                    
                    IEventListener strategy = new ImageRenderListener(imageFileName);
                    PdfCanvasProcessor parser = new PdfCanvasProcessor(strategy);
                    for (var i = 1; i <= pdfDocument.GetNumberOfPages(); i++)
                    {
                        parser.ProcessPageContent(pdfDocument.GetPage(i));
                    }
                }
            }


public class ImageRenderListener : IEventListener
    {
        string format;
        int index = 0;
        public ImageRenderListener(string format)
        {
            this.format = format;
        }

        public void EventOccurred(IEventData data, EventType type)
        {
            if (data is ImageRenderInfo imageData)
            {
                try
                {
                    PdfImageXObject imageObject = imageData.GetImage();
                    if (imageObject != null)
                    {
                        File.WriteAllBytes(string.Format(format, index++, imageObject.IdentifyImageFileExtension()), imageObject.GetImageBytes());
                    }
                }
                catch
                {                    
                }
            }
        }

        public ICollection<EventType> GetSupportedEvents()
        {
            return null;
        }
    }