惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
T
Threatpost
Latest news
Latest news
N
News | PayPal Newsroom
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Help Net Security
Help Net Security
D
Darknet – Hacking Tools, Hacker News & Cyber Security
AI
AI
Simon Willison's Weblog
Simon Willison's Weblog
TaoSecurity Blog
TaoSecurity Blog
The Last Watchdog
The Last Watchdog
L
LINUX DO - 热门话题
Google DeepMind News
Google DeepMind News
T
Threat Research - Cisco Blogs
O
OpenAI News
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
The Exploit Database - CXSecurity.com
NISL@THU
NISL@THU
Application and Cybersecurity Blog
Application and Cybersecurity Blog
S
Securelist
小众软件
小众软件
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Martin Fowler
Martin Fowler
S
SegmentFault 最新的问题
Cisco Talos Blog
Cisco Talos Blog
云风的 BLOG
云风的 BLOG
AWS News Blog
AWS News Blog
GbyAI
GbyAI
N
News and Events Feed by Topic
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
美团技术团队
Engineering at Meta
Engineering at Meta
A
About on SuperTechFans
博客园 - 三生石上(FineUI控件)
S
Schneier on Security
博客园 - 聂微东
V2EX - 技术
V2EX - 技术
T
Troy Hunt's Blog
SecWiki News
SecWiki News
S
Secure Thoughts
B
Blog RSS Feed
Hugging Face - Blog
Hugging Face - Blog
WordPress大学
WordPress大学
腾讯CDC
H
Heimdal Security Blog
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
www.infosecurity-magazine.com
www.infosecurity-magazine.com
P
Privacy International News Feed

博客园 - 菜鸟吴迪

Lucene搜索结果排序问题(按时间倒序排的替代解决方法) 利用Lucene.net搜索引擎进行多条件搜索的做法 (转) (转)Razor引擎学习:RenderBody,RenderPage和RenderSection 读取txt文本,批量插入到数据库中! ASP.NET MVC3 返回多个实体或泛型 MySQL数据库集群Master-Slave模式安装摘要 Jquery each - 菜鸟吴迪 - 博客园 堆?栈?(转) 数据库并发控制!!! Velocity脚本简明教程 Visual Studio的调试技巧 《C#与.NET3.5高级程序设计(第4版)》笔记10 sql分页问题,老话题!! sql查询连续三天之内都有记录的人员!! Asp.net Mvc Controller接收可控的数组或字典类型的实现方法: SELECT INTO 和 INSERT INTO SELECT 两种表复制语句 (转) httpContext里面的东西!!! c#闭包!! c#新特性!!!
使用HtmlAgilityPack批量抓取网页数据
菜鸟吴迪 · 2011-01-07 · via 博客园 - 菜鸟吴迪

转:http://www.cnblogs.com/chuncn/archive/2009/09/07/1561564.html

登录的处理。因为有些网页数据需要登陆后才能提取。这里要使用ieHTTPHeaders来提取登录时的提交信息。

抓取网页

 HtmlAgilityPack.HtmlDocument htmlDoc;

            
if (!string.IsNullOrEmpty(登录URL))
            
{
                htmlDoc 
= htmlWeb.Load(登录URL, 提交的用户验证信息, 获取数据的网页URL);
            }

            
else
            
{
                htmlDoc 
= htmlWeb.Load(获取数据的网页URL);
            }

        

-------------------------------------------------------------------------------------------

using System;

using System.Collections.Generic;

using System.Text;

using Microsoft.VisualStudio.TestTools.WebTesting;

using HtmlAgilityPack;

public class WebTest1Coded : WebTest

{

public override IEnumerator<WebTestRequest> GetRequestEnumerator()

{

WebTestRequest request1 = new WebTestRequest("http://www.microsoft.com/");

request1.ValidateResponse += new EventHandler<ValidationEventArgs>(request1_ValidateResponse);

yield return request1;

}

void request1_ValidateResponse(object sender, ValidationEventArgs e)

{

//load the response body string as an HtmlAgilityPack.HtmlDocument

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(e.Response.BodyString);

//locate the "Nav" element

HtmlNode navNode = doc.GetElementbyId("Nav");

//pick the first <li> element

HtmlNode firstNavItemNode = navNode.SelectSingleNode(".//li");

//validate the first list item in the Nav element says "Windows"

e.IsValid = firstNavItemNode.InnerText == "Windows";

}

}