惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Attack and Defense Labs
Attack and Defense Labs
The GitHub Blog
The GitHub Blog
C
Check Point Blog
博客园_首页
MongoDB | Blog
MongoDB | Blog
N
Netflix TechBlog - Medium
F
Full Disclosure
Microsoft Security Blog
Microsoft Security Blog
爱范儿
爱范儿
Recent Announcements
Recent Announcements
阮一峰的网络日志
阮一峰的网络日志
G
GRAHAM CLULEY
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
T
Threat Research - Cisco Blogs
C
Cybersecurity and Infrastructure Security Agency CISA
V
Vulnerabilities – Threatpost
K
Kaspersky official blog
博客园 - 司徒正美
S
Schneier on Security
T
The Exploit Database - CXSecurity.com
Project Zero
Project Zero
云风的 BLOG
云风的 BLOG
Cisco Talos Blog
Cisco Talos Blog
Know Your Adversary
Know Your Adversary
雷峰网
雷峰网
V
V2EX - 技术
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Spread Privacy
Spread Privacy
罗磊的独立博客
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
S
Security Affairs
SecWiki News
SecWiki News
Schneier on Security
Schneier on Security
O
OpenAI News
Jina AI
Jina AI
PCI Perspectives
PCI Perspectives
Cyberwarzone
Cyberwarzone
Y
Y Combinator Blog
Apple Machine Learning Research
Apple Machine Learning Research
B
Blog RSS Feed
I
InfoQ
D
Docker
P
Palo Alto Networks Blog
Recorded Future
Recorded Future
M
MIT News - Artificial intelligence
博客园 - Franky
B
Blog
Scott Helme
Scott Helme
博客园 - 叶小钗
D
DataBreaches.Net

博客园 - Ivan Zou

示例 - 17行代码实现一个简单高效的多线程蜘蛛程序 示例 - 10行代码在C#中获取页面元素布局信息 Spider Studio 新版本 (20140225) - 设置菜单调整 / 提供JQueryContext布局相关的方法 示例 - 如何在NodeJS中调用SS生成的DLL 示例 - 25行代码等价实现 - 借助Nodejs在服务端使用jQuery采集17173游戏排行信息 Spider Studio 新版本 (码年吉祥版) - 浏览器视图 / 脚本库上线! 分享: 利用Readability解决网页正文提取问题 分享 - Hybrid 开发将博客园集成到自己的网站中 - 效果高大上 :) Spider Studio 新版本 (20140109) - 修复浏览器对部分网页不支持的BUG Spider Studio 新版本 (20140108) - 优化设置菜单 / 生成程序集支持版本号 示例 - 数据仓库的妙用 Spider Studio 界面功能布局 C# 脚本代码自动登录淘宝获取用户信息 API - 使用数据仓库 - 基础篇 示例 - 如何在ASP.NET中应用Spider Studio生成的DLL? 示例 - 如何在多线程中应用SpiderStudio生成的DLL? 示例 - 如何在Console应用程序中应用SpiderStudio生成的DLL? C#中另辟蹊径解决JSON / XML互转的问题 - Ivan Zou Spider Studio 新版本 (x-mas) - 可以引入第三方程序集, 可以将脚本生成为DLL
分享一个天气历史数据的采集脚本
Ivan Zou · 2014-01-22 · via 博客园 - Ivan Zou

最近一个项目中需要用到过往的天气数据, 我找到了天气后报这个网站 (www.tianqihoubao.com), 并在SS中完成了相关采集, 和大家分享一下.

首先分析这个网站提供了两种信息:

1. 省市关系

2. 天气记录

对应的我们创建数据结构:

[Serializable]
public class Province
{
    public string ProvinceName;
    public string ProvinceUrl;
}

[Serializable]
public class City
{
    public Province Province;
    public string CityName;
    public string CityUrl;
}

[Serializable]
public class WeatherDataSet
{
    public City City;
    public string Title;
    public string Url;
}

[Serializable]
public class WeatherData
{
    public WeatherDataSet DataSet;
    public string Date;
    public string TextWeather;
    public string Temp;
    public string Wind;
}

>> 网站一共提供34个直辖市/省/特区的天气信息, 完整的列表在: http://www.tianqihoubao.com/lishi/index.htm

对应的采集语句是:

var list = Default.SelectNodes("#content DT a");

>> 每个省都有下辖的城市列表, 如: http://www.tianqihoubao.com/lishi/hebei.htm

对应的采集语句是:

var list = Default.SelectNodes("#content DD a");

>> 每个城市都有一个历史天气记录列表, 如: http://www.tianqihoubao.com/lishi/shijiazhuang.html

对应的采集语句是:

Default.SelectNodes("#content>div.pcity a");

>> 进入每条历史天气记录, 就可以得到当月的天气数据了:

对应的采集语句是:

var list = Default.SelectNodes("#content>table.b tr:gt(0)");
foreach(var item in list)
{
    var date = item.SelectSingleNode("td:eq(0)").Text();
    var textWeather = item.SelectSingleNode("td:eq(1)").Text();
    var temp = item.SelectSingleNode("td:eq(2)").Text();
    var wind = item.SelectSingleNode("td:eq(3)").Text();
}

将这些语句分别包装为方法, 并将结果绑定到最开始定义的数据结构中:

public List<Province> GetProvinceList() {...} //获取直辖市/省/特区
public List<City> GetCityList(Province province) {...} //获取城市列表
public List<WeatherDataSet> GetWeatherDataSet(City city) {...} //获取指定城市的天气历史记录集
public List<WeatherData> GetWeatherData(WeatherDataSet ds) {...} //获取天气历史数据

>> 完整的脚本: (复制到SS中即可直接运行)

SS下载地址为: http://www.gdtsearch.com/products.spiderstudio.docapi.htm

public void Run()
{
    Logger.ClearAll();
    Default.ScriptErrorsSuppressed = true;
    
    var pl = GetProvinceList();
    foreach(var p in pl)
    {
        Logger.Log(p.ProvinceName);
        Logger.Log(p.ProvinceUrl);
    }
    var cl = GetCityList(pl[1]);
    foreach(var c in cl)
    {
        Logger.Log(c.Province.ProvinceName);
        Logger.Log(c.Province.ProvinceUrl);
        Logger.Log(c.CityName);
        Logger.Log(c.CityUrl);
    }
    var ds = GetWeatherDataSet(cl[1]);
    foreach(var d in ds)
    {
        Logger.Log(d.City.CityName);
        Logger.Log(d.Title);
        Logger.Log(d.Url);
    }
    var dl = GetWeatherData(ds[0]);
    foreach(var d in dl)
    {
        Logger.Log(d.DataSet.Title);
        Logger.Log(d.Date);
        Logger.Log(d.TextWeather);
        Logger.Log(d.Temp);
        Logger.Log(d.Wind);
    }
} 


public List<Province> GetProvinceList()
{
    Default.Navigate("http://www.tianqihoubao.com/lishi/index.htm");
    Default.Ready("#content DT");
    var list = Default.SelectNodes("#content DT a");
    var result = new List<Province>();
    foreach(var item in list)
    {
        var p = new Province();
        p.ProvinceName = item.Text();
        p.ProvinceUrl = item.Attr("href");
        p.ProvinceUrl = new Uri(Default.Url, p.ProvinceUrl).ToString();
        result.Add(p);
    }
    return result;
}

public List<City> GetCityList(Province province)
{
    Default.Navigate(province.ProvinceUrl);
    Default.Ready("#content DD");
    var list = Default.SelectNodes("#content DD a");
    var result = new List<City>();
    foreach(var item in list)
    {
        var c = new City();
        c.Province = province;
        c.CityName = item.Text();
        c.CityUrl = item.Attr("href");
        c.CityUrl = new Uri(Default.Url, c.CityUrl).ToString();
        result.Add(c);
    }
    return result;
}

public List<WeatherDataSet> GetWeatherDataSet(City city)
{
    Default.Navigate(city.CityUrl);
    Default.Ready("#content>div.pcity");
    var list = Default.SelectNodes("#content>div.pcity a");
    var result = new List<WeatherDataSet>();
    foreach(var item in list)
    {
        var ds = new WeatherDataSet();
        ds.Title = item.Text();
        ds.Url = item.Attr("href");
        ds.Url = new Uri(Default.Url, ds.Url).ToString();
        ds.City = city;
        result.Add(ds);
    }
    return result;
}

public List<WeatherData> GetWeatherData(WeatherDataSet ds)
{
    Default.Navigate(ds.Url);
    Default.Ready("#content>table.b");
    var list = Default.SelectNodes("#content>table.b tr:gt(0)");
    var result = new List<WeatherData>();
    foreach(var item in list)
    {
        var d = new WeatherData();
        d.DataSet = ds;
        d.Date = item.SelectSingleNode("td:eq(0)").Text();
        d.TextWeather = item.SelectSingleNode("td:eq(1)").Text();
        d.Temp = item.SelectSingleNode("td:eq(2)").Text();
        d.Wind = item.SelectSingleNode("td:eq(3)").Text();
        result.Add(d);
    }
    return result;
}

[Serializable]
public class Province
{
    public string ProvinceName;
    public string ProvinceUrl;
}

[Serializable]
public class City
{
    public Province Province;
    public string CityName;
    public string CityUrl;
}

[Serializable]
public class WeatherDataSet
{
    public City City;
    public string Title;
    public string Url;
}

[Serializable]
public class WeatherData
{
    public WeatherDataSet DataSet;
    public string Date;
    public string TextWeather;
    public string Temp;
    public string Wind;
}

View Code

>> 运行效果: