惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
Tenable Blog
H
Heimdal Security Blog
K
Kaspersky official blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
S
Schneier on Security
G
GRAHAM CLULEY
U
Unit 42
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
C
CERT Recently Published Vulnerability Notes
Google DeepMind News
Google DeepMind News
罗磊的独立博客
Stack Overflow Blog
Stack Overflow Blog
阮一峰的网络日志
阮一峰的网络日志
Simon Willison's Weblog
Simon Willison's Weblog
C
Cisco Blogs
Cyberwarzone
Cyberwarzone
T
The Exploit Database - CXSecurity.com
Project Zero
Project Zero
Security Archives - TechRepublic
Security Archives - TechRepublic
www.infosecurity-magazine.com
www.infosecurity-magazine.com
博客园 - 司徒正美
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
V
Visual Studio Blog
博客园 - Franky
Engineering at Meta
Engineering at Meta
WordPress大学
WordPress大学
Jina AI
Jina AI
P
Proofpoint News Feed
P
Proofpoint News Feed
有赞技术团队
有赞技术团队
L
LINUX DO - 最新话题
宝玉的分享
宝玉的分享
N
News and Events Feed by Topic
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
博客园 - 聂微东
T
The Blog of Author Tim Ferriss
Spread Privacy
Spread Privacy
Application and Cybersecurity Blog
Application and Cybersecurity Blog
IT之家
IT之家
S
Security Affairs
博客园 - 叶小钗
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
小众软件
小众软件
N
News | PayPal Newsroom
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
W
WeLiveSecurity
The Last Watchdog
The Last Watchdog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
NISL@THU
NISL@THU

博客园 - 慢步前行

实验0 安装GLUT包及工程的创建与运行 实验2 二维图形几何变换 实验1 时间趋势可视化 本博客基本不再更新,请移步至我的CSDN博客 WebGL绘制三角形 WebGL画点程序v3 WebGL画点程序v2 WebGL画点程序v1 三步实现修改hosts方式登录谷歌 Maya API编程快速入门 我的高拍仪自动阅卷系统 实验8 标准模板库STL 实验7 多态与模板 实验6 继承 实验5 运算符重载 实验4 类初步 实验3 文件操作 实验2 C++数组与指针 实验1 C++函数
《鲜活的数据-第2章 处理数据》有关代码
慢步前行 · 2017-09-11 · via 博客园 - 慢步前行

2.1.3 自动收集数据

import urllib2
page = urllib2.urlopen("https://www.wunderground.com/history/airport/ZHCC/2017/9/8/DailyHistory.html")
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(page)
images = soup.findAll('img')
first_image = images[0]
print first_image
wxvalue = soup.findAll(attrs={"class":"wx-value"})
print wxvalue
print wxvalue[0]
print wxvalue[0].span.string #AttributeError: 'NoneType' object has no attribute 'string'
print wxvalue[0].contents[0].string
for m in range(1, 13):
    for d in range(1, 32):
 
      # Check if already gone through month
      if (m == 2 and d > 28):
        break
      elif (m in [4, 6, 9, 11] and d > 30):
        break
 
      # Open wunderground.com url
      timestamp = '2016' + str(m) + str(d)
      print "Getting data for " + timestamp
      #url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html"
      url = "https://www.wunderground.com/history/airport/ZHCC/2016/" + str(m) + "/" + str(d) + "/DailyHistory.html"
      page = urllib2.urlopen(url)
 
      # Get temperature from page
      soup = BeautifulSoup(page)
      # dayTemp = soup.body.nobr.b.string
      dayTemp = soup.findAll(attrs={"class":"wx-value"})[0].contents[0].string
 
      # Format month for timestamp
      if len(str(m)) < 2:
        mStamp = '0' + str(m)
      else:
        mStamp = str(m)
 
      # Format day for timestamp
      if len(str(d)) < 2:
        dStamp = '0' + str(d)
      else:
        dStamp = str(d)
 
      # Build timestamp
      timestamp = '2016' + mStamp + dStamp
 
      # Write timestamp and temperature to file
      print timestamp + ',' + dayTemp + '\n'

终端输入并运行文件

python get-weather-data.py

2.2.3 用代码来格式化

1. CSV转为XML

import csv
reader = csv.reader(open('wunder-data.txt', 'r'), delimiter=",")
print '<weather_data>'

for row in reader:
    print '<observation>'
    print '<date>' + row[0] + '</date>'
    print '<temperature>' + row[1] + '</temperature>'
    print '</observation>'

print '</weather_data>'

终端输入并运行文件

python csv2xml.py >wunder-data1.xml

f = open('wunder-data.xml', 'w')
f.write('<weather_data>')
for row in reader:
    f.write( '<observation>')
    f.write( '<date>' + row[0] + '</date>')
    f.write( '<temperature>' + row[1] + '</temperature>')
    f.write( '</observation>')

f.write( '</weather_data>')
f.close()

2. XML转为CSV

from BeautifulSoup import BeautifulStoneSoup
f = open('wunder-data.xml', 'r')
xml = f.read()
soup = BeautifulStoneSoup(xml)
observations = soup.findAll('observation')
for o in observations:
    print o.date.string + "," + o.temperature.string

终端输入并运行文件

python xml2csv.py >wunder-data1.txt

3. CSV转为JSON

import csv
reader = csv.reader(open('wunder-data.txt', 'r'), delimiter=",")
print '{ "observations": ['
rows_so_far = 0
for row in reader:
    
    rows_so_far += 1
    
    print '{' 
    print '"date": ' + '"' + row[0] + '", '
    print '"temperature": ' + row[1] 
    
    if rows_so_far < 365:
        print " },"
    else:
        print " }"
    
print "] }"

终端输入并运行文件

python csv2json.py >wunder-data1.json

4.在循环中加入新的逻辑

import csv
reader = csv.reader(open('wunder-data.txt', 'r'), delimiter=",")
for row in reader:
    if int(row[1]) <= 32:
        is_freezing = '1'
    else:
        is_freezing = '0'
    
    print row[0] + "," + row[1] + "," + is_freezing

终端输入并运行文件

python freezingInfo.py >wunder-data-fz.txt