降水序列重建思路

俟河清

【牛GN】白色相簿 2 动画版 / IC 浅谈【牛GN】最终幻想 16 【牛体力学】流体力学的基本概念 1 【牛体力学】前置：场论及张量【牛GN】iPhone 80 Ultra & Apple Watch 6 Pro 2025 年总结：变了很多，但也没变。【牛GN】最终幻想 7：重生【牛 GN】女神异闻录 3 重制版【牛GN】那些年用过的外围设备 MODIS 遥感产品数据处理及转换【牛GN】那些年用过的电脑们经纬度转换的那些麻烦事【牛GN】那些年用过的爪机们【牛GN】葬送的芙莉莲【All in Boom】第 2 期：服务部署【牛GN】Darling in the Franxx 动漫随笔：败犬女主实在是太多了【All in Boom】第 1.5 期：反向代理及内网 HTTPS 构建动漫随笔：Mygo、Ave Mujica和孤独摇滚

降水序列重建思路

SatyrLee · 2025-05-09 · via 俟河清

前几天接到了一批不同时间长度的降雨序列数据，需要将其处理为逐小时降水序列数据。然而处理的时候发现了一些问题，时间序列长度不一致，且有冲突。通过结合历史资料分析，提出了一种不是很合理的解决思路，并将其编写为代码，整理如下。

其实最简单的思路就是直接将长时间的数据除以时间并进行拼接即可，但是在原数据中发现了问题：14:00-16:00 的降水量是 0.5 mm，而 14:00-15:00 的降水量是 0.8 mm，这在时间段降水量定义上明显存在冲突。

但是我们不能简单就将其视为是异常数据就处理掉，因为在多个时间段内数据均有这样的问题。这时我想到在这里可能是表示为折算至每小时的降水量。同样，也不能简单的做出定论。需要进行数据支持。还好，有一些网站提供了历史降水量数据，可以进行简单比对。

考虑雨量计的工作原理，雨量计是通过测量降水的高度来计算降水量的。雨量计通常会有一个容器，当降水落入容器时，容器内的水位会随着降水量的增加而上升。雨量计会定期记录容器内的水位高度，并将其转换为降水量。

import pandas as pd
import osdef read_xlsx(file_path):
    """
    Read an Excel file and return a DataFrame.
    """
    if os.path.exists(file_path):
        df = pd.read_excel(file_path)
        return df
    else:
        raise FileNotFoundError(f"The file {file_path} does not exist.")
def first_processing(df, t_begin_col, t_end_col, data_col):
    """
    INPUT: pandas.DataFrame, t_begin_col, t_end_col, data_col
    OUTPUT: pandas.DataFrame
    """
    df[t_begin_col] = pd.to_datetime(df[t_begin_col])
    df[t_end_col] = pd.to_datetime(df[t_end_col])
    df = df[df[t_begin_col] <= df[t_end_col]].copy()
    global_begin = df[t_begin_col].min()
    global_end = df[t_end_col].max()
    full_range = pd.date_range(start=global_begin, end=global_end, freq='h')
    mapping = {}
    for _, row in df.iterrows():
        start = row[t_begin_col]
        end = row[t_end_col]
        value = row[data_col]
        if end - start == pd.Timedelta(hours=1):
            mapping[start] = value
    new_rows = []
    for ts in full_range:
        new_rows.append({t_begin_col: ts, data_col: mapping.get(ts, 0),"type": 1 if ts in mapping else 0})
    new_df = pd.DataFrame(new_rows)
    return new_df
def further_processing(df, t_begin_col, t_end_col, data_col, target_df, type_val):
    """
    INPUT: pandas.DataFrame, t_begin_col, t_end_col, data_col, target_df, type_val
    OUTPUT: pandas.DataFrame
    """
    interval = pd.Timedelta(hours=type_val)
    filtered_df = df[(df[t_end_col] - df[t_begin_col] == interval)].drop_duplicates(
        subset=[t_begin_col, t_end_col, data_col])
    for _, row in filtered_df.iterrows():
        current_time = row[t_begin_col]
        selected_value = row[data_col]
        period_start = current_time
        period_end = current_time + interval
        mask = (target_df[t_begin_col] >= period_start) & (target_df[t_begin_col] < period_end)
        sub_df = target_df.loc[mask]
        existing_nonzero_sum = sub_df[sub_df[data_col] != 0][data_col].sum()
        remain_value = selected_value - existing_nonzero_sum
        if remain_value > 0:
            empty_mask = sub_df[data_col] == 0
            empty_count = empty_mask.sum()
            if empty_count > 0:
                average_val = remain_value / empty_count
                target_df.loc[mask & empty_mask, data_col] = average_val
                target_df.loc[mask & empty_mask, "type"] = type_val
    return target_df
def prep_extract(df,prep0,time,excel_name):
    """
    INPUT: pandas.DataFrame, prep0,time,excel_name
    OUTPUT: Excel file with sequences of rainfall
    """
    results = []
    current_sequence = []
    current_type = None
    for index, row in df.iterrows():
        if row['prep'] > prep0:
            current_sequence.append(row)
        else:
            if len(current_sequence) >= time:
                results.append((current_sequence, current_type))
            current_sequence = []
    if len(current_sequence) >= time:
        results.append((current_sequence, current_type))
    with pd.ExcelWriter(excel_name) as writer:
        for i, (sequence, seq_type) in enumerate(results):
            sequence_df = pd.DataFrame(sequence)
            sequence_df.to_excel(writer, sheet_name=f"Seq_{i+1}", index=False)
if __name__ == "__main__":
    file_path = 'data.xlsx'
    df = pd.read_excel(file_path)
    t_begin_col = 'begin'
    t_end_col = 'end'
    data_col = 'prep'
    types=[2,3,4,5,6,7,8,9,10,11,12,13,14,15]
    hourly_df = first_processing(df, t_begin_col, t_end_col, data_col)
    for i in types:
        hourly_df = further_processing(df, t_begin_col, t_end_col, data_col, hourly_df, i)
    hourly_df.to_excel('hourly.xlsx', index=False)
    prep_extract(hourly_df,2,5,'prep_extract.xlsx')
    print(hourly_df)

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

俟河清