Munyika yanhasi, kushandura data uye kuongororwa kwakakosha kuti unzwisise zviitiko zvakasiyana uye kuita sarudzo dzine ruzivo. Rimwe remabasa akajairwa mukuongororwa kwedata kudzokorora dhata yenguva, iyo inosanganisira kushandura frequency yedata, kungave neupsampling (kuwedzera frequency) kana kudzikisa (kuderedza frequency). Muchikamu chino, tichakurukura maitiro ekuzadza kumashure uchisimudza data data uchishandisa iyo ine simba Python raibhurari, Pandas.
Kumashure Zadza muTime Series Data
Kana isu tikasample data data data, tinowedzera kuwanda kwemapoinzi edata, izvo zvinowanzo guma nekushaikwa kwakakosha kune ichangobva kugadzirwa data mapoinzi. Kuti tizadzise zvisizvo izvi, tinogona kushandisa nzira dzakasiyana-siyana. Imwe nzira yakadaro inonzi kuzadza kumashure, inozivikanwawo se backfilling. Kudzokera kumashure kuzadza ndiyo maitiro ekuzadza iyo yakashaikwa kukosha neinotevera inowanikwa kukosha munguva inoteedzana.
Pandas Library
Python's Pandas library chishandiso chakakosha chekugadzirisa data, chinopa huwandu hwakasiyana hwekushanda kwekubata data zvimiro seDataFrames uye data yakatevedzana data. Pandas ine akavakirwa-mukati maficha anoita kuti zvive nyore kushanda nenguva yakatevedzana data, senge kudzokorora uye kuzadza hunhu husipo, zvichiita kuti isu tiite nemazvo kuzadza kumashure mushure meupsampling.
Mhinduro: Kumashure Kuzadza nePandas
Kuratidza maitiro ekushandisa kuzadza kumashure mushure mekusimudzira nguva yedata data uchishandisa Pandas, ngatifungei muenzaniso wakapusa. Isu tichatanga nekupinza kunze maraibhurari anodiwa uye nekugadzira sampuli yenguva yakatevedzana dataset.
import pandas as pd import numpy as np # Create a sample time series dataset date_rng = pd.date_range(start='2022-01-01', end='2022-01-10', freq='D') data = np.random.randint(0, 100, size=(len(date_rng), 1)) df = pd.DataFrame(date_rng, columns=['date']) df['value'] = data
Zvino zvatine yedu yemuenzaniso data, isu tichaenderera mberi neupsampling uye nekushandisa yekumashure yekuzadza nzira. Mumuenzaniso uyu, isu tichasimudzira kubva kufrequency yezuva nezuva kusvika kune yeawa frequency:
# Upsample the data to hourly frequency df.set_index('date', inplace=True) hourly_df = df.resample('H').asfreq() # Apply the backward fill method to fill missing values hourly_df.fillna(method='bfill', inplace=True)
Mune iyo kodhi iri pamusoro, isu tinotanga kuseta iyo 'date' mutsara se index uye tozodzokorora iyo data kune frequency yeawa tichishandisa muenzaniso () basa. Iyo inoguma DataFrame haina kukosha nekuda kwekuwedzera frequency. Takabva tashandisa kuzadza () nzira ine parameter 'bfill' kuita kuzadza kumashure pane zvakashaikwa zvakakosha.
Tsanangudzo-nhanho-nhanho
Ngatiburitse kodhi kuti tinzwisise zviri nani:
1. Takatanga kuunza kunze kwenyika maraibhurari ePandas neNumPy:
import pandas as pd import numpy as np
2. Isu takagadzira sampuli yenguva yakatevedzana dataset tichishandisa iyo date_range() shanda kubva kuPandas kugadzira misi yemazuva ese uye zvisingaite manhamba manhamba:
date_rng = pd.date_range(start='2022-01-01', end='2022-01-10', freq='D') data = np.random.randint(0, 100, size=(len(date_rng), 1)) df = pd.DataFrame(date_rng, columns=['date']) df['value'] = data
3. Tevere, isu tinoisa iyo 'date' column se index uye takadzokorora data kune frequency yeawa ne muenzaniso () uye asfreq() mabasa:
df.set_index('date', inplace=True) hourly_df = df.resample('H').asfreq()
4. Pakupedzisira, takazadza zvisizvo zvakakosha muupsampled DataFrame tichishandisa kuzadza () nzira ine 'bfill' parameter yekuzadza kumashure:
hourly_df.fillna(method='bfill', inplace=True)
mhedziso
Muchikamu chino, takaongorora maitiro e kuzadza kumashure mushure mekusimudzira nguva yakatevedzana data uchishandisa raibhurari yePandas ine simba muPython. Nekunzwisisa nekushandisa matekiniki aya, tinokwanisa kunyatso shandisa uye kuongorora data yenguva, kuwana ruzivo rwakakosha uye kuita sarudzo dzine ruzivo.