Yakagadziriswa: pandas inoverenga parquet kubva pas3

Munyika yanhasi inofambiswa nemafashoni, kubata nemaseti makuru edata kwakajairika, uye pandas raibhurari yakakurumbira muPython inopa zvine simba, zviri nyore kushandisa data manipulation maturusi. Pakati peakasiyana siyana mafomati edata, Parquet inoshandiswa zvakanyanya kune yayo inoshanda columnar kuchengetedza uye isingaremi syntax. Amazon S3 inzira yakakurumbira yekuchengetedza mafaera ako, uye kuibatanidza nepandas kunogona kuvandudza zvakanyanya mafambiro ako. Muchinyorwa chino, tichaongorora maverengero eParquet mafaera kubva kuAmazon S3 tichishandisa iro rine simba pandas raibhurari.

Kuti ugadzirise dambudziko rekuverenga mafaira eParquet kubva kuS3, unofanirwa kunzwisisa zvikamu zvakakosha uye maraibhurari anobatanidzwa. Maraibhurari maviri makuru atichashandisa pandas uye s3fs. Pandas inobata kugadziridzwa kwedata, nepo s3fs ichapa kubatana kuAmazon S3.

import pandas as pd
import s3fs

Pandas Library

pandas iraibhurari yakavhurika-sosi inopa ine simba data manipulation uye maturusi ekuongorora muPython. Inoshandiswa zvakanyanya nenharaunda yesainzi yedata, nekuda kwekuchinjika kwayo uye kugona kushanda nemafomati akasiyana e data, kusanganisira mafaera eParquet. Ne pandas, unogona kurodha, kuongorora, uye kushandisa data zviri nyore, zvichiita kuti iwe ugone kuongorora nekukasira kunzwisisa mapatani nemafambiro ari mudata rako.

S3fs Library

S3fs iPython faira-senge interface yekuwana isina musono zvinhu zveAmazon S3. Inobatanidza kushanda kweBoto3 uye FUSE (Filesystem muUserspace), zvichiita kuti zvive nyore zvikuru kushanda nezvinhu zveS3 sekunge mafaira emunharaunda. Kuburikidza nes3fs, unogona kuverenga nekunyora mafaera kubva kuS3, kunyora uye kudzima zvinhu, uye kuita mamwe mafaera mashandiro zvakananga nePython.

Zvino zvawanzwisisa maraibhurari anosanganisirwa, ngatiendei nenhanho-ne-nhanho tsananguro yekuverenga Parquet mafaera kubva kuS3 uchishandisa pandas uye s3fs.

  1. Isa pandas uye s3fs - Chekutanga, iwe unofanirwa kuisa ese pandas uye s3fs maraibhurari kuburikidza nepip:
pip install pandas s3fs
  1. Kupinza maraibhurari - Tanga nekupinza ese mapandas uye s3fs maraibhurari:
import pandas as pd
import s3fs
  1. Seta gadziriso -Gadzirisa zvitupa zvako zveAmazon S3 nekuzvipfuudza zvakananga kune s3fs kana kugadzirisa nharaunda yako neAWS_ACCESS_KEY_ID uye AWS_SECRET_ACCESS_KEY:
fs = s3fs.S3FileSystem(
  key='your_aws_access_key_id',
  secret='your_aws_secret_access_key'
)
  1. Verenga Parquet faira kubva kuS3 -Shandisa pandas uye s3fs kuverenga yako Parquet faira:
file_path = 's3://your_bucket/path/to/your/parquet/file.parquet'
df = pd.read_parquet(file_path, storage_options={"s3": {"anon": False}})

Mushure mekuita nhanho idzi, unofanirwa kunge wakabudirira kuverenga yako Parquet faira kubva kuS3, uye iyo dataframe 'df' ikozvino ine yako S3 data mune tabular fomati.

Muchikamu chino, taona nzira yekuwana nekuverenga mafaira eParquet kubva kuAmazon S3 uchishandisa raibhurari yepandas ine simba rekushandisa data uye s3fs yekubatanidza S3 isina musono. Zvishandiso izvi zvinogona kuvandudza zvakanyanya yako data processing workflows uye zvinokutendera kuti utarise pakutora njere uye kunzwisisa zvazvino maitiro munyika yefashoni. Kubva pakuongorora akasiyana masitaera musanganiswa kusvika pakuongorora nhoroondo uye shanduko yemafashoni ezvipfeko, pandas inoita kuti zvive nyore kuburitsa matombo akavanzika mudata rako.

Related posts:

Leave a Comment