Kuxazululiwe: ama-panda afunda i-parquet kusuka ku-s3

Ezweni lanamuhla eliqhutshwa imfashini, ukubhekana namasethi amakhulu edatha kuyinto evamile, futhi i-pandas iwumtapo wezincwadi odumile osePython ohlinzeka ngamathuluzi anamandla, asebenziseka kalula okukhohlisa idatha. Phakathi kwezinhlobonhlobo ezinkulu zamafomethi wedatha, i-Parquet isetshenziswa kabanzi ukugcinwa kwayo kwekholomu ephumelelayo kanye ne-syntax engasindi. I-Amazon S3 iyindlela yokugcina edumile yamafayela akho, futhi ukuyihlanganisa nama-panda kungathuthukisa kakhulu ukuhamba kwakho komsebenzi. Kulesi sihloko, sizohlola ukuthi ungawafunda kanjani amafayela e-Parquet avela ku-Amazon S3 usebenzisa umtapo wezincwadi we-pandas onamandla.

Ukuxazulula inkinga yokufunda amafayela e-Parquet kusuka ku-S3, udinga ukuqonda izingxenye ezibalulekile nemitapo yolwazi ehilelekile. Imitapo yolwazi emibili emikhulu esizoyisebenzisa ama-panda nama-s3fs. AmaPanda azophatha ukucutshungulwa kwedatha, kuyilapho ama-s3fs azohlinzeka ngoxhumo ku-Amazon S3.

import pandas as pd
import s3fs

I-Pandas Library

AmaPandas iyilabhulali yomthombo ovulekile ehlinzeka ngamathuluzi anamandla okukhohlisa nokuhlaziya idatha kuPython. Isetshenziswa kakhulu umphakathi wesayensi yedatha, ngenxa yokuguquguquka kwayo kanye nekhono lokusebenza ngamafomethi wedatha ahlukene, okuhlanganisa namafayela e-Parquet. Ngama-panda, ungakwazi ukulayisha kalula, uhlaziye, futhi ulawule idatha, okukuvumela ukuthi uhlole ngokushesha futhi uqonde amaphethini nokuthrendayo kudatha yakho.

I-S3fs Library

I-S3fs iyisixhumi esibonakalayo esifana nefayela lePython sokufinyelela ngaphandle komthungo ezintweni ze-Amazon S3. Ihlanganisa ukusebenza kwe-Boto3 kanye ne-FUSE (Isistimu Yefayela Ku-Userspace), okwenza kube lula ngendlela emangalisayo ukusebenza nezinto ze-S3 njengokungathi zingamafayela endawo. Nge-s3fs, ungafunda futhi ubhale amafayela ku-S3, wenze uhlu futhi ususe izinto, futhi wenze eminye imisebenzi yefayela ngqo ngePython.

Manje njengoba usuwaqonda amalabhulali ahilelekile, ake sidlule encazelweni yesinyathelo nesinyathelo sokufunda amafayela e-Parquet ku-S3 sisebenzisa ama-panda nama-s3fs.

  1. Faka ama-panda nama-s3fs - Okokuqala, udinga ukufaka imitapo yolwazi ye-pandas ne-s3fs ngepayipi:
pip install pandas s3fs
  1. Ngenisa imitapo yolwazi - Qala ngokungenisa yomibili imitapo yolwazi yama-panda kanye ne-s3fs:
import pandas as pd
import s3fs
  1. Setha ukucushwa - Setha imininingwane yakho ye-Amazon S3 ngokuyidlulisela ngqo kuma-s3fs noma ngokulungiselela indawo okuyo nge-AWS_ACCESS_KEY_ID kanye ne-AWS_SECRET_ACCESS_KEY:
fs = s3fs.S3FileSystem(
  key='your_aws_access_key_id',
  secret='your_aws_secret_access_key'
)
  1. Funda ifayela leParquet ku-S3 - Sebenzisa ama-pandas nama-s3fs ukufunda ifayela lakho leParquet:
file_path = 's3://your_bucket/path/to/your/parquet/file.parquet'
df = pd.read_parquet(file_path, storage_options={"s3": {"anon": False}})

Ngemuva kokwenza lezi zinyathelo, bekufanele ufunde ngempumelelo ifayela lakho leParquet kusuka ku-S3, kanye nefayela le-Parquet i-dataframe 'df' manje iqukethe idatha yakho ye-S3 ngefomethi yethebula.

Kulesi sihloko, sibonile ukuthi ungawafinyelela kanjani futhi uwafunde kanjani amafayela e-Parquet avela ku-Amazon S3 usebenzisa umtapo wezincwadi we-pandas onamandla wokukhohlisa idatha kanye nama-s3fs ekuxhumekeni kwe-S3 okungenamthungo. Lawa mathuluzi angathuthukisa kakhulu ukuhamba komsebenzi kokucubungula idatha futhi akuvumele ukuthi ugxile ekukhipheni imininingwane nokuqonda amathrendi akamuva emhlabeni wemfashini. Kusukela ekuhloleni inhlanganisela yezitayela ezehlukene kuya ekuhlaziyeni umlando nokuvela kwezitayela zezingubo, ama-panda akwenza kube lula ukwembula amagugu afihliwe kudatha yakho.

Okuthunyelwe okuhlobene:

Shiya amazwana