Isonjululwe: iipanda zifunda ipalakethi ukusuka ku-s3

Kwihlabathi lanamhlanje eliqhutywa yifashoni, ukujongana neeseti zedatha enkulu kuyinto eqhelekileyo, kwaye i-pandas lithala leencwadi elidumileyo ePython elibonelela ngezixhobo ezinamandla, ekulula ukuzisebenzisa zokukhohlisa idatha. Phakathi kweentlobo ezininzi zeefomathi zedatha, iParquet isetyenziswa ngokubanzi ukugcinwa kwayo kwekholamu efanelekileyo kunye ne-syntax elula. IAmazon S3 lukhetho oludumileyo lokugcina iifayile zakho, kwaye ukuyidibanisa neepandas kunokuphucula kakhulu ukuhamba kwakho komsebenzi. Kweli nqaku, siza kuhlolisisa indlela yokufunda iifayile zeParquet kwi-Amazon S3 usebenzisa ilayibrari ye-pandas enamandla.

Ukuxazulula ingxaki yokufunda iifayile zeParquet ezivela kwi-S3, kufuneka uqonde amacandelo abalulekileyo kunye namathala eencwadi abandakanyekayo. Amathala eencwadi amabini aphambili esiza kuwasebenzisa ziipanda kunye nee-s3fs. I-Pandas iya kusingatha ukucutshungulwa kwedatha, ngelixa i-s3fs iya kunika uxhulumaniso kwi-Amazon S3.

import pandas as pd
import s3fs

Pandas Library

Iipandas yilayibrari yomthombo ovulekileyo obonelela ngamandla okuguqula idatha kunye nezixhobo zokuhlalutya kwiPython. Isetyenziswa ngokubanzi luluntu lwesayensi yedatha, ngenxa yokuguquguquka kwayo kunye nokukwazi ukusebenza kunye neefomathi zedatha ezahlukeneyo, kuquka iifayile zeParquet. Nge-pandas, unokulayisha ngokulula, ukuhlalutya, kunye nokukhohlisa idatha, ekuvumela ukuba uphonononge ngokukhawuleza kwaye uqonde iipateni kunye nentsingiselo kwidatha yakho.

Ithala leencwadi le-S3fs

S3fs lujongano olufana nefayile yePython yokufikelela ngaphandle komthungo kwizinto zeAmazon S3. Idibanisa ukusebenza kwe-Boto3 kunye ne-FUSE (Inkqubo yefayile kwi-Userspace), okwenza kube lula kakhulu ukusebenza kunye nezinto ze-S3 ngokungathi ziifayile zendawo. Nge-s3fs, unokufunda kwaye ubhale iifayile kwi-S3, uluhlu kwaye ucime izinto, kwaye wenze eminye imisebenzi yefayile ngokuthe ngqo ngePython.

Ngoku ukuba uyawaqonda amathala eencwadi abandakanyekayo, masihambe ngesinyathelo-nge-nyathelo inkcazo yokufunda iifayile zeParquet ukusuka kwi-S3 usebenzisa i-pandas kunye ne-s3fs.

  1. Faka iipanda kunye ne-s3fs -Kuqala, kufuneka ufake zombini iipandas kunye neelayibrari ze-s3fs ngepayipi:
pip install pandas s3fs
  1. Amathala eencwadi ngaphandle -Qala ngokungenisa zombini iipandas kunye neelayibrari ze-s3fs:
import pandas as pd
import s3fs
  1. Cwangcisa ubumbeko -Seta iziqinisekiso zakho ze-Amazon S3 ngokuzidlulisela ngokuthe ngqo kwi-s3fs okanye ulungelelanise okusingqongileyo nge-AWS_ACCESS_KEY_ID kunye ne-AWS_SECRET_ACCESS_KEY:
fs = s3fs.S3FileSystem(
  key='your_aws_access_key_id',
  secret='your_aws_secret_access_key'
)
  1. Funda ifayile yeParquet evela kwi-S3 Sebenzisa iipandas kunye nees3fs ukufunda ifayile yakho yeParquet:
file_path = 's3://your_bucket/path/to/your/parquet/file.parquet'
df = pd.read_parquet(file_path, storage_options={"s3": {"anon": False}})

Emva kokuphumeza la manyathelo, kufuneka ufunde ngempumelelo ifayile yakho yeParquet ukusuka kwi-S3, kunye ne uluhlu lwedatha 'df' ngoku iqulathe idatha yakho ye-S3 kwifomati yetheyibhuli.

Kweli nqaku, siye sabona indlela yokufikelela kunye nokufunda iifayile zeParquet ezivela kwi-Amazon S3 usebenzisa ilayibrari ye-pandas enamandla yokuguqulwa kwedatha kunye ne-s3fs yokudibanisa i-S3 engenamthungo. Ezi zixhobo zinokuphucula kakhulu ukusetyenzwa kwedatha yakho kwaye zikuvumela ukuba ugxile ekukhupheni ukuqonda kunye nokuqonda iintsingiselo zamva nje kwihlabathi lefashoni. Ukusuka ekujongeni imidibaniso eyahlukahlukeneyo yesitayile ukuya kuhlalutya lwembali kunye nokuvela kweendlela zokunxiba, iipandas zenza kube lula ukufumanisa amagugu afihliweyo kwidatha yakho.

Izithuba ezihambelanayo:

Shiya Comment