An warware: pandas karanta parquet daga s3

A cikin duniyar yau da ake amfani da kayan kwalliya, ma'amala da manyan bayanai ya zama ruwan dare gama gari, kuma pandas sanannen ɗakin karatu ne a Python wanda ke ba da ƙarfi, kayan aikin sarrafa bayanai masu sauƙin amfani. Daga cikin nau'ikan nau'ikan bayanai iri-iri, ana amfani da Parquet sosai don ingantaccen ma'ajin sa da kuma ma'auni mai nauyi. Amazon S3 sanannen zaɓi ne na ajiya don fayilolinku, kuma haɗa shi tare da pandas na iya haɓaka aikin ku sosai. A cikin wannan labarin, za mu bincika yadda ake karanta fayilolin Parquet daga Amazon S3 ta amfani da ɗakin karatu mai ƙarfi na pandas.

Don warware matsalar karanta fayilolin Parquet daga S3, kuna buƙatar fahimtar mahimman abubuwan haɗin gwiwa da ɗakunan karatu da ke ciki. Manyan ɗakunan karatu guda biyu da za mu yi amfani da su sune pandas da s3fs. Pandas zai kula da sarrafa bayanan, yayin da s3fs zai samar da haɗin kai zuwa Amazon S3.

import pandas as pd
import s3fs

Pandas Library

Panda babban ɗakin karatu ne mai buɗewa wanda ke ba da ƙarfin sarrafa bayanai da kayan aikin bincike a Python. Ƙungiyoyin kimiyyar bayanai suna amfani da shi sosai, godiya ga sassauƙansa da ikon aiki tare da tsarin bayanai daban-daban, gami da fayilolin Parquet. Tare da pandas, zaku iya lodawa cikin sauƙi, bincika, da sarrafa bayanai, yana ba ku damar bincika da sauri da fahimtar alamu da abubuwan da ke cikin bayanan ku.

S3fs Library

S3fs Fayil mai kama da Python ne don shiga abubuwan Amazon S3 ba tare da matsala ba. Yana haɗuwa da ayyuka na Boto3 da FUSE (Filesystem in Userspace), yana sa shi sauƙi mai sauƙi don aiki tare da abubuwan S3 kamar dai fayilolin gida ne. Ta hanyar s3fs, zaku iya karantawa da rubuta fayiloli daga S3, jera da share abubuwa, da aiwatar da sauran ayyukan fayil kai tsaye tare da Python.

Yanzu da kuka fahimci ɗakunan karatu da abin ya shafa, bari mu shiga cikin bayanin mataki-mataki na karanta fayilolin Parquet daga S3 ta amfani da pandas da s3fs.

  1. Shigar pandas da s3fs - Na farko, kuna buƙatar shigar da pandas da ɗakunan karatu na s3fs ta hanyar pip:
pip install pandas s3fs
  1. Shigo da ɗakunan karatu - Fara da shigo da pandas da ɗakunan karatu na s3fs:
import pandas as pd
import s3fs
  1. Saita tsari - Sanya takaddun shaidar ku na Amazon S3 ta ko dai wuce su kai tsaye zuwa s3fs ko daidaita yanayin ku tare da AWS_ACCESS_KEY_ID da AWS_SECRET_ACCESS_KEY:
fs = s3fs.S3FileSystem(
  key='your_aws_access_key_id',
  secret='your_aws_secret_access_key'
)
  1. Karanta fayil ɗin Parquet daga S3 - Yi amfani da pandas da s3fs don karanta fayil ɗin Parquet:
file_path = 's3://your_bucket/path/to/your/parquet/file.parquet'
df = pd.read_parquet(file_path, storage_options={"s3": {"anon": False}})

Bayan aiwatar da waɗannan matakan, yakamata ku sami nasarar karanta fayil ɗin Parquet ɗinku daga S3, da kuma dataframe 'df' yanzu yana ƙunshe da bayanan S3 ɗin ku a cikin tsari na tebur.

A cikin wannan labarin, mun ga yadda ake samun dama da karanta fayilolin Parquet daga Amazon S3 ta amfani da ɗakin karatu mai ƙarfi na pandas don sarrafa bayanai da s3fs don haɗin S3 maras kyau. Waɗannan kayan aikin na iya haɓaka ayyukan sarrafa bayananku sosai kuma suna ba ku damar mai da hankali kan fitar da fahimta da fahimtar sabbin abubuwan da ke faruwa a duniyar salo. Daga bincika haɗe-haɗe daban-daban zuwa nazarin tarihi da juyin halitta na yanayin sutura, pandas yana sauƙaƙa gano ɓoyayyun duwatsu masu daraja a cikin bayanan ku.

Shafi posts:

Leave a Comment