Kuxazululiwe: imigqa yesihlungi se-panda ngamavelu angaqondakali

Emhlabeni wokuhlaziya idatha, kuvamile ukuhlangana namasethi amakhulu edatha adinga ukukhohliswa nokucubungula idatha. Enye inkinga enjalo evame ukuvela ukuhlunga imigqa ngokusekelwe kumanani angaqondakali, ikakhulukazi uma kukhulunywa ngedatha yombhalo. I-Pandas, umtapo wezincwadi wePython odumile wokukhohlisa idatha, inikeza isisombululo esihle kakhulu sokusiza ukubhekana nalolu daba. Kulesi sihloko, sizocwilisa endleleni yokusebenzisa ama-Panda ukuze sihlunge imigqa sisebenzisa amanani angaqondakali, sihlole ikhodi ngesinyathelo ngesinyathelo, futhi sixoxe ngamalabhulali afanelekile nemisebenzi engasiza ekuxazululeni izinkinga ezifanayo.

Ukuze siqale ukubhekana nale nkinga, sizosebenzisa le nkinga AmaPandas umtapo wolwazi kanye ne fuzzywuzzy umtapo wolwazi osiza ukubala ukufana phakathi kweyunithi yezinhlamvu ehlukene. I fuzzywuzzy Umtapo wolwazi usebenzisa ibanga le-Levenshtein, isilinganiso sokufana esisekelwe enanini lokuhlelwa (okufakiwe, ukususwa, noma okushintshiwe) okudingekayo ukuze kuguqulwe iyunithi yezinhlamvu eyodwa ibe enye.

Ukufaka kanye Nokungenisa Amalabhulali Adingekayo

Ukuze siqale, sizodinga ukufaka futhi singenise amalabhulali adingekayo. Ungasebenzisa i-pip ukufaka kokubili amaPanda kanye ne-fuzzywuzzy:

pip install pandas
pip install fuzzywuzzy

Uma isifakiwe, ngenisa imitapo yolwazi kukhodi yakho yePython:

import pandas as pd
from fuzzywuzzy import fuzz, process

Ukuhlunga Imigqa Ngokusekelwe Kumanani Angaqondakali

Manje njengoba sesingenise amalabhulali adingekayo, masidale isethi yedatha eqanjiwe futhi sibonise indlela yokuhlunga imigqa ngokusekelwe kumanani angaqondakali. Kulesi sibonelo, isethi yethu yedatha izoqukatha amagama ezingubo kanye nezitayela ezihambisanayo.

data = {'Garment': ['T-shirt', 'Polo shirt', 'Jeans', 'Leather jacket', 'Winter coat'],
        'Style': ['Casual', 'Casual', 'Casual', 'Biker', 'Winter']}
df = pd.DataFrame(data)

Ngokucabanga ukuthi sifuna ukuhlunga imigqa equkethe izingubo ezinamagama afana nethi “Tee shirt”, kuzodingeka sisebenzise umtapo wezincwadi ongacacile ukuze sifeze lokhu.

search_string = "Tee shirt"
threshold = 70

def filter_rows(df, column, search_string, threshold):
    return df[df[column].apply(lambda x: fuzz.token_sort_ratio(x, search_string)) >= threshold]

filtered_df = filter_rows(df, 'Garment', search_string, threshold)

Kule khodi engenhla, sichaza umsebenzi imigqa_yokuhlunga lokho kuthatha amapharamitha amane: i-DataFrame, igama lekholomu, iyunithi yezinhlamvu zosesho, kanye nomkhawulo wokufana. Ibuyisela i-DataFrame ehlungiwe ngokusekelwe kumkhawulo oshiwo, obalwa kusetshenziswa i- fuzz.token_sort_ratio umsebenzi ovela kumtapo wolwazi we-fuzzywuzzy.

Ukuqonda Ikhodi Isinyathelo Ngesinyathelo

  • Okokuqala, sakha i-DataFrame ebizwa df equkethe isethi yethu yedatha.
  • Okulandelayo, sichaza iyunithi yezinhlamvu yethu yokusesha njenge-"Tee shirt" futhi simise umkhawulo wokufana ongu-70. Ungakwazi ukulungisa inani lomkhawulo ngokuya ngeleveli oyifunayo yokufana.
  • Sibe sesidala umsebenzi obizwa ngokuthi imigqa_yokuhlunga, ehlunga i-DataFrame ngokusekelwe ebangeni le-Levenshtein phakathi kweyunithi yezinhlamvu yosesho kanye nenani lomugqa ngamunye kukholamu eshiwo.
  • Ekugcineni, sibiza i- imigqa_yokuhlunga umsebenzi wokuthola i-DataFrame yethu ehlungiwe, ihlungiwe_df.

Sengiphetha, i-Pandas, ihlanganiswe nomtapo wolwazi ongacacile, iyithuluzi elihle kakhulu lokuhlunga imigqa ngokusekelwe kumanani angaqondakali. Ukuqonda la mamitapo kanye nemisebenzi yawo kusivumela ukuthi sikwazi ukukhohlisa idatha futhi sixazulule imisebenzi eyinkimbinkimbi yokucubungula idatha.

Okuthunyelwe okuhlobene:

Shiya amazwana