Home > Blockchain >  How Filter dataframe with dictionary in pandas?
How Filter dataframe with dictionary in pandas?

Time:11-15

My sample DataFrame is:

file Voucher_stand
90.txt SCA
90.txt SCB
60.txt WFA
60.txt WFO
90.txt SCA
50.txt SCA
80.txt SCA
100.txt SCA

I want to define a dictionary based on which I can select some columns, but each file has its own code in the Voucher_stand column, how can I filter this DataFrame by dictionary based on Voucher_stand and file instead of using the list every time?

For example, define Voucher_Stand and files in a dictionary and then filter my DataFrame based on this dictionary:

code = {'90.txt':['SCA', 'SCB'], '60.txt':['WFA', 'WFO']}

CodePudding user response:

Try this:

pd.DataFrame(df.set_index('file').groupby('file')['Voucher_stand'].apply(list)).T.reset_index(drop=True)

You can get it in a dict form by calling .to_dict():

pd.DataFrame(df.set_index('file').groupby('file')['Voucher_stand'].apply(list)).T.reset_index(drop=True).to_dict()

CodePudding user response:

You could iterate over the dictionary using items and filter the desired rows using equal comparison for the key and isin for the list of Voucher_stand codes. Then, get the row's index as list and save it into a variable. Use iloc to select only those rows that match the criteria.

import pandas as pd

df = pd.read_csv('sample.csv', sep='\s ')
print(df)

CODE = {'90.txt':['SCA', 'SCB'], '60.txt':['WFA', 'WFO']}

filtered_rows = []
for k,v in CODE.items():
    filtered_rows  = df[(df['file'] == k) & df['Voucher_stand'].isin(v)].index.to_list()

result = df.iloc[filtered_rows]
print(result)

Output from result

     file Voucher_stand
0  90.txt           SCA
1  90.txt           SCB
4  90.txt           SCA
2  60.txt           WFA
3  60.txt           WFO
  • Related