My sample DataFrame is:
file | Voucher_stand |
---|---|
90.txt | SCA |
90.txt | SCB |
60.txt | WFA |
60.txt | WFO |
90.txt | SCA |
50.txt | SCA |
80.txt | SCA |
100.txt | SCA |
I want to define a dictionary based on which I can select some columns, but each file has its own code in the Voucher_stand column, how can I filter this DataFrame by dictionary based on Voucher_stand and file instead of using the list every time?
For example, define Voucher_Stand and files in a dictionary and then filter my DataFrame based on this dictionary:
code = {'90.txt':['SCA', 'SCB'], '60.txt':['WFA', 'WFO']}
CodePudding user response:
Try this:
pd.DataFrame(df.set_index('file').groupby('file')['Voucher_stand'].apply(list)).T.reset_index(drop=True)
You can get it in a dict
form by calling .to_dict()
:
pd.DataFrame(df.set_index('file').groupby('file')['Voucher_stand'].apply(list)).T.reset_index(drop=True).to_dict()
CodePudding user response:
You could iterate over the dictionary using items
and filter the desired rows using equal comparison for the key and isin
for the list of Voucher_stand
codes. Then, get the row's index as list and save it into a variable. Use iloc
to select only those rows that match the criteria.
import pandas as pd
df = pd.read_csv('sample.csv', sep='\s ')
print(df)
CODE = {'90.txt':['SCA', 'SCB'], '60.txt':['WFA', 'WFO']}
filtered_rows = []
for k,v in CODE.items():
filtered_rows = df[(df['file'] == k) & df['Voucher_stand'].isin(v)].index.to_list()
result = df.iloc[filtered_rows]
print(result)
Output from result
file Voucher_stand
0 90.txt SCA
1 90.txt SCB
4 90.txt SCA
2 60.txt WFA
3 60.txt WFO