Home > Net >  Python Filter Dataframe with Dynamic arguments
Python Filter Dataframe with Dynamic arguments

Time:08-01

Hi i want to Filter a dataframe from arguments dynamically.

this is my idea now:

tr=pd.read_csv("sales.csv")

def filtr(*arg2):
    fltr = tr.loc[(tr[arg2[0]] arg2[1] arg2[2]) arg2[3] ....]
    print(fltr)
    
filtr(*sys.argv[1:])

## python test.py "Unit Cost" "==" 4 & .......

i had the idea of making the (tr[arg2[0]] arg2[1] arg2[2]) as body and iterating it but i don't know how.

edit: Data Example:

{'Region': {0: 'Sub-Saharan Africa', 1: 'Europe', 2: 'Middle East and North Africa', 3: 'Sub-Saharan Africa', 4: 'Europe', 5: 'Sub-Saharan Africa', 6: 'Asia', 7: 'Asia', 8: 'Sub-Saharan Africa', 9: 'Central America and the Caribbean', 10: 'Sub-Saharan Africa', 11: 'Europe', 12: 'Europe', 13: 'Asia', 14: 'Middle East and North Africa', 15: 'Australia and Oceania', 16: 'Central America and the Caribbean', 17: 'Europe', 18: 'Middle East and North Africa', 19: 'Europe'}, 'Country': {0: 'Chad', 1: 'Latvia', 2: 'Pakistan', 3: 'Democratic Republic of the Congo', 4: 'Czech Republic', 5: 'South Africa', 6: 'Laos', 7: 'China', 8: 'Eritrea', 9: 'Haiti', 10: 'Zambia', 11: 'Bosnia and Herzegovina', 12: 'Germany', 13: 'India', 14: 'Algeria', 15: 'Palau', 16: 'Cuba', 17: 'Vatican City', 18: 'Lebanon', 19: 'Lithuania'}, 'Item Type': {0: 'Office Supplies', 1: 'Beverages', 2: 'Vegetables', 3: 'Household', 4: 'Beverages', 5: 'Beverages', 6: 'Vegetables', 7: 'Baby Food', 8: 'Meat', 9: 'Office Supplies', 10: 'Cereal', 11: 'Baby Food', 12: 'Office Supplies', 13: 'Household', 14: 'Clothes', 15: 'Snacks', 16: 'Beverages', 17: 'Beverages', 18: 'Personal Care', 19: 'Snacks'}, 'Sales Channel': {0: 'Online', 1: 'Online', 2: 'Offline', 3: 'Online', 4: 'Online', 5: 'Offline', 6: 'Online', 7: 'Online', 8: 'Online', 9: 'Online', 10: 'Offline', 11: 'Offline', 12: 'Online', 13: 'Online', 14: 'Offline', 15: 'Offline', 16: 'Online', 17: 'Online', 18: 'Offline', 19: 'Offline'}, 'Order Priority': {0: 'L', 1: 'C', 2: 'C', 3: 'C', 4: 'C', 5: 'H', 6: 'L', 7: 'C', 8: 'L', 9: 'C', 10: 'M', 11: 'M', 12: 'C', 13: 'C', 14: 'C', 15: 'L', 16: 'H', 17: 'L', 18: 'H', 19: 'H'}, 'Order Date': {0: '1/27/2011', 1: '12/28/2015', 2: '1/13/2011', 3: '9/11/2012', 4: '10/27/2015', 5: '7/10/2012', 6: '2/20/2011', 7: '4/10/2017', 8: '11/21/2014', 9: '7/4/2015', 10: '7/26/2016', 11: '10/20/2012', 12: '2/22/2015', 13: '8/27/2016', 14: '6/21/2011', 15: '9/19/2013', 16: '11/15/2015', 17: '4/6/2015', 18: '4/12/2010', 19: '9/26/2011'}, 'Order ID': {0: 292494523, 1: 361825549, 2: 141515767, 3: 500364005, 4: 127481591, 5: 482292354, 6: 844532620, 7: 564251220, 8: 411809480, 9: 327881228, 10: 773452794, 11: 479823005, 12: 498603188, 13: 151717174, 14: 181401288, 15: 500204360, 16: 640987718, 17: 206925189, 18: 221503102, 19: 878520286}, 'Ship Date': {0: '2/12/2011', 1: '1/23/2016', 2: '2/1/2011', 3: '10/6/2012', 4: '12/5/2015', 5: '8/21/2012', 6: '3/20/2011', 7: '5/12/2017', 8: '1/10/2015', 9: '7/20/2015', 10: '8/24/2016', 11: '11/15/2012', 12: '2/27/2015', 13: '9/2/2016', 14: '7/21/2011', 15: '10/4/2013', 16: '11/30/2015', 17: '4/27/2015', 18: '5/19/2010', 19: '10/2/2011'}, 'Units Sold': {0: 4484, 1: 1075, 2: 6515, 3: 7683, 4: 3491, 5: 9880, 6: 4825, 7: 3330, 8: 2431, 9: 6197, 10: 724, 11: 9145, 12: 6618, 13: 5338, 14: 9527, 15: 441, 16: 1365, 17: 2617, 18: 6545, 19: 2530}, 'Unit Price': {0: 651.21, 1: 47.45, 2: 154.06, 3: 668.27, 4: 47.45, 5: 47.45, 6: 154.06, 7: 255.28, 8: 421.89, 9: 651.21, 10: 205.7, 11: 255.28, 12: 651.21, 13: 668.27, 14: 109.28, 15: 152.58, 16: 47.45, 17: 47.45, 18: 81.73, 19: 152.58}, 'Unit Cost': {0: 524.96, 1: 31.79, 2: 90.93, 3: 502.54, 4: 31.79, 5: 31.79, 6: 90.93, 7: 159.42, 8: 364.69, 9: 524.96, 10: 117.11, 11: 159.42, 12: 524.96, 13: 502.54, 14: 35.84, 15: 97.44, 16: 31.79, 17: 31.79, 18: 56.67, 19: 97.44}, 'Total Revenue': {0: 2920025.64, 1: 51008.75, 2: 1003700.9, 3: 5134318.41, 4: 165647.95, 5: 468806.0, 6: 743339.5, 7: 850082.4, 8: 1025614.59, 9: 4035548.37, 10: 148926.8, 11: 2334535.6, 12: 4309707.78, 13: 3567225.26, 14: 1041110.56, 15: 67287.78, 16: 64769.25, 17: 124176.65, 18: 534922.85, 19: 386027.4}, 'Total Cost': {0: 2353920.64, 1: 34174.25, 2: 592408.95, 3: 3861014.82, 4: 110978.89, 5: 314085.2, 6: 438737.25, 7: 530868.6, 8: 886561.39, 9: 3253177.12, 10: 84787.64, 11: 1457895.9, 12: 3474185.28, 13: 2682558.52, 14: 341447.68, 15: 42971.04, 16: 43393.35, 17: 83194.43, 18: 370905.15, 19: 246523.2}, 'Total Profit': {0: 566105.0, 1: 16834.5, 2: 411291.95, 3: 1273303.59, 4: 54669.06, 5: 154720.8, 6: 304602.25, 7: 319213.8, 8: 139053.2, 9: 782371.25, 10: 64139.16, 11: 876639.7, 12: 835522.5, 13: 884666.74, 14: 699662.88, 15: 24316.74, 16: 21375.9, 17: 40982.22, 18: 164017.7, 19: 139504.2}}

CodePudding user response:

Just use eval() and here are the code:

import pandas as pd

def filter_df(df, args_list):
    constraints = []
    for a in args_list:
        col = a[0]
        symbol = a[1]
        value = a[2]
        constraint = "(df.{}{}{})".format(col, symbol, value)
        constraints.append(constraint)
    
    filter_str = "&".join(constraints)

    return df[eval(filter_str)]

data = {
    "COL_A": [1,2,3,2,4,6],
    "COL_B": [1,10,100,20,20,40],
    "COL_C": ["aaa", "bbb", "zzz", "xxx", "xxx", "xxx"]
}
df = pd.DataFrame(data)

args_list = [["COL_A", "<=", "4"], ["COL_C", "==", "'xxx'"]]

df2 = filter_df(df, args_list)

This is df:

enter image description here

After filter COL_A <= 4 & COL_C == 'xxx', this is df2:

enter image description here

CodePudding user response:

How about this ?

def filter(df, **args):
    conditions = args["args"]
    
    for key , value in conditions.items():
        df = df[df[key] > value]
        
    return df
    

Invoke using

df = filter(df, args={"Unit Cost": 500, "Unit Price": 500})

Result:

print(df.shape)
(5,14)

Note: This approach can be used only when you want to compare all the conditions using >. if you need to include multiple operation, you may need to find a better approach

  • Related