Home > Back-end >  Filtering dataframe in a loop with use of config file values
Filtering dataframe in a loop with use of config file values

Time:07-21

I have the following toy dataset

data = {"Subject":["1","2","3","3","4","5","5"],
    "date": ["2020-05-01 16:54:25","2020-05-03 10:31:18","2020-05-08 10:10:40","2020-05-08 10:10:42","2020-05-06 09:30:40","2020-05-07 12:46:30","2020-05-07 12:55:10"],
    "Accept": ["True","False","True","True","False","True","True"],
    "Amount" : [150,30,32,32,300,100,50],
    "accept_1": ["True","False","True","True","False","True","True"],
    "amount_1" : [20,30,32,32,150,100,30],
    "Transaction":["True","True","False","False","True","True","False"],
    "Label":["True","True","True","False","True","True","True"]}
     data = pd.DataFrame(data)

and a small toy config file

config = [{"colname": "Accept","KeepValue":"True","RemoveTrues":"True"},
    {"colname":"Transaction","KeepValue":"False","RemoveTrues":"False"}]

I want to loop through the dataset and apply these filters. After I have applied the first filter, I want to apply the following filter on the filtered data and so on.

I run the following code and it seems it applies the filter on the data the first time and then, it applies the second filter on the original data, not the filtered one.

for i in range(len(config)):
    filtering = config[i]
    if filtering["RemoveTrues"] == "True":
        col = filtering["colname"]
        test  = data[data[col] == filtering["KeepValue"]]
        print(test)
    else:
        col = filtering["colname"]
        test = data[(data[col]== filtering["KeepValue"]) | data["Label"]]
        print(test)

How can I apply the first filter on the data, then the second filter on the filtered data and so on ? I need to use a loop since I have to get the filters from the configuration file.

CodePudding user response:

I'd suggest changing your True/False strings to booleans. You can just assign a new value to df that will persist during the loop (don't create an extra test variable).

df = pd.DataFrame(data)

config = [{"colname": "Accept","KeepValue":"True","RemoveTrues":"True"},
    {"colname":"Transaction","KeepValue":"False","RemoveTrues":"False"}]

for conf in config:
    if conf["RemoveTrues"] == "True":
        df = df[df[conf['colname']] == conf['KeepValue']]
        print(df)
    else:
        df = df[(df[conf['colname']]== conf["KeepValue"]) | df["Label"]]
        print(df)

CodePudding user response:

From what I get, you want to save the filtering each time it happened, and from what I see in the code each loop you are trying to filter, but using the original reference for the data frame, which it's going to do the filter each time on the original dataframe, you have to change it to a new reference call it "test", and save it to the same reference "test" so it can be used in next loop

test = data.copy() # copy the original dataframe so we can refreance for it each time in loop
for i in range(len(config)):
    filtering = config[i]
    if filtering["RemoveTrues"] == "True":
        col = filtering["colname"]
        test  = test[test[col] == filtering["KeepValue"]] # change it to the new reference, and save it to the same reference so it can be used in next loop
        print(test)
    else:
        col = filtering["colname"]
        test = test[(test[col]== filtering["KeepValue"]) | test["Label"]] # change it to the new reference, and save it to the same reference so it can be used in next loop
        print(test)
  • Related