Home > Blockchain >  Filter out values that are in a list
Filter out values that are in a list

Time:10-20

I have a pandas dataframe that looks like this but with more columns. The values are beside "Department" in a list and are connected like for the first on Jenny is located in New York and is 30 years old:

Index Department Team Location Age
0 Accounting [Jenny, Juliet, John, Mark] [New York, Madrid, Los Angeles, Paris] [30,32,33,21]
1 Production [Romeo, Michael, Lara] [New York, Glasgow, London] [32,26,42]
2 Management [Marco, Patrick, Will, Lisa] [Barcelona, Delhi, Paris, Jakarta] [32,54,21,42]
3 Compliance [Claire, Franco, Maria] [Barcelona, Rom, Madrid] [23,42,21]

I would like to for example filter out the data from Romeo and drop the data for Romeo, New York, 32. How can I do this with Pandas?

Edit: To clarify, I would like to keep the initial Pandas Dataframe. Since I have to further use the dataframe. So the end result should look just like the dataframe above only without Romeo, New York and 32 in the second row. Would it be possible to filter out if only the Department "Production" and "Romeo" is given?

CodePudding user response:

Given the new Edit, as OP wants to keep the initial dataframe, but filter out the columns that satisfy all those specific conditions, there are various options to do that.

Option 1

Using a list comprehension

df_new = df[[False if 'Romeo' in x['Team'] and 'New York' in x['Location'] and 32 in x['Age'] else True for x in df.to_dict('records')]]

[Out]:

   Department  ...               Age
0  Accounting  ...  [30, 32, 33, 21]
2  Management  ...  [32, 54, 21, 42]
3  Compliance  ...      [23, 42, 21]

Option 2

Another one is using pandas.DataFrame.apply and a custom lambda function as follows

df_new = df[~df.apply(lambda x: 'Romeo' in x['Team'] and 'New York' in x['Location'] and 32 in x['Age'], axis=1)]

[Out]:

   Department  ...               Age
0  Accounting  ...  [30, 32, 33, 21]
2  Management  ...  [32, 54, 21, 42]
3  Compliance  ...      [23, 42, 21]

Notes:

  • There are some limitations on using .apply(). Read here more about it.

  • Depending on the use case, one can pass .reset_index(drop=True) at the end, to up with the following

       Department  ...               Age
    0  Accounting  ...  [30, 32, 33, 21]
    1  Management  ...  [32, 54, 21, 42]
    2  Compliance  ...      [23, 42, 21]
    
  • If, on another hand, one wants to keep a dataframe of the filtered rows (the rows that satisfy those specific requirements), the logic is not that different.

    • Considering the method used above for Option 1, it would be as follows

       df_new = df[[True if 'Romeo' in x['Team'] and 'New York' in x['Location'] and 32 in x['Age'] else False for x in df.to_dict('records')]]
      
       [Out]:
      
          Department  ...           Age
       1  Production  ...  [32, 26, 42]
      
    • Considering the method used above for Option 2, it would be as follows

       df_new = df[df.apply(lambda x: 'Romeo' in x['Team'] and 'New York' in x['Location'] and 32 in x['Age'], axis=1)]
      
       [Out]:
      
          Department  ...           Age
       1  Production  ...  [32, 26, 42]
      
  • Related