I have a pandas dataframe that looks like this but with more columns. The values are beside "Department" in a list and are connected like for the first on Jenny is located in New York and is 30 years old:
Index | Department | Team | Location | Age |
---|---|---|---|---|
0 | Accounting | [Jenny, Juliet, John, Mark] | [New York, Madrid, Los Angeles, Paris] | [30,32,33,21] |
1 | Production | [Romeo, Michael, Lara] | [New York, Glasgow, London] | [32,26,42] |
2 | Management | [Marco, Patrick, Will, Lisa] | [Barcelona, Delhi, Paris, Jakarta] | [32,54,21,42] |
3 | Compliance | [Claire, Franco, Maria] | [Barcelona, Rom, Madrid] | [23,42,21] |
I would like to for example filter out the data from Romeo and drop the data for Romeo, New York, 32. How can I do this with Pandas?
Edit: To clarify, I would like to keep the initial Pandas Dataframe. Since I have to further use the dataframe. So the end result should look just like the dataframe above only without Romeo, New York and 32 in the second row. Would it be possible to filter out if only the Department "Production" and "Romeo" is given?
CodePudding user response:
Given the new Edit, as OP wants to keep the initial dataframe, but filter out the columns that satisfy all those specific conditions, there are various options to do that.
Option 1
Using a list comprehension
df_new = df[[False if 'Romeo' in x['Team'] and 'New York' in x['Location'] and 32 in x['Age'] else True for x in df.to_dict('records')]]
[Out]:
Department ... Age
0 Accounting ... [30, 32, 33, 21]
2 Management ... [32, 54, 21, 42]
3 Compliance ... [23, 42, 21]
Option 2
Another one is using pandas.DataFrame.apply
and a custom lambda function as follows
df_new = df[~df.apply(lambda x: 'Romeo' in x['Team'] and 'New York' in x['Location'] and 32 in x['Age'], axis=1)]
[Out]:
Department ... Age
0 Accounting ... [30, 32, 33, 21]
2 Management ... [32, 54, 21, 42]
3 Compliance ... [23, 42, 21]
Notes:
There are some limitations on using
.apply()
. Read here more about it.Depending on the use case, one can pass
.reset_index(drop=True)
at the end, to up with the followingDepartment ... Age 0 Accounting ... [30, 32, 33, 21] 1 Management ... [32, 54, 21, 42] 2 Compliance ... [23, 42, 21]
If, on another hand, one wants to keep a dataframe of the filtered rows (the rows that satisfy those specific requirements), the logic is not that different.
Considering the method used above for Option 1, it would be as follows
df_new = df[[True if 'Romeo' in x['Team'] and 'New York' in x['Location'] and 32 in x['Age'] else False for x in df.to_dict('records')]] [Out]: Department ... Age 1 Production ... [32, 26, 42]
Considering the method used above for Option 2, it would be as follows
df_new = df[df.apply(lambda x: 'Romeo' in x['Team'] and 'New York' in x['Location'] and 32 in x['Age'], axis=1)] [Out]: Department ... Age 1 Production ... [32, 26, 42]