I currently have a pandas Dataframe with 5 different instances
df = {
'Name': ['John', 'Mark', 'Kevin', 'Ron', 'Amira'
'ID': [110,111,112,113,114]
'Job title': ['xox','xoy','xoz','yow','uyt']
'Manager': ['River' 'Trevor', 'John', 'Lydia', 'Connor']
'M2': ['Shaun', 'Mary', 'Ronald', 'Cary', 'Miranda']
'M3': ['Clavis', 'Sharon', 'Randall', 'Mark', Doug']
'M4': ['Pat', 'Karen', 'Brad', 'Chad', 'Anita']
'M5': ['Ty', 'Jared', 'Bill', 'William', 'Bob']
'Location': ['US', 'US', 'JP', 'CN', 'JA']
}
list = ['River', 'Pat', 'Brad', 'William', 'Clogah']
I need to filter and drop all rows in the pandas dataframe that contain 0 values from my list and also those that contain more than one value from my list. In the case above the instances in row 1 and row 2 would be dropped because there's two of the names in the specific row within the list.
IN ROW 1 i.e. (1: 'John', 110, 'xox, 'River', 'Shaun', 'Clavis', 'Pat', 'Ty', 'US'): SEE BELOW -> IT WOULD BE DROPPED BECAUSE BOTH 'River' and 'Pat' are listed in the list
IN ROW 2 i.e. (2: 'Mark', 111, 'xoy, 'Trevor', 'Mary', 'Sharon', 'Karen', 'Jared', 'US'): SEE BELOW -> IT WOULD BE DROPPED BECAUSE BOTH 'Trevor' and 'Jared' are listed in the list
IN ROW 5 i.e. (5: 'Amira', 114, 'uyt', 'Connor', 'Miranda', 'Doug', 'Anita', 'Bob', 'JA'): SEE BELOW -> IT WOULD BE DROPPED BECAUSE the row does not contain any values from my list.
The two other instances would be kept.
Original Printed DF
0: 'Name', 'ID', 'Job title', 'Manager', 'M2', 'M3', 'M4', 'M5', 'Location'
1: 'John', 110, 'xox, 'River', 'Shaun', 'Clavis', 'Pat', 'Ty', 'US'
2: 'Mark', 111, 'xoy, 'Trevor', 'Mary', 'Sharon', 'Karen', 'Jared', 'US'
3: 'Kevin', 112, 'xoz, 'John', 'Ronald', 'Randall', 'Brad', 'Bill', 'JP
4: 'Ron', 113, 'yow', 'Lydia', 'Cary', 'Mark', 'Chad', 'William', 'CN'
5: 'Amira', 114, 'uyt', 'Connor', 'Miranda', 'Doug', 'Anita', 'Bob', 'JA'
Filtered Printed DF
3: 'Kevin', 112, 'xoz, 'John', 'Ronald', 'Randall', 'Brad', 'Bill', 'JP',
4: 'Ron', 113, 'yow', 'Lydia', 'Cary', 'Mark', 'Chad', 'William', 'CN',
The current process only filters out rows that don't contain a value equal to any value in my managers list. I want to keep rows with one manager from the list but not rows without mangers from the lis
CodePudding user response:
Not the prettiest way to achieve this, but this will work:
d = {
"Name": ["John", "Mark", "Kevin", "Ron", "Amira"],
"ID": [110, 111, 112, 113, 114],
"Job title": ["xox", "xoy", "xoz", "yow", "uyt"],
"M1": ["River", "Trevor", "John", "Lydia", "Connor"],
"M2": ["Shaun", "Mary", "Ronald", "Cary", "Miranda"],
"M3": ["Clavis", "Sharon", "Randall", "Mark", "Doug"],
"M4": ["Pat", "Karen", "Brad", "Chad", "Anita"],
"M5": ["Ty", "Jared", "Bill", "William", "Bob"],
"Location": ["US", "US", "JP", "CN", "JA"],
}
df = pd.DataFrame(d)
# Isolate managers in their own DataFrame
managers = ["River", "Pat", "Trevor", "Jared", "Connor"]
df_managers = df[["M1", "M2", "M3", "M4", "M5"]]
# Assess any one employee has less than two managers and isolate those employees
less_than_two_managers = []
for i in range(df_managers.shape[0]):
if len(set(df_managers.iloc[i]).intersection(set(managers))) < 2:
less_than_two_managers.append(True)
else:
less_than_two_managers.append(False)
df["LT two managers"] = less_than_two_managers
df[df["LT two managers"] == True]
CodePudding user response:
here you go:
import pandas as pd
df = pd.DataFrame({'Name': ['John', 'Mark', 'Kevin', 'Ron', 'Amira'],
'ID': [110, 111, 112, 113, 114],
'Job title': ['xox', 'xoy', 'xoz', 'yow', 'uyt'],
'Manager': ['River', 'Trevor', 'John', 'Lydia', 'Connor'],
'M2': ['Shaun', 'Mary', 'Ronald', 'Cary', 'Miranda'],
'M3': ['Clavis', 'Sharon', 'Randall', 'Mark', 'Doug'],
'M4': ['Pat', 'Karen', 'Brad', 'Chad', 'Anita'],
'M5': ['Ty', 'Jared', 'Bill', 'William', 'Bob'],
'Location': ['US', 'US', 'JP', 'CN', 'JA']}
)
managers = ['River', 'Pat', 'Trevor', 'Jared', 'Connor']
mask = df.applymap(lambda x: x in managers)
filtered_df = df[mask.values.sum(axis=1) < 2]
print(filtered_df)
to filter also the 0 (so only 1 manager will stay):
filtered_df = df[mask.values.sum(axis=1) == 1]
CodePudding user response:
Vectorial solution using a mask:
m = (df.filter(regex=r'^M')
.apply(lambda s: s.isin(lst))
.sum(1).eq(1)
)
out = df.loc[m]
Output:
Name ID Job title Manager M2 M3 M4 M5 Location
2 Kevin 112 xoz John Ronald Randall Brad Bill JP
3 Ron 113 yow Lydia Cary Mark Chad William CN