I have a data frame like this:
df:
ID Names
3 [Ally, Ben, Cris]
5 [Bruno, Coleen, Flyn]
2 [Dave, Bob]
7 [Rob, Ally, Bob]
11 [Jill, Tom, Sal]
The Names column is a list of names. Some of them could be repeated.
I want to filter the data frame on Names columns where the names start with either A or B or D.
So my output should look like this:
ID Names
3 [Ally, Ben]
5 [Bruno]
2 [Dave, Bob]
7 [Ally, Bob]
Reproducible input:
df = pd.DataFrame({'ID': [3, 5, 2, 7, 11],
'Names': [['Ally', 'Ben', 'Cris'],
['Bruno', 'Coleen', 'Flyn'],
['Dave', 'Bob'],
['Rob', 'Ally', 'Bob'],
['Jill', 'Tom', 'Sal']]
})
CodePudding user response:
You can use a list comprehension to filter the names, and boolean indexing to filter the rows:
target = {'A', 'B', 'D'}
df['Names'] = [[n for n in l if n[0] in target] for l in df['Names']]
df = df[df['Names'].str.len().gt(0)]
Or using explode
, and groupby.agg
:
s = (df['Names']
.explode()
.loc[lambda x: x.str[0].isin(['A', 'B', 'D'])]
.groupby(level=0).agg(list)
)
df = df.loc[s.index].assign(Names=s)
output:
ID Names
0 3 [Ally, Ben]
1 5 [Bruno]
2 2 [Dave, Bob]
3 7 [Ally, Bob]
CodePudding user response:
just a slight variance of this solution:
df = (df.explode('Names').
query("Names.str[0].isin(['A','B','D'])",engine='python').
groupby('ID').
agg(list).
reset_index())
print(df)
'''
ID Names
0 2 [Dave, Bob]
1 3 [Ally, Ben]
2 5 [Bruno]
3 7 [Ally, Bob]