I first need to groupby a column, remove the unwanted values, and then unpack or unzip it into the next row.
My dataset looks like this:
Text tag
drink coke mic
eat pizza mic
eat fruits yes
eat banana yes
eat banana mic
eat fruits mic
eat pizza no
eat pizza mic
eat pizza yes
drink coke yes
drink coke no
drink coke no
drink coke yes
I used this function to groupby.
df = pd.DataFrame(df.groupby(['text'])['tag'].apply(lambda x: list(x.values)))
Text labels
eat pizza [mic,no,mic,yes]
eat fruits [yes,mic]
eat banana [yes,mic]
drink coke [yes,yes,no,no,yes]
If in the columns labels there is a 'no' and a 'yes', I need to remove those values from the column labels, and the unpack back.
The output should look like this.
Text tag
drink coke mic
eat pizza mic
eat fruits yes
eat banana yes
eat banana mic
eat fruits mic
eat pizza mic
CodePudding user response:
Doing:
# Answer, does the group contain both yes and no?
contains_both = (df.groupby('Text')['tag']
.transform(lambda x: all(i in x.values for i in ('yes', 'no'))))
# We'll keep it if it doesn't contain both yes and no
# But if it does, remove the yes and no.
df = df[~contains_both | ~df.tag.isin(['yes', 'no'])]
print(df)
Output:
Text tag
0 drink coke mic
1 eat pizza mic
2 eat fruits yes
3 eat banana yes
4 eat banana mic
5 eat fruits mic
7 eat pizza mic
FYI, your df
calculation could be shortened to:
df = df.groupby('Text', as_index=False)['tag'].agg(list)
# Output:
Text tag
0 drink coke [mic, yes, no, no, yes]
1 eat banana [yes, mic]
2 eat fruits [yes, mic]
3 eat pizza [mic, no, mic, yes]