I have a dataframe with various columns, three of which are columns with lists (each cell has a list). These three columns have mutually exclusive values.
vot_in_favour vot_against vot_abstention
0 [A, B, C] [] [D, E]
1 [A, D, E] [C] [B]
2 [B, C] [A] [D, E]
I have another column which has the label (A, B, C, D or E)
I want a column vote
which has the name of the column according to the label, like the following:
label vote
0 A vot_in_favour
1 C vot_against
2 D vot_abstention
I tried something like df1['vote'] = df.drop("label", axis=1).isin(df["label"]).any(1)
, but do not know how to make this to match any value of the lists. I have visited similar issues, but the list columns are posing a challenge.
Thanks in advance for any help you can provide.
CodePudding user response:
Try this
# explode all three columns
exp_df = df.explode('vot_in_favour').explode('vot_against').explode('vot_abstention')
# compare labels with votes to find matches and assign index to be used later to remove duplicates
exp_df = exp_df.eq(exp_df.pop('label'), axis=0).assign(index=lambda x: x.index)
# remove duplicates and all False rows and get the matches in each row
df['vote'] = exp_df[exp_df.any(1) & ~exp_df.duplicated() & ~exp_df.pop('index').duplicated()].idxmax(1)
print(df)
vot_in_favour vot_against vot_abstention label vote
0 [A, B, C] [] [D, E] A vot_in_favour
1 [A, D, E] [C] [B] C vot_against
2 [B, C] [A] [D, E] D vot_abstention
CodePudding user response:
You can melt
, explode
and filter with loc
:
(df
.reset_index()
.melt(id_vars=['index', 'label'], var_name='vote')
.explode('value')
.set_index('index')
.loc[lambda d: d['label'].eq(d['value']), ['label', 'vote']]
)
output:
label vote
index
0 A vot_in_favour
1 C vot_against
2 D vot_abstention