Home > database >  Get aggregated column with names of other columns (which are list columns)
Get aggregated column with names of other columns (which are list columns)

Time:05-19

I have a dataframe with various columns, three of which are columns with lists (each cell has a list). These three columns have mutually exclusive values.

   vot_in_favour vot_against vot_abstention
0   [A, B, C]      []          [D, E]
1   [A, D, E]      [C]         [B]
2   [B, C]         [A]         [D, E]

I have another column which has the label (A, B, C, D or E) I want a column vote which has the name of the column according to the label, like the following:

  label vote
0   A   vot_in_favour
1   C   vot_against
2   D   vot_abstention

I tried something like df1['vote'] = df.drop("label", axis=1).isin(df["label"]).any(1), but do not know how to make this to match any value of the lists. I have visited similar issues, but the list columns are posing a challenge. Thanks in advance for any help you can provide.

CodePudding user response:

Try this

# explode all three columns
exp_df = df.explode('vot_in_favour').explode('vot_against').explode('vot_abstention')
# compare labels with votes to find matches and assign index to be used later to remove duplicates
exp_df = exp_df.eq(exp_df.pop('label'), axis=0).assign(index=lambda x: x.index)
# remove duplicates and all False rows and get the matches in each row
df['vote'] = exp_df[exp_df.any(1) & ~exp_df.duplicated() & ~exp_df.pop('index').duplicated()].idxmax(1)
print(df)
  vot_in_favour vot_against vot_abstention label            vote
0     [A, B, C]          []         [D, E]     A   vot_in_favour
1     [A, D, E]         [C]            [B]     C     vot_against
2        [B, C]         [A]         [D, E]     D  vot_abstention

CodePudding user response:

You can melt, explode and filter with loc:

(df
 .reset_index()
 .melt(id_vars=['index', 'label'], var_name='vote')
 .explode('value')
 .set_index('index')
 .loc[lambda d: d['label'].eq(d['value']), ['label', 'vote']]
)

output:

      label            vote
index                      
0         A   vot_in_favour
1         C     vot_against
2         D  vot_abstention
  • Related