Home > Enterprise >  How to create a DataFrame column based on multiple columns values
How to create a DataFrame column based on multiple columns values

Time:03-12

I've a DataFrame that looks like this

Name Result_cat_1 Result1 Result_cat_2 Result_2 Result_cat_3 Result_3
Andrew Rock 12 plays Paper 4 plays Scissors 45 plays
John Paper 8 plays Scissors 15 plays Rock 76 plays
Ronald Scissors 6 plays Rock 3 plays Paper 23 plays

The end result should be like this:

Name Rock Paper Scissors
Andrew 12 plays 4 plays 45 plays
John 76 plays 8 plays 15 plays
Ronald 3 plays 23 plays 6 plays

I've tried

df['Rock'] = df[Result1].where(df[Result_cat_1] == 'Rock')
df['Rock'] = df[Result2].where(df[Result_cat_2] == 'Rock')
df['Rock'] = df[Result3].where(df[Result_cat_3] == 'Rock')

but the column contains only the df[Result_cat_3] overwriting the previous 2

CodePudding user response:

We could stack the "Results_cat" columns and "Results" columns separately; build a DataFrame with the stacked Series; then pivot:

df = df.set_index('Name')
out = (df.filter(like='Result_cat')
       .stack()
       .droplevel(1)
       .reset_index(name='result')
       .assign(play=df[['Result1', 'Result_2', 'Result_3']]
               .stack().to_numpy())
       .pivot('Name', 'result', 'play')
       .rename_axis(index=[None], columns=[None]))

Output:

           Paper      Rock  Scissors
Andrew   4 plays  12 plays  45 plays
John     8 plays  76 plays  15 plays
Ronald  23 plays   3 plays   6 plays

CodePudding user response:

In your case do

s1 = df.set_index('Name').iloc[:,::2].stack()
s2 = df.set_index('Name').iloc[:,1::2].stack()
out = pd.crosstab(s1.index.get_level_values(0),s1.values,s2.values,aggfunc='first')
Out[764]: 
col_0      Paper      Rock  Scissors
row_0                               
Andrew   4 plays  12 plays  45 plays
John     8 plays  76 plays  15 plays
Ronald  23 plays   3 plays   6 plays
  • Related