Home > Back-end >  Pandas: map new column to existing column based on condition
Pandas: map new column to existing column based on condition

Time:07-13

I have a table like this:

id q1_score q2_score q3_score q4_score quarter
1 -0.77 -0.55 0.21 3.42 q1
2 -0.77 -0.55 0.21 3.42 q1
3 -0.77 -0.55 0.21 3.42 q2
4 -0.77 -0.55 0.21 3.42 q3
5 -0.77 -0.55 0.21 3.42 q4

I want to create a new column quarter_score based on the value of quarter.

  1. If quarter==q1 then quarter_score=value from q1_score of that row.
  2. If quarter==q2 then quarter_score=value from q2_score of that row.
  3. so forth ...

My code looks like this

mapper = {'q1': df['q1_score'], 'q2':df['q2_score'], 'q3': df['q3_score'], 'q4': df['q4_score']}
df['quarter_score'] = df['quarter'].map(mapper)

but it does not take single value, it takes whole rows for that column for each row.

id q1_score q2_score q3_score q4_score quarter quarter_score
1 -0.77 -0.55 0.21 3.42 q1 0 -0.77 1 -0.77 2 -0.77
2 -0.77 -0.55 0.21 3.42 q1 0 -0.77 1 -0.77 2 -0.77
3 -0.77 -0.55 0.21 3.42 q2 0 -0.55 1 -0.55 2 -0.55
4 -0.77 -0.55 0.21 3.42 q3 0 0.21 1 0.21 2 0.21
5 -0.77 -0.55 0.21 3.42 q4 0 3.42 1 3.42 2 3.42

My final output should be like this below.

id q1_score q2_score q3_score q4_score quarter quarter_score
1 -0.77 -0.55 0.21 3.42 q1 -0.77
2 -0.77 -0.55 0.21 3.42 q1 -0.77
3 -0.77 -0.55 0.21 3.42 q2 -0.55
4 -0.77 -0.55 0.21 3.42 q3 0.21
5 -0.77 -0.55 0.21 3.42 q4 3.42

CodePudding user response:

Try:

df["quarter_score"] = df.apply(lambda x: x[x["quarter"]   "_score"], axis=1)
print(df)

Prints:

   id  q1_score  q2_score  q3_score  q4_score quarter  quarter_score
0   1     -0.77     -0.55      0.21      3.42      q1          -0.77
1   2     -0.77     -0.55      0.21      3.42      q1          -0.77
2   3     -0.77     -0.55      0.21      3.42      q2          -0.55
3   4     -0.77     -0.55      0.21      3.42      q3           0.21
4   5     -0.77     -0.55      0.21      3.42      q4           3.42

CodePudding user response:

For huge datasets, going down to numpy's level can be interesting

df['quarter_score'] = df.to_numpy()[np.arange(len(df)), 
                                    df['quarter'].str[-1].astype(int) - 1]
  • Related