find highest value in group, and add value of other column to a new row in python pandas-CodePudding

i have the table:

person	score	Job type
person 1	6.5	job 1
person 1	4.3	job 2
person 2	1.2	job 1
person 2	3.4	job 2
person 2	4.3	job 3

i want to ad a column with the job type, with highest score, like this:

person	score	Job type	Higest score
person 1	6.5	job 1	job 1
person 1	4.3	job 2	job 1
person 2	1.2	job 1	job 3
person 2	3.4	job 2	job 3
person 3	4.3	job 3	job 3

Any idea how can I achieve this?

CodePudding user response：

One approach could be as follows:

Use df.groupby on column person, and get idxmax for column score, wrapped inside transform.
Use the result inside df.loc to select the correct entries from Job type, and add Series.to_numpy to keep only the values (dropping the index values, which won't match).
Assign to the new column.

df['Highest score'] = df.loc[df.groupby('person')['score'].transform('idxmax'),
                             'Job type'].to_numpy()

print(df)

     person  score Job type Highest score
0  person 1    6.5    job 1         job 1
1  person 1    4.3    job 2         job 1
2  person 2    1.2    job 1         job 3
3  person 2    3.4    job 2         job 3
4  person 2    4.3    job 3         job 3

CodePudding user response：

First, let's sort the data by score. Then let's group by person and get the last value of each group. The last value is the highest value:

mask=df.sort_values('score').groupby('person').tail(1).rename(columns={'Job type':'Higest_score'})

final=df.merge(mask[['person','Higest_score']],how='left')
final
'''
    person      score   Job type    Higest_score
0   person 1    6.5     job 1       job 1
1   person 1    4.3     job 2       job 1
2   person 2    1.2     job 1       job 3
3   person 2    3.4     job 2       job 3
4   person 2    4.3     job 3       job 3

'''

CodePudding user response：

df.groupby('person')['score'].transform(lambda x:df.loc[x.idxmax(), 'Job type'])

result:

0    job 1
1    job 1
2    job 3
3    job 3
4    job 3
Name: score, dtype: object

make result to Higest score column

CodePudding user response：

With merge()

df = pd.DataFrame(
    {'person': ['person 1', 'person 1', 'person 2', 'person 2', 'person 2'], 'score': [6.5, 4.3, 1.2, 3.4, 4.3],
     'Job type': ['job 1', 'job 2', 'job 1', 'job 2', 'job 3']})

df = df.merge(df.loc[df.groupby('person', as_index=False).score.idxmax().score, ['person', 'Job type']], on='person',
              how='right', suffixes=('', '_')).rename(columns={'Job type_': 'Higest score'})
print(df)

     person  score Job type Higest score
0  person 1    6.5    job 1        job 1
1  person 1    4.3    job 2        job 1
2  person 2    1.2    job 1        job 3
3  person 2    3.4    job 2        job 3
4  person 2    4.3    job 3        job 3