i have the table:
person | score | Job type |
---|---|---|
person 1 | 6.5 | job 1 |
person 1 | 4.3 | job 2 |
person 2 | 1.2 | job 1 |
person 2 | 3.4 | job 2 |
person 2 | 4.3 | job 3 |
i want to ad a column with the job type, with highest score, like this:
person | score | Job type | Higest score |
---|---|---|---|
person 1 | 6.5 | job 1 | job 1 |
person 1 | 4.3 | job 2 | job 1 |
person 2 | 1.2 | job 1 | job 3 |
person 2 | 3.4 | job 2 | job 3 |
person 3 | 4.3 | job 3 | job 3 |
Any idea how can I achieve this?
CodePudding user response:
One approach could be as follows:
- Use
df.groupby
on columnperson
, and getidxmax
for columnscore
, wrapped insidetransform
. - Use the result inside
df.loc
to select the correct entries fromJob type
, and addSeries.to_numpy
to keep only the values (dropping the index values, which won't match). - Assign to the new column.
df['Highest score'] = df.loc[df.groupby('person')['score'].transform('idxmax'),
'Job type'].to_numpy()
print(df)
person score Job type Highest score
0 person 1 6.5 job 1 job 1
1 person 1 4.3 job 2 job 1
2 person 2 1.2 job 1 job 3
3 person 2 3.4 job 2 job 3
4 person 2 4.3 job 3 job 3
CodePudding user response:
First, let's sort the data by score. Then let's group by person and get the last value of each group. The last value is the highest value:
mask=df.sort_values('score').groupby('person').tail(1).rename(columns={'Job type':'Higest_score'})
final=df.merge(mask[['person','Higest_score']],how='left')
final
'''
person score Job type Higest_score
0 person 1 6.5 job 1 job 1
1 person 1 4.3 job 2 job 1
2 person 2 1.2 job 1 job 3
3 person 2 3.4 job 2 job 3
4 person 2 4.3 job 3 job 3
'''
CodePudding user response:
df.groupby('person')['score'].transform(lambda x:df.loc[x.idxmax(), 'Job type'])
result:
0 job 1
1 job 1
2 job 3
3 job 3
4 job 3
Name: score, dtype: object
make result to Higest score
column
CodePudding user response:
With merge()
df = pd.DataFrame(
{'person': ['person 1', 'person 1', 'person 2', 'person 2', 'person 2'], 'score': [6.5, 4.3, 1.2, 3.4, 4.3],
'Job type': ['job 1', 'job 2', 'job 1', 'job 2', 'job 3']})
df = df.merge(df.loc[df.groupby('person', as_index=False).score.idxmax().score, ['person', 'Job type']], on='person',
how='right', suffixes=('', '_')).rename(columns={'Job type_': 'Higest score'})
print(df)
person score Job type Higest score
0 person 1 6.5 job 1 job 1
1 person 1 4.3 job 2 job 1
2 person 2 1.2 job 1 job 3
3 person 2 3.4 job 2 job 3
4 person 2 4.3 job 3 job 3