Home > Net >  add a new column based on a group without grouping
add a new column based on a group without grouping

Time:08-09

I have this reproducible data set where i need to add a column based for the 'best usage' source.

df_in = pd.DataFrame({
    'year': [ 5, 5, 5, 
             10, 10, 
             15, 15, 
             30, 30, 30 ],
    'usage': ['farm', 'best', '',
               'manual', 'best',
               'best',  'city',
               'random', 'best', 'farm'  ],
    'value': [0.825, 0.83, 0.85,
              0.935, 0.96,
              1.12, 1.305,
              1.34, 1.34, 1.455],       
    'source': ['wood', 'metal', 'water',
               'metal', 'water',
               'wood',  'water',
               'wood', 'metal', 'water'  ]})

desired outcome:

print(df)
   year   usage  value source   best
0     5    farm  0.825   wood  metal
1     5    best  0.830  metal  metal
2     5          0.850  water  metal
3    10  manual  0.935  metal  water
4    10    best  0.960  water  water
5    15    best  1.120   wood   wood
6    15    city  1.305  water   wood
7    30  random  1.340   wood  metal
8    30    best  1.340  metal  metal
9    30    farm  1.455  water  metal

Is there a way to do that without grouping? currently, i'm using:

grouped = df_in.groupby('usage').get_group('best')
grouped = grouped.rename(columns={'source': 'best'})
df = df_in.merge(grouped[['year','best']],how='outer', on='year')

CodePudding user response:

You could just query:

df_in.merge(df_in.query('usage=="best"')[['year','source']]
            .drop_duplicates('year')  # you might not need/want this line if `best` is unique per year (or doesn't need to be in the output)
            .rename(columns={'source':'best'}),
            on='year', how='left')

Output:

   year   usage  value source   best
0     5    farm  0.825   wood  metal
1     5    best  0.830  metal  metal
2     5          0.850  water  metal
3    10  manual  0.935  metal  water
4    10    best  0.960  water  water
5    15    best  1.120   wood   wood
6    15    city  1.305  water   wood
7    30  random  1.340   wood  metal
8    30    best  1.340  metal  metal
9    30    farm  1.455  water  metal

CodePudding user response:

Here is a way using .loc and .map()

(df.assign(best = df_in['year']
.map(df_in.loc[df_in['usage'].eq('best'),['year','source']]
.set_index('year')
.squeeze())))

Output:

   year   usage  value source   best
0     5    farm  0.825   wood  metal
1     5    best  0.830  metal  metal
2     5          0.850  water  metal
3    10  manual  0.935  metal  water
4    10    best  0.960  water  water
5    15    best  1.120   wood   wood
6    15    city  1.305  water   wood
7    30  random  1.340   wood  metal
8    30    best  1.340  metal  metal
9    30    farm  1.455  water  metal
  • Related