Home > database >  how to change values of rows based on groupby
how to change values of rows based on groupby

Time:12-16

I have the following df:

    df = pd.DataFrame({'ID':[1,1,1,1,1,2,2,3],'ALGO':['A','A','B','B','B','B','B','A'],'VALUES':[0.8,0.2,0.4,0.3,0.3,0.8,0.2,1]})

    ID  ALGO    VALUES
0   1   A   0.80
1   1   A   0.20
2   1   B   0.40
3   1   B   0.30
4   1   B   0.30
5   2   B   0.80
6   2   B   0.20
7   3   A   1.00

I wish to change the VALUES (x10) if the ID's Algo has A. So in the above df only ID 1 and 3 has A's algo hence, the values should be x10.

To achieve this I do:

def foo(id_data):     
    if any(id_data['ALGO']=='A'):
        id_data['VALUES']=id_data['VALUES']*10
    return id_data['VALUES']

new_column  = df.groupby('ID').apply(lambda x: foo(x))
df["VALUES"] = new_column.reset_index(level=0, drop=True)

desired results:

    ID  ALGO    VALUES
0   1   A   8.00
1   1   A   2.00
2   1   B   4.00
3   1   B   3.00
4   1   B   3.00
5   2   B   0.80
6   2   B   0.20
7   3   A   10.00

My Question: say I wish to iterate over the group of IDs and make the adjustment there (inside my for loop), how should it be done?

m = df.groupby(['ID'])
for i in m:
    if any(i['ALGO']=='A'): //this of course will not work
          i['VALUES']=i['VALUES']*10

CodePudding user response:

This should work for you (I had to change the var names because of some errors).

Note that below I created a new list new_groups and appended each group to it, because changing the groups don't actually modify the dataframe. Then I use pd.concat(new_groups) to join them all together vertically.

new_groups = []
groups = df.groupby(['ID'])
for idx, group in groups:
    if (group['ALGO'] == 'A').any():
          group['VALUES'] = group['VALUES'] * 10

    new_groups.append(group)
 
new_df = pd.concat(new_groups)

Output:

>>> new_df
   ID ALGO  VALUES
0   1    A     8.0
1   1    A     2.0
2   1    B     4.0
3   1    B     3.0
4   1    B     3.0
5   2    B     0.8
6   2    B     0.2
7   3    A    10.0

A much more efficient solution, though, would be:

df['VALUES'] = df.groupby('ID').apply(lambda x: x['VALUES'] * (10 if x['ALGO'].isin(['A']).any() else 1)).reset_index(drop=True)

Output:

   ID ALGO  VALUES
0   1    A     8.0
1   1    A     2.0
2   1    B     4.0
3   1    B     3.0
4   1    B     3.0
5   2    B     0.8
6   2    B     0.2
7   3    A    10.0

CodePudding user response:

Use update:

for _, sub in df.groupby('ID'):
    if any(sub['ALGO'] == 'A'):
        df.update(sub['VALUES'] * 10)
print(df)

# Output:
   ID ALGO  VALUES
0   1    A     8.0
1   1    A     2.0
2   1    B     4.0
3   1    B     3.0
4   1    B     3.0
5   2    B     0.8
6   2    B     0.2
7   3    A    10.0

CodePudding user response:

I believe this works:

df.loc[df["ALGO"] == "A", "VALUEs"] *= 10

CodePudding user response:

Another option using transform. This should be more efficient than using loop:

df['VALUES'] = np.where(df.groupby('ID')['ALGO'].transform(lambda x: (x=='A').any()), df['VALUES'].to_numpy()*10, df['VALUES'].to_numpy())

Output

   ID ALGO  VALUES
0   1    A     8.0
1   1    A     2.0
2   1    B     4.0
3   1    B     3.0
4   1    B     3.0
5   2    B     0.8
6   2    B     0.2
7   3    A    10.0
  • Related