I have the following df:
df = pd.DataFrame({'ID':[1,1,1,1,1,2,2,3],'ALGO':['A','A','B','B','B','B','B','A'],'VALUES':[0.8,0.2,0.4,0.3,0.3,0.8,0.2,1]})
ID ALGO VALUES
0 1 A 0.80
1 1 A 0.20
2 1 B 0.40
3 1 B 0.30
4 1 B 0.30
5 2 B 0.80
6 2 B 0.20
7 3 A 1.00
I wish to change the VALUES (x10) if the ID's Algo has A. So in the above df only ID 1 and 3 has A's algo hence, the values should be x10.
To achieve this I do:
def foo(id_data):
if any(id_data['ALGO']=='A'):
id_data['VALUES']=id_data['VALUES']*10
return id_data['VALUES']
new_column = df.groupby('ID').apply(lambda x: foo(x))
df["VALUES"] = new_column.reset_index(level=0, drop=True)
desired results:
ID ALGO VALUES
0 1 A 8.00
1 1 A 2.00
2 1 B 4.00
3 1 B 3.00
4 1 B 3.00
5 2 B 0.80
6 2 B 0.20
7 3 A 10.00
My Question: say I wish to iterate over the group of IDs and make the adjustment there (inside my for loop), how should it be done?
m = df.groupby(['ID'])
for i in m:
if any(i['ALGO']=='A'): //this of course will not work
i['VALUES']=i['VALUES']*10
CodePudding user response:
This should work for you (I had to change the var names because of some errors).
Note that below I created a new list new_groups
and appended each group to it, because changing the groups don't actually modify the dataframe. Then I use pd.concat(new_groups)
to join them all together vertically.
new_groups = []
groups = df.groupby(['ID'])
for idx, group in groups:
if (group['ALGO'] == 'A').any():
group['VALUES'] = group['VALUES'] * 10
new_groups.append(group)
new_df = pd.concat(new_groups)
Output:
>>> new_df
ID ALGO VALUES
0 1 A 8.0
1 1 A 2.0
2 1 B 4.0
3 1 B 3.0
4 1 B 3.0
5 2 B 0.8
6 2 B 0.2
7 3 A 10.0
A much more efficient solution, though, would be:
df['VALUES'] = df.groupby('ID').apply(lambda x: x['VALUES'] * (10 if x['ALGO'].isin(['A']).any() else 1)).reset_index(drop=True)
Output:
ID ALGO VALUES
0 1 A 8.0
1 1 A 2.0
2 1 B 4.0
3 1 B 3.0
4 1 B 3.0
5 2 B 0.8
6 2 B 0.2
7 3 A 10.0
CodePudding user response:
Use update
:
for _, sub in df.groupby('ID'):
if any(sub['ALGO'] == 'A'):
df.update(sub['VALUES'] * 10)
print(df)
# Output:
ID ALGO VALUES
0 1 A 8.0
1 1 A 2.0
2 1 B 4.0
3 1 B 3.0
4 1 B 3.0
5 2 B 0.8
6 2 B 0.2
7 3 A 10.0
CodePudding user response:
I believe this works:
df.loc[df["ALGO"] == "A", "VALUEs"] *= 10
CodePudding user response:
Another option using transform
. This should be more efficient than using loop:
df['VALUES'] = np.where(df.groupby('ID')['ALGO'].transform(lambda x: (x=='A').any()), df['VALUES'].to_numpy()*10, df['VALUES'].to_numpy())
Output
ID ALGO VALUES
0 1 A 8.0
1 1 A 2.0
2 1 B 4.0
3 1 B 3.0
4 1 B 3.0
5 2 B 0.8
6 2 B 0.2
7 3 A 10.0