Home > Enterprise >  pandas: add values which were calculated after grouping to a column in the original dataframe
pandas: add values which were calculated after grouping to a column in the original dataframe

Time:12-22

I have a pandas dataframe and want to add a value to a new column ('new') to all instances of .groupby() based on another column ('A').

At the moment I am doing it in several steps by:
1- looping through all unique column A values
2- calculate the value to add (run function on a different column, e.g. 'B')
3- store the value I would like to add to 'new' in a separate list (just one instance in that group!)
4- zip the list of unique groups (.groupby('A').unique())
5- looping again through the zipped values to store them in the dataframe.

This is a very inefficient way, and takes a long time to run.
is there a native pandas way to do it in less steps and that will run faster?

Example code:

mylist = []
df_groups = df.groupby('A')
groups = df['A'].unique()
for group in groups:
  g = df_groups.get_group(group)
  idxmin = g.index.min()
  example = g.loc[idxmin]
  mylist.append(myfunction(example['B'])
zipped = zip(groups, mylist)
df['new'] = np.nan
for group, val in zipped:
  df.loc[df['A']==group, 'new'] = val

A better way to do that would be highly appreciated.


EDIT 1:
I could just run myfunction on all rows of the dataframe, but since its a heavy function, it would also take very long - so would prefer to run it as little as possible (that is, once per group).

CodePudding user response:

Please try this, if this is the ask, using min function here, you can replace it.

import pandas as pd 

data = {
  "calories": [400, 300, 300, 400],
  "duration": [50, 40, 45, 35]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)
df['min_value_duration'] = df.groupby('calories')['duration'].transform(min)

print(df) 

Reference: https://www.analyticsvidhya.com/blog/2020/03/understanding-transform-function-python/

  • Related