Home > Back-end >  Dataframe column value based on aggregation of several columns
Dataframe column value based on aggregation of several columns

Time:05-02

Say I have a pandas dataframe as below:

A   B   C
1   4   0.1
2   3   0.5
4   1   0.7
5   2   0.2
7   5   0.6

I want to loop through the rows in the dataframe, and for each row perform on aggregation on columns A and B as:

Agg = row[A] / row[A]   row[B]

A   B   C     Agg
1   4   0.1   0.2
2   3   0.5   0.4
4   1   0.7   0.8
5   2   0.2   0.7
7   5   0.6   0.6

For all values of Agg > 0.6, get their corresponding column C values into a list, i.e. 0.7 and 0.2 in this case.

Last step is to get the minimum of the list i.e. min(list) = 0.2 in this instance.

CodePudding user response:

We could use vectorized operations: add for addition, rdiv for division (for A/(A B)), gt for greater than comparison and loc for the filtering:

out = df.loc[df['A'].add(df['B']).rdiv(df['A']).gt(0.6), 'C'].min()

We could also derive the same result using query much more concisely:

out = df.query('A/(A B)>0.6')['C'].min()

Output:

0.2

CodePudding user response:

Instead of iterating, you can try creating an aggregate function and apply it across all rows.

def aggregate(row):
  return row["A"] / (row["A"]   row["B"])

df["Agg"] = round(df.apply(aggregate, axis = 1), 1)
df[df["Agg"] > 0.6]["C"].min()

Output -

0.2
  • Related