Say I have a pandas dataframe as below:
A B C
1 4 0.1
2 3 0.5
4 1 0.7
5 2 0.2
7 5 0.6
I want to loop through the rows in the dataframe, and for each row perform on aggregation on columns A and B as:
Agg = row[A] / row[A] row[B]
A B C Agg
1 4 0.1 0.2
2 3 0.5 0.4
4 1 0.7 0.8
5 2 0.2 0.7
7 5 0.6 0.6
For all values of Agg > 0.6, get their corresponding column C values into a list, i.e. 0.7
and 0.2
in this case.
Last step is to get the minimum of the list i.e. min(list) = 0.2
in this instance.
CodePudding user response:
We could use vectorized operations: add
for addition, rdiv
for division (for A/(A B)), gt
for greater than comparison and loc
for the filtering:
out = df.loc[df['A'].add(df['B']).rdiv(df['A']).gt(0.6), 'C'].min()
We could also derive the same result using query
much more concisely:
out = df.query('A/(A B)>0.6')['C'].min()
Output:
0.2
CodePudding user response:
Instead of iterating, you can try creating an aggregate
function and apply it across all rows.
def aggregate(row):
return row["A"] / (row["A"] row["B"])
df["Agg"] = round(df.apply(aggregate, axis = 1), 1)
df[df["Agg"] > 0.6]["C"].min()
Output -
0.2