Home > Enterprise >  Is there an efficient way to compute column values in Pandas using values from previous rows based o
Is there an efficient way to compute column values in Pandas using values from previous rows based o

Time:12-05

Consider looping through my DataFrame:

import pandas as pd

df = pd.DataFrame({
    'Price': [1000, 1000, 1000, 2000, 2000, 2000, 2000, 1400, 1400],
    'Count': [0, 0, 0, 0, 0, 0, 0, 0, 0]
})

for idx in df.index:
    if df['Price'].iloc[idx] > 1500:
        if idx > 0:
            df['Count'].iloc[idx] = df['Count'].iloc[idx - 1]   1

Resulting in:

Price Count
0 1000 0
1 1000 0
2 1000 0
3 2000 1
4 2000 2
5 2000 3
6 2000 4
7 1400 0
8 1400 0

Is there a more efficient way to do this?

CodePudding user response:

Use mask to hide values below 1500 and use cumsum to create the counter:

df['Count'] = df.mask(df['Price'] <= 1500)['Count'].add(1).cumsum().fillna(0).astype(int)
print(df)

# Output:
   Price  Count
0   1000      0
1   1000      0
2   1000      0
3   2000      1
4   2000      2
5   2000      3
6   2000      4
7   1400      0
8   1400      0

CodePudding user response:

Create pseudo-groups using Series.cumsum, then use groupby.cumcount to generate the within-group counts:

groups = df.Price.le(1500).cumsum()
df['Count'] = df.Price.gt(1500).groupby(groups).cumcount()

#    Price  Count
# 0   1000      0
# 1   1000      0
# 2   1000      0
# 3   2000      1
# 4   2000      2
# 5   2000      3
# 6   2000      4
# 7   1400      0
# 8   1400      0
  • Related