Home > Back-end >  Fill Pandas dataframe based on conditions from same row values (based on Excel COUNTIFS formula)
Fill Pandas dataframe based on conditions from same row values (based on Excel COUNTIFS formula)

Time:09-17

I have a table like the following, and I want to fill a new column based on conditions from other columns. In this case:

  • Having the same value in Cond1
  • Amount being different than zero
  • Being from previous months

Column "CountPreviousMonth" is what I need to fill out

Cond1 Amount Month CountPreviousMonth
a 10 1 0
a 20 2 1
a 15 3 2
b 10 1 0
b 0 2 1
b 15 3 1
c 5 1 0
c 25 2 1
c 15 3 2

When month is 1 then the count is zero because is the first one. In Cond1=b it stays at count = 1 because in Month 2 the Amount was zero

In Excel I used COUNTIFS but would like to do it in Python, where I could do it in a for loop but the real table has many rows and it wouldn't be efficient. Is there a better way to calculate it?

CodePudding user response:

First replace Month to missing values if Amount=0 ant hen use custom lambda function with Series.shift and forward filling missing values:

f = lambda x: x.shift(fill_value=0).ffill().astype(int)
df['count'] = df['Month'].mask(df['Amount'].eq(0)).groupby(df['Cond1']).transform(f)
print (df)
  Cond1  Amount  Month  CountPreviousMonth  count
0     a      10      1                   0      0
1     a      20      2                   1      1
2     a      15      3                   2      2
3     b      10      1                   0      0
4     b       0      2                   1      1
5     b      15      3                   1      1
6     c       5      1                   0      0
7     c      25      2                   1      1
8     c      15      3                   2      2

CodePudding user response:

Move down by 1, and check which ones are not equal to 0:

arr = df.Amount.shift().ne(0)

Get boolean where month is 1:

repl = df.Month.eq(1)

index arr with repl, to keep track of first month per group:

arr[repl] = True

Groupby, run a cumulative sum, and finally deduct 1, to ensure every group starts at 0:

df.assign(CountPreviousMonth = arr.groupby(df.Cond1).cumsum().sub(1))
  Cond1  Amount  Month  CountPreviousMonth
0     a      10      1                   0
1     a      20      2                   1
2     a      15      3                   2
3     b      10      1                   0
4     b       0      2                   1
5     b      15      3                   1
6     c       5      1                   0
7     c      25      2                   1
8     c      15      3                   2
  • Related