I have a table like the following, and I want to fill a new column based on conditions from other columns. In this case:
- Having the same value in Cond1
- Amount being different than zero
- Being from previous months
Column "CountPreviousMonth" is what I need to fill out
Cond1 | Amount | Month | CountPreviousMonth |
---|---|---|---|
a | 10 | 1 | 0 |
a | 20 | 2 | 1 |
a | 15 | 3 | 2 |
b | 10 | 1 | 0 |
b | 0 | 2 | 1 |
b | 15 | 3 | 1 |
c | 5 | 1 | 0 |
c | 25 | 2 | 1 |
c | 15 | 3 | 2 |
When month is 1 then the count is zero because is the first one. In Cond1=b it stays at count = 1 because in Month 2 the Amount was zero
In Excel I used COUNTIFS but would like to do it in Python, where I could do it in a for loop but the real table has many rows and it wouldn't be efficient. Is there a better way to calculate it?
CodePudding user response:
First replace Month
to missing values if Amount=0
ant hen use custom lambda function with Series.shift
and forward filling missing values:
f = lambda x: x.shift(fill_value=0).ffill().astype(int)
df['count'] = df['Month'].mask(df['Amount'].eq(0)).groupby(df['Cond1']).transform(f)
print (df)
Cond1 Amount Month CountPreviousMonth count
0 a 10 1 0 0
1 a 20 2 1 1
2 a 15 3 2 2
3 b 10 1 0 0
4 b 0 2 1 1
5 b 15 3 1 1
6 c 5 1 0 0
7 c 25 2 1 1
8 c 15 3 2 2
CodePudding user response:
Move down by 1, and check which ones are not equal to 0:
arr = df.Amount.shift().ne(0)
Get boolean where month is 1:
repl = df.Month.eq(1)
index arr with repl, to keep track of first month per group:
arr[repl] = True
Groupby, run a cumulative sum, and finally deduct 1, to ensure every group starts at 0:
df.assign(CountPreviousMonth = arr.groupby(df.Cond1).cumsum().sub(1))
Cond1 Amount Month CountPreviousMonth
0 a 10 1 0
1 a 20 2 1
2 a 15 3 2
3 b 10 1 0
4 b 0 2 1
5 b 15 3 1
6 c 5 1 0
7 c 25 2 1
8 c 15 3 2