Home > Enterprise >  Return number of score increases by account
Return number of score increases by account

Time:08-11

I developed a piece of code that can count the number value increases for a particular column (in this example, that column is Anxiety).

Counting code:

len([b-a for a,b in zip(df['Anxiety'],df['Anxiety'][1:]) if b>a])

Setup code:

df = pd.DataFrame({'Account':[123,123,123,123,123,123,123,123,123,123,456,456,456,456],
                   'Anxiety':[0,1,np.nan,2,3,0,2,np.nan,np.nan,0,0,1,np.nan,3]})
df

However, two problems here. One is it doesn't account for different accounts, and it won't count properly if there is a null value in between values.

Expected output would be 4 for account 123, and 2 for account 456.

CodePudding user response:

here is one way to do it

#create a temp column 'diff' by taking a difference from previous row (excluding NaN), where difference is positive
# using groupby to sum the positive differences from previous rows

df.assign(
    diff=(df[df['Anxiety'].notna()]['Anxiety'].diff()>0).astype(int)
).groupby('Account')['diff'].sum()
Account
123    4.0
456    2.0
Name: diff, dtype: float64

CodePudding user response:

Like:

out = df[df['Anxiety'].notna()].groupby('Account')['Anxiety'].apply(
    lambda x: x[x > x.shift()].size)

print(out):

Account
123    4
456    2

CodePudding user response:

Try:

def n_incr(g):
    return (g.ffill().diff() > 0).sum()

>>> df.groupby('Account').agg(n_incr)
         Anxiety
Account         
123            4
456            2
  • Related