Subtract with value in previous row to create a new column by subject-CodePudding

Using python and this data set https://raw.githubusercontent.com/yadatree/AL/main/AK4.csv I would like to create a new column for each subject, that starts with 0 (in the first row) and then subtracts the SCALE value from row 2 from row 1, then row 3 from row 2, row 4 from row 3, etc.

However, if this produces a negative value, then to give the output of 0.

Edit: Thank you for the response. That worked perfectly. The only remaining issue is that I'd like to start again with each subject (SUBJECT column). The number of values for each subject is not fixed thus something that checks the SUBJECT column and then starts again from 0 would be ideal. screenshot

CodePudding user response：

You can use .shift(1) create new column with values moved from previous rows - and then you will have both values in the same row and you can substract columns.

And later you can selecte all negative results and assign zero

import pandas as pd

data = {
    'A': [1, 3, 2, 5, 1], 
}

df = pd.DataFrame(data)

df['previous'] = df['A'].shift(1)
df['result'] = df['A'] - df['previous']
print(df)

#df['result'] = df['A'] - df['A'].shift(1)
#print(df)

df.loc[ df['result'] < 0 , 'result'] = 0

print(df)

Result:

   A  previous  result
0  1       NaN     NaN
1  3       1.0     2.0
2  2       3.0    -1.0
3  5       2.0     3.0
4  1       5.0    -4.0

   A  previous  result
0  1       NaN     NaN
1  3       1.0     2.0
2  2       3.0     0.0
3  5       2.0     3.0
4  1       5.0     0.0

EDIT:

If you use df['result'] = df['A'] - df['A'].shift(1) then you get column result without creating column previous.

And if you use .shift(1, fill_value=0) then it will put 0 instead of NaN in first row.

EDIT:

You can use groupy("SUBJECT") to group by subject and later in every group you can put 0 in first row.

import pandas as pd

data = {
    'S': ['A', 'A', 'A', 'B', 'B', 'B'],
    'A': [1, 3, 2, 1, 5, 1], 
}

df = pd.DataFrame(data)

df['result'] = df['A'] - df['A'].shift(1, fill_value=0)
print(df)

df.loc[ df['result'] < 0 , 'result'] = 0
print(df)

all_groups = df.groupby('S')
first_index = all_groups.apply(lambda grp: grp.index[0])
df.loc[first_index, 'result'] = 0

print(df)

Results:

   S  A  result
0  A  1       1
1  A  3       2
2  A  2      -1
3  B  1      -1
4  B  5       4
5  B  1      -4

   S  A  result
0  A  1       1
1  A  3       2
2  A  2       0
3  B  1       0
4  B  5       4
5  B  1       0

   S  A  result
0  A  1       0
1  A  3       2
2  A  2       0
3  B  1       0
4  B  5       4
5  B  1       0