I have a dataframe of ints:
mydf = pd.DataFrame([[0,0,0,1,0,2,2,5,2,4],
[0,1,0,0,2,2,4,5,3,3],
[1,1,1,1,2,2,0,4,4,4]])
I'd like to calculate something that resembles the gradient given by pd.Series.dff()
for each row, but with one big change: my ints represent categorical data, so I'm only interested in detecting a change, not the magnitude of it. So the step from 0 to 1 should be the same as the step from 0 to 4.
Is there a way for pandas to interpret my data as categorical in the data frame, and then calculate a Series.diff()
on that? Or could you "flatten" the output of Series.diff()
to be only 0s and 1s?
CodePudding user response:
If I understand you correctly, this is what you are trying to achieve:
import pandas as pd
mydf = pd.DataFrame([[0,0,0,1,0,2,2,5,2,4],
[0,1,0,0,2,2,4,5,3,3],
[1,1,1,1,2,2,0,4,4,4]])
mydf = mydf.astype("category")
diff_df = mydf.apply(lambda x: x.diff().ne(0), axis=1).astype(int)
The ne
returns a boolean array which indicates if the difference between consecutive values is different from zero. Then you use the astype
to convert the boolean values to integers (0s and 1s). The result is a dataframe with the same number of rows as the original dataframe, and the same number of columns, but with binary values indicating a change in the categorical value from one step to the next.
0 1 2 3 4 5 6 7 8 9
0 1 0 0 1 1 1 0 1 1 1
1 1 1 1 0 1 0 1 1 1 0
2 1 0 0 0 1 0 1 1 0 0