I am looking for an efficient way to turn this pandas dataframe:
A B C
0 0 1 0
1 0 1 1
2 1 1 1
3 1 1 0
4 0 0 1
into
A B C
0 0 1 0
1 0 0 1
2 1 0 0
3 0 0 0
4 0 0 1
I only want "1" in a cell, if in the original dataframe the value jumps from "0" to "1". If it's the first row, I want a "1", if "1" is the start value. I have to use this operation often in my project and on a large dataframe, so it should be as efficient as possible. Thanks in advance!
CodePudding user response:
You can use:
df.diff().clip(0).fillna(df)
output:
A B C
0 0 1 0
1 0 0 1
2 1 0 0
3 0 0 0
4 0 0 1
CodePudding user response:
This code snippet should do exactly what you need:
import pandas as pd
df = pd.DataFrame({'A':[0,0,1,1,0],'B':[1,1,1,1,0],'C':[0,1,1,1,1]})
df.loc[-1] = len(df.columns)*[0]
df.index = df.index 1
df.sort_index(inplace=True)
df = (df.diff() == 1)
df = df.astype(int)
df = df.iloc[1:]
print(df)
Output:
A B C
1 0 1 0
2 0 0 1
3 1 0 0
4 0 0 0
5 0 0 0
I am not sure, however, if this is efficient enough for you.
CodePudding user response:
Simple and efficient... Shift
the dataframe and check for the change from 0 -> 1
m1 = df == 1
m2 = df.shift(fill_value=0) == 0
(m1 & m2) * 1
A B C
0 0 1 0
1 0 0 1
2 1 0 0
3 0 0 0
4 0 0 1
CodePudding user response:
Another possible solution:
df1 = df.shift(fill_value=0)
1 * (df1.ne(df) & df1.ne(1))
Output:
A B C
0 0 1 0
1 0 0 1
2 1 0 0
3 0 0 0
4 0 0 1