I have a dataframe with some dummy variables and I wanted to see for a column with a df value of 1 has a value of a 0 to the right of that value within the same row. Here is an example of a dataframe and the columns that would return the rows I want. You can see that I would like to result in the the rows with index #1, 2, & 4 because a 0 comes after the 1. It is a big dataframne for the record, so doing the dates hardcoded would be too cumbersome:
test_df = pd.DataFrame({'2018-02': [0, 0, 0, 0, 1], '2018-03': [0, 1, 1, 0, 0], '2018-04': [1, 1, 1, 1, 0], '2018-05': [1, 0, 0, 1, 0], '2018-06': [1, 1, 1, 1, 0]})
result_df = pd.DataFrame({'2018-02': [0, 0, 1], '2018-03': [1, 1, 0], '2018-04': [1, 1, 0], '2018-05': [0, 0, 0], '2018-06': [1, 1, 0]})
attempted code:
pd.DataFrame({'2018-02': [0, 0, 0, 0, 1], '2018-03': [0, 1, 1, 0, 0], '2018-04': [1, 1, 1, 1, 0], '2018-05': [1, 0, 0, 1, 0], '2018-06': [1, 1, 1, 0, 0]})
test_df[[(test_df == 1) & (df_duration.shift(-1, axis=0) == 0)]].head()
CodePudding user response:
One way using pandas.DataFrame.shift
:
res = test_df[(test_df.eq(0) & test_df.shift(axis=1).eq(1)).any(1)]
Output:
2018-02 2018-03 2018-04 2018-05 2018-06
1 0 1 1 0 1
2 0 1 1 0 1
4 1 0 0 0 0
It checks for a cell where it is 0 and becomes 1 when gets shifted (i.e., (1, 0) pair in serial view)
CodePudding user response:
Chris' answer is probably the best one, but here's a different approach that might be interesting:
If each row only contains 0s and 1s, then the only way for a 0 to NOT occur after 1 is to have a bunch of consecutive 0s, followed by consecutive 1s. We can check for this by taking the rowwise difference. For [0,0,0,...,1,1,1] the difference should be all 0s and one 1 where this change occurs.
We can check for this by seeing that the unique values of the rowwise difference are only [0,1]. For any other situation besides [0,0,0,...1,1,1], 1 will have to change back to 0 meaning the unique values of such a row would be [-1,0,1].
So this expression will give you all rows where there are consecutive 0s followed by consecutive 1s: test_df.diff(axis=1).dropna(axis=1).apply(lambda x: np.array_equal(x.unique(), [0,1])
Then all we need to do is negate this and select such indices using .loc:
test_df.loc[~test_df.diff(axis=1).dropna(axis=1).apply(lambda x: np.array_equal(x.unique(), [0,1])]
CodePudding user response:
You could take the previous diff per row. 1 followed by 0 has a previous diff of 1:
test_df[test_df.diff(-1, axis=1).eq(1).any(axis=1)]