Trying to see if a column equals another column-CodePudding

I have a dataframe with some dummy variables and I wanted to see for a column with a df value of 1 has a value of a 0 to the right of that value within the same row. Here is an example of a dataframe and the columns that would return the rows I want. You can see that I would like to result in the the rows with index #1, 2, & 4 because a 0 comes after the 1. It is a big dataframne for the record, so doing the dates hardcoded would be too cumbersome:

test_df = pd.DataFrame({'2018-02': [0, 0, 0, 0, 1], '2018-03': [0, 1, 1, 0, 0], '2018-04': [1, 1, 1, 1, 0], '2018-05': [1, 0, 0, 1, 0], '2018-06': [1, 1, 1, 1, 0]})

result_df = pd.DataFrame({'2018-02': [0, 0, 1], '2018-03': [1, 1, 0], '2018-04': [1, 1, 0], '2018-05': [0, 0, 0], '2018-06': [1, 1, 0]})

attempted code:

pd.DataFrame({'2018-02': [0, 0, 0, 0, 1], '2018-03': [0, 1, 1, 0, 0], '2018-04': [1, 1, 1, 1, 0], '2018-05': [1, 0, 0, 1, 0], '2018-06': [1, 1, 1, 0, 0]})

test_df[[(test_df == 1) & (df_duration.shift(-1, axis=0) == 0)]].head()

CodePudding user response：

One way using pandas.DataFrame.shift:

res = test_df[(test_df.eq(0) & test_df.shift(axis=1).eq(1)).any(1)]

Output:

   2018-02  2018-03  2018-04  2018-05  2018-06
1        0        1        1        0        1
2        0        1        1        0        1
4        1        0        0        0        0

It checks for a cell where it is 0 and becomes 1 when gets shifted (i.e., (1, 0) pair in serial view)

CodePudding user response：

Chris' answer is probably the best one, but here's a different approach that might be interesting:

If each row only contains 0s and 1s, then the only way for a 0 to NOT occur after 1 is to have a bunch of consecutive 0s, followed by consecutive 1s. We can check for this by taking the rowwise difference. For [0,0,0,...,1,1,1] the difference should be all 0s and one 1 where this change occurs.

We can check for this by seeing that the unique values of the rowwise difference are only [0,1]. For any other situation besides [0,0,0,...1,1,1], 1 will have to change back to 0 meaning the unique values of such a row would be [-1,0,1].

So this expression will give you all rows where there are consecutive 0s followed by consecutive 1s: test_df.diff(axis=1).dropna(axis=1).apply(lambda x: np.array_equal(x.unique(), [0,1])

Then all we need to do is negate this and select such indices using .loc:

test_df.loc[~test_df.diff(axis=1).dropna(axis=1).apply(lambda x: np.array_equal(x.unique(), [0,1])]

CodePudding user response：

You could take the previous diff per row. 1 followed by 0 has a previous diff of 1:

test_df[test_df.diff(-1, axis=1).eq(1).any(axis=1)]