I have a csv file with a column (CODE REF) which stores two type of values (1A and 1X); I am importing it in a Pandas Dataframe. The CODE REF may not always start with 1X but I can find the index of the first row starting with 1X and slice my Dataframe accordingly. I am struggling with the main task which is to filter only rows that follow the sequence 1X1A as shown in picture. I am starting off with the shift method.
Input:
CODE REF
0 1X
1 1A
2 1A
3 1X
4 1A
5 1A
6 1A
7 1X
8 1A
Expected output:
CODE REF
0 1X
1 1A
3 1X
4 1A
7 1X
8 1A
CodePudding user response:
You can create a mask to check if a given row is 1X and the next one is 1A, then slice using the mask or its shift:
m = df['CODE REF'].eq('1X')&df['CODE REF'].shift(-1).eq('1A')
df[m|m.shift()]
Output:
CODE REF
0 1X
1 1A
3 1X
4 1A
7 1X
8 1A