I have a dataframe which looks like this:
position parent dataType value
1 1 0 data1 7x13124
2 2 1 data2 x21312
3 3 2 data3 x312
4 4 2 data3 x321r
5 5 2 data3 x324
6 6 2 data3 xg4352
7 7 2 data3 x2312
8 8 2 data3 x2131
9 9 2 data3 x31231
10 10 2 data3 x3x3412
12 1 0 data1 432-x424
13 2 0 data2 x42342-0
14 3 2 data4 423
15 4 3 data3 x4234
and I would need to create an extra column in which to track data3. In this way, the first time data 3 appears in the dataType column, the new column would have the value 'yes', and the other times 'no'. So the first time the data3 value appears in a block of data 3 values, the trackData3 value would be 'yes'. If the dataType is 'data3 data3 data2 data2 data3', then the new column would be 'yes no no no yes'. I need to create the new dataframe with the tracking of data3 values, which would look like below:
position parent dataType value trackData3
1 1 0 data1 7x13124 no
2 2 1 data2 x21312 no
3 3 2 data3 x312 yes
4 4 2 data3 x321r no
5 5 2 data3 x324 no
6 6 2 data3 xg4352 no
7 7 2 data3 x2312 no
8 8 2 data3 x2131 no
9 9 2 data3 x31231 no
10 10 2 data3 x3x3412 no
12 1 0 data1 432-x424 no
13 2 0 data2 x42342-0 no
14 3 2 data4 423 no
15 4 3 data3 x4234 yes
CodePudding user response:
If need yes
for first consecutive value data3
use numpy.where
with chain masks - compare data3
and first consecutive values by compare shifted values:
mask = df['dataType'].eq('data3') & df['dataType'].ne(df['dataType'].shift())
df['trackData3'] = np.where(mask, 'yes', 'no')
print (df)
position parent dataType value trackData3
1 1 0 data1 7x13124 no
2 2 1 data2 x21312 no
3 3 2 data3 x312 yes
4 4 2 data3 x321r no
5 5 2 data3 x324 no
6 6 2 data3 xg4352 no
7 7 2 data3 x2312 no
8 8 2 data3 x2131 no
9 9 2 data3 x31231 no
10 10 2 data3 x3x3412 no
12 1 0 data1 432-x424 no
13 2 0 data2 x42342-0 no
14 3 2 data4 423 no
15 4 3 data3 x4234 yes
How it working:
print (df.assign(data3 = df['dataType'].eq('data3') ,
consecutive=df['dataType'].ne(df['dataType'].shift()),
both = mask))
position parent dataType value trackData3 data3 consecutive both
1 1 0 data1 7x13124 no False True False
2 2 1 data2 x21312 no False True False
3 3 2 data3 x312 yes True True True
4 4 2 data3 x321r no True False False
5 5 2 data3 x324 no True False False
6 6 2 data3 xg4352 no True False False
7 7 2 data3 x2312 no True False False
8 8 2 data3 x2131 no True False False
9 9 2 data3 x31231 no True False False
10 10 2 data3 x3x3412 no True False False
12 1 0 data1 432-x424 no False True False
13 2 0 data2 x42342-0 no False True False
14 3 2 data4 423 no False True False
15 4 3 data3 x4234 yes True True True