I have put together a conditional statement using the duplicate function to extract a row from the following data. That is extracted from a larger dataset
{'NID': {104565: '213003580',
104566: '213003580',
104567: '213003580',
104568: '213003580',
104569: '213003580',
104570: '213003580',
104571: '213003580',
104572: '213003580',
104573: '213003580',
104574: '213003580'},
'Fdat': {104565: Timestamp('2015-02-24 00:00:00'),
104566: Timestamp('2015-02-24 00:00:00'),
104567: Timestamp('2016-03-20 00:00:00'),
104568: Timestamp('2016-03-20 00:00:00'),
104569: Timestamp('2017-03-13 00:00:00'),
104570: Timestamp('2018-03-05 00:00:00'),
104571: Timestamp('2018-03-05 00:00:00'),
104572: Timestamp('2018-03-05 00:00:00'),
104573: Timestamp('2018-03-05 00:00:00'),
104574: Timestamp('2018-03-05 00:00:00')},
'BCode': {104565: 0,
104566: 1,
104567: 0,
104568: 1,
104569: 0,
104570: 0,
104571: 0,
104572: 0,
104573: 1,
104574: 1},
'Breddat': {104565: Timestamp('2015-06-09 00:00:00'),
104566: Timestamp('2015-08-19 00:00:00'),
104567: Timestamp('2016-06-05 00:00:00'),
104568: Timestamp('2016-08-23 00:00:00'),
104569: Timestamp('2017-05-31 00:00:00'),
104570: Timestamp('2018-06-06 00:00:00'),
104571: Timestamp('2018-06-27 00:00:00'),
104572: Timestamp('2018-11-22 00:00:00'),
104573: Timestamp('2019-02-14 00:00:00'),
104574: Timestamp('2019-06-18 00:00:00')},
'Code': {104565: 2,
104566: 10,
104567: 2,
104568: 10,
104569: 2,
104570: 2,
104571: 2,
104572: 2,
104573: 10,
104574: 10},
'Result': {104565: <NA>,
104566: 71,
104567: <NA>,
104568: 79,
104569: <NA>,
104570: <NA>,
104571: <NA>,
104572: <NA>,
104573: 84,
104574: 0},
'DIM': {104565: 105,
104566: 176,
104567: 77,
104568: 156,
104569: 79,
104570: 93,
104571: 114,
104572: 262,
104573: 346,
104574: 470},
'AbCode': {104565: 0,
104566: 0,
104567: 0,
104568: 0,
104569: 0,
104570: 0,
104571: 0,
104572: 0,
104573: 0,
104574: 0}}
Formatted the data looks like
NID Fdat BCode Breddat Code Result DIM Result AbCode
104565 213003580 2015-02-24 0 2015-06-09 2 <NA> 105 <NA> 0
104566 213003580 2015-02-24 1 2015-08-19 10 71 176 71 0
104567 213003580 2016-03-20 0 2016-06-05 2 <NA> 77 <NA> 0
104568 213003580 2016-03-20 1 2016-08-23 10 79 156 79 0
104569 213003580 2017-03-13 0 2017-05-31 2 <NA> 79 <NA> 0
104570 213003580 2018-03-05 0 2018-06-06 2 <NA> 93 <NA> 0
104571 213003580 2018-03-05 0 2018-06-27 2 <NA> 114 <NA> 0
104572 213003580 2018-03-05 0 2018-11-22 2 <NA> 262 <NA> 0
104573 213003580 2018-03-05 1 2019-02-14 10 84 346 84 0
104574 213003580 2018-03-05 1 2019-06-18 10 0 470 0 0
The code to extract the row I want is as follows
out1 = df[((df.duplicated(['NID','Fdat','BCode'], keep='last') & (df['BCode'].eq(1)) & (df['Result'].ge(0))) )].copy()
This returns the second last line in this example and all others I need in the complete dataset
104573 213003580 2018-03-05 1 2019-02-14 10 84 346 84 0
The significance of this line is that the cow aborted and I would like to toggle column AbCode from 0 to 1
I have tried the following unsuccessfully
df['AbCode'] = np.where([((df.duplicated(['NID','Fdat','BCode'], keep='last') & (df['BCode'].eq(1)) & (df['Result'].ge(0))) )], 1,0)
This gives an error
ValueError: Length of values (1) does not match length of index (121627)
Is there a simple way to toggle the value of AbCode according to these conditions?
CodePudding user response:
numpy.where() accepts an array_like boolean condition. Result of (df.duplicated(['NID','Fdat','BCode'], keep='last') & (df['BCode'].eq(1)) & (df['Result'].ge(0)))
is already a list, you don't need to wrap it into another list.
df['AbCode'] = np.where((df.duplicated(['NID','Fdat','BCode'], keep='last') & (df['BCode'].eq(1)) & (df['Result'].ge(0))) , 1, 0)