I have df like:
value
0 yes
1 nan
2 no
3 nan
4 yes
5 no
6 yes
7 nan
8 nan
9 nan
I do not have a guarantee that the first not nan
value,yes
, will be at the first row. It could as well start at later index.
I need to check if the first occurrence of string
that is not Nan
, equals the last string that is not nan
, and if so, set it to nan
.
Here, index 6 equals index 0, means we need to set it to nan and result in :
value
0 yes
1 nan
2 no
3 nan
4 yes
5 no
6 nan #set to nan since equals first non Nan value
7 nan
8 nan
9 nan
CodePudding user response:
Use Series.first_valid_index
and
Series.last_valid_index
for indices first and last non missing values, get values by DataFrame.loc
, last use if-else
statement for set values by scalars:
first_idx = df['value'].first_valid_index()
last_idx = df['value'].last_valid_index()
first = df.loc[first_idx, 'value']
last = df.loc[first_idx, 'value']
df.loc[last_idx, 'value'] = np.nan if first == last else last
Or assign only if True :
if first == last:
df.loc[last_idx, 'value'] = np.nan
print (df)
value
0 yes
1 NaN
2 no
3 NaN
4 yes
5 no
6 NaN
7 NaN
8 NaN
9 NaN
If only one non missing value (and avoid replacement) also test if not equal indices:
print (df)
value
0 yes
1 NaN
2 NaN
first_idx = df['value'].first_valid_index()
last_idx = df['value'].last_valid_index()
first = df.loc[first_idx, 'value']
last = df.loc[first_idx, 'value']
df.loc[last_idx, 'value'] = np.nan if (first == last) and (first_idx != last_idx) else last
print (df)
value
0 yes
1 NaN
2 NaN