Home > OS >  replace last occurrence if equal the first
replace last occurrence if equal the first

Time:12-01

I have df like:

  value
0 yes
1 nan
2 no
3 nan
4 yes
5 no
6 yes
7 nan
8 nan
9 nan

I do not have a guarantee that the first not nan value,yes, will be at the first row. It could as well start at later index.

I need to check if the first occurrence of string that is not Nan, equals the last string that is not nan, and if so, set it to nan.

Here, index 6 equals index 0, means we need to set it to nan and result in :

  value
0 yes
1 nan
2 no
3 nan
4 yes
5 no
6 nan  #set to nan since equals first non Nan value
7 nan
8 nan
9 nan

CodePudding user response:

Use Series.first_valid_index and Series.last_valid_index for indices first and last non missing values, get values by DataFrame.loc, last use if-else statement for set values by scalars:

first_idx = df['value'].first_valid_index()
last_idx = df['value'].last_valid_index()
first = df.loc[first_idx, 'value']
last = df.loc[first_idx, 'value']

df.loc[last_idx, 'value'] = np.nan if first == last else last

Or assign only if True :

if first == last:
    df.loc[last_idx, 'value'] = np.nan 

print (df)
  value
0   yes
1   NaN
2    no
3   NaN
4   yes
5    no
6   NaN
7   NaN
8   NaN
9   NaN

If only one non missing value (and avoid replacement) also test if not equal indices:

print (df)
  value
0   yes
1   NaN
2   NaN


first_idx = df['value'].first_valid_index()
last_idx = df['value'].last_valid_index()
first = df.loc[first_idx, 'value']
last = df.loc[first_idx, 'value']

df.loc[last_idx, 'value'] = np.nan if (first == last) and (first_idx != last_idx) else last
print (df)
  value
0   yes
1   NaN
2   NaN
  • Related