Home > other >  Keep only the 1st non-null value in each row (and replace others with NaN)
Keep only the 1st non-null value in each row (and replace others with NaN)

Time:07-29

I have the following dataframe:

     a   b
0  3.0  10.0
1  2.0   9.0
2  NaN   8.0

For each row, I need to drop (and replace with NaN) all values, excluding the first non-null one. This is the expected output:

     a   b
0  3.0   NaN
1  2.0   NaN
2  NaN   8.0

I know that using the justify function I can identify the first n non-null values, but I need to keep the same structure of the original dataframe.

CodePudding user response:

One way to go, would be:

import pandas as pd

data = {'a': {0: 3.0, 1: 2.0, 2: None}, 'b': {0: 10.0, 1: 9.0, 2: 8.0}}

df = pd.DataFrame(data)

def keep_first_valid(x):
    first_valid = x.first_valid_index()
    return x.mask(x.index!=first_valid)

df = df.apply(lambda x: keep_first_valid(x), axis=1)
df

     a    b
0  3.0  NaN
1  2.0  NaN
2  NaN  8.0
  • So, the first x passed to the function would consist of pd.Series([3.0, 10.0],index=['a','b']).
  • Inside the function first_valid = x.first_valid_index() will store 'a'; see df.first_valid_index.
  • Finally, we apply s.mask to get pd.Series([3.0, None],index=['a','b']), which we assign back to the df.

CodePudding user response:

try this:

f = df.copy()
f[:] = f.columns
fv_idx = df.apply(pd.Series.first_valid_index, axis=1).values[:, None]
res = df.where(f == fv_idx)
print(res)
>>>
    a   b
0   3.0 NaN
1   2.0 NaN
2   NaN 8.0
  • Related