I have the following dataframe:
a b
0 3.0 10.0
1 2.0 9.0
2 NaN 8.0
For each row, I need to drop (and replace with NaN) all values, excluding the first non-null one. This is the expected output:
a b
0 3.0 NaN
1 2.0 NaN
2 NaN 8.0
I know that using the justify function I can identify the first n non-null values, but I need to keep the same structure of the original dataframe.
CodePudding user response:
One way to go, would be:
import pandas as pd
data = {'a': {0: 3.0, 1: 2.0, 2: None}, 'b': {0: 10.0, 1: 9.0, 2: 8.0}}
df = pd.DataFrame(data)
def keep_first_valid(x):
first_valid = x.first_valid_index()
return x.mask(x.index!=first_valid)
df = df.apply(lambda x: keep_first_valid(x), axis=1)
df
a b
0 3.0 NaN
1 2.0 NaN
2 NaN 8.0
- So, the first
x
passed to the function would consist ofpd.Series([3.0, 10.0],index=['a','b'])
. - Inside the function
first_valid = x.first_valid_index()
will store 'a'; seedf.first_valid_index
. - Finally, we apply
s.mask
to getpd.Series([3.0, None],index=['a','b'])
, which we assign back to thedf
.
CodePudding user response:
try this:
f = df.copy()
f[:] = f.columns
fv_idx = df.apply(pd.Series.first_valid_index, axis=1).values[:, None]
res = df.where(f == fv_idx)
print(res)
>>>
a b
0 3.0 NaN
1 2.0 NaN
2 NaN 8.0