I replicated a Pandas series with the following code:
data = np.array([1, 2, 3, 4, 5, np.nan, np.nan, np.nan, 9,10,11,12,13,14])
ser = pd.Series(data)
print(ser)
I would like to select only the columns before the NaN values so that I only get the values 1,2,3,4,5. How should I do that?
CodePudding user response:
Test missing values with Series.isna
and add Series.cummax
for repeat True
s after first match and last invert mask by ~
, filter in boolean indexing
:
a = ser[~ser.isna().cummax()]
print(a)
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
dtype: float64
Alternative solution with cumulative sum:
a = ser[ser.isna().cumsum().eq(0)]
print(a)
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
dtype: float64
Details:
print(ser.to_frame().assign(testna = ser.isna(),
cummax = ser.isna().cumsum(),
invert = ser.isna().cumsum().eq(0)))
0 testna cummax invert
0 1.0 False 0 True
1 2.0 False 0 True
2 3.0 False 0 True
3 4.0 False 0 True
4 5.0 False 0 True
5 NaN True 1 False
6 NaN True 2 False
7 NaN True 3 False
8 9.0 False 3 False
9 10.0 False 3 False
10 11.0 False 3 False
11 12.0 False 3 False
12 13.0 False 3 False
13 14.0 False 3 False
print(ser.to_frame().assign(testna = ser.isna(),
cummax = ser.isna().cummax(),
test0 = ~ser.isna().cummax()))
0 testna cummax test0
0 1.0 False False True
1 2.0 False False True
2 3.0 False False True
3 4.0 False False True
4 5.0 False False True
5 NaN True True False
6 NaN True True False
7 NaN True True False
8 9.0 False True False
9 10.0 False True False
10 11.0 False True False
11 12.0 False True False
12 13.0 False True False
13 14.0 False True False
CodePudding user response:
Use a boolean mask to slice the series.
You have two options, check is the value is not NA with notna
and extend the False
values after the first True
with
Series.cummin
.
ser[ser.notna().cummin()]
output:
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
dtype: float64
Or, test if the values are NA with isna
, then extend the True
values after the first True
with Series.cummax
, then invert the mask with ~
:
ser[~ser.isna().cummax()]
visual representation of how it works:
data notna notna cummin isna isna cummax ~(isna cummax)
0 1.0 True True False False True
1 2.0 True True False False True
2 3.0 True True False False True
3 4.0 True True False False True
4 5.0 True True False False True
5 NaN False False True True False
6 NaN False False True True False
7 NaN False False True True False
8 9.0 True False False True False
9 10.0 True False False True False
10 11.0 True False False True False
11 12.0 True False False True False
12 13.0 True False False True False
13 14.0 True False False True False