Home > Enterprise >  How to delete rows in Pandas after a certain value?
How to delete rows in Pandas after a certain value?

Time:02-18

I replicated a Pandas series with the following code:

data = np.array([1, 2, 3, 4, 5, np.nan, np.nan, np.nan, 9,10,11,12,13,14])
  
ser = pd.Series(data)
print(ser)

I would like to select only the columns before the NaN values so that I only get the values 1,2,3,4,5. How should I do that?

CodePudding user response:

Test missing values with Series.isna and add Series.cummax for repeat Trues after first match and last invert mask by ~, filter in boolean indexing:

a = ser[~ser.isna().cummax()]
print(a)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

Alternative solution with cumulative sum:

a = ser[ser.isna().cumsum().eq(0)]
print(a)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

Details:

print(ser.to_frame().assign(testna = ser.isna(), 
                            cummax = ser.isna().cumsum(),
                            invert = ser.isna().cumsum().eq(0)))
       0  testna  cummax  invert
0    1.0   False       0    True
1    2.0   False       0    True
2    3.0   False       0    True
3    4.0   False       0    True
4    5.0   False       0    True
5    NaN    True       1   False
6    NaN    True       2   False
7    NaN    True       3   False
8    9.0   False       3   False
9   10.0   False       3   False
10  11.0   False       3   False
11  12.0   False       3   False
12  13.0   False       3   False
13  14.0   False       3   False

print(ser.to_frame().assign(testna = ser.isna(), 
                            cummax = ser.isna().cummax(),
                            test0 = ~ser.isna().cummax()))

       0  testna  cummax   test0
0    1.0   False   False    True
1    2.0   False   False    True
2    3.0   False   False    True
3    4.0   False   False    True
4    5.0   False   False    True
5    NaN    True    True   False
6    NaN    True    True   False
7    NaN    True    True   False
8    9.0   False    True   False
9   10.0   False    True   False
10  11.0   False    True   False
11  12.0   False    True   False
12  13.0   False    True   False
13  14.0   False    True   False

CodePudding user response:

Use a boolean mask to slice the series.

You have two options, check is the value is not NA with notna and extend the False values after the first True with Series.cummin.

ser[ser.notna().cummin()]

output:

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

Or, test if the values are NA with isna, then extend the True values after the first True with Series.cummax, then invert the mask with ~:

ser[~ser.isna().cummax()]
visual representation of how it works:
    data  notna  notna cummin   isna  isna cummax  ~(isna cummax)
0    1.0   True          True  False        False            True
1    2.0   True          True  False        False            True
2    3.0   True          True  False        False            True
3    4.0   True          True  False        False            True
4    5.0   True          True  False        False            True
5    NaN  False         False   True         True           False
6    NaN  False         False   True         True           False
7    NaN  False         False   True         True           False
8    9.0   True         False  False         True           False
9   10.0   True         False  False         True           False
10  11.0   True         False  False         True           False
11  12.0   True         False  False         True           False
12  13.0   True         False  False         True           False
13  14.0   True         False  False         True           False
  • Related