Home > Software engineering >  Pandas array filter NaN and keep the first value in group
Pandas array filter NaN and keep the first value in group

Time:07-15

I have the following pandas dataframe. There are many NaN but there are lots of NaN value (I skipped the NaN value to make it look shorter).

0        NaN
...        
26       NaN
27     357.0
28     357.0
29     357.0
30       NaN
...
246      NaN
247    357.0
248    357.0
249    357.0
250      NaN
...
303      NaN
304     58.0
305     58.0
306     58.0
307     58.0
308     58.0
309     58.0
310     58.0
311     58.0
312     58.0
313     58.0
314     58.0
315     58.0
316      NaN
...
333      NaN
334    237.0

I would like to filter all the NaN value and also only keep the first value out of the NaN (e.g. from index 27-29 there are three values, I would like to keep the value indexed 27 and skip the 28 and 29 value). The targeted array should be as follows:

27     357.0
247    357.0
304     58.0
334    237.0

I am not sure how could I keep only the first value. Thanks in advance.

CodePudding user response:

Take only values that aren't nan, but the value before them is nan:

df = df[df.col1.notna() & df.col1.shift().isna()]

Output:

      col1
27   357.0
247  357.0
304   58.0
334  237.0

Assuming all values are greater than 0, we could also do:

df = df.fillna(0).diff()
df = df[df.col1.gt(0)]

CodePudding user response:

You can find the continuous index and diff to get its first value

m = (df['col'].dropna()
     .index.to_series()
     .diff().fillna(2).gt(1)
     .reindex(range(df.index.max() 1))
     .fillna(False))

out = df[m]
print(out)

       col
27   357.0
247  357.0
304   58.0
334  237.0
  • Related