Home > front end >  pandas trim with NA value
pandas trim with NA value

Time:04-19

I have some data frame with the following structure.

timestamp             Col_A   Col_B
2021-04-19T16:12:00Z     NA      NA
2021-04-19T16:13:00Z     NA      NA 
2021-04-19T16:14:00Z      1      NA
2021-04-19T16:15:00Z      2      NA
2021-04-19T16:16:00Z      3       1
2021-04-19T16:17:00Z      4       2
2021-04-19T16:18:00Z      5       3
2021-04-19T16:19:00Z      6       4
2021-04-19T16:20:00Z     NA      NA

I would like to make a trim function where basically it would trim all contiguous NA columns from top and bottom i.e. in this case it would filter the frame as:

2021-04-19T16:14:00Z      1      NA
2021-04-19T16:15:00Z      2      NA
2021-04-19T16:16:00Z      3       1
2021-04-19T16:17:00Z      4       2
2021-04-19T16:18:00Z      5       3
2021-04-19T16:19:00Z      6       4

CodePudding user response:

Check with cumprod after all

s = df.isna().all(1)
out = df[s.cumprod()   s.iloc[::-1].cumprod() == 0]

CodePudding user response:

This is how you can drop the rows that have NA in the defined columns: df.dropna(subset=['Col_A', 'Col_B'], how='all')

CodePudding user response:

You can try first_valid_index and last_valid_index

out = df.loc[df[['Col_A', 'Col_B']].first_valid_index():df[['Col_A', 'Col_B']].last_valid_index(), :]
print(out)

              timestamp  Col_A  Col_B
2  2021-04-19T16:14:00Z    1.0    NaN
3  2021-04-19T16:15:00Z    2.0    NaN
4  2021-04-19T16:16:00Z    3.0    1.0
5  2021-04-19T16:17:00Z    4.0    2.0
6  2021-04-19T16:18:00Z    5.0    3.0
7  2021-04-19T16:19:00Z    6.0    4.0
  • Related