I have some data frame with the following structure.
timestamp Col_A Col_B
2021-04-19T16:12:00Z NA NA
2021-04-19T16:13:00Z NA NA
2021-04-19T16:14:00Z 1 NA
2021-04-19T16:15:00Z 2 NA
2021-04-19T16:16:00Z 3 1
2021-04-19T16:17:00Z 4 2
2021-04-19T16:18:00Z 5 3
2021-04-19T16:19:00Z 6 4
2021-04-19T16:20:00Z NA NA
I would like to make a trim
function where basically it would trim all contiguous NA columns from top and bottom i.e. in this case it would filter the frame as:
2021-04-19T16:14:00Z 1 NA
2021-04-19T16:15:00Z 2 NA
2021-04-19T16:16:00Z 3 1
2021-04-19T16:17:00Z 4 2
2021-04-19T16:18:00Z 5 3
2021-04-19T16:19:00Z 6 4
CodePudding user response:
Check with cumprod
after all
s = df.isna().all(1)
out = df[s.cumprod() s.iloc[::-1].cumprod() == 0]
CodePudding user response:
This is how you can drop the rows that have NA in the defined columns:
df.dropna(subset=['Col_A', 'Col_B'], how='all')
CodePudding user response:
You can try first_valid_index
and last_valid_index
out = df.loc[df[['Col_A', 'Col_B']].first_valid_index():df[['Col_A', 'Col_B']].last_valid_index(), :]
print(out)
timestamp Col_A Col_B
2 2021-04-19T16:14:00Z 1.0 NaN
3 2021-04-19T16:15:00Z 2.0 NaN
4 2021-04-19T16:16:00Z 3.0 1.0
5 2021-04-19T16:17:00Z 4.0 2.0
6 2021-04-19T16:18:00Z 5.0 3.0
7 2021-04-19T16:19:00Z 6.0 4.0