I have two pandas dataframes with the same indexes and lengths. However, they might contain nans at different places. See, example input below:
timestamp,y
2019-08-01 00:00:00, 772.00
2019-08-01 00:15:00, 648.00
2019-08-01 00:30:00, nan
timestamp,y
2019-08-01 00:00:00, 612.00
2019-08-01 00:15:00, nan
2019-08-01 00:30:00, 500.00
I want to do some calculations where I want to remove indexes in both the dataframes even if one of them has a NaN values. So, in those case only the first row is valid.
Is there a way to do this easily in pandas?
CodePudding user response:
You can merge
to get the common indices, then subset:
keep = df1.merge(df2, on='timestamp').dropna(how='any')['timestamp']
new_df1 = df1[df1['timestamp'].isin(keep)]
new_df2 = df2[df2['timestamp'].isin(keep)]
Alternative, you can use python sets to get the intersection:
keep = set(df1.dropna()['timestamp']).intersection(df2.dropna()['timestamp'])
output:
# new_df1
timestamp y
0 2019-08-01 00:00:00 772.0
# new_df2
timestamp y
0 2019-08-01 00:00:00 612.0