Home > front end >  Compare columns with a tolerated error in pandas
Compare columns with a tolerated error in pandas

Time:06-29

I have three different seismic catalogs with origin times calculated using different methods, naturally, the calculated values aren't exactly the same with an error of arround 5 seconds.

Catalog_1

Index     Time
0  2022-05-01T08:16:55
1  2022-05-01T09:54:01
2  2022-05-01T10:25:49
3  2022-05-01T12:01:55
4  2022-05-01T18:17:23

Catalog_2

Index     Time
0  2022-05-01T08:16:58.444
1  2022-05-01T10:25:46.939
2  2022-05-01T20:37:17.491
3  2022-05-01T23:34:22.539

Catalog_3

Index     Time
0  2022-05-01T10:25:48
1  2022-05-01T23:34:20
2  2022-05-02T07:21:51

I want to combine these 3 dataframes into a single dataframe that automatically matches the origin times if they have the acceptable error.

Combined_catalog

Index     Time_1                Time_2           Time_3
0  2022-05-01T08:16:55  2022-05-01T08:16:58.444  N/A
1  2022-05-01T09:54:01  N/A                      N/A
2  2022-05-01T10:25:49  2022-05-01T10:25:46.939  2022-05-01T10:25:48
3  2022-05-01T12:01:55  N/A                      N/A
4  2022-05-01T18:17:23  N/A                      N/A
5  N/A                  2022-05-01T20:37:17.491  N/A
6  N/A                  2022-05-01T23:34:22.539  2022-05-01T23:34:20
7  N/A                  N/A                      2022-05-02T07:21:51

Is there a way to get a result similar to this witout using loops and if's?

Sometimes the catalogs have data from up to 5 years so it might be better to consider a different approach.

CodePudding user response:

Pandas round() function and compare() function might be of help here.

enter image description here

If you need HOUR level matching only use

pd.to_datetime(Catalog_1['Time']).dt.floor('H')
  • Related