How would I be able to create a function where it combines data1, data2 and data3
. the code has to combine all the columns that are in common with all 3 datasets so Low
in data1
is not common with data2/3
so it will be excepted. The data is sorted in synchronous order and each row has to be unique, no two dates can be the same. In data2
data3
the row that contains the date: 2021-10-21 00:03:00
but only a single row of that data is in the output because it is not a a unique row. How would I be able to do this code?
import pandas as pd
import numpy as np
import functools
data1 =pd.read_csv('dataset1.csv', low_memory=False)
data2 =pd.read_csv('dataset2.csv', low_memory=False)
data3 ==pd.read_csv('dataset3.csv', low_memory=False)
data1 csv:
Unix Timestamp date Symbol Open High Low
1444311600000 2015-10-08 13:40:00 BTCUSD 10384.54 10389.08 10340.2
1444311660000 2015-10-08 13:41:00 BTCUSD 10389.08 10389.08 10332.8
1444311720000 2015-10-08 13:42:00 BTCUSD 10387.15 10388.36 10385
data2 csv:
Unix Timestamp Date Symbol Open High
1634774460000 2021-10-21 00:01:00 BTCUSD 4939.95 4939.97
1634774520000 2021-10-21 00:02:00 BTCUSD 4959.18 4961.75
1634774580000 2021-10-21 00:03:00 BTCUSD 4964.33 4964.33
data3 csv:
Unix Timestamp Date Symbol Open High
1634774580000 2021-10-21 00:03:00 BTCUSD 4964.33 4964.33
1634774640000 2021-10-21 00:04:00 BTCUSD 4800.2 4867.47
Expected Output:
Unix Timestamp date Symbol Open High
1444311600000 2015-10-08 13:40:00 BTCUSD 10384.54 10389.08
1444311660000 2015-10-08 13:41:00 BTCUSD 10389.08 10389.08
1444311720000 2015-10-08 13:42:00 BTCUSD 10387.15 10388.36
1634774460000 2021-10-21 00:01:00 BTCUSD 4939.95 4939.97
1634774520000 2021-10-21 00:02:00 BTCUSD 4959.18 4961.75
1634774580000 2021-10-21 00:03:00 BTCUSD 4964.33 4964.33
1634774640000 2021-10-21 00:04:00 BTCUSD 4800.2 4867.47
CodePudding user response:
Use append and then drop_duplicates
data1.drop('Low',1).append([data2, data3], ignore_index=True).drop_duplicates()