Home > Mobile >  Combining datasets and formatting them Pandas Python
Combining datasets and formatting them Pandas Python

Time:10-31

How would I be able to create a function where it combines data1, data2 and data3. the code has to combine all the columns that are in common with all 3 datasets so Low in data1 is not common with data2/3 so it will be excepted. The data is sorted in synchronous order and each row has to be unique, no two dates can be the same. In data2 data3 the row that contains the date: 2021-10-21 00:03:00 but only a single row of that data is in the output because it is not a a unique row. How would I be able to do this code?

import pandas as pd 
import numpy as np 
import functools

data1 =pd.read_csv('dataset1.csv', low_memory=False)
data2 =pd.read_csv('dataset2.csv', low_memory=False)
data3 ==pd.read_csv('dataset3.csv', low_memory=False) 

data1 csv:

Unix Timestamp  date                    Symbol    Open      High      Low 
1444311600000   2015-10-08 13:40:00     BTCUSD    10384.54  10389.08  10340.2
1444311660000   2015-10-08 13:41:00     BTCUSD    10389.08  10389.08  10332.8
1444311720000   2015-10-08 13:42:00     BTCUSD    10387.15  10388.36  10385

data2 csv:

Unix Timestamp  Date                    Symbol    Open       High 
1634774460000   2021-10-21 00:01:00     BTCUSD    4939.95    4939.97    
1634774520000   2021-10-21 00:02:00     BTCUSD    4959.18    4961.75
1634774580000   2021-10-21 00:03:00     BTCUSD    4964.33    4964.33

data3 csv:

Unix Timestamp  Date                    Symbol    Open       High
1634774580000   2021-10-21 00:03:00     BTCUSD    4964.33    4964.33
1634774640000   2021-10-21 00:04:00     BTCUSD    4800.2     4867.47

Expected Output:

Unix Timestamp  date                    Symbol    Open      High      
1444311600000   2015-10-08 13:40:00     BTCUSD    10384.54  10389.08  
1444311660000   2015-10-08 13:41:00     BTCUSD    10389.08  10389.08  
1444311720000   2015-10-08 13:42:00     BTCUSD    10387.15  10388.36
1634774460000   2021-10-21 00:01:00     BTCUSD    4939.95    4939.97    
1634774520000   2021-10-21 00:02:00     BTCUSD    4959.18    4961.75
1634774580000   2021-10-21 00:03:00     BTCUSD    4964.33    4964.33
1634774640000   2021-10-21 00:04:00     BTCUSD    4800.2     4867.47

CodePudding user response:

Use append and then drop_duplicates

data1.drop('Low',1).append([data2, data3], ignore_index=True).drop_duplicates()
  • Related