I have two different DataFrames that I would like to combine and I cannot get the answer I am looking for. A MWE is given below.
import numpy as np
import pandas as pd
index_tuple1 = [('car', 'green'), ('car', 'red'), ('boat', 'green')]
index1 = pd.MultiIndex.from_tuples(index_tuple1, names=['Vehicle', 'Color'])
data1 = np.array([[1,2], [3,4], [7,8]])
df1 = pd.DataFrame(data1, index=index1, columns=['2022-05-09', '2022-10-17'])
index_tuple2 = [('car', 'blue'), ('boat', 'blue')]
index2 = pd.MultiIndex.from_tuples(index_tuple2, names=['Vehicle', 'Color'])
data2 = np.array([[5,6], [9,0]])
df2 = pd.DataFrame(data2, index=index2, columns=['2022-05-09', '2022-10-17'])
print(df1)
print(df2)
Output:
2022-05-09 2022-10-17
Vehicle Color
car green 1 2
red 3 4
boat green 7 8
2022-05-09 2022-10-17
Vehicle Color
car blue 5 6
boat blue 9 0
What I would like to see as output is given below. I have tried concat, merge, and join, but I either get a DataFrame filled with NaNs or "ValueError: columns overlap but no suffix specified: Index(['2022-05-09', '2022-10-17']
2022-05-09 2022-10-17
Vehicle Color
car green 1 2
red 3 4
blue 5 6
boat green 7 8
blue 9 0
CodePudding user response:
You need same types of columns, e.g. strings or datetimes, then concat
working correct:
df1.columns = pd.to_datetime(df1.columns)
df2.columns = pd.to_datetime(df2.columns)
df = pd.concat([df1, df2]).sort_index()
print (df)
2022-05-09 2022-10-17
Vehicle Color
boat blue 9 0
green 7 8
car blue 5 6
green 1 2
red 3 4
If ordering is important use here is solution with ordered CategoricalIndex
:
df1.columns = pd.to_datetime(df1.columns)
df2.columns = pd.to_datetime(df2.columns)
lvl = df1.index.get_level_values(0).append(df1.index.get_level_values(0)).unique()
print (lvl)
Index(['car', 'boat'], dtype='object', name='Vehicle')
idx1 = pd.CategoricalIndex(df1.index.levels[0], ordered=True, categories=lvl)
idx2 = pd.CategoricalIndex(df2.index.levels[0], ordered=True, categories=lvl)
df1.index = df1.index.set_levels(idx1, level = 0)
df2.index = df2.index.set_levels(idx2, level = 0)
df = pd.concat([df1, df2]).sort_index(level=0, sort_remaining=False)
print (df)
2022-05-09 2022-10-17
Vehicle Color
car green 1 2
red 3 4
blue 5 6
boat green 7 8
blue 9 0