I'm trying to concatenate two dataframes with these conditions :
- for an existing header, append to the column ;
- otherwise add a new column.
The code is working but the columns names are lost in case 2. Why? It doesn't seem to be mentioned in Pandas doc. Or I missed something?
How to keep the column names?
The code :
# Testing
# Merge, join, concatenate
# Pandas documentation : https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
},
#index=[0, 1, 2, 3],
)
df2 = pd.DataFrame(
{
"A": ["A4", "A5", "A6", "A7"],
"B": ["B4", "B5", "B6", "B7"],
"C": ["C4", "C5", "C6", "C7"],
"D": ["D4", "D5", "D6", "D7"],
},
#index=[4, 5, 6, 7],
)
df3 = pd.DataFrame(
{
"E": ["E0", "E1", "E2", "E3", "E4", "E5"],
},
#index=[0, 1, 2, 3, 4 , 5],
)
frames = [df1, df2]
result_1 = pd.concat(frames, ignore_index=True)
print(result_1)
frames = [result_1, df3]
if "E" in df3.columns :
result_2 = pd.concat(frames, axis=1, ignore_index=True)
print(result_2)
CodePudding user response:
You requested to drop the index with ignore_index=True
. As you are concatenating on axis=1
the index is the columns!
frames = [result_1, df3]
if "E" in df3.columns :
result_2 = pd.concat(frames, axis=1)
print(result_2)
Output:
A B C D E
0 A0 B0 C0 D0 E0
1 A1 B1 C1 D1 E1
2 A2 B2 C2 D2 E2
3 A3 B3 C3 D3 E3
4 A4 B4 C4 D4 E4
5 A5 B5 C5 D5 E5
6 A6 B6 C6 D6 NaN
7 A7 B7 C7 D7 NaN