Goal: to concatenate 2 similar DataFrames together, and sort by first column.
The problem is new_df
has some records "pushed" to the right, starting I think with a tab
or \t
.
This causes an inconsistent shape of a DataFrame.
Code:
import pandas as pd
df_1 = ...
df_2 = ...
new_df = pd.concat([df_1, df_2])
new_df.sort_values(new_df.columns[0], ascending=True)
df_1
:
1 2 3
0 Emissions 305-1~GHG emissions in metric tons of CO2e~Gro... Emissions for Gross direct (Scope 1) GHG emiss...
1 Emissions 305-1~GHG emissions in metric tons of CO2e~Bio... Emissions for Biogenic CO2 emissions was 14681...
2 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~CO2 Emissions for CO2 was 107973 tons in year 2014...
3 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~N20 Emissions for N20 was 91661 tons in year 2014;...
4 Emissions 305-1~Direct (Scope 1) GHG emissions by gas~HFCs Emissions for HFCs was 31744 tons in year 2014...
df_2
:
0 1 2
0 Emissions 103-1~Explanation of the material topic and it... consumption rate fossil fuels coal oil emissio...
1 Emissions 103-2~The management approach and its components how evaluate companys environmental management...
2 Emissions 103-3~Evaluation of the management approach evaluation effectiveness companys environmenta...
3 Customer Health and Safety 103-1~Explanation of the material topic and it... health safety corporate policy needsthe americ...
4 Customer Health and Safety 103-2~The management approach and its components management approach employee customer wellbein...
new_df:
0 1 2 3
0 NaN Emissions 305-1~GHG emissions in metric tons of CO2e~Gro... Emissions for Gross direct (Scope 1) GHG emiss...
1 NaN Emissions 305-1~GHG emissions in metric tons of CO2e~Bio... Emissions for Biogenic CO2 emissions was 14681...
2 NaN Emissions 305-1~Direct (Scope 1) GHG emissions by gas~CO2 Emissions for CO2 was 107973 tons in year 2014...
3 NaN Emissions 305-1~Direct (Scope 1) GHG emissions by gas~N20 Emissions for N20 was 91661 tons in year 2014;...
4 NaN Emissions 305-1~Direct (Scope 1) GHG emissions by gas~HFCs Emissions for HFCs was 31744 tons in year 2014...
Please let me know if there is anything else I can add to post.
CodePudding user response:
You need start RangeIndex
from 0
like in df2.columns
:
df_1.columns = range(len(df_1.columns))
Or:
df_1.columns -= 1
Another idea is set both columns:
df_1.columns = range(len(df_1.columns))
df_2.columns = range(len(df_2.columns))
And then join:
new_df = pd.concat([df_1, df_2])