pd.concat() adding NaN value column to beginning of rows-CodePudding

Goal: to concatenate 2 similar DataFrames together, and sort by first column.

The problem is new_df has some records "pushed" to the right, starting I think with a tab or \t.

This causes an inconsistent shape of a DataFrame.

Code:

import pandas as pd

df_1 = ...
df_2 = ...

new_df = pd.concat([df_1, df_2])
new_df.sort_values(new_df.columns[0], ascending=True)

df_1:

           1                                                  2                                                  3
0  Emissions  305-1~GHG emissions in metric tons of CO2e~Gro...  Emissions for Gross direct (Scope 1) GHG emiss...
1  Emissions  305-1~GHG emissions in metric tons of CO2e~Bio...  Emissions for Biogenic CO2 emissions was 14681...
2  Emissions    305-1~Direct (Scope 1) GHG emissions by gas~CO2  Emissions for CO2 was 107973 tons in year 2014...
3  Emissions    305-1~Direct (Scope 1) GHG emissions by gas~N20  Emissions for N20 was 91661 tons in year 2014;...
4  Emissions   305-1~Direct (Scope 1) GHG emissions by gas~HFCs  Emissions for HFCs was 31744 tons in year 2014...

df_2:

                            0                                                  1                                                  2
0                   Emissions  103-1~Explanation of the material topic and it...  consumption rate fossil fuels coal oil emissio...
1                   Emissions   103-2~The management approach and its components  how evaluate companys environmental management...
2                   Emissions        103-3~Evaluation of the management approach  evaluation effectiveness companys environmenta...
3  Customer Health and Safety  103-1~Explanation of the material topic and it...  health safety corporate policy needsthe americ...
4  Customer Health and Safety   103-2~The management approach and its components  management approach employee customer wellbein...

new_df:

     0          1                                                  2                                                  3
0  NaN  Emissions  305-1~GHG emissions in metric tons of CO2e~Gro...  Emissions for Gross direct (Scope 1) GHG emiss...
1  NaN  Emissions  305-1~GHG emissions in metric tons of CO2e~Bio...  Emissions for Biogenic CO2 emissions was 14681...
2  NaN  Emissions    305-1~Direct (Scope 1) GHG emissions by gas~CO2  Emissions for CO2 was 107973 tons in year 2014...
3  NaN  Emissions    305-1~Direct (Scope 1) GHG emissions by gas~N20  Emissions for N20 was 91661 tons in year 2014;...
4  NaN  Emissions   305-1~Direct (Scope 1) GHG emissions by gas~HFCs  Emissions for HFCs was 31744 tons in year 2014...

Please let me know if there is anything else I can add to post.

CodePudding user response：

You need start RangeIndex from 0 like in df2.columns:

df_1.columns = range(len(df_1.columns))

Or:

df_1.columns -= 1

Another idea is set both columns:

df_1.columns = range(len(df_1.columns))
df_2.columns = range(len(df_2.columns))

And then join:

new_df = pd.concat([df_1, df_2])