Vertically append dataframes with unequal columns-CodePudding

I have two dataframes like so,

>>>df1
    A   B
1   3   4
2   6   7

>>>df2
    C      D      E      F
1  20.0   30.0   61.2   29.1
2  40.0   50.0   33.8   36.4

Now what I want to do is append df2 vertically to the end of df1 so that it looks like this:-

       A      B
1      3      4
2      6      7
3      C      D    E    F
4    20.0   30.0   61.2   29.1
5    40.0   50.0   33.8   36.4

So far I have tried pd.concat([df1, df2]) with axis = 0 and axis = 1, pd.append() to no avail. The pd.concat appended df2 horizontally to df1 which doesnt fulfil my purpose.

pd.concat([df1, df2], axis = 0, ignore_index = True) outputs this:-

   A    B    C     D     E     F
1  3    4
2  6    7
3           20.0  30.0  61.2  29.1
4           40.0  50.0  33.8  36.4

and pd.concat([df1, df2], axis = 1) outputs this:-

   A    B    C     D     E     F
1  3    4   20.0  30.0  61.2  29.1
2  6    7   40.0  50.0  33.8  36.4

Any ideas or suggestions as to how I can do this? This is for Python3

CodePudding user response：

Option 1:

In the comments, you mention:

[I] am actually collecting specific important data from different csvs.

If possible, I would take advantage of adding header=None and index_col[0] as parameters to pd.read_csv. This way, you can achieve the following quite easily:

import pandas as pd
from io import StringIO

# imitating the csv files here
file1 = StringIO("""\
,A,B
1,3,4
2,6,7
""")

file2 = StringIO("""\
,C,D,E,F
1,20.0,30.0,61.2,29.1
2,40.0,50.0,33.8,36.4
""")

list_files = [file1, file2]
list_dfs = list()

for file in list_files:
    list_dfs.append(pd.read_csv(file, sep=',', header=None, index_col=[0]))

df_new = pd.concat(list_dfs, axis=0, ignore_index=True)

print(df_new)

      1     2     3     4
0     A     B   NaN   NaN
1     3     4   NaN   NaN
2     6     7   NaN   NaN
3     C     D     E     F
4  20.0  30.0  61.2  29.1
5  40.0  50.0  33.8  36.4

Now, at this point you could of course change the df.columns into df_new.iloc[0] (i.e. ['A', 'B', nan, nan]), but that will leave you with duplicate NaN values as column names:

df_new.columns = df_new.iloc[0].values.tolist()
df_new = df_new.iloc[1:]

print(df_new)
      A     B   NaN   NaN
1     3     4   NaN   NaN
2     6     7   NaN   NaN
3     C     D     E     F
4  20.0  30.0  61.2  29.1
5  40.0  50.0  33.8  36.4

This is both highly impractical and also very likely to cause errors later on when you want to manipulate the data based on column (index) references.

Option 2:

If the first option isn't feasible (e.g. no access to the original CSVs), you could achieve the same result as follows:

data1 = {'A': {1: 3, 2: 6}, 'B': {1: 4, 2: 7}}
df1 = pd.DataFrame(data1)

data2 = {'C': {1: 20.0, 2: 40.0},
         'D': {1: 30.0, 2: 50.0},
         'E': {1: 61.2, 2: 33.8},
         'F': {1: 29.1, 2: 36.4}}
df2 = pd.DataFrame(data2)

list_dfs = [df1,df2]

for i, item in enumerate(list_dfs):
    item.loc[-1] = item.columns
    item.index = item.index   1
    item = item.sort_index()
    item.columns = [i for i in range(1, len(item.columns) 1)] # or start at 0
    list_dfs[i] = item

df_new = pd.concat(list_dfs, axis=0, ignore_index=True)

print(df_new)

      1     2     3     4
0     A     B   NaN   NaN
1     3     4   NaN   NaN
2     6     7   NaN   NaN
3     C     D     E     F
4  20.0  30.0  61.2  29.1
5  40.0  50.0  33.8  36.4