I have two dataframes like so,
>>>df1
A B
1 3 4
2 6 7
>>>df2
C D E F
1 20.0 30.0 61.2 29.1
2 40.0 50.0 33.8 36.4
Now what I want to do is append df2 vertically to the end of df1 so that it looks like this:-
A B
1 3 4
2 6 7
3 C D E F
4 20.0 30.0 61.2 29.1
5 40.0 50.0 33.8 36.4
So far I have tried pd.concat([df1, df2]) with axis = 0 and axis = 1
, pd.append()
to no avail. The pd.concat
appended df2
horizontally to df1
which doesnt fulfil my purpose.
pd.concat([df1, df2], axis = 0, ignore_index = True)
outputs this:-
A B C D E F
1 3 4
2 6 7
3 20.0 30.0 61.2 29.1
4 40.0 50.0 33.8 36.4
and pd.concat([df1, df2], axis = 1)
outputs this:-
A B C D E F
1 3 4 20.0 30.0 61.2 29.1
2 6 7 40.0 50.0 33.8 36.4
Any ideas or suggestions as to how I can do this? This is for Python3
CodePudding user response:
- Option 1:
In the comments, you mention:
[I] am actually collecting specific important data from different csvs.
If possible, I would take advantage of adding header=None
and index_col[0]
as parameters to pd.read_csv
. This way, you can achieve the following quite easily:
import pandas as pd
from io import StringIO
# imitating the csv files here
file1 = StringIO("""\
,A,B
1,3,4
2,6,7
""")
file2 = StringIO("""\
,C,D,E,F
1,20.0,30.0,61.2,29.1
2,40.0,50.0,33.8,36.4
""")
list_files = [file1, file2]
list_dfs = list()
for file in list_files:
list_dfs.append(pd.read_csv(file, sep=',', header=None, index_col=[0]))
df_new = pd.concat(list_dfs, axis=0, ignore_index=True)
print(df_new)
1 2 3 4
0 A B NaN NaN
1 3 4 NaN NaN
2 6 7 NaN NaN
3 C D E F
4 20.0 30.0 61.2 29.1
5 40.0 50.0 33.8 36.4
Now, at this point you could of course change the df.columns
into df_new.iloc[0]
(i.e. ['A', 'B', nan, nan]
), but that will leave you with duplicate NaN
values as column names:
df_new.columns = df_new.iloc[0].values.tolist()
df_new = df_new.iloc[1:]
print(df_new)
A B NaN NaN
1 3 4 NaN NaN
2 6 7 NaN NaN
3 C D E F
4 20.0 30.0 61.2 29.1
5 40.0 50.0 33.8 36.4
This is both highly impractical and also very likely to cause errors later on when you want to manipulate the data based on column (index)
references.
- Option 2:
If the first option isn't feasible (e.g. no access to the original CSVs), you could achieve the same result as follows:
data1 = {'A': {1: 3, 2: 6}, 'B': {1: 4, 2: 7}}
df1 = pd.DataFrame(data1)
data2 = {'C': {1: 20.0, 2: 40.0},
'D': {1: 30.0, 2: 50.0},
'E': {1: 61.2, 2: 33.8},
'F': {1: 29.1, 2: 36.4}}
df2 = pd.DataFrame(data2)
list_dfs = [df1,df2]
for i, item in enumerate(list_dfs):
item.loc[-1] = item.columns
item.index = item.index 1
item = item.sort_index()
item.columns = [i for i in range(1, len(item.columns) 1)] # or start at 0
list_dfs[i] = item
df_new = pd.concat(list_dfs, axis=0, ignore_index=True)
print(df_new)
1 2 3 4
0 A B NaN NaN
1 3 4 NaN NaN
2 6 7 NaN NaN
3 C D E F
4 20.0 30.0 61.2 29.1
5 40.0 50.0 33.8 36.4