I have many (and long) data frames of same structure (same column names).
Example with three dataframes:
df1 = pd.DataFrame(np.array([['name1', 1, 2], ['name2', 3, 4], ['name3', 5, 6]]),
columns=['names', 'val1', 'val2'])
names val1 val2
0 name1 1 2
1 name2 3 4
2 name3 5 6
df2 = pd.DataFrame(np.array([['name1', 11, 22], ['name2', 33, 44], ['name3', 55, 66]]),
columns=['names', 'val1', 'val2'])
names val1 val2
0 name1 11 22
1 name2 33 44
2 name3 55 66
df3 = pd.DataFrame(np.array([['name1', 111, 222], ['name2', 333, 444], ['name3', 555, 666]]),
columns=['names', 'val1', 'val2'])
names val1 val2
0 name1 111 222
1 name2 333 444
2 name3 555 666
How can I combine the dataframes row by row, so that the result is:
names val1 val2
0 name1 1 2
1 name1 11 22
2 name1 111 222
3 name2 3 4
4 name2 33 44
5 name2 333 444
6 name3 5 6
7 name3 55 66
8 name3 555 666
I tried:
df=pd.concat([df1, df2, df3])
groups=df.groupby('names')'
but it seems to be wrong.
CodePudding user response:
OP was not far from the desired goal. One can use pandas.concat
to merge the dataframes, then sort_values
to make sure that those with name1
appear first, and, finally, reset_index
, as follows
df_merge = pd.concat([df1, df2, df3], axis=0, ignore_index=True).sort_values('names').reset_index(drop=True)
[Out]:
names val1 val2
0 name1 1 2
1 name1 11 22
2 name1 111 222
3 name2 3 4
4 name2 33 44
5 name2 333 444
6 name3 5 6
7 name3 55 66
8 name3 555 666
CodePudding user response:
Correct only; have to sort by names. not groupby
df=pd.concat([df1, df2, df3])
df = df.sort_values('names')
df.index = range(0,df.shape[0])