Home > other >  Combine data frame tables row by row
Combine data frame tables row by row

Time:10-05

I have many (and long) data frames of same structure (same column names).

Example with three dataframes:

df1 = pd.DataFrame(np.array([['name1', 1, 2], ['name2', 3, 4], ['name3', 5, 6]]),
                   columns=['names', 'val1', 'val2'])

   names val1 val2
0  name1    1    2
1  name2    3    4
2  name3    5    6

df2 = pd.DataFrame(np.array([['name1', 11, 22], ['name2', 33, 44], ['name3', 55, 66]]),
                   columns=['names', 'val1', 'val2'])

   names val1 val2
0  name1   11   22
1  name2   33   44
2  name3   55   66

df3 = pd.DataFrame(np.array([['name1', 111, 222], ['name2', 333, 444], ['name3', 555, 666]]),
                   columns=['names', 'val1', 'val2'])

   names val1 val2
0  name1  111  222
1  name2  333  444
2  name3  555  666

How can I combine the dataframes row by row, so that the result is:

   names val1 val2
0  name1    1    2
1  name1   11   22
2  name1  111  222
3  name2    3    4
4  name2   33   44
5  name2  333  444
6  name3    5    6
7  name3   55   66
8  name3  555  666

I tried:

df=pd.concat([df1, df2, df3])
groups=df.groupby('names')'

but it seems to be wrong.

CodePudding user response:

OP was not far from the desired goal. One can use pandas.concat to merge the dataframes, then sort_values to make sure that those with name1 appear first, and, finally, reset_index, as follows

df_merge = pd.concat([df1, df2, df3], axis=0, ignore_index=True).sort_values('names').reset_index(drop=True)


[Out]:

   names val1 val2
0  name1    1    2
1  name1   11   22
2  name1  111  222
3  name2    3    4
4  name2   33   44
5  name2  333  444
6  name3    5    6
7  name3   55   66
8  name3  555  666

CodePudding user response:

Correct only; have to sort by names. not groupby

df=pd.concat([df1, df2, df3])
df = df.sort_values('names')
df.index = range(0,df.shape[0])
  • Related