Home > Software engineering >  Can I merge dataframes below eachother and at the same time add a id per dataset?
Can I merge dataframes below eachother and at the same time add a id per dataset?

Time:06-08

I have the following dataframes: df1, df2, df3, ......

Now I would like to make one data frame of them, with each data frame below each other. And I know how to do that but at the same time I want to make a ID per original data frame. So I create a new column in the merged dataset. The column will be like (1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,......). And I know how to make a column with the same value, but i would like to do this at the same time.

Anyone has an idea?

Greetings!

CodePudding user response:

I'd make a list of the dataframes, then use a list comprehension to add the index of the df in the list to each df, and then use pd.concat:

dfs = [df1, df2, df3]
new_df = pd.concat([df.assign(ID=i) for i, df in enumerate(dfs, start=1)])

CodePudding user response:

You can use pd.concat as suggested by @richardec however you can use its parameters like keys and names with the mainly advantage that the ID column is in the first position:

dfs = [df1, df2, df3]
out = pd.concat([dfs, keys=range(1, len(dfs) 1), names=['ID', None]).reset_index('ID')
print(out)

# Output
   ID         A         B         C         D
0   1  0.216093  0.527420  0.473905  0.645006
1   1  0.495148  0.030564  0.735505  0.704401
2   1  0.155198  0.679730  0.895915  0.387254
3   1  0.207006  0.622962  0.550490  0.972048
4   1  0.134301  0.214033  0.779889  0.527854
0   2  0.900729  0.928867  0.089406  0.814716
1   2  0.676181  0.246887  0.932252  0.931694
2   2  0.913082  0.986885  0.383471  0.628707
3   2  0.423171  0.454256  0.721458  0.032126
4   2  0.074285  0.583578  0.677186  0.818504
0   3  0.320201  0.009262  0.343529  0.789488
1   3  0.331445  0.215488  0.455243  0.311798
2   3  0.456113  0.422995  0.041669  0.542399
3   3  0.643180  0.938067  0.476010  0.586512
4   3  0.206706  0.571189  0.690014  0.317337

Setup:

df1 = pd.DataFrame(np.random.random((5, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.random((5, 4)), columns=list('ABCD'))
df3 = pd.DataFrame(np.random.random((5, 4)), columns=list('ABCD'))

CodePudding user response:

One way to do what your question asks is to create a dataframe with the full-length ID column and concat this horizontally to the result of a concat of the input dataframes vertically:

import pandas as pd

df1 = pd.DataFrame({'foo':[33]*3, 'bar':[34]*3})
df2 = pd.DataFrame({'foo':[66]*4, 'bar':[67]*4})
df3 = pd.DataFrame({'foo':[99]*5, 'bar':[100]*5})
dfs = [df1, df2, df3]
df = pd.concat([pd.DataFrame({'ID':[i   1 for i, df in enumerate(dfs) for j in range(len(df.index))]}), 
    pd.concat(dfs).reset_index(drop=True)], axis=1)
print(df)

Input:


df1:
   foo  bar
0   33   34
1   33   34
2   33   34

df2:
   foo  bar
0   66   67
1   66   67
2   66   67
3   66   67

df3:
   foo  bar
0   99  100
1   99  100
2   99  100
3   99  100
4   99  100

Output:

vertically concat'ed with new ID column:
    ID  foo  bar
0    1   33   34
1    1   33   34
2    1   33   34
3    2   66   67
4    2   66   67
5    2   66   67
6    2   66   67
7    3   99  100
8    3   99  100
9    3   99  100
10   3   99  100
11   3   99  100
  • Related