I have the following dataframes: df1, df2, df3, ......
Now I would like to make one data frame of them, with each data frame below each other. And I know how to do that but at the same time I want to make a ID per original data frame. So I create a new column in the merged dataset. The column will be like (1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,......). And I know how to make a column with the same value, but i would like to do this at the same time.
Anyone has an idea?
Greetings!
CodePudding user response:
I'd make a list of the dataframes, then use a list comprehension to add the index of the df in the list to each df, and then use pd.concat
:
dfs = [df1, df2, df3]
new_df = pd.concat([df.assign(ID=i) for i, df in enumerate(dfs, start=1)])
CodePudding user response:
You can use pd.concat
as suggested by @richardec however you can use its parameters like keys
and names
with the mainly advantage that the ID
column is in the first position:
dfs = [df1, df2, df3]
out = pd.concat([dfs, keys=range(1, len(dfs) 1), names=['ID', None]).reset_index('ID')
print(out)
# Output
ID A B C D
0 1 0.216093 0.527420 0.473905 0.645006
1 1 0.495148 0.030564 0.735505 0.704401
2 1 0.155198 0.679730 0.895915 0.387254
3 1 0.207006 0.622962 0.550490 0.972048
4 1 0.134301 0.214033 0.779889 0.527854
0 2 0.900729 0.928867 0.089406 0.814716
1 2 0.676181 0.246887 0.932252 0.931694
2 2 0.913082 0.986885 0.383471 0.628707
3 2 0.423171 0.454256 0.721458 0.032126
4 2 0.074285 0.583578 0.677186 0.818504
0 3 0.320201 0.009262 0.343529 0.789488
1 3 0.331445 0.215488 0.455243 0.311798
2 3 0.456113 0.422995 0.041669 0.542399
3 3 0.643180 0.938067 0.476010 0.586512
4 3 0.206706 0.571189 0.690014 0.317337
Setup:
df1 = pd.DataFrame(np.random.random((5, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.random((5, 4)), columns=list('ABCD'))
df3 = pd.DataFrame(np.random.random((5, 4)), columns=list('ABCD'))
CodePudding user response:
One way to do what your question asks is to create a dataframe with the full-length ID
column and concat this horizontally to the result of a concat of the input dataframes vertically:
import pandas as pd
df1 = pd.DataFrame({'foo':[33]*3, 'bar':[34]*3})
df2 = pd.DataFrame({'foo':[66]*4, 'bar':[67]*4})
df3 = pd.DataFrame({'foo':[99]*5, 'bar':[100]*5})
dfs = [df1, df2, df3]
df = pd.concat([pd.DataFrame({'ID':[i 1 for i, df in enumerate(dfs) for j in range(len(df.index))]}),
pd.concat(dfs).reset_index(drop=True)], axis=1)
print(df)
Input:
df1:
foo bar
0 33 34
1 33 34
2 33 34
df2:
foo bar
0 66 67
1 66 67
2 66 67
3 66 67
df3:
foo bar
0 99 100
1 99 100
2 99 100
3 99 100
4 99 100
Output:
vertically concat'ed with new ID column:
ID foo bar
0 1 33 34
1 1 33 34
2 1 33 34
3 2 66 67
4 2 66 67
5 2 66 67
6 2 66 67
7 3 99 100
8 3 99 100
9 3 99 100
10 3 99 100
11 3 99 100