I have two
dataframe
as follows:
df1:
column1
0 aaa
1 aaa
2 aaa
3 aaa
df2:
column1
0 aaa
1 aaa
2 aaa
3 aaa
I want to concat
them but I want to know which value comes from which dataframe
So I want a dataframe
like
df1:
Index column1
0 0_df1 aaa
1 1_df1 aaa
2 2_df1 aaa
3 3_df1 aaa
4 4_df2 aaa
5 5_df2 aaa
6 6_df2 aaa
I know how to make the index column
but I cannot add "identifier" to its values
CodePudding user response:
The quickest solution I can think of:
df1=pd.DataFrame({'column1': ['aaa','aaa','aaa','aaa']})
df2=pd.DataFrame({'column1': ['aaa','aaa','aaa','aaa']})
df1["Index"] = df1.index.astype(str) '_df1'
df2["Index"] = df2.index.astype(str) '_df2'
pd.concat([df1,df2], axis=0, ignore_index=True)
The output:
column1 Index
0 aaa 0_df1
1 aaa 1_df1
2 aaa 2_df1
3 aaa 3_df1
4 aaa 0_df2
5 aaa 1_df2
6 aaa 2_df2
7 aaa 3_df2
You can also create multi-index keys with concat by:
pd.concat([df1,df2], keys=['df1','df2'])
which gives:
column1
df1 0 aaa
1 aaa
2 aaa
3 aaa
df2 0 aaa
1 aaa
2 aaa
3 aaa
CodePudding user response:
You can create a column for each dataframe with the name of the dataframe, then add the index to this column after concatenation:
df1 = pd.DataFrame({'column1': ['aaa', 'aaa', 'aaa', 'aaa']})
df2 = pd.DataFrame({'column1': ['aaa', 'aaa', 'aaa', 'aaa']})
df1['Index'] = 'df1'
df2['Index'] = 'df2'
cat = pd.concat([df1, df2], ignore_index=True)
cat['Index'] = cat.index.astype(str) '_' cat['Index'] # add the index
cat = cat[['Index', 'column1']] # reorder the columns
Index column1
0 0_df1 aaa
1 1_df1 aaa
2 2_df1 aaa
3 3_df1 aaa
4 4_df2 aaa
5 5_df2 aaa
6 6_df2 aaa
7 7_df2 aaa