I have multiple dataframes and would like a dataframe that contains all column names from said multiple dataframes.
For example :
# Existing Dataframes
df1 =
df1_colA df1_colB df1_colC
0 1 2 3
1 4 5 6
2 7 8 9
df2 =
df2_colA df2_colB df3_colC
0 10 11 12
1 13 14 15
2 16 17 18
df3 =
df3_colA df3_colB df3_colC
0 30 31 32
1 33 34 35
2 36 37 38
I would like to get a dataframe like this :
names =
df_names col_names
0 df1 df1_colA
1 df1 df1_colB
2 df1 df1_colC
3 df2 df2_colA
4 df2 df2_colB
5 df2 df2_colC
6 df3 df3_colA
7 df3 df3_colB
8 df3 df3_colC
Help would be very appreciated and thank you in advance!
CodePudding user response:
If possible extract DataFrame names
fom columns names
use list comprehension with concat
and last for new column in first position use DataFrame.insert
with Series.str.extract
Series.str.extractall
for values from columnsnames before _
:
dfs = [df1, df2, df3]
df = pd.concat([df.columns.to_frame(name='col_names') for df in dfs], ignore_index=True)
df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_'))
print (df)
df_names col_names
0 df1 df1_colA
1 df1 df1_colB
2 df1 df1_colC
3 df2 df2_colA
4 df2 df2_colB
5 df3 df3_colC
6 df3 df3_colA
7 df3 df3_colB
8 df3 df3_colC
Similar ide with flatten list comprehension:
dfs = [df1, df2, df3]
df = pd.DataFrame({'col_names': [x for df in dfs for x in df.columns]})
df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_'))
print (df)
df_names col_names
0 df1 df1_colA
1 df1 df1_colB
2 df1 df1_colC
3 df2 df2_colA
4 df2 df2_colB
5 df3 df3_colC
6 df3 df3_colA
7 df3 df3_colB
8 df3 df3_colC
Alternative is create dictionary of DataFrames and in dict comprehension use concat
, keys of dict create first level of MultiIndex
, so not necessary parse columns names:
dfs = {'df1':df1, 'df2':df2, 'df3':df3}
df = (pd.concat({k:v.columns.to_frame(name='col_names') for k, v in dfs.items()})
.droplevel(1)
.rename_axis('df_names')
.reset_index())
print (df)
df_names col_names
0 df1 df1_colA
1 df1 df1_colB
2 df1 df1_colC
3 df2 df2_colA
4 df2 df2_colB
5 df2 df3_colC
6 df3 df3_colA
7 df3 df3_colB
8 df3 df3_colC
CodePudding user response:
dfs = [df1, df2, df3]
df = pd.DataFrame({'col_names': pd.concat(dfs).columns})
df['df_names'] = df['col_names'].str.split('_').str[0]
print(df)
Output:
col_names df_names
0 df1_colA df1
1 df1_colB df1
2 df1_colC df1
3 df2_colA df2
4 df2_colB df2
5 df2_colC df2
6 df3_colA df3
7 df3_colB df3
8 df3_colC df3
CodePudding user response:
You can try
dfs = [df1, df2, df3]
df = (pd.DataFrame({'col_names': [df.columns.tolist() for df in dfs]})
.explode('col_names', ignore_index=True)
.pipe(lambda df: df.assign(df_names=df['col_names'].str.split('_').str[0])))
print(df)
col_names df_names
0 df1_colA df1
1 df1_colB df1
2 df1_colC df1
3 df2_colA df2
4 df2_colB df2
5 df3_colC df3
6 df3_colA df3
7 df3_colB df3
8 df3_colC df3
If the order matters
df.insert(0, 'df_names', df.pop('df_names'))
print(df)
df_names col_names
0 df1 df1_colA
1 df1 df1_colB
2 df1 df1_colC
3 df2 df2_colA
4 df2 df2_colB
5 df3 df3_colC
6 df3 df3_colA
7 df3 df3_colB
8 df3 df3_colC
CodePudding user response:
One option is to append the columns (they are indexes), and repeat the df_names with the lengths of the columns for each dataframe, before creating a final dataframe:
dfs = [df1, df2, df3]
col_names = df1.columns.append([df.columns for df in dfs[1:]])
lengths = [len(df) for df in dfs] # or [*map(len, dfs)]
# only useful if you have lots of dataframes
# else, it is just easier to write ['df1', 'df2', 'df3']
df_names = [f"df{num 1}" for num, _ in enumerate(dfs)]
df_names = np.repeat(df_names, lengths)
df = {'df_names' : df_names, 'col_names': col_names}
pd.DataFrame(df, copy = False)
df_names col_names
0 df1 df1_colA
1 df1 df1_colB
2 df1 df1_colC
3 df2 df2_colA
4 df2 df2_colB
5 df2 df3_colC
6 df3 df3_colA
7 df3 df3_colB
8 df3 df3_colC