Home > Mobile >  How to return dataframe containing column names of multiple dataframe
How to return dataframe containing column names of multiple dataframe

Time:05-09

I have multiple dataframes and would like a dataframe that contains all column names from said multiple dataframes.

For example :

# Existing Dataframes
df1 =
    df1_colA  df1_colB  df1_colC
0   1         2         3
1   4         5         6
2   7         8         9

df2 =
    df2_colA  df2_colB  df3_colC
0   10        11        12
1   13        14        15
2   16        17        18

df3 =
    df3_colA  df3_colB  df3_colC
0   30        31        32
1   33        34        35
2   36        37        38

I would like to get a dataframe like this :

names =
     df_names   col_names
0    df1        df1_colA
1    df1        df1_colB
2    df1        df1_colC
3    df2        df2_colA
4    df2        df2_colB
5    df2        df2_colC
6    df3        df3_colA
7    df3        df3_colB
8    df3        df3_colC

Help would be very appreciated and thank you in advance!

CodePudding user response:

If possible extract DataFrame names fom columns names use list comprehension with concat and last for new column in first position use DataFrame.insert with Series.str.extractSeries.str.extractall for values from columnsnames before _:

dfs = [df1, df2, df3]
df = pd.concat([df.columns.to_frame(name='col_names') for df in dfs], ignore_index=True)
df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_'))
print (df)
  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df3  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC

Similar ide with flatten list comprehension:

dfs = [df1, df2, df3]
df = pd.DataFrame({'col_names': [x for df in dfs for x in df.columns]})
df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_'))
print (df)
  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df3  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC

Alternative is create dictionary of DataFrames and in dict comprehension use concat, keys of dict create first level of MultiIndex, so not necessary parse columns names:

dfs = {'df1':df1, 'df2':df2, 'df3':df3}
df = (pd.concat({k:v.columns.to_frame(name='col_names') for k, v in dfs.items()})
        .droplevel(1)
        .rename_axis('df_names')
        .reset_index())

print (df)
  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df2  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC

CodePudding user response:

dfs = [df1, df2, df3]
df = pd.DataFrame({'col_names': pd.concat(dfs).columns})
df['df_names'] = df['col_names'].str.split('_').str[0]
print(df)

Output:

  col_names df_names
0  df1_colA      df1
1  df1_colB      df1
2  df1_colC      df1
3  df2_colA      df2
4  df2_colB      df2
5  df2_colC      df2
6  df3_colA      df3
7  df3_colB      df3
8  df3_colC      df3

CodePudding user response:

You can try

dfs = [df1, df2, df3]

df = (pd.DataFrame({'col_names': [df.columns.tolist() for df in dfs]})
      .explode('col_names', ignore_index=True)
      .pipe(lambda df: df.assign(df_names=df['col_names'].str.split('_').str[0])))
print(df)

  col_names df_names
0  df1_colA      df1
1  df1_colB      df1
2  df1_colC      df1
3  df2_colA      df2
4  df2_colB      df2
5  df3_colC      df3
6  df3_colA      df3
7  df3_colB      df3
8  df3_colC      df3

If the order matters

df.insert(0, 'df_names', df.pop('df_names'))
print(df)

  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df3  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC

CodePudding user response:

One option is to append the columns (they are indexes), and repeat the df_names with the lengths of the columns for each dataframe, before creating a final dataframe:

dfs = [df1, df2, df3]

col_names = df1.columns.append([df.columns for df in dfs[1:]])

lengths = [len(df) for df in dfs] # or [*map(len, dfs)]

# only useful if you have lots of dataframes
# else, it is just easier to write ['df1', 'df2', 'df3']
df_names = [f"df{num 1}" for num, _ in enumerate(dfs)]

df_names = np.repeat(df_names, lengths)

df = {'df_names' : df_names, 'col_names': col_names}

pd.DataFrame(df, copy = False)


  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df2  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC

  • Related