Home > Back-end >  output data frame name and column names into a dataframe in pandas
output data frame name and column names into a dataframe in pandas

Time:04-07

I have a list of dataframes:

list_of_datasets = [master,
                    assessments,
                    vertical_list,
                    cleanedstids,
                    full_stack_eval,
                    sales]

they all have different sets of column names:

master = pd.DataFrame({'ID': [10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013],
'Type': ['Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue'],
'Year': [2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021,2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021],
'Score': [0,0,0,0,0,0,0,0,0,0,14,24,16,5,87,33,0,0,0,0,0,0,0,0,0,0,11,13,3,16,37,49]})

vertical_list= pd.DataFrame({'ID':[5863, 4506, 4514, 4543],
'Winner_ID': [13463, 20345, 21012, 20476]})

etc. etc.

I would like to get the same number of dataframes that have the name of the dataframe and the column names in 2 columns

master_names:

df_name     column_name

master      ID
master      Type
master      Year

vertical_list_names:

df_name     column_name

vertical_list      ID
vertical_list      Winner_ID

I did try:

column_names_list = list()

for i in list_of_datasets:
  column_names_list.append(i.columns.values)

but yeah this is a list and not a dataframe, and it's not many dataframes

CodePudding user response:

You can't directly access an object bound variable name in python.

You should use a dictionary as initial container:

dict_of_dfs = {'master': master, 'vertical_list': vertical_list}

Then you can loop over the items and extract your data:

col_list = [pd.DataFrame({'df_name': [k]*len(d.columns), 'column_name': list(d.columns)})
            for k,d in dict_of_dfs.items()]

output:

[  df_name column_name
 0  master          ID
 1  master        Type
 2  master        Year
 3  master       Score,
          df_name column_name
 0  vertical_list          ID
 1  vertical_list   Winner_ID]

Or, as dictionary:

col_dict = {k: pd.DataFrame({'df_name': [k]*len(d.columns), 'column_name': list(d.columns)})
            for k,d in dict_of_dfs.items()}

output:

{'master':   df_name column_name
 0  master          ID
 1  master        Type
 2  master        Year
 3  master       Score,
 'vertical_list':          df_name column_name
 0  vertical_list          ID
 1  vertical_list   Winner_ID}
output as csv
for k,d in dict_of_dfs.items():
    (pd.DataFrame({'df_name': [k]*len(d.columns),
                   'column_name': list(d.columns)})
       .to_csv(f'{k}.csv')
    )
  • Related