I have a list of dataframes:
list_of_datasets = [master,
assessments,
vertical_list,
cleanedstids,
full_stack_eval,
sales]
they all have different sets of column names:
master = pd.DataFrame({'ID': [10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013],
'Type': ['Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue'],
'Year': [2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021,2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021],
'Score': [0,0,0,0,0,0,0,0,0,0,14,24,16,5,87,33,0,0,0,0,0,0,0,0,0,0,11,13,3,16,37,49]})
vertical_list= pd.DataFrame({'ID':[5863, 4506, 4514, 4543],
'Winner_ID': [13463, 20345, 21012, 20476]})
etc. etc.
I would like to get the same number of dataframes that have the name of the dataframe and the column names in 2 columns
master_names:
df_name column_name
master ID
master Type
master Year
vertical_list_names:
df_name column_name
vertical_list ID
vertical_list Winner_ID
I did try:
column_names_list = list()
for i in list_of_datasets:
column_names_list.append(i.columns.values)
but yeah this is a list and not a dataframe, and it's not many dataframes
CodePudding user response:
You can't directly access an object bound variable name in python.
You should use a dictionary as initial container:
dict_of_dfs = {'master': master, 'vertical_list': vertical_list}
Then you can loop over the items and extract your data:
col_list = [pd.DataFrame({'df_name': [k]*len(d.columns), 'column_name': list(d.columns)})
for k,d in dict_of_dfs.items()]
output:
[ df_name column_name
0 master ID
1 master Type
2 master Year
3 master Score,
df_name column_name
0 vertical_list ID
1 vertical_list Winner_ID]
Or, as dictionary:
col_dict = {k: pd.DataFrame({'df_name': [k]*len(d.columns), 'column_name': list(d.columns)})
for k,d in dict_of_dfs.items()}
output:
{'master': df_name column_name
0 master ID
1 master Type
2 master Year
3 master Score,
'vertical_list': df_name column_name
0 vertical_list ID
1 vertical_list Winner_ID}
output as csv
for k,d in dict_of_dfs.items():
(pd.DataFrame({'df_name': [k]*len(d.columns),
'column_name': list(d.columns)})
.to_csv(f'{k}.csv')
)