I am new to coding and currently i want to create individual dataframes from each excel tab. It works out so far by doing a search in this forum (i found a sample using dictionary), but then i need one more step which i can't figure out.
This is the code i am using:
import pandas as pd
excel = 'sample.xlsx'
xls = pd.ExcelFile(excel)
d = {}
for sheet in xls.sheet_names:
print(sheet)
d[f'{sheet}'] = pd.read_excel(xls, sheet_name=sheet)
Let's say i have 3 excel tabs called 'alpha', 'beta' and 'charlie'.
the code above will gave me 3 dataframes and i can call them by typing: d['alpha']
, d['beta']
and d['charlie']
.
What i want is to rename the dataframes so instead of calling them by typing (for example) d['alpha']
, i just need to write alpha
(without any other extras).
CodePudding user response:
Don't rename them.
I can think of two scenarios here:
1. The sheets are fundamentally different
When people ask how to dynamically assign to variable names, the usual (and best) answer is "Use a dictionary". Here's one example.
Indeed, this is the reason Pandas does it this way!
In this case, my opinion is that your best move here is to do nothing, and just use the dictionary you have.
2. The sheets are roughly the same
If the sheets are all basically the same, and only differ by one attribute (e.g. they represent monthly sales and the names of the sheets are 'May', 'June', etc), then your best move is to merge them somehow, adding a column to reflect the sheet name (month, in my example).
Whatever you do, don't use exec
or eval
, no matter what anyone tells you. They are not options for beginner programmers.
CodePudding user response:
I think you are looking for the build-in exec
method, which executes strings.
But I do not recommend using exec
, it is really widely discussed why it shouldn't be used or at least should be used cautiously.
As I do not have your data, I think it is achievable using the following code:
import pandas as pd
excel='sample.xlsx'
xls=pd.ExcelFile(excel)
for sheet in xls.sheet_names:
print(sheet)
code_to_execute = f'{sheet} = pd.read_excel(xls,sheet_name={sheet})'
exec(code_to_execute)
But again, I highlight that it is not the cleanest way to do that. Your approach is definitely cleaner, to be more precise, I would always use dicts for those kinds of assignments. See here for more about exec
.
In general, you want to generate a string.
possible_string = 'a=10'
exec(possible_string)
print(a) # 10
CodePudding user response:
You need to create variables which correspond to the three dataframes:
alpha = d['alpha']
beta = d['beta']
charlie = d['charlie']
or more succinctly:
alpha, beta, charlie = d.values()
Edit:
Since you mentioned that the excel sheet could have 50 tabs and could grow, you may prefer to do it your original loop. This can be done dynamically using exec
import pandas as pd
excel = 'sample.xlsx'
xls = pd.ExcelFile(excel)
d = {}
for sheet in xls.sheet_names:
print(sheet)
exec(f'{sheet}' " = pd.read_excel(xls, sheet_name=sheet)")
It might be better practice, however, to simply index your sheets and access them by index. A 50 length collection of excel sheets is probably better organized by appending to a list and accessing by index:
d = []
for sheet in xls.sheet_names:
print(sheet)
d.append(pd.read_excel(xls, sheet_name=sheet))
#d[0] = alpha; d[1] = beta, and so on...