I want to extract specific rows (assume for now that I already have the row number) from .xlsx file to a list. I addition, I don't know if it is possible but to take the first column as the list's name.
For example: the table I want to extract info from:
12/31/2020 12/31/2019 12/31/2018 12/31/2017
Revenue 1.823500e 11 1.614020e 11 1.369580e 11 1.110240e 11
Revenue Growth 1.298000e-01 1.785000e-01 2.336000e-01 2.373000e-01
Cost of Revenue 8.473200e 10 7.189600e 10 5.954900e 10 4.558300e 10
Gross Profit 9.761800e 10 8.950600e 10 7.740900e 10 6.544100e 10
If it is possible I want to get the info in this order: Revenue = ["1.8235E 11", "1.61402E 11", "1.36958E 11" , "1.11024E 11"]
I have already tried using xlrd to get this job done but I always get a message
xlrd.biffh.XLRDError: Excel xlsx file; not supported
Thanks in advance and thank you for your help!
CodePudding user response:
Install openpyxl
then use read_excel
:
# Python env: pip install openpyxl
# Anaconda env: conda install openpyxl
df = pd.read_excel('data.xlsx', index_col=0, engine='openpyxl')
print(df)
# Output:
12/31/2020 12/31/2019 12/31/2018 12/31/2017
Revenue 1.823500e 11 1.614020e 11 1.369580e 11 1.110240e 11
Revenue Growth 1.298000e-01 1.785000e-01 2.336000e-01 2.373000e-01
Cost of Revenue 8.473200e 10 7.189600e 10 5.954900e 10 4.558300e 10
Gross Profit 9.761800e 10 8.950600e 10 7.740900e 10 6.544100e 10
To extract the row Revenue
, use:
Revenue = df.loc['Revenue']