I am trying to web scrape India Energy Dashboard (https://www.niti.gov.in/edm/#elecGeneration) data using Python. Then, when I click on download, the website returns error NET::ERR_CERT_DATE_INVALID
. I guess, because of this I am not getting response 200 message. I tried with Tableauscraper library too, but I am getting error NoneType has no attribute text.
I am writing the following code:
#!pip install TableauScraper
from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/app/profile/niti.energy.vertical/viz/ElectricityGeneration_0/Source"
ts = TS()
ts.loads(url)
CodePudding user response:
You need to inspect the Network tab in your browser's Dev tools, and get the correct url for the data source. Here is one way to obtain that data:
from tableauscraper import TableauScraper as TS
url = 'https://public.tableau.com/views/ElectricityGeneration_0/Source?:display_static_image=y&:bootstrapWhenNotified=true&:embed=true&:language=en-US&:embed=y&:showVizHome=n&:apiID=host0'
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
for t in workbook.worksheets:
print(f"worksheet name : {t.name}") #show worksheet name
print(t.data) #show dataframe for this worksheet
Result in terminal:
worksheet name : Generation Trend by Source
Year Name-value Year Name-alias Year Name-[federated.0hpknup10wcqib1b9qd9s1xn749g].[none:YearName:nk]-value Year Name-[federated.0hpknup10wcqib1b9qd9s1xn749g].[none:YearName:nk]-alias SUM(Generation TWh)-value SUM(Generation TWh)-alias SUM(Generation TWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[sum:Calculation_3893502658938839040:qk]-value SUM(Generation TWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[sum:Calculation_3893502658938839040:qk]-alias Energy Source-alias
0 FY06 FY06 FY06 FY06 697.06083 697.06083 0.22062 0.22062 WIND
1 FY07 FY07 FY07 FY07 751.53005 751.53005 0.21588 0.21588 WIND
2 FY08 FY08 FY08 FY08 809.263687 809.263687 11.065371 11.065371 WIND
3 FY09 FY09 FY09 FY09 838.682997 838.682997 13.19954 13.19954 WIND
4 FY10 FY10 FY10 FY10 898.527489 898.527489 15.171851 15.171851 WIND
... ... ... ... ... ... ... ... ... ...
150 0 0 FY16 FY16 0 0 16.680499 16.680499 BIOMASS-BAGASSE
151 0 0 FY17 FY17 0 0 14.15864 14.15864 BIOMASS-BAGASSE
152 0 0 FY18 FY18 0 0 15.2523 15.2523 BIOMASS-BAGASSE
153 0 0 FY19 FY19 0 0 16.326489 16.326489 BIOMASS-BAGASSE
154 0 0 FY20 FY20 0 0 13.742429 13.742429 BIOMASS-BAGASSE
155 rows × 9 columns
worksheet name : Generation by Source in
Energy Source-alias SUM(Generation TWh)-alias SUM(Generation GWh)-alias SUM(Generation GWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[pcto:sum:Generation_GWh:qk]-alias
0 NUCLEAR 46.47245 46,472MWh 0.028636
1 HYDRO 156.117158 156,117MWh 0.096197
2 COAL 1199.742768 1,199,743MWh 0.739262
3 BIOMASS-BAGASSE 13.742429 13,742MWh 0.008468
4 DIESEL 2.027548 2,028MWh 0.001249
5 NATURAL GAS 73.885792 73,886MWh 0.045527
6 RENEWABLES 0.365895 366MWh 0.000225
7 SMALL HYDRO 9.451229 9,451MWh 0.005824
8 SOLAR 51.938299 51,938MWh 0.032004
9 WIND 69.149642 69,150MWh 0.042609
For documentation, please see https://github.com/bertrandmartel/tableau-scraping