Home > OS >  Web scraping India Energy Dashboard data
Web scraping India Energy Dashboard data

Time:10-06

I am trying to web scrape India Energy Dashboard (https://www.niti.gov.in/edm/#elecGeneration) data using Python. Then, when I click on download, the website returns error NET::ERR_CERT_DATE_INVALID. I guess, because of this I am not getting response 200 message. I tried with Tableauscraper library too, but I am getting error NoneType has no attribute text. I am writing the following code:

#!pip install TableauScraper

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/app/profile/niti.energy.vertical/viz/ElectricityGeneration_0/Source"

ts = TS()

ts.loads(url)

CodePudding user response:

You need to inspect the Network tab in your browser's Dev tools, and get the correct url for the data source. Here is one way to obtain that data:

from tableauscraper import TableauScraper as TS

url = 'https://public.tableau.com/views/ElectricityGeneration_0/Source?:display_static_image=y&:bootstrapWhenNotified=true&:embed=true&:language=en-US&:embed=y&:showVizHome=n&:apiID=host0'

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

for t in workbook.worksheets:
    print(f"worksheet name : {t.name}") #show worksheet name
    print(t.data) #show dataframe for this worksheet

Result in terminal:

worksheet name : Generation Trend by Source 
Year Name-value Year Name-alias Year Name-[federated.0hpknup10wcqib1b9qd9s1xn749g].[none:YearName:nk]-value Year Name-[federated.0hpknup10wcqib1b9qd9s1xn749g].[none:YearName:nk]-alias SUM(Generation TWh)-value   SUM(Generation TWh)-alias   SUM(Generation TWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[sum:Calculation_3893502658938839040:qk]-value SUM(Generation TWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[sum:Calculation_3893502658938839040:qk]-alias Energy Source-alias
0   FY06    FY06    FY06    FY06    697.06083   697.06083   0.22062 0.22062 WIND
1   FY07    FY07    FY07    FY07    751.53005   751.53005   0.21588 0.21588 WIND
2   FY08    FY08    FY08    FY08    809.263687  809.263687  11.065371   11.065371   WIND
3   FY09    FY09    FY09    FY09    838.682997  838.682997  13.19954    13.19954    WIND
4   FY10    FY10    FY10    FY10    898.527489  898.527489  15.171851   15.171851   WIND
... ... ... ... ... ... ... ... ... ...
150 0   0   FY16    FY16    0   0   16.680499   16.680499   BIOMASS-BAGASSE
151 0   0   FY17    FY17    0   0   14.15864    14.15864    BIOMASS-BAGASSE
152 0   0   FY18    FY18    0   0   15.2523 15.2523 BIOMASS-BAGASSE
153 0   0   FY19    FY19    0   0   16.326489   16.326489   BIOMASS-BAGASSE
154 0   0   FY20    FY20    0   0   13.742429   13.742429   BIOMASS-BAGASSE
155 rows × 9 columns

worksheet name : Generation by Source in
Energy Source-alias SUM(Generation TWh)-alias   SUM(Generation GWh)-alias   SUM(Generation GWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[pcto:sum:Generation_GWh:qk]-alias
0   NUCLEAR 46.47245    46,472MWh   0.028636
1   HYDRO   156.117158  156,117MWh  0.096197
2   COAL    1199.742768 1,199,743MWh    0.739262
3   BIOMASS-BAGASSE 13.742429   13,742MWh   0.008468
4   DIESEL  2.027548    2,028MWh    0.001249
5   NATURAL GAS 73.885792   73,886MWh   0.045527
6   RENEWABLES  0.365895    366MWh  0.000225
7   SMALL HYDRO 9.451229    9,451MWh    0.005824
8   SOLAR   51.938299   51,938MWh   0.032004
9   WIND    69.149642   69,150MWh   0.042609

For documentation, please see https://github.com/bertrandmartel/tableau-scraping

  • Related