Home > Mobile >  Stuck with HTML scraping using BeautifulSoup (Python)
Stuck with HTML scraping using BeautifulSoup (Python)

Time:07-19

I want to convert activities uploaded to strava to a .gpx file.

To do this I need to scrape strava activity HTML page for the elevation, longitude, latitude, etc... This is stored within the <div data-react-class= line. I have included an extract of the website code below. I only care about the information from {"activity":{"name" onwards

       </li>
      </ul>
     </div>
    </nav>
   </header>
   <div data-react- data-react-props='{
  "activity": {
    "name": "Morning Ride",
    "date": "Today",
    "athlete": {
      "name": "James Whyard",
      "avatarUrl": "https://lh3.googleusercontent.com/a-/AOh14GiA8yxgfozOqSJEiwW9srS-VEZU_mV_UM2iHFZxjw=s96-c",
      "location": "",
      "followersCount": 3,
      "followAthleteUrl": "http://www.strava.com/register?activity_action=athlete\u0026activity_id=7487240518\u0026athlete_id=90220142\u0026content=90220142\u0026cta=follow\u0026element=button\u0026follow_athlete_after_login=true\u0026follow_athlete_after_registration=true\u0026follow_athlete_id=90220142\u0026source=activities_show",
      "totalDistance": "452",
      "distanceUnit": "miles",
      "totalActivities": 40
    },
    "type": "Ride",
    "detailedType": "Ride",
    "kudosCount": 0,
    "comments": [],
    "commentCount": 0,
    "achievementsCount": 11,
    "distance": "11.7 mi",
    "time": "49:38",
    "elevation": "246 ft",
    "calories": 526.0,
    "streams": {
      "altitude": [6.6, 6.6, 6.6, 6.7, 6.7, 6.7, 6.7, 6.7, 6.7, 6.9, 6.7, 6.6, 6.5, 6.4, 6.4, 6.4, 6.4, 6.2, 5.9, 6.0, 5.9, 5.8, 5.7, 5.6, 5.6, 5.6, 5.7, 5.9, 6.0, 6.0, 5.9, 5.9, 5.9, 6.0, 6.0, 6.0, 6.0, 6.0, 6.1, 6.2, 6.2, 6.4, 6.5, 6.5, 6.6, 6.9, 7.2, 7.2, 7.4

I have attached my code also. My plan was to find the "div" in which the data was stored, then slice the string and zip the data as necessary. However after finding the "div" and converting to a string, .index('altitude') returns a ValueError: substring not found. I also think it would be more elegant to purely use BeautifulSoup to scoop up the data but I am unsure how to go with this.

import requests
from bs4 import BeautifulSoup
import csv

url = 'https://www.strava.com/activities/7487240518'
urlr = requests.get(url)

soup = BeautifulSoup(urlr.content, 'html.parser')

divdata = soup.find('div', {'data-react-class':'ActivityPublic'})
strdata = str(divdata)

print(strdata.index('altitude'))

CodePudding user response:

Your very close with this!

What I would do is grab the div element as your are doing then get the data-react-props property that contains all the data your looking for. This is clearly formatted in json so we can interpret as such and get all the information we need from it from there..

import requests
import json
from bs4 import BeautifulSoup
import csv

url = 'https://www.strava.com/activities/7487240518'
urlr = requests.get(url)

soup = BeautifulSoup(urlr.content, 'html.parser')

divdata = soup.find('div', {'data-react-class':'ActivityPublic'})
activity_data = divdata.get("data-react-props")
activity_dict = json.loads(activity_data)

print("My rides elevation was:", activity_dict['activity']['elevation'])

Edit: @It_is_Chris suggested using the Strava API instead, https://developers.strava.com/docs/reference/. This seems like a better alternative.

CodePudding user response:

You might use .get on element to get attribute value, that is

import requests
from bs4 import BeautifulSoup

url = 'https://www.strava.com/activities/7487240518'
urlr = requests.get(url)

soup = BeautifulSoup(urlr.content, 'html.parser')

divdata = soup.find('div', {'data-react-class':'ActivityPublic'})
strdata = divdata.get('data-react-props')
print(strdata)

then excerpt from output is

{"activity":{"name":"Morning Ride","date":"Today","athlete":{"name":"James Whyard","avatarUrl":"https://lh3.googleusercontent.com/a-/AOh14GiA8yxgfozOqSJEiwW9srS-VEZU_mV_UM2iHFZxjw=s96-c","location":"","followersCount":3,"followAthleteUrl":"http://www.strava.com/register?activity_action=athlete\u0026activity_id=7487240518\u0026athlete_id=90220142\u0026content=90220142\u0026cta=follow\u0026element=button\u0026follow_athlete_after_login=true\u0026follow_athlete_after_registration=true\u0026follow_athlete_id=90220142\u0026source=activities_show","totalDistance":"452","distanceUnit":"miles","totalActivities":40},"type":"Ride","detailedType":"Ride","kudosCount":0,"comments":[],"commentCount":0,"achievementsCount":11,"distance":"11.7 mi","time":"49:38","elevation":"246 ft","calories":526.0,"streams":
  • Related