Python project that extracts and analyzes a data sample from Strava API. The records are effort power values calculated for given segments of a ride.
The daily mood
Today is a bank holiday. I didn't work on duty, but I did "kind of" work for fun.
As a roadbike enthusiast I am riding once a week in the summer. I usually record my activities using a Garmin watch, then track and share them with friends on Strava which is the most popular social-media platform (mainly) dedicated to endurance sports. Strava offers a free account and a commercial subscription with additional features. I used to pay for premium for a couple of months but I personally didn't find it worth.
The fascination of cycling
Cycling is addictive. First, it catches you with outdoor experience, in which you are escaping duty routine, tanking natural oxygen, discovering new places. Second, it treats your body well through carefull and efficient biomechanics. It is (almost) free of injury provided that you keep away from crowds and stunts. Third, it makes you feel well thanks to the activation of hapiness neurotransmitters and hormones like dopamine, adrenaline and endorphins. Fourth, it is an infinite quest for delivering the best possible performance at given road and weather conditions through technique, effort and endurance. There are actually a lot a tuning parameters to consider, which makes it especially interesting for bike manufacturing and setting, for training science and medication, for competition and fame and... unfortunately for cheating :-(
Why power matters
If you want to perform in cycling, then you need to observe and manage your effort. Strava allows you to store and visualize lots of different metrics including your time, distance, speed, cadence, heart-rate, geolocation, elevation, gradient, temperature, wind, humidity / air pressure, vam (as discussed further below) and power. However, different sensors are required in order to collect this data. Your speed is easy to measure but is nothing absolute since it depends a lot on the wind and gradient resisting to the applied force. Your heart-rate indicates the ability of your body to resist to pressure, but does not measure the performance itself. As of today, the most reliable performance metric for climbs is Michele Ferrari's Average Ascent Speed (VAM) expressed in elevation meters per hour. Unlike other measures, the VAM doesn't "lie" about your competitive ranking while riding a pass, but is useless on a flat time trial.
Going for power
There are a lot of different types of hardware (ex. rear hubs, crank hubs, crank arms, pedals, chainring/spiders, smart pods) and brands (ex. SRM, Powertap, Stages) available on the market of Powermeters. While their main differentiator happens to be modularity and precision, most systems have the ability to send data to a bike computer via ANT+ wireless protocol, or a Smartphone via Bluetooth. They have an acceptable weight and battery life, but a cost of several hundreds of dollars (excluding the central unit). Unfortunately I do not own such a device yet. It was still high-end (ex. SRM) by the time I bought my road bike and computer 7 years ago, and it wouldn't be reasonable to upgrade now. With that, I can't easily analyse my performance the way other riders do nowadays.
Measurement alternatives
I read about Strava calculated power and Velocomp Powerpod technology, which are both the result of the computation of non-power dedicated metrics leading me to the following ideas:
Option 1
The idea is to create power data based on data points available in a ride record (GPX file or API). Like Strava, we may calculate the average power needed for a mass M (bike + rider) to move from geolocation Lat1/Lon2 at date T1 and altitude A1, to point geolocation Lat2/Lon2 at date T2 and altitude A2.
Stackoverflow tells you how to calculate the distance between two positions on the earth. Strava also mentions common factors for taking elevation, air/rolling resistance and acceleration into account. I highly recommend Alex Simmons blog Ascension rate and power to body mass ratios for further details.
At the end, provided that I don't have any mistake while doing the maths, that the measures of my GPS device were accurate, that I didn't ride through a hummock, took a curve or stopped at a trafic light, I might eventually get some realistic metrics... in between lots of backgrond noise.
Option 2
The idea is to trust Strava calculated power available per activity and per segment, where a segment is an interesting portion of the track previously registered by users in a huge database. The data can be easily extracted from Strava public API. By the time I am writing, Strava sets an API limit to 100 calls every quarter of an hour, and 1000 calls a UTC day. That could be sufficient for my purpose. Instead of implementing individual REST requests they also recommend using a Swagger SDK that offers same compatibility level for many different languages including Java and C#. I'll go the unofficial way instead, and make use of an open library for Python called stravalib. It is certainly not perfect but it is even easier to implement than a Swagger client, covers most of the scenarios I need and still doesn't prevent a direct API call if absolutely needed (ex. refresh token).
Setup
- Obviously I already have a Strava account and my activities uploaded
- You need to register a new API client which returns an application ID and a secret key
- Python project with the following requirements.txt
chardet matplotlib pandas requests stravalib urllib3
- Python virtual environment creation
virtualenv -p $(which python3) venv
source venv/bin/activate
pip3 install -r requirements.txt
Authn/Authz
Like any other modern app, Strava supports OAuth2 authorization flows. I found it convenient to have a dedicated Python script (ex. login.py) for authentication.
In a first part of the script, we define some input arguments:
# setup program usage and parse arguments
import os, sys, argparse
parser = argparse.ArgumentParser()
help_client_id="Client ID (default: $STRAVA_CLIENT: " + str(os.environ.get('STRAVA_CLIENT')) + ")"
parser.add_argument("--client_id", "-i", help=help_client_id)
help_client_secret="Client Secret (default: $STRAVA_SECRET: " + str(os.environ.get('STRAVA_SECRET')) + ")"
parser.add_argument("--client_secret", "-s", help=help_client_secret)
parser.add_argument("--access_code", "-c", help="Access Code (default: None)")
help_refresh_token="Refresh Token (default: $STRAVA_REFRESH_TOKEN: " + str(os.environ.get('STRAVA_REFRESH_TOKEN')) + ")"
parser.add_argument("--refresh_token", "-t", help=help_refresh_token)
args = parser.parse_args()
Their initialisation might be done via environment variables if not passed as command line parameters:
# set default argument value if available as environment variables (see --help)
if args.client_id is None:
args.client_id = os.environ.get('STRAVA_CLIENT')
print("Set Strava client ID to %s" % args.client_id)
if args.client_secret is None:
args.client_secret = os.environ.get('STRAVA_SECRET')
print("Set Strava client Secret to %s" % args.client_secret)
if args.refresh_token is None:
args.refresh_token = os.environ.get('STRAVA_REFRESH_TOKEN')
print("Set Strava Refresh Token to %s" % args.refresh_token)
Their validation mainly depends on mandatory ones for no:
# check mandatory arguments (client_id, secret_key) and terminate as necessary
if args.client_id is None:
sys.exit("Your input is invalid. Please specify a Client ID (--client_id) or read Usage (--help)")
if args.client_secret is None:
sys.exit("Your input is invalid. Please specify a Client Secret (--client_secret) or read Usage (--help)")
In a second part we handle the authorisation flow:
# get temporary access_code or access_token (depending on input parameters) import requests, webbrowser from stravalib import Client client = Client()
# obtain a new access token if a refresh token is available if args.refresh_token is not None: token_url = "https://www.strava.com/api/v3/oauth/token" token_opt = {'client_id': args.client_id, 'client_secret': args.client_secret, 'grant_type': 'refresh_token', 'refresh_token': args.refresh_token} x = requests.post(token_url, data = token_opt) print(x.text) sys.exit("You are ready to use app.py")
# have the user sign in to strava and authorize if args.access_code is None: url = client.authorization_url(client_id=args.client_id, redirect_uri='http://127.0.0.1:5000/authorization') # authorize (you might have to login first) webbrowser.open(url, new=2) args.access_code = input("Copy your authorization code from response URL here: ").strip()
# get access_token from access_code if args.access_code is not None: access_dict = client.exchange_code_for_token( client_id=args.client_id, client_secret=args.client_secret, code=args.access_code ) print(access_dict) os.putenv("STRAVA_ACCESS_TOKEN", access_dict['access_token']) os.putenv("STRAVA_REFRESH_TOKEN", access_dict['refresh_token']) print("You are ready to use app.py") os.system('bash') # just a tweak, creating a subprocess for re-using env sys.exit()
Data extract
Now that we have a token, we are allowed to query Strava entity objects visible to the account. We are doing so in a different script app.py. Like before, we want to check command line arguments, especially that a valid token has been passed.
# setup program usage and parse arguments import os, sys, argparse parser = argparse.ArgumentParser() parser.add_argument("--access_token", "-t", help="Access Token")
args = parser.parse_args() # terminate if access token is not set if args.access_token is None: sys.exit("Please specify an Access Token (--access_token) or read Usage (--help)")
# instanciate strava client from stravalib import Client, exc # stop at wrong token try: # https://pythonhosted.org/stravalib/api.html#stravalib.client.Client client = Client(access_token=args.access_token) except exc.AccessUnauthorized: sys.exit("Your token is not valid. Please run login.py to get one.") # stop at API rate limit try: athlete = client.get_athlete() print("Hello ", athlete.firstname.strip(), " ", athlete.lastname.strip(), "!") except exc.RateLimitExceeded: sys.exit("API rate limit exceeded. Please retry in a bit.")We'll store data using Python DAta ANalysiS library (Pandas) object Dataframe.
# prepare data collection import pandas as pd # start/end index might be usefull later to identify effort position in activity stream df = pd.DataFrame(columns=['start_index','end_index','dist','avg_grade','date','avg_power','duration']) # stop at API rate limit try: # work on last activity only activities = client.get_activities(limit=1) # request last activity only except exc.RateLimitExceeded: sys.exit("API rate limit exceeded. Please retry in a bit.")
Let's assume last activity is a ride that we want to analyze. The data model of activity/effort differenciates Summary from Detail objects, where Details are only viewable by the object owner.
# work on last activity only act_summary = next(activities) # stop at API rate limit try: # https://developers.strava.com/docs/reference/#api-models-SummaryActivity act_detail = client.get_activity(act_summary.id, include_all_efforts="true") except exc.RateLimitExceeded: sys.exit("API rate limit exceeded. Please retry in a bit.")
Persist Activity metadata in a text file.
# https://developers.strava.com/docs/reference/#api-models-DetailedActivity # store activity summary with open(str(act_summary.id) + ".txt", "w") as text_file: print("Name: {}".format(act_summary.name), file=text_file) print("Date: {}".format(act_summary.start_date_local), file=text_file) print("Dist: {} km".format(str(act_summary.distance / 1000).split(' ')[0]), file=text_file) print("Elevation: {}".format(act_summary.total_elevation_gain), file=text_file) print("Time: {}".format(act_summary.moving_time), file=text_file) print("Speed: {} km/h".format(str(act_summary.average_speed * 3.6).split(' ')[0]), file=text_file) print("Power: {} w".format(act_summary.average_watts), file=text_file)Now that we have a list of Segment effort summaries from Activity detail object, we want to access their details as well, but remember we are limited to 100 calls a day so we'll take first 5 segments only.
# get segment efforts for seg_effort_summary in act_detail.segment_efforts: try: # https://developers.strava.com/docs/reference/#api-models-SummarySegmentEffort seg_effort_detail = client.get_segment_effort(seg_effort_summary.id) # https://developers.strava.com/docs/reference/#api-models-DetailedSegmentEffort # https://developers.strava.com/docs/reference/#api-models-SummarySegment seg_summary = client.get_segment(seg_effort_detail.segment.id) df.loc[len(df)] = [ seg_effort_detail.start_index, seg_effort_detail.end_index, seg_effort_summary.distance, seg_summary.average_grade, seg_effort_detail.start_date_local, seg_effort_detail.average_watts, seg_effort_detail.elapsed_time ] except exc.RateLimitExceeded: print ("API rate limit exceeded. Data collected until now will be persisted.") break
This is the file content
$ cat *.txt
Name: Morning Ride
Date: 2020-05-31 10:27:14
Dist: 70.20 km
Elevation: 463.90 m
Time: 2:28:41
Speed: 28.33 km/h
Power: 140.5 w
# result overview and serialization print(df) df.to_pickle(str(act_summary.id) + '.pkl')
We were able to create a new virtual stream of Power metrics, however those are only available for parts of a ride belonging to one or more segments. As already mentioned before, we are limited to 100 API calls, which is equivalent to 48 segment efforts given the required objects (1 summary and 1 detail per activity, then 1 summary and 1 detail per segment effort). The result table is displayed to the program output and written into a local file for further analysis (i.e. data preparation/visualization).
API rate limit exceeded. Data collected until now will be persisted. start_index end_index dist avg_grade date avg_power duration
0 344 440 828.30 m -1.7 2020-05-31 10:35:27 173.9 00:01:36
1 565 894 2988.80 m 0.1 2020-05-31 10:39:08 167.0 00:05:29
2 701 888 1714.40 m 0.1 2020-05-31 10:41:24 174.4 00:03:07
3 705 753 455.10 m -0.0 2020-05-31 10:41:28 196.4 00:00:48
4 708 756 456.20 m 0.0 2020-05-31 10:41:31 194.2 00:00:48 ...
Data preview
We will start visualisation of our power data using Python MAThematical PLOTting LIBrary (matplotlib).
import pandas as pd # read and filter previously serialized data df = pd.read_pickle("./efforts.pkl") df = df[(df.avg_grade > -3) & (df.avg_grade < 24)] print(df.head()) import matplotlib.pyplot as plt # draw bar chart of power per elevation grade plt.bar(df['avg_grade'],df['avg_power'], 0.2, alpha=0.5) plt.xlabel('Grade (%)') plt.ylabel('Power (W)') plt.title('Activity efforts') plt.show()
Now we'll use seaborn, a wrapper library based on matplotlib. It significantly simplifies advanced representations involving some aggregation, faceting and classification.
import seaborn as sns
sns.regplot(x='avg_grade',y='avg_power',data=df[['avg_grade','avg_power']], fit_reg=True)
plt.show()
In this case we are "misusing" the concept of linear regression for drawing a trend line of power. Indeed, individual performance measures are split into two categories: those which lie above, and those which lie below average probability. Given first degree equation y = ax + b, the regression factor a reflects the overall training intensity while b reflects the overall performance level.
Note: Some bars are currently overlapping, which means that multiple efforts were measured on same, or similar segments. We could keep the maximum only, or analyse further interesting dimensions like for example the segment distance and max grade, which definitely make it more difficult to ride.
Conslusion
We have successfully retrieved some power data from Strava using Python. The data was taken out of the segment efforts produced during a ride activity. We were also able to represent the data in a graph.
Next
We are not quite done with this initiative. The functional use-case is a performance analysis. We'll have to get more insight in order to achieve that. The technical goal is to get more familiar with Python and there is enough potential for improvement. So we will definitely follow-up through a series of blog posts.
Sources
Comments
Post a Comment