Skip to content
Strava's Revolutionary AI Leaderboard Integrity System: Eliminating Cheaters for Good

Strava's Revolutionary AI Leaderboard Integrity System: Eliminating Cheaters for Good

Strava's Machine Learning Approach to Leaderboard Integrity

In the ever-evolving world of fitness and technology, Strava is leading the charge with a groundbreaking approach to maintaining the integrity of its leaderboards. Imagine you've just finished a grueling ride or run, you're dripping with sweat, heart pounding, and you upload your activity to Strava. But here's the catch: how does Strava ensure that these records are pure, untainted by accidental car rides that might skew the competitive landscape? Enter the marvel of machine learning.

Strava's innovative 'Cars on Segments' model is a digital gatekeeper, ensuring that every kilometer recorded was powered by human effort, not horsepower. This model scrutinizes each activity uploaded, employing a sophisticated algorithm that assigns a probability score from 0 to 1. If this score tips over a certain threshold, it's a red flag that part of your journey might have been motor-powered, prompting users to either adjust the record or keep it private.

But how does this digital wizardry operate? It's simpler than you might think, yet brilliantly complex in execution. Upon the completion of your upload, the model dives into the data, extracting over 57 distinct features from your activity. These aren't just random numbers; they're carefully calculated metrics like velocity averages, acceleration variances, and even the intriguingly named "jerk" – the rate of change of acceleration.

One particularly fascinating feature is the "Sendrix Coefficient," named after one of Strava’s fastest cyclists, Jimi Sendrix. It measures how quickly and frequently a cyclist can accelerate to a certain speed before fatigue sets in – a feat no car could mimic.

These features are then weighed using SHAP values, a method that helps determine how much each feature sways the model's decision towards "bike" or "car." For instance, hitting a top speed of 80mph would heavily tip the scale towards "car," as achieving such a feat on two wheels is nearly implausible under human power alone.

The beauty of this system lies in its learning mechanism. Strava has trained its model using a gradient boosted decision tree classifier with XGBoost, a renowned machine learning library. This training involved thousands of activities, clearly marked as either vehicle-inclusive or purely human-powered. This rigorous training regimen allows the model to discern with a reported accuracy of 81% whether a ride or run has been tainted by mechanical aid.

Looking ahead, Strava isn't stopping here. They plan to roll out additional models to further refine the accuracy of their leaderboards, including differentiating between e-bikes and conventional bikes and ensuring that runs aren't mistakenly logged as rides.

This initiative by Strava isn't just about keeping a leaderboard accurate; it's about preserving the spirit of fair play in the global athletic community. It's a testament to how technology can be harnessed to enhance our sporting endeavors, ensuring that every drop of sweat counts and every record set is a true measure of human endurance and willpower.

In essence, Strava is setting a new standard for digital sportsmanship, where technology and honesty run hand in hand, inspiring us all to push our limits, fairly and squarely. So the next time you lace up your sneakers or mount your bike, take a moment to appreciate the invisible, high-tech ally that ensures your efforts are judged accurately, motivating you to keep striving for those personal bests.

What is the 'Cars on Segments' machine learning model used for?

The 'Cars on Segments' machine learning model is used to identify if any part of any activity uploaded to Strava was recorded in a vehicle, such as cars, motorcycles, trains, or planes. It helps ensure that only valid cycling and running activities appear on Strava leaderboards.

How does the model determine whether an activity was recorded in a vehicle?

The model calculates a series of 57 features from the activity data, such as average speed, acceleration, and other metrics, to differentiate between vehicles and bikes. It uses these features to generate a probability score indicating the likelihood that a vehicle was used in the activity.

What happens if the model identifies a vehicle in an uploaded activity?

If the model's probability that a vehicle is present exceeds a certain threshold, the activity is flagged before it reaches any leaderboards. The user is then prompted to crop out the vehicle portion or make the activity private.

What is the 'Sendrix Coefficient' and how is it used?

The 'Sendrix Coefficient' is a feature developed with one of Strava's fastest staff cyclists, Jimi Sendrix. It measures how fast a cyclist can accelerate from a dead stop to 20mph and how many times they can do so before exhausting themselves. This feature helps the model differentiate cars from bikes by considering the limitations of human performance.

What are SHAP values and how do they contribute to the model's decisions?

SHAP values are used to explain the contribution of each feature to the model's final decision. They indicate whether a feature is more indicative of a vehicle or a bike, helping to provide transparency in how the model evaluates each activity.

How was the machine learning model trained?

The model was trained using a gradient boosted decision tree classifier with XGBoost, a widely used machine learning library. It was trained on tens of thousands of activities containing vehicles to accurately identify and flag such activities.

What are the future plans for enhancing Strava's leaderboard integrity?

Future plans include releasing models to prevent incorrect bike activities from disrupting run leaderboards and to differentiate between e-bikes and regular rides. Strava will also reprocess the top 10 results to ensure accuracy and fairness.

#MachineLearning #LeaderboardIntegrity

Source: https://stories.strava.com/articles/removing-cars-from-leaderboards

Discover unique triathlon-themed merchandise, including stylish t-shirts, stickers, phone cases, and home decor - perfect for endurance sports enthusiasts and athletes. Shop now
Leave a comment

Your email address will not be published..

Cart 0

Your cart is currently empty.

Start Shopping