how it works
Problem Statement & Methodology
In this section we first describe how the rankings work at a high level, and then introduce the complexities of the model.
High Level Overview [No Mathematical Concepts]
The general concept behind the ranking methodology can easily be explained by using an example.
Imagine Team X clocks a time of 2:08 at the Center Island regatta. Team Y clocks a time of 2:10 at the Pickering regatta. Which team performed better?
Most people with some dragon boat race experience have noticed that times at different regattas are not directly comparable. There are a number of factors that can make one regatta generally faster than another. For example, variations in conditions such as slight differences in the race course length, water depth, or weather conditions will impact race times.
Now imagine there is a Team Z that competed at both Pickering and Center Island. Team Z clocked in at 2:07 at Center Island and 2:11 at Pickering. Team Z was faster than Team X at Center Island, and slower than Team Y at Pickering. Now relative to Team Z we can see there is some evidence that Team X’s performance was slower than Team Y’s performance.
A potential problem is that we are only using a single team, Team Z as our benchmark. It is possible that team Z just had a better race at Center Island but fell apart at Pickering. We can go a step further by looking at every team like Team Z that attended both Pickering and Center Island. Imagine a doctored scenario where there were 20 teams that went to both Pickering and Center Island. All 20 teams beat Team X at Center Island, and all 20 teams lost to Team Y at Pickering. Now we have really built a strong argument that Team Y’s 2:10 at Pickering was in fact better than Team X’s 2:08 at Center Island.
That is the basic idea behind the rankings methodology. We use teams in common to determine how times at two different regattas compare to each other.
rankings: The Nitty gritty Details
More Detailed Algorithm Explanation
We first choose a base regatta (Center Island for 2009) and average each team’s times. For example the MOFOS raced 4 times at Center Island averaging 2:08.35 (2:09.3, 2:09.4, 2:07.3, 2:07.4) so their initial rank time is 2:08.35. A similar average is performed for all teams in attendance at Center Island.
Next we do the same thing for another regatta (for example Pickering). At Pickering the Hammerheads averaged 2:08.3. The trick is that we can’t factor 2:08.3 into the Hammerheads ranking yet because the ranking is based on Center Island times and this 2:08.3 is from Pickering. We need to adjust this Pickering time to make it comparable to Center Island times.
To calculate the adjustment we look at all teams that attended both Center Island and Pickering. We look at each teams Pickering time, Center Island time, and calculate their implied adjustment (Center Island time / Pickering time). For example, for the Hammerheads we have:
Average Pickering time = 2:08.26
Average Center Island time = 2:07.2
Center Island / Pickering = 0.9917
In other words, if we only look at the Hammerheads then we determine that we need to multiply Pickering times by 0.9917 to make them comparable to Center Island times. (Hammerheads Pickering time * 0.9917 = Hammerheads Center Island time).
The Hammerheads are only one of many teams that attended both Pickering and Center Island. We perform the same calculation for each of these teams. For each team when we divide (Center Island time / Pickering time) we will likely get a different value for the implied adjustment.
We can then graph our findings. On the Y-axis we can put the adjustment for each team, and on the X-axis we can put how fast the team was at Pickering. So the Hammerheads would produce a point at (128.26 seconds, 0.9917).
If all teams implied the same adjustment then the points would make a horizontal line. Instead different teams improve at different rates, and different teams may be affected differently by conditions so we get a smattering of points as shown above.
So out of all these implied adjustments which one should we use in our model to adjust the Pickering times? Our initial, naïve approach was to use the average implied adjustment. This produced reasonable results but we can do better. Picking a single value for the adjustment (like the average) implies that all teams will be adjusted by the same %. If a 2:00 team is adjusted by 5 seconds then a 4:00 team would be adjusted by 10 seconds.
Our model makes the adjustment itself a variable. The adjustment is a function of the team’s speed at Pickering. It allows us to adjust a slow team by a different % than what we adjust a fast team by. The adjustment we use is based on the data itself.
Revisiting the graph above, we can draw a “best fit” line through the data points:
The line captures the relationship between Pickering speed and adjustment. This is accomplished by performing an ordinary least squares regression. This gives us the equation of the line:
Adjustment = M*(Pickering Speed) + b
Where M is the slope of the line, and b is the Y-intercept. Now we have a formula for calculating the adjustment. For the Hammerheads we plug 2:08.26 into our equation to get the result. (or visually we slide across the X-axis to 128.26 and then move up until we intersect the regression line). This is the technique we used in version 2 of the rankings. We actually use a slight variation of this idea.
After implementing the rankings we re-read Power Demon’s methodology document for the first time since he originally created the true rankings. It turned out he didn’t actually perform a linear regression as described above, instead he fit a 2nd order polynomial to the data. We added this capability to the rankings program so that the adjustment would be calculated as:
Adjustment = a + b*( Pickering Speed) + c* (Pickering Speed)^2
Practically speaking what this means is that our model has more freedom with how to choose the adjustment based on Pickering speed. For example we could adjust slow and fast teams by a similar factor, but adjust medium teams by a different factor. This would only be done if the data itself suggests it (i.e. if the implied adjustment described above for slow teams and fast teams were similar) and the implied adjustment for medium speed teams were different. In our example the curve looks like:
So finally we use this formula to adjust the Pickering times, and then average the adjusted Pickering time with the existing Center Island times to calculate the new ranking time. The entire procedure is then repeated for each regatta that we add into the rankings but instead of using only Center Island as our base we use the combined 'true rank' time that was determined by the model using all previous regattas. Since all previous regattas have been adjusted to Center Island times we can think of the Canadian Dragon Boat rank time as how the team would perform at Center Island.
The example above illustrates a somewhat common occurrence. We see almost all of the teams in common between Center Island and Pickering fall in the 2:00-2:30 range. Far off at 2:50 is an outlier. The model arguably gives too generous an adjustment to this team. In general we weren't terribly pleased with the rankings of the very slowest teams so we tweaked the model slightly. The adjustment used on distant outliers gets “clamped” so the curve ends up looking like this:
After releasing the rankings to public scrutiny we received feedback expressed dissatisfaction over how certain small dragon boat races effected a team's rankings as much as major competitive regattas. Every regatta now has a competitive multiplier. A regatta that has very few competitive teams will effect a team's ranking by less than a competitive regatta
The competitive multiplier is determined as follows:
The attendance of any of the top 50 teams award competitive points to a regatta. Team #1 gives 50^2 points. Team #2 gives 49^2 points, Team #3 gives 48^2 points, etc. The sum of competitive points for a regatta is then divided by the competitive multiplier for Center Island. This number is shown next to the regatta name in the rankings. The multiplier is clamped between 0.1 and 1.5.
So if a team attended Center Island which has a competitive multiplier of 1.0 and Welland which has a competitive multiplier of 0.1 then the ranking time will no longer be a straight average. Now it will be ((1/1.1*CenterIsland) + (0.1/1.1*Welland)).
The model described above produces fairly good results. There are some caveats that should be mentioned. The model takes into account all races including heats and specialty races. Sometimes teams do not go “all out” in such races. This will hurt the ranking of the team.
No explicit adjustment is made for getting a “bad” lane or wash riding or for any other reason. It is possible that a team will get an unlucky draw at a regatta. One of the nice features of this model is that the rankings get better as the season progresses and more races take place. Although some teams may be negatively impacted by lanes (or some other factor) at one regatta, this effect on the rankings should be neutralized over time as different teams get their share of “bad draws”. The more races to take place the less of an effect these things will have on the rankings.
At the time this document was written the model only considers 500m races. We are currently in the process of debating whether we should have separate 200m and 2000m rankings or if all distances should factor into a single ranking. A computer program has been written to automate the rankings calculation however there is still the potential for data entry errors. The most common problem is that teams will use slightly different names at different regattas (e.g. using 'DBC' vs 'Dragon Boat Club' in their moniker).
Please contact us at Rankings@Mofosdragonboat.com should you find any such duplicate entries in the rankings.
Improving your ranking time
We received some questions to the effect of, "I know my team got faster with practice but somehow on your rankings our results got slower over time. What happened?"
In order to improve your ranking time your team needs to improve at a faster rate than other teams. Imagine a simple simple season with 2 regattas and three teams. In the first regatta we have Team A 2:08, Team B 2:09, Team C 2:10. In the next regatta Team A and Team C both get 1 second faster: Team A 2:07, Team C 2:09. Since everyone else got faster, Team B needs to improve by 1s just to maintain it's ranking time. If Team B achieved 2:09 again then it's adjusted time will actually be slower in the second regatta.
In other words just to maintain the same ranking time your team needs to improve. If your team improves by less than your peers then your ranking time will be slower. Since all teams are practicing hard throughout the season it is a challenge simply to maintain your ranking time.