By David Hager
Among my life-long hobbies is my favorite: Track and Field. I was never a participant, but I have followed the sport closely for over 40 years. I have served as a coach, an official, an announcer and as a statistician as well as a fan. As an observer, I had noted that number of performances in a number of events seemed to have dropped off in quality over the last 15-20 years. In particular, the great matches between Kirani James and LaShawn Merritt this year over 400 meters seemed to me to be reversing the trend I had observed. Well, it was time to put my Excel PowerPivot skills to work to see whether my observation was just my imagination. First, I located a source of data on the Internet for al-time 400 meter times.
This source is a record of all 400 meter race results with times less than 45.20 and updated regularly, ideal for my purpose. Unfortunately, Power Query does not recognize there is data on the web page. Excel’s heritage web query does recognize the data field, but I decided to not use it, mainly because the data does not have any headers. Instead, I simply highlighted the data and copy/pasted it into Excel (and added headers).
This list contains ALL performances, meaning that in a given year if someone ran ten 400 meters races all under 45.20, all of those results are in the list. That information will be quite useful to me for future analyses, but what I really need was a list that had the best time for each athlete per year. So, I pulled the data into PowerPivot and created several calculated columns.
The DAX formula in the primary calculated column (named UniqueMinTime) returns the fastest time for each runner per year (or 10000, if not the fastest time). This formula is shown below:
Note that this formula uses the EARLIER function. For large datasets, this would create performance problems, but it works fine for datasets with less than 10000 rows.
Now, a measure was created to harvest the values in the calculated column to find the Nth best performance (in this case, 4th best) for unique individuals.
The first part of the measure contains an OR expression designed to remove two undesirable outcomes.
COUNTROWS(TOPN(4,Main,Main[UniqueMinTime],1))<4 returns the number of rows returned for each year (as applied in a row field shown below). Since 4 is used here as the 1st argument of the TOPN function, the expectation would be that 4 rows would be returned in the TOPN “table”. However, that is the case. If there are less than 4 records in the data table for a specific year, TOPN will still return that table, opening the door to deriving a result that is less than the 4th best performance.
The other argument of the OR function:
prevents returning a result that was blocked by the UniqueMinTime calculated column when 10000 was inserted for preformances greater than the best performance.
The main part of the DAX measure:
uses nested TOPN functions to return one value which in this case is the 4th best performance.
Is it important to note here that the Nth value can be made dynamic in a couple of different ways (i.e. – disconnected slicers and one cell linked tables).
Now comes the important part of this article.
It is clear from the chart that the 4th best performance by a unique individual in the 400 meters has been trending up (worse) since the early 1990s, confirming my suspicion. The reason for this is currently, but it could be cleared up by some more data. That is a subject for another post.
Hope that you found this useful!