Calculating the Nth Best Performance (unique individuals) with Excel and PowerPivot

By David Hager

Among my life-long hobbies is my favorite: Track and Field. I was never a participant, but I have followed the sport closely for over 40 years. I have served as a coach, an official, an announcer and as a statistician as well as a fan. As an observer, I had noted that number of performances in a number of events seemed to have dropped off in quality over the last 15-20 years. In particular, the great matches between Kirani James and LaShawn Merritt this year over 400 meters seemed to me to be reversing the trend I had observed. Well, it was time to put my Excel PowerPivot skills to work to see whether my observation was just my imagination. First, I located a source of data on the Internet for al-time 400 meter times.

http://www.alltime-athletics.com/m_400ok.htm

This source is a record of all 400 meter race results with times less than 45.20 and updated regularly, ideal for my purpose. Unfortunately, Power Query does not recognize there is data on the web page. Excel’s heritage web query does recognize the data field, but I decided to not use it, mainly because the data does not have any headers. Instead, I simply highlighted the data and copy/pasted it into Excel (and added headers).

400M_Picture1

This list contains ALL performances, meaning that in a given year if someone ran ten 400 meters races all under 45.20, all of those results are in the list. That information will be quite useful to me for future analyses, but what I really need was a list that had the best time for each athlete per year. So, I pulled the data into PowerPivot and created several calculated columns.

The DAX formula in the primary calculated column (named UniqueMinTime) returns the fastest time for each runner per year (or 10000, if not the fastest time). This formula is shown below:

=IF(
Main[Time]=
CALCULATE(MIN(Main[Time]),FILTER(Main,Main[YEAR]=EARLIER(Main[YEAR])&&Main[Name]=EARLIER(Main[Name])))
,CALCULATE(MIN(Main[Time]),FILTER(Main,Main[YEAR]=EARLIER(Main[YEAR])&&Main[Name]=EARLIER(Main[Name])))
,10000)

Note that this formula uses the EARLIER function. For large datasets, this would create performance problems, but it works fine for datasets with less than 10000 rows.
Now, a measure was created to harvest the values in the calculated column to find the Nth best performance (in this case, 4th best) for unique individuals.

FullNthPerformance:=IF(
OR(COUNTROWS(TOPN(4,Main,Main[UniqueMinTime],1))<4
,MINX(TOPN(1,TOPN(4,Main,Main[UniqueMinTime],1),Main[UniqueMinTime]),Main[UniqueMinTime])=10000)
,BLANK()
,MINX(TOPN(1,TOPN(4,Main,Main[UniqueMinTime],1),Main[UniqueMinTime]),Main[UniqueMinTime]))

The first part of the measure contains an OR expression designed to remove two undesirable outcomes.

COUNTROWS(TOPN(4,Main,Main[UniqueMinTime],1))<4 returns the number of rows returned for each year (as applied in a row field shown below). Since 4 is used here as the 1st argument of the TOPN function, the expectation would be that 4 rows would be returned in the TOPN β€œtable”. However, that is the case. If there are less than 4 records in the data table for a specific year, TOPN will still return that table, opening the door to deriving a result that is less than the 4th best performance.

The other argument of the OR function:

MINX(TOPN(1,TOPN(4,Main,Main[UniqueMinTime],1),Main[UniqueMinTime]),Main[UniqueMinTime])=10000)

prevents returning a result that was blocked by the UniqueMinTime calculated column when 10000 was inserted for preformances greater than the best performance.
The main part of the DAX measure:

MINX(TOPN(1,TOPN(4,Main,Main[UniqueMinTime],1),Main[UniqueMinTime]),Main[UniqueMinTime]))

uses nested TOPN functions to return one value which in this case is the 4th best performance.
Is it important to note here that the Nth value can be made dynamic in a couple of different ways (i.e. – disconnected slicers and one cell linked tables).
Now comes the important part of this article.

400M_Picture2
It is clear from the chart that the 4th best performance by a unique individual in the 400 meters has been trending up (worse) since the early 1990s, confirming my suspicion. The reason for this is currently, but it could be cleared up by some more data. That is a subject for another post.
Hope that you found this useful!

Advertisements

5 thoughts on “Calculating the Nth Best Performance (unique individuals) with Excel and PowerPivot

  1. Pingback: Calculating Nth Best Performance with PowerPivot

  2. David Hager Post author

    Update: The men’s 400 meters in 2015 experienced a renaissance. I have not yet analyzed the data for this year, but this was perhaps the best year in terms of depth in history. Interestingly, no other track and field event experienced the same improvement. Guess my predictive analysis was correct πŸ™‚

    Reply
  3. Pingback: # Excel Super Links #70 – shared by David Hager | Excel For You

  4. Pingback: #Excel Super Links #89 – shared by David Hager | Excel For You

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s