Outliers and Contextualization in Prospect Evaluation
Written by Nuno Sousa
In this article, I’m going to go over some statistical and Data Science terms and concepts, such as outliers, biases, and contextualization, and make an effort to show how as a community we could apply them successfully towards NBA scouting and in the evaluation of prospects.
As a setup, let's go over some basic outlier concepts.
An outlier is known as an anomaly in a given dataset. That dataset can be a statistic, it can be a distribution, or it can be a group of people.
An outlier can also be of three different types: a statistical outlier, a contextual outlier, or a collective outlier.
These can be determined empirically, which requires a fair bit of knowledge of the area. Alternatively, they can be determined statistically, provided you have a large enough sample to run the statistical tests required, most of them derived from calculating the distance of a point to the rest of the class’ distribution.
Imagine you have a group of fifty data points ranging from zero to one hundred. Now, you look at every single point in the dataset. Forty-nine of them are between sixty and ninety-five, while one of them is thirty. You found an outlier.
The numbers are arbitrary, as is the limit of distance from what is considered “normal”. You could find a way to consider numbers below sixty-five also to be outliers, but the statistical tests and the concept behind them remain the same.
Now, onto the basketball evaluation side of it.
Collective outliers are a weird thing to define in basketball terms. The idea of a collective outlier is a group of points in a data set that, if you look at them alone, look like they’re within normality. However, if you compare that group to the rest of the available data, you’ll see that group in specific has a distribution which would classify as outlier. This has limitations in basketball as most players change group-wise(they go from their HS team to their college team to hopefully their NBA team), so it’s hard to group them as datasets. One of the collective outliers that is frequently discussed is the Syracuse University’s Men’s basketball team. For years, Syracuse has produced outlier STL% and BLK% numbers due to their extensive use of a 2-3 zone defense, which anecdotally inflates those numbers. The same can be seen with Washington University. This may prove useful when assessing prospects from these programs, and not taking the numbers at face value, where they simply represent a statistical outlier. Such happened with Matisse Thybulle. While his numbers were absolutely brilliant within these categories, scouts understood that they were perhaps a misrepresentation, inflated within the context of Washington’s defensive schemes. While his immense defensive abilities have translated, you still see some of the defensive instincts (more leniency toward gambles on the perimeter and in isolation situations) that are prominent from the zone defense that inflated those same numbers. He still manages to cover it due to his outlier (eh.) recovery ability, which made it so his numbers are still outlier good in those statistics at the NBA level.
Contextual outliers and statistical outliers often take a linked path in team sports. Basketball statistics are somewhat of a representation of a player's performance on the court (never truly representative, but for the sake of this article bear with me). That performance on the court is inherently affected by the context of what’s happening around the player, whether it’s their own team or opponents.
This contextualization of statistical and performance-based outliers can turn out to be a positive argument or a negative one for a prospect. This was recently put into use with Luka Doncic and Alperen Sengun’s historical youth dominance in Europe.
Luka Doncic in his pre-draft year had a dominating performance in what should have been a deterring context. He was a young player playing for a very high level Real Madrid team, competing in the second best domestic league in the world, as well as the second best overall competition in the world. Yet he did it. The above contextualization of what Luka did was an important part of what made him an outlier in a prospect evaluation.
Alperen Sengun was also a case of people under-contextualizing (or over-simplifying) his situation. Yes, Sengun was an eighteen year old in a professional league. Nonetheless, that professional league was nowhere near as competitive as the one Luka was in, with many teams suffering losses due to the pandemic and cutting back on investment. Furthermore, it was a league knowingly scarce at his position, and his impact when playing the high level teams, either in Turkey or in the Eurocup (second level intra-European competition), was noticeably diminished. “He played against grown men” isn't really contextualizing his outlier production.
One of the basketball areas where outlier is a pretty poorly applied term is physical abilities. As mentioned before, outliers are points of data (players in this case) who distinctly separate themselves from the norm. If you change the norm, you change what is considered an outlier.
For example, Zion Williamson was an outlier in almost every competition level he played at. Nonetheless, he was an outlier by a *wide* margin with a combination of strength, velocity, and agility which is very historically rare. This made it easy to scale that into outlier NBA physical abilities.
This isn't the norm. The NBA physical threshold is a lot higher than any other basketball scene. The guards you see sprint past everyone in a middling college conference won’t do it against NBA defenders. The big man you see out jumping everyone in that same conference will not out jump the Goberts, Aytons and ADs of the world. Contextualizing why someone looked like he physically didn't belong in a court often requires a better evaluation of his surroundings, rather than himself.
Skill contextualization and which outlier skills are actually worth valuing?
The NBA and its general play systems are fluid, the positions change, what is requested of each prospect changes, and prospect evaluation should aim to be on par with that league evolution, or in an ideal world, get ahead of it. So how do we apply that to skill contextualization and determining how to value it?
Some outlier skills translate better than others, but in general, physical-based skills are the ones which translate the least into the pros.
Why? Because to put it in blunt terms, how outlier is your physical skill when you add three inches and forty pounds to everyone on the court. This applies to speed, jumping ability, and strength.
This has an impact in how we should contextualize some specific outlier basketball skills in prospects and how they would translate.
Rebounding, for example, putting aside the contextual value of the skill in the NBA (Drummond *cough cough*), there should be a recognition that the hypothetical 6’10” Center which dominated the boards on a mid-major conference will most likely not dominate or even be particularly good on the boards in the NBA (Dennis Rodmans of this world notwithstanding). The same could be applied to blocks and rim protection. Block numbers and BLK% don't tell you how they were obtained. They can be a product of unbelievable instinct and techniques, or they can be a product of being bigger than anyone and being allowed to stay under the rim constantly. Both these outlier blocking number scenarios translate differently, even if the rim protection numbers look the same. Bruno Fernando got himself a NBA contract off it.
For a less linear example, rim pressure is a harder skill to predict translation for since, as a whole, the rim numbers represent the finishing result of what is a group of skill. This is where I’d like to introduce the concept of the conjunction fallacy/bias (I'd also like to refer you to the work from Joseph Nation on this[here]). Broadly speaking, this concept tells us that humans inherently overrate the probability of X separate events all happening. I’d tend to agree this happens in relation to this and similar skills in prospect evaluation, albeit with a caveat.
Dumbing it down to the rim pressure of a prospect equaling the product of his handle plus his athleticism plus his layup/floater package, I’ll pose this scenario. Prospect A has good grades for all three of these skills. Prospect B has a slightly above average or average handle and layup/floater package but has outlier elite athleticism. The conjunction/disjunction bias presented above tells us two things. We as evaluators likely overestimate the probability of Prospect A translating all three skills to a good level when jumping up a level of competition (HS -> College/Equivalent -> NBA), hence making his finishing and rim pressure at the next level good. We are also likely underestimating the probabilities of the outlier athleticism translating enough as a single skill for it to “carry” the rim pressure of Prospect B to a “good” level in the next competition level, even if the other two skills fall off comparatively.
I’m not gonna pretend I have a solution to basic human biases and evaluation problems which are prevalent in areas way older than NBA scouting. Outlier mining and contextual outliers have been a problem in Data Science since the dawn of time. Just now we are beginning to understand how to deal with them.
Nonetheless, I do think the ability to break down “complex” skills into what makes them happen the way they do is a step that scouts should take part in, as it makes it easy to see which parts will translate to the next level (taking into account previous occurrences and similarity cases). Just as important is a general shift to the contextualization of skills and the use of the term outlier in evaluating prospects. Having a discernible notion of what outliers are and how they translate to the next level may lead to a better use of the term, which does have its place in basketball viewing and evaluation.