The K-S test is a statistical test that determines whether a sample comes from a population with a specific distribution (one-sample K-S test) or whether two samples come from the same distribution (two-sample K-S test). It works by comparing cumulative distribution functions (CDFs).
How it Works:
- For one-sample test:
- Take your sample data and create an empirical CDF
- Compare it to the theoretical CDF you're testing against
- Find the maximum vertical distance between these two curves; the KS-statistic D.
- Null hypothesis,
H0
: The samples follow the theoretical distribution. - For two-sample test:
- Create empirical CDFs for both samples
- Find the maximum vertical distance between them, which is D.
- Null hypothesis,
H0
: The samples are from the same distribution.
Interpreting the result
The test statistic, D
, can then be compared with the Kolmogorov distribution, to get a more interpretable P-value.
The p-value represents how likely we are to observe a test statistic as extreme as D
if the null hypothesis is true. In other words, how likely is it to observe a distance D
if the null hypothesis is actually true?
Given a significance level alpha = 0.05
, we get;
p > 0.05 | p < 0.05 | |
1) one-sample test | We cannot reject H0,
meaning it’s not unlikely that the sample follows the distribution | We reject H0, meaning the sample is unlikely to come from the given distribution. |
2) two-sample test | We cannot reject H0,
meaning it’s not unlikely that the two samples are from the same distribution. | We reject H0
= meaning the two samples are unlikely to come from the same distribution. |
(Obs: Not rejecting H0 is not the same as accepting H0, but that’s for another discussion.)
The value of D can be in the range of [0, 1], where 0 means that there are no differences between the distributions and 1 that the difference is very high (remember all CDFs are are limited between 0 and 1).
The image above shows a random sample with a fitted normal distribution. To the right, the KS statistic, which is the maximum distance between empirical and theoretical distribution, is highlighted with a red line. The difference is rather small, which follows that the P-value is high; meaning that is likely that the sample distribution follows a normal distribution.
Pros:
- Distribution-free: Unlike many other tests, it doesn't assume your data follows any particular distribution
- Sensitive to shape: It can detect differences in both location (mean) and shape (variance, skewness) of distributions
- Non-parametric: Works with ordinal data and doesn't require parameters to be specified
- Simple interpretation: The test statistic has a clear geometric meaning
Cons:
- Less powerful: Compared to specific tests (like t-test for normal distributions), it has less statistical power
- Not great for discrete data: The test is designed for continuous distributions
- Sample size limitations: Very sensitive to sample size - too large samples might reject even tiny, practically insignificant differences
- Only one-dimensional: Can't directly test multivariate distributions
- Limited for tail behaviour: Might miss important differences in the tails of distributions
Performance for different sample sizes
Let’s start with investigating how the KS-tests performs for different sample sizes.
We generate random values from a normal distribution, and then add some noise:
def generate_sample_data(size: int, noise_factor: float = 2) -> np.ndarray:
data = np.random.normal(0, 1, sample_size)
return data + np.random.uniform(-1 * noise_factor, noise_factor, size)
n = 100
For a small sample size of n = 100
, the KS-test performs well and identifiers the distribution as normal.
n = 1,000
For a slightly larger sample size, n = 1_000
, the KS-statistic is smaller (eg. the max distance between the CDFs), but the p-value is actually lower, p = 0.185
, but the hypothesis that the data follows a normal distribution can still be accepted (given choice of significance level).
n = 10,000
For a larger sample size, n = 10_000
, the KS-statistic is even smaller, but the p-value is now 0.001
, which rejects the hypothesis that the data follows a normal distribution.
Result
Sample size (n) | Noise factor | KS Statistic | p-value | Significant (alpha=0.05) |
100 | 2 | 0.059 | 0.861 | Yes |
1,000 | 2 | 0.034 | 0.185 | Yes |
10,000 | 2 | 0.020 | 0.001 | No |
This shows how sensitive the KS test are to sample size. The reason for this is how the KS-statistic value, D, is interpreted. Basically; the larger the sample size, the smaller difference to the theoretical distribution is accepted. The consequences are:
For smaller sample sizes:
- With small
n
, larger fluctuations in the empirical CDF are accepted. - The KS test may fail to reject the null hypothesis even when there are meaningful differences.
- Meaning; we can get high p-values even with large differences.
For larger sample sizes:
- As
n
increases, the empirical CDF is expected to get closer to the true distribution, which raises the bar for an accepted D value. - These leads to that even tiny, practically insignificant differences become statistically significant, leading to the rejection of the null hypothesis even when the deviation is negligible.
- Meaning; even very low KS-statistic values yields low p-values.
In summary:
- For small samples → The test lacks power, and true differences may not be detected.
- For large samples → The test becomes overly sensitive, flagging even minor differences as significant.
Conclusion
The Kolmogorov-Smirnov test is a powerful statistical tool but it comes with some limitations. It’s main benefit is that it’s easy to implement, usable for any continuous distribution and fairly simple to interpret. On the downside it’s very sensitive for sample sizes, with low power for small sample sizes and very sensitive for larger datasets. Nonetheless it’s a useful tool in the toolbox, and as with all “goodness of fit”-analysis it should be complemented with other analysis to get a comprehensive result.