Kolmogorov-Smirnov (K-S) test

Created

January 26, 2025

How it Works:

For one-sample test:

Take your sample data and create an empirical CDF
Compare it to the theoretical CDF you're testing against
Find the maximum vertical distance between these two curves; the KS-statistic D.
Null hypothesis, H0: The samples follow the theoretical distribution.

For two-sample test:

Create empirical CDFs for both samples
Find the maximum vertical distance between them, which is D.
Null hypothesis, H0: The samples are from the same distribution.

Interpreting the result

The test statistic, D, can then be compared with the Kolmogorov distribution, to get a more interpretable P-value.

The p-value represents how likely we are to observe a test statistic as extreme as D if the null hypothesis is true. In other words, how likely is it to observe a distance D if the null hypothesis is actually true?

Given a significance level alpha = 0.05 , we get;

	`p > 0.05`	`p < 0.05`
1) one-sample test	We cannot reject H0, meaning it’s not unlikely that the sample follows the distribution	We reject H0, meaning the sample is unlikely to come from the given distribution.
2) two-sample test	We cannot reject H0, meaning it’s not unlikely that the two samples are from the same distribution.	We reject H0 = meaning the two samples are unlikely to come from the same distribution.

(Obs: Not rejecting H0 is not the same as accepting H0, but that’s for another discussion.)

The value of D can be in the range of [0, 1], where 0 means that there are no differences between the distributions and 1 that the difference is very high (remember all CDFs are are limited between 0 and 1).

The image to left shows some sample data (histogram) and a fitted normal distribution. To the right, the empirical CDF is compared with the theoretical CDF. The maximum difference between the empirical and theoretical CDFs are highlighted with a yellow line; which is the KS Statistic, D.

The image above shows a random sample with a fitted normal distribution. To the right, the KS statistic, which is the maximum distance between empirical and theoretical distribution, is highlighted with a red line. The difference is rather small, which follows that the P-value is high; meaning that is likely that the sample distribution follows a normal distribution.

Pros:

Distribution-free: Unlike many other tests, it doesn't assume your data follows any particular distribution
Sensitive to shape: It can detect differences in both location (mean) and shape (variance, skewness) of distributions
Non-parametric: Works with ordinal data and doesn't require parameters to be specified
Simple interpretation: The test statistic has a clear geometric meaning

Cons:

Less powerful: Compared to specific tests (like t-test for normal distributions), it has less statistical power
Not great for discrete data: The test is designed for continuous distributions
Sample size limitations: Very sensitive to sample size - too large samples might reject even tiny, practically insignificant differences
Only one-dimensional: Can't directly test multivariate distributions
Limited for tail behaviour: Might miss important differences in the tails of distributions

Performance for different sample sizes

Let’s start with investigating how the KS-tests performs for different sample sizes.

We generate random values from a normal distribution, and then add some noise:

  def generate_sample_data(size: int, noise_factor: float = 2) -> np.ndarray:
    data = np.random.normal(0, 1, sample_size)
    return data + np.random.uniform(-1 * noise_factor, noise_factor, size)

n = 100

For a small sample size of n = 100, the KS-test performs well and identifiers the distribution as normal.

n = 1,000

For a slightly larger sample size, n = 1_000, the KS-statistic is smaller (eg. the max distance between the CDFs), but the p-value is actually lower, p = 0.185 , but the hypothesis that the data follows a normal distribution can still be accepted (given choice of significance level).

n = 10,000

For a larger sample size, n = 10_000, the KS-statistic is even smaller, but the p-value is now 0.001 , which rejects the hypothesis that the data follows a normal distribution.

Result

Sample size (n)	Noise factor	KS Statistic	p-value	Significant (alpha=0.05)
100	2	0.059	0.861	Yes
1,000	2	0.034	0.185	Yes
10,000	2	0.020	0.001	No

This shows how sensitive the KS test are to sample size. The reason for this is how the KS-statistic value, D, is interpreted. Basically; the larger the sample size, the smaller difference to the theoretical distribution is accepted. The consequences are:

For smaller sample sizes:

With small n, larger fluctuations in the empirical CDF are accepted.
The KS test may fail to reject the null hypothesis even when there are meaningful differences.
Meaning; we can get high p-values even with large differences.

For larger sample sizes:

As n increases, the empirical CDF is expected to get closer to the true distribution, which raises the bar for an accepted D value.
These leads to that even tiny, practically insignificant differences become statistically significant, leading to the rejection of the null hypothesis even when the deviation is negligible.
Meaning; even very low KS-statistic values yields low p-values.

In summary:

For small samples → The test lacks power, and true differences may not be detected.
For large samples → The test becomes overly sensitive, flagging even minor differences as significant.

Conclusion

The Kolmogorov-Smirnov test is a powerful statistical tool but it comes with some limitations. It’s main benefit is that it’s easy to implement, usable for any continuous distribution and fairly simple to interpret. On the downside it’s very sensitive for sample sizes, with low power for small sample sizes and very sensitive for larger datasets. Nonetheless it’s a useful tool in the toolbox, and as with all “goodness of fit”-analysis it should be complemented with other analysis to get a comprehensive result.