15 - 04 - 2022

The statistics underlying the popmon hood

In our previous article we explain what model drift, concept drift and data drift is. Which we then put into practice in a notebook where we show you what you can do to make sure your productionized models keep working as expected with the help of popmon. Now it is time to complete the information, you need to get started using popmon to monitor model drift. If you haven’t read the previous articles, we strongly recommend reading them before reading this one. In this short article we’ll explain briefly and clearly what the most important things are that are happening under the hood in popmon. 

Reference data 

Popmon, which is short for population shift monitoring, is Python package developed by ING, read more on what they wrote about this themselves. This package helps you to be able to track the stability of your model features and predictions over time. There are traffic lights to alert you when something seems off. In popmon you can choose which reference period (the parameter references) you want to use to base your tracking on, the options are: 

  1.     ref: the reference data, a fixed dataset defined by the user, 
  1.     roll: a rolling window (last x time slots), 
  1.     prev1: the preceding time slot, 
  1.     expanding: all preceding time slots. 

Using the given reference data, a comparison is made if the data in the new dataset is coming from the same distribution as was the case in the reference data. If a deviation is found, this can either lead to a yellow or a red traffic light, giving the severity of the deviation. 

Comparisons and deviations 

You probably wonder how these deviations are calculated/established. That is being done using different types of metrics: 

  1. Profiles 
  1. Comparisons of statistical tests 

We will give a short description of each of the metrics, to give you some sense of all the calculations happening in the background when using popmon.  

Profiles 

  1. count: Number of entries (non-NaN and NaN), 
  1. distinct: Number of distinct entries, 
  1. filled: Number of non-missing entries (non-NaN), 
  1. nan: Number of missing entries (NaN), 
  1. overflow: Number of values larger than the maximum bin-edge of the histogram., 
  1. underflow: Number of values smaller than the minimum bin-edge of the histogram., 
  1. min: Minimum value, 
  1. max: Maximum value, 
  1. mean: Mean value, 
  1. most_probable_value: Most probable value, 
  1. std: Standard deviation, 
  1. phik: phi-k correlation between the two variables of the histogram, 
  1. phik_pvalue: p-value of the contingency test of the 2d histogram, 
  1. phik_zscore: Z-score of the contingency test of the 2d histogram. 

Comparisons 

  1. ks: Kolmogorov-Smirnov test statistic comparing each time slot to {ref}, 
  1. ks_zscore: Z-score of the Kolmogorov-Smirnov test, comparing each time slot with {ref}, 
  1. ks_pvalue: p-value of the Kolmogorov-Smirnov test, comparing each time slot with {ref}, 
  1. pearson: Pearson correlation between each time slot and {ref}, 
  1. chi2: Chi-squared test statistic, comparing each time slot with {ref}, 
  1. chi2_norm: Normalized chi-squared statistic, comparing each time slot with {ref}, 
  1. chi2_pvalue: p-value of the chi-squared statistic, comparing each time slot with {ref}, 
  1. chi2_zscore: Z-score of the chi-squared statistic, comparing each time slot with {ref}, 
  1. chi2_max_residual: The largest absolute normalized residual (|chi|) observed in all bin pairs + (one histogram in a time slot and one in {ref}), 
  1. chi2_spike_count: The number of normalized residuals of all bin pairs (one histogram in a time + slot and one in {ref}) with absolute value bigger than a given threshold (default: 7)., 
  1. max_prob_diff: The largest absolute difference between all bin pairs of two normalized + histograms (one histogram in a time slot and one in {ref}), 
  1. unknown_labels: Are categories observed in a given time slot that are not present in {ref}? 

Traffic lights and alerts 

Using the metrics mentioned above, the traffic lights are calculated based on the pull (calculated normalized residual) for each of the metrics. The traffic lights are then translated into alerts, which can be: 

  • green: mean no reason to think anything is deviating 
  • yellow: there seem(s) to be small deviation(s), might be good to check this/these columns 
  • red: there seem(s) to be large deviation(s), might be good to check this/these columns 

The final report 

If you want, popmon can create a profiling report or you can retrieve all the calculations and outcome in a dictionary. Both these outputs have the same sections, as we would expect after reading the information in this article: 

  1. Profiles 
  1. Comparisons 
  1. Traffic Lights 
  1. Alerts 

Final remarks 

We hope that this short, but sweet, overview will help you to have a bit more understanding of what is going on under the hood of popmon. Without reading our other two articles and playing around with popmon a little bit for yourself this article probably won’t help you a lot. So, do you feel lost after reading this? Don’t stay in the dark, but step into the popmon light and have some fun with monitoring your productionized models!  

This article is written by:
Jeanine Schoonemann
Jeanine Schoonemann
jeanine.schoonemann@cmotions.com
Jurriaan Nagelkerke
Jurriaan Nagelkerke
jurriaan.nagelkerke@cmotions.com