CGM Guide Series – Step 2

CGM Data Sufficiency – How Much Evidence Is Enough?

A device can pass all five framework criteria with ten participants in the study. That is not the same as passing with three hundred. This step sets the minimum threshold for how much data is required before a CGM earns its place in the guide.

CGM Evidence

Part of: CGM for T1D – The GNL Framework → Step 2 of 3

The loophole

The original DSNFUK CGM Comparison Chart scored devices on five criteria. A 5/5 score meant a device had addressed each of the five risk areas in its accuracy data. That was meaningful progress.

But a loophole emerged. A device could score 5/5 with ten people in the study. Another could score 5/5 with three hundred. Both scored identically, but the generalisability of the evidence – whether it represents how the device performs across a real population – was completely different.

The January 2026 framework update introduced a data sufficiency requirement to close that gap.

The sufficiency threshold

To meet data sufficiency, an accuracy study must satisfy at least one of the following:

Option A – Minimum participant count

At least 50 participants in the accuracy study. This is the clearest route: enough people that the accuracy figures can be considered reasonably generalisable to a broader population.

Option B – High data point density

Fewer than 50 participants, but with a very high number of paired CGM-to-reference data points per participant. Some intensive study designs achieve this. If the total matched pairs are sufficient to produce stable accuracy statistics, a study with 40 participants can meet the threshold.

Both options aim at the same thing: accuracy data that is stable enough to generalise. A small study with sparse data points produces accuracy numbers that can shift significantly with a handful of outliers. A study that meets either option produces figures that are less sensitive to individual variation.

Why this matters clinically

CGM accuracy claims from small studies are real claims from real measurements. But a MARD of 9% in a study of twelve people is not the same confidence level as a MARD of 9% in a study of three hundred.

Small studies are prone to sampling effects – if the participants happened to have more stable glucose patterns, the device looks better than it would in a broader population. If they had unusually variable glucose, it looks worse. Only larger studies smooth out these effects.

For a device used by hundreds of thousands of people to make insulin dosing decisions, the evidence base needs to be big enough that the numbers are meaningful for the population, not just the sample.

Current status of each device

Device5/5 frameworkData sufficientKey study
Dexcom G7YesYesGarg et al. (2022) – n=316, 619 sensors, 77,774 matched pairs
Abbott FreeStyle Libre 3YesYesAbbott pivotal (2022) – n=72, 6,845+ matched pairs
Roche Accu-Chek SmartGuideYesYesATTD 2024 accuracy study – n=48, 3 sensors per participant
Medtronic Simplera SyncYesYesCIP330 pivotal trial – n=243, ages 2-80 years
CareSens AirFramework metPending full publication15-day accuracy study (DTT, 2024) – threshold review pending
GlucoMen iCanPendingPendingSinocare pivotal (2024) – additional real-world data needed

Table reflects published evidence as of April 2026. Updated when new studies are published.

What comes next

Data sufficiency answers whether there is enough evidence. The next step – accuracy – looks at what that evidence actually shows: what MARD means in practice, why the 40/40 agreement rate is the critical number, and why the 1% of readings that falls well outside any accuracy window deserves its own serious attention.