Clinical Evaluation Masterclass: Overcoming Non-conformities - Episode 1

By Dr Paul Hercock
Chief Executive Officer

Clinical Evaluation Plans (CEPs) and Clinical Evaluation Reports (CERs) are critical components of medical device regulatory compliance. However, the reality is that meeting requirements isn’t always plain sailing.

Based on our experience with over 250 successful submissions, we’ve identified that overcoming clinical evaluation non-conformities is one of the major challenges manufacturers face during the device approval journey.

What is a Non-Conformity in Clinical Evaluation?

A non-conformity arises when a CEP or CER fails to meet MDR requirements. Non-conformities signal that the evaluation does not yet adequately demonstrate a medical device’s safety and performance or that full alignment with regulatory obligations has not been shown, requiring correction before approval.

The Impact of Non-Conformities on Manufacturers

Non-conformities significantly disrupt the approval process, leading to extended review timelines, increased costs, and mandatory revision cycles. Since notified bodies (NBs) cannot accept a CEP or CER with unresolved non-conformities, addressing them is critical for market access.

A helping hand

Our new video series applies our deep experience to help manufacturers proactively identify and fix these issues for smoother audits and faster certifications. This series breaks down the most common clinical evaluation non-conformities, explains why they keep happening, and most importantly, shows you how to fix them for good. Each episode provides step-by-step guidance to help you avoid these pitfalls in your CER submissions.

Episode 1: “Safety and Performance Objectives Lack Specific and Measurable Acceptance Criteria”

This frequently cited regulatory observation typically reflects not an actual safety or performance deficiency, but rather insufficiently defined evaluation parameters. The episode will address:

Fundamental concepts of safety and performance objectives
Deriving safety and performance objectives using weighted values
Defining “specific” and “measurable” objectives
Practical approaches to establishing and implementing acceptance criteria

Coming in Episode 2: We’ll tackle another frequent non-conformity, “Appraisal of literature articles has not been conducted appropriately” showing you how to strengthen your literature review process. Sign up to our newsletter now so you don’t miss out.

Transcript follows:

Okay, hi everyone, it’s Paul here from Mantra Systems. Welcome to a brand new series that we’re running on how to fix clinical evaluation non-conformities.

The idea behind this series is to equip you with the capability to, first of all, avoid common non-conformities that are seen in medical device clinical evaluations. And then secondly, if you have been unfortunate enough to receive one, by the end of the series you should hopefully have a powerful strategy for correcting them and moving on to full acceptance of your CE.

So, just for this first episode, we’re going to cover a few basics, and it’s worth starting with the question: What is a clinical evaluation non-conformity?

Well, obviously, apart from Class I devices, clinical evaluation plans and clinical evaluation reports need to be submitted for review by a notified body. And a non-conformity — or NC for short — is when conformity with requirements has not been adequately demonstrated. A non-conformity requires correction and then resubmission of the CEP and CER. Resolution — certainly of major non-conformities — is necessary in order for the clinical evaluation to be accepted.

So, it is a really important topic, and non-conformities under MDR are quite common. But the problem with non-conformities is that sometimes the questions — which is how they’re often manifest, it’s a question from a reviewer — they can be difficult to interpret. Sometimes it’s not clear exactly what a reviewer means, and it may be unclear how to fully address them as well.

A failure to resolve non-conformities in full may lead to a further round of review following yet another resubmission, which is just multiplying costs, losing time, and it leads to stress and worry — because it is anxiety-inducing trying to get a clinical evaluation through approval.

Okay, so the aim of the series is to work through common non-conformities one at a time, and to dig deep into what they mean and how to solve or avoid them. The series and the principles within it link over to general principles for optimal conduct of clinical evaluation under the MDR.

So, let’s start with our first example non-conformity. This one is a really common one, and it states:

“Safety and performance objectives do not appear to have specific and measurable acceptance criteria.”

Breaking that down, we need to understand exactly what the question means and how to address it. It’s possible to break it down into the following bullet points:

What actually are safety and performance objectives?
What in this context does “specific and measurable” mean?
What are acceptance criteria?

And importantly, of course, it needs to go beyond just knowing what they are but: How do we derive and analyse them?

So, let’s begin with safety and performance objectives. What are safety and performance objectives?

Well, these ultimately are benchmark values against which the device under evaluation will be compared. They are derived from the state-of-the-art literature review, which is an essential component of clinical evaluation.

Technically speaking, objectives have two components. They’re derived of a clinical outcome parameter — which is a qualitative concept or type of outcome — and then attached to that, there will be a quantitative value that forms the actual objective.

As per the non-conformity, safety and performance objectives must be specific and measurable. Now, that covers a lot of ground, so let’s take an example and break this down.

An example might be of a completed safety — or in this case a performance — objective:

“Increasing walking distance in meters at six weeks post-procedure of 61.46m.”

You can see within this the two components: the “increase in walking distance in meters at six weeks” — that’s not a value, okay? It contains a number, but that’s just because we need to compare like with like. That’s not the actual objective.

The clinical outcome parameter is a thing, a concept, something you might measure, something to which you might attach a value. And then the second part of it is the actual performance outcome, which is quantitative.

These two things together constitute a specific and measurable safety or performance objective.

So, how are these derived? Well, they’re derived during the analysis stage of a state-of-the-art literature review. We’re going to go on to a working example in a moment to show exactly how they are derived.

Remember, a clinical outcome parameter is a “thing,” and you decide upon those by looking for outcomes that are seen in a comparable form across multiple sources within the state-of-the-art literature review. If you have four, five, six sources all reporting the same type of outcome, it’s a strong candidate for a clinical outcome parameter.

In the state-of-the-art protocol, there should be a method for determining what constitutes a clinical outcome parameter. That, of course, would feed into the clinical evaluation plan.

Safety and performance objectives then are weighted mean values attached to those parameters. As we’ve seen before, you need both parts.

It’s worth at this point just reflecting on the principles of an effective state-of-the-art literature review. Because there’s no point producing state-of-the-art safety or performance outcomes that have not been derived properly.

We’ll go into this in another video, but a state-of-the-art literature review requires:

a detailed protocol
use of a validated method to define research questions and search terms (one such method is PICCO)
a well-documented literature extraction process with recording and justification of all excluded sources,
a structured appraisal — which is where we get to be very specific to the production of safety and performance objectives, and
the analysis — which is where the actual objectives are produced.

But we cover that in a lot more detail in a separate video.

So, let’s look at a working example of how safety and performance objectives are derived.

Here is fictional data. Let’s say we’ve done a state-of-the-art literature review, and sticking with the same example, we found four publications that all reported a mean increase in walking distance at six weeks, and they all did it in a comparable way. Here are the mean values from each of those sources.

How do we produce a weighted mean?

You can see in the table that the way we’re going to weight it is by a representation of the quality of each of these studies, to ensure that greater weighting or prominence is given to results from higher quality studies. In order to do that, we need to produce an appraisal score.

Appraisal scores — there are lots of different ways to do this. On the next slide we’ve got, again, a simplified fictional example of how to calculate an appraisal score. You might look at different aspects of the study: you might consider study type, sample size, the use of statistical tests and whether they were appropriate, and the length of follow-up. You’d probably in reality consider other factors as well, but for the purpose of this, this will suffice.

You’ve perhaps seen this — usually these manifest as an alphanumeric code, and again, the meaning behind this code should be expressed in the literature search protocol, in the appraisal plan section. So this will all be mapped out.

Ultimately, this enables the calculation of a numerical appraisal score for each paper, with a higher number representing a higher quality study. Then those values can be imputed back into the original table with all the other values being the same.

Now we have an appraisal score by which we can weight the results from each study. That’s done fairly simply just by multiplying the actual result by the appraisal score. We do that for all of them to produce a weighted value for each study.

But we’re not quite done there, because we need to calculate a weighted mean. That’s done by taking the sum of all the weighted values and dividing that by the sum of the appraisal scores.

What we’re doing there is making the appraisal scores the denominator — they are the weighting factor that’s being applied across all studies. This means we achieve our objective of moving the weighted mean closest to the values of the highest quality studies.

In this case, that means: 3872 / 63, which gives a weighted mean of 61.46.

It’s always worth sanity checking these, but if you look at the plain mean values from each study and the appraisal scores, you’d expect it to land around the 60 mark — and that’s where it falls: 61.46.

So, that’s a simplified example of how to calculate a weighted mean for the quantitative component of a safety and performance objective.

Don’t forget, the non-conformity required that we produce safety and performance objectives that are specific and measurable.

Let’s just reflect on what that means:

Specific means unambiguous, clear, and would be consistently interpreted as meaning the same thing.
Measurable means it contains a value — a quantitative value — against which another value from another device can be measured.

So we have this in this case. It’s clear that that is both specific and measurable, and that’s why we included reference to six weeks — to ensure we’re comparing like with like.

That leaves us to consider the final aspect of the non-conformity, which was reference to acceptance criteria.

An acceptance criteria basically defines when safety or performance of the device under evaluation are acceptable in comparison to state-of-the-art.

What we’re doing here is setting out a means for comparing the device-under-evaluation outcomes with the state-of-the-art objectives for specified clinical outcome parameters. By now, all of these terms should have a meaning attached to them.

So, the clinical evaluation plan needs to contain an analysis plan for how this comparison will be done. Suffice to say that the outcomes for the device under evaluation are calculated by weighted means using a very similar method that we used for the state-of-the-art objectives.

And acceptance — this is the key part — acceptance can be defined as showing (statistically) that the outcomes for the device under evaluation are non-inferior to objectives derived from the SOA. We don’t need to show superiority — we’re just showing non-inferiority. That’s a key distinction.

Let’s take a final example, then, working this through.

We can see:

Increasing walking distance (spelled correctly this time) of 61.46,
and in this example there were some other clinical outcome parameters as well:
- improvement in pain (VAS, a measure of pain score),
- and range of motion at six months.

We have values attached to all of those. Then we also have weighted mean values for the device under evaluation.

Remember, the job here is to determine whether the device under evaluation is non-inferior to SOA objectives.

For the top one, it’s very easy, because a plain number comparison shows that the device under evaluation did better than the SOA objective. We don’t need to do any fancy tests — it’s obviously non-inferior. That’s a straightforward conclusion.

But with the others, on a plain number comparison, the device under evaluation actually looks like it’s done less well than the state-of-the-art objectives. The important thing to understand here is whether that represents true inferiority, or whether these values are statistically non-different.

For that reason, conducting a statistical test is relevant. Often a t-test is an appropriate test because it’s a comparison of means, and it generates a p-value denoting significance or non-significance of the difference between these values.

Let’s say we conducted a t-test in this case, and these were the p-values that were derived. Generally, significance requires a value of less than 0.05 for the p-value. That’s not what we’re seeing here, and so these differences were non-inferior.

According to our acceptance criteria, the device under evaluation has been shown to have appropriate performance in reference to state-of-the-art.

So let’s go back to the original non-conformity:

“Safety and performance objectives do not appear to have specific and measurable acceptance criteria.”

We covered:

what safety and performance objectives are,
how to derive them using weighted values following a state-of-the-art literature review,
what “specific and measurable” means and how the outlined process generates specific and measurable objectives,
and what acceptance criteria are and how to apply them — including through use of statistical testing.

If you need any further support in relation to clinical evaluation or working through non-conformities, Mantra Systems are clinical evaluation specialists. You’re free to contact our team at any time.

If you’ve just got some general questions, don’t worry — it’s absolutely fine. More than happy to speak at any time. So feel free to reach out to us if you need any additional support.

That concludes the first episode of the Clinical Evaluation Non-Conformity series. I’d like to thank you very much, and if you have any questions or comments, please let me know below the video.

Thank you very much.