Showing posts with label metrics. Show all posts
Showing posts with label metrics. Show all posts

Thursday, March 30, 2017

Retention metrics, simplified

[Originally posted on First Patient In]

In my experience, most clinical trials do not suffer from significant retention issues. This is a testament to the collaborative good will of most patients who consent to participate, and to the patient-first attitude of most research coordinators.

However, in many trials – especially those that last more than a year – the question of whether there is a retention issue will come up at some point while the trial’s still going. This is often associated with a jump in early terminations, which can occur as the first cohort of enrollees has been in the trial for a while.

It’s a good question to ask midstream: are we on course to have as many patients fully complete the trial as we’d originally anticipated?

However, the way we go about answering the question is often flawed and confusing. Here’s an example: a sponsor came to us with what they thought was a higher rate of early terminations than expected. The main problem? They weren't actually sure.

Here’s their data. Can you tell?

Original retention graph. Click to enlarge.
If you can, please let me know how! While this chart is remarkably ... full of numbers, it provides no actual insight into when patients are dropping out, and no way that I can tell to project eventual total retention.

In addition, measuring the “retention rate” as a simple ratio of active to terminated patients will not provide an accurate benchmark until the trial is almost over. Here's why: patients tend to drop out later in a trial, so as long as you’re enrolling new patients, your retention rate will be artificially high. When enrollment ends, your retention rate will appear to drop rapidly – but this is only because of the artificial lift you had earlier.

In fact, that was exactly the problem the sponsor had: when enrollment ended, the retention rate started dropping. It’s good to be concerned, but it’s also important to know how to answer the question.

Fortunately, there is a very simple way to get a clear answer in most cases – one that’s probably already in use by your  biostats team around the corner: the Kaplan-Meier “survival” curve.

Here is the same study data, but patient retention is simply depicted as a K-M graph. The key difference is that instead of calendar dates, we used the relative measure of time in the trial for each patient. That way we can easily spot where the trends are.


In this case, we were able to establish quickly that patient drop-outs were increasing at a relatively small constant rate, with a higher percentage of drops coinciding with the one-year study visit. Most importantly, we were able to very accurately predict the eventual number of patients who would complete the trial. And it only took one graph!




Saturday, March 18, 2017

The Streetlight Effect and 505(b)(2) approvals

It is a surprisingly common peril among analysts: we don’t have the data to answer the question we’re interested in, so we answer a related question where we do have data. Unfortunately, the new answer turns out to shed no light on the original interesting question.

This is sometimes referred to as the Streetlight Effect – a phenomenon aptly illustrated by Mutt and Jeff over half a century ago:


This is the situation that the Tufts Center for the Study of Drug Development seems to have gotten itself into in its latest "Impact Report".  It’s worth walking through the process of how an interesting question ends up in an uninteresting answer.

So, here’s an interesting question:
My company owns a drug that may be approvable through FDA’s 505(b)(2) pathway. What is the estimated time and cost difference between pursuing 505(b)(2) approval and conventional approval?
That’s "interesting", I suppose I should add, for a certain subset of folks working in drug development and commercialization. It’s only interesting to that peculiar niche, but for those people I suspect it’s extremely interesting - because it is a real situation that a drug company may find itself in, and there are concrete consequences to the decision.

Unfortunately, this is also a really difficult question to answer. As phrased, you'd almost need a randomized trial to answer it. Let’s create a version which is less interesting but easier to answer:
What are the overall development time and cost differences between drugs seeking approval via 505(b)(2) and conventional pathways?
This is much easier to answer, as pharmaceutical companies could look back on development times and costs of all their compounds, and directly compare the different types. It is, however, a much less useful question. Many new drugs are simply not eligible for 505(b)(2) approval. If those drugs
Extreme qualitative differences of 505(b)(2) drugs.
Source: Thomson Reuters analysis via RAPS
are substantially different in any way (riskier, more novel, etc.), then they will change the comparison in highly non-useful ways. In fact, in 2014, only 1 drug classified as a New Molecular Entity (NME) went through 505(b)(2) approval, versus 32 that went through conventional approval. And in fact, there are many qualities that set 505(b)(2) drugs apart.

So we’re likely to get a lot of confounding factors in our comparison, and it’s unclear how the answer would (or should) guide us if we were truly trying to decide which route to take for a particular new drug. It might help us if we were trying to evaluate a large-scale shift to prioritizing 505(b)(2) eligible drugs, however.

Unfortunately, even this question is apparently too difficult to answer. Instead, the Tufts CSDD chose to ask and answer yet another variant:
What is the difference in time that it takes the FDA for its internal review process between 505(b)(2) and conventionally-approved drugs?
This question has the supreme virtue of being answerable. In fact, I believe that all of the data you’d need is contained within the approval letter that FDA posts publishes for each new approved drug.

But at the same time, it isn’t a particularly interesting question anymore. The promise of the 505(b)(2) pathway is that it should reduce total development time and cost, but on both those dimensions, the report appears to fall flat.
  • Cost: This analysis says nothing about reduced costs – those savings would mostly come in the form of fewer clinical trials, and this focuses entirely on the FDA review process.
  • Time: FDA review and approval is only a fraction of a drug’s journey from patent to market. In fact, it often takes up less than 10% of the time from initial IND to approval. So any differences in approval times will likely easily be overshadowed by differences in time spent in development. 
But even more fundamentally, the problem here is that this study gives the appearance of providing an answer to our original question, but in fact is entirely uninformative in this regard. The accompanying press release states:
The 505(b)(2) approval pathway for new drug applications in the United States, aimed at avoiding unnecessary duplication of studies performed on a previously approved drug, has not led to shorter approval times.
This is more than a bit misleading. The 505(b)(2) statute does not in any way address approval timelines – that’s not it’s intent. So showing that it hasn’t led to shorter approval times is less of an insight than it is a natural consequence of the law as written.

Most importantly, showing that 505(b)(2) drugs had a longer average approval time than conventionally-approved drugs in no way should be interpreted as adding any evidence to the idea that those drugs were slowed down by the 505(b)(2) process itself. Because 505(b)(2) drugs are qualitatively different from other new molecules, this study can’t claim that they would have been developed faster had their owners initially chosen to go the route of conventional approval. In fact, such a decision might have resulted in both increased time in trials and increased approval time.

This study simply is not designed to provide an answer to the truly interesting underlying question.

[Disclosure: the above review is based entirely on a CSDD press release and summary page. The actual report costs $125, which is well in excess of this blog’s expense limit. It is entirely possible that the report itself contains more-informative insights, and I’ll happily update that post if that should come to my attention.]

Tuesday, March 18, 2014

These Words Have (Temporarily) Relocated

Near the end of last year, I had the bright idea of starting a second blog, Placebo Lead-In, to capture a lot of smaller items that I found interesting but wasn't going to work up into a full-blown, 1000 word post.

According to Murphy’s Law, or the Law of Unintended Consequences, or the Law of Biting Off More Than You Can Chew, or some such similar iron rule of the universe, what happened next should have been predictable.

First, my team at CAHG Trials launched a new blog, First Patient In. FPI is dedicated to an open discussion of patient recruitment ideas, and I’m extremely proud of what we've published so far.

Next, I was invited to be a guest blogger for the upcoming Partnerships in Clinical Trials Conference.

Suddenly, I've gone from 1 blog to 4. And while my writing output appears to have increased, it definitely hasn't quadrupled. So this blog has been quiet for a bit too long as a result.

The good news is that the situation is temporary - Partnerships will actually happen at the end of this month. (If you’re going: drop me a line and let’s meet. If you’re not: you really should come and join us!) My contributions to FPI will settle into a monthly post, as I have a fascinating and clever team to handle most of the content.

In case you've missed it, then, here is a brief summary of my posts elsewhere over the past 2 months.

First Patient In


Partnerships in Clinical Trials



Please take a look, and I will see you back here soon.

[Photo credit: detour sign via Flikr user crossley]

Friday, June 21, 2013

Preview of Enrollment Analytics: Moving Beyond the Funnel (Shameless DIA Self-Promotion, Part 2)


Are we looking at our enrollment data in the right way?


I will be chairing a session on Tuesday on this topic, joined by a couple of great presenters (Diana Chung from Gilead and Gretchen Goller from PRA).

Here's a short preview of the session:



Hope to see you there. It should be a great discussion.

Session Details:

June 25, 1:45PM - 3:15PM

  • Session Number: 241
  • Room Number: 205B


1. Enrollment Analytics: Moving Beyond the Funnel
Paul Ivsin
VP, Consulting Director
CAHG Clinical Trials

2. Use of Analytics for Operational Planning
Diana Chung, MSc
Associate Director, Clinical Operations
Gilead

3. Using Enrollment Data to Communicate Effectively with Sites
Gretchen Goller, MA
Senior Director, Patient Access and Retention Services
PRA


Wednesday, February 27, 2013

It's Not Them, It's You

Are competing trials slowing yours down? Probably not.

If they don't like your trial, EVERYTHING ELSE IN
THE WORLD is competition for their attention.
Rahlyn Gossen has a provocative new blog post up on her website entitled "The Patient Recruitment Secret". In it, she makes a strong case for considering site commitment to a trial – in the form of their investment of time, effort, and interest – to be the single largest driver of patient enrollment.

The reasoning behind this idea is clear and quite persuasive:
Every clinical trial that is not yours is a competing clinical trial. 
Clinical research sites have finite resources. And with research sites being asked to take on more and more duties, those resources are only getting more strained. Here’s what this reality means for patient enrollment. 
If research site staff are working on other clinical trials, they are not working on your clinical trial. Nor are they working on patient recruitment for your clinical trial. To excel at patient enrollment, you need to maximize the time and energy that sites spend recruiting patients for your clinical trial.
Much of this fits together very nicely with a point I raised in a post a few months ago, showing that improvements in site enrollment performance may often be made at the expense of other trials.

However, I would add a qualifier to these discussions: the number of active "competing" trials at a site is not a reliable predictor of enrollment performance. In other words, selecting sites who are not working on a lot of other trials will in no way improve enrollment in your trial.

This is an important point because, as Gossen points out, asking the number of other studies is a standard habit of sponsors and CROs on site feasibility questionnaires. In fact, many sponsors can get very hung up on competing trials – to the point of excluding potentially good sites that they feel are working on too many other things.

This came to a head recently when we were brought in to consult on a study experiencing significant enrollment difficulty. The sponsor was very concerned about competing trials at the sites – there was a belief that such competition was a big contributor to sluggish enrollment.

As part of our analysis, we collected updated information on competitive trials. Given the staggered nature of the trial's startup, we then calculated time-adjusted Net Patient Contributions for each site (for more information on that, see my write-up here).

We then cross-referenced competing trials to enrollment performance. The results were very surprising: the quantity of other trials had no effect on how the sites were doing.  Here's the data:

Each site's enrollment performance as it relates to number of other trials it's running.
Competitive trials do not appear to substantially impact rates of enrollment.
 Each site is a point. Good sites (higher up) and poor enrollers (lower) are virtually identical in terms of how many concurrent trials they were running.

Since running into this result, I've looked at the relationship between the number of competing trials in CRO feasibility questionnaires and final site enrollment for many of the trials we've worked on. In each case, the "competing" trials did not serve as even a weak predictor of eventual site performance.

I agree with Gossen's fundamental point that a site's interest and enthusiasm for your trial will help increase enrollment at that site. However, we need to do a better job of thinking about the best ways of measuring that interest to understand the magnitude of the effect that it truly has. And, even more importantly, we have to avoid reliance on substandard proxy measurements such as "number of competing trials", because those will steer us wrong in site selection. In fact, almost everything we tend to collect on feasibility questionnaires appears to be non-predictive and potentially misleading; but that's a post for another day.

[Image credit: research distractions courtesy of Flikr user ronocdh.]

Friday, February 8, 2013

The FDA’s Magic Meeting


Can you shed three years of pipeline flab with this one simple trick?

"There’s no trick to it ... it’s just a simple trick!" -Brad Goodman

Getting a drug to market is hard. It is hard in every way a thing can be hard: it takes a long time, it's expensive, it involves a process that is opaque and frustrating, and failure is a much more likely outcome than success. Boston pioneers pointing their wagons west in 1820 had far better prospects for seeing the Pacific Ocean than a new drug, freshly launched into human trials, will ever have for earning a single dollar in sales.

Exact numbers are hard to come by, but the semi-official industry estimates are: about 6-8 years, a couple billion dollars, and more than 80% chance of ultimate failure.

Is there a secret handshake? Should we bring doughnuts?
(We should probably bring doughnuts.)
Finding ways to reduce any of those numbers is one of the premier obsessions of the pharma R&D world. We explore new technologies and standards, consider moving our trials to sites in other countries, consider skipping the sites altogether and going straight to the patient, and hire patient recruitment firms* to speed up trial enrollment. We even invent words to describe our latest and awesomest attempts at making development faster, better, and cheaper.

But perhaps all we needed was another meeting.

A recent blog post from Anne Pariser, an Associate Director at FDA's Center for Drug Evaluation and Research suggests that attending a pre-IND meeting can shave a whopping 3 years off your clinical development timeline:
For instance, for all new drugs approved between 2010 and 2012, the average clinical development time was more than 3 years faster when a pre-IND meeting was held than it was for drugs approved without a pre-IND meeting. 
For orphan drugs used to treat rare diseases, the development time for products with a pre-IND meeting was 6 years shorter on average or about half of what it was for those orphan drugs that did not have such a meeting.
That's it? A meeting? Cancel the massive CTMS integration – all we need are a couple tickets to DC?

Pariser's post appears to be an extension of an FDA presentation made at a joint NORD/DIA meeting last October. As far as I can tell, that presentation's not public, but it was covered by the Pink Sheet's Derrick Gingery on November 1.  That presentation covered just 2010 and 2011, and actually showed a 5 year benefit for drugs with pre-IND meetings (Pariser references 2010-2012).

Consider the fact that one VC-funded vendor** was recently spotted aggressively hyping the fact that its software reduced one trial’s timeline by 6 weeks. And here the FDA is telling us that a single sit-down saves an additional 150 weeks.

In addition, a second meeting – the End of Phase II meeting – saves another year, according to the NORD presentation.  Pariser does not include EOP2 data in her blog post.

So, time to charter a bus, load up the clinical and regulatory teams, and hit the road to Silver Spring?

Well, maybe. It probably couldn't hurt, and I'm sure it would be a great bonding experience, but there are some reasons to not take the numbers at face value.
  • We’re dealing with really small numbers here. The NORD presentation covers 54 drugs, and Pariser's appears to add 39 to that total. The fact that the time-savings data shifted so dramatically – from 5 years to 3 – tips us off to the fact that we probably have a lot of variance in the data. We also have no idea how many pre-IND meetings there were, so we don't know the relative sizes of the comparison groups.
  • It's a survivor-only data set. It doesn't include drugs that were terminated or rejected. FDA would never approve a clinical trial that only looked at patients who responded, then retroactively determined differences between them.  That approach is clearly susceptible to survivorship bias.
  • It reports means. This is especially a problem given the small numbers being studied. It's entirely plausible that just one or two drugs that took a really long time are badly skewing the results. Medians with quartile ranges would have been a lot more enlightening here.
All of the above make me question how big an impact this one meeting can really have. I'm sure it's a good thing, but it can't be quite this amazing, can it?

However, it would be great to see more of these metrics, produced in more detail, by the FDA. The agency does a pretty good job of reporting on its own performance – the PDUFA performance reports are a worthwhile read – but it doesn't publish much in the way of sponsor metrics. Given the constant clamor for new pathways and concessions from the FDA, it would be truly enlightening to see how well the industry is actually taking advantage of the tools it currently has.

As Gingery wrote in his article, "Data showing that the existing FDA processes, if used, can reduce development time is interesting given the strong effort by industry to create new methods to streamline the approval process." Gingery also notes that two new official sponsor-FDA meeting points have been added in the recently-passed FDASIA, so it would seem extremely worthwhile to have some ongoing, rigorous measurement of the usage of, and benefit from, these meetings.

Of course, even if these meetings are strongly associated with faster pipeline times, don’t be so sure that simply adding the meeting will cut your development so dramatically. Goodhart's Law tells us that performance metrics, when turned into targets, have a tendency to fail: in this case, whatever it was about the drug, or the drug company leadership, that prevented the meeting from happening in the first place may still prove to be the real factor in the delay.

I suppose the ultimate lesson here might be: If your drug doesn't have a pre-IND meeting because your executive management has the hubris to believe it doesn't need FDA input, then you probably need new executives more than you need a meeting.

[Image: Meeting pictured may not contain actual magic. Photo from FDA's Flikr stream.]

*  Disclosure: the author works for one of those.
** Under the theory that there is no such thing as bad publicity, no link will be provided.



Monday, January 14, 2013

Magical Thinking in Clinical Trial Enrollment


The many flavors of wish-based patient recruitment.

[Hopefully-obvious disclosure: I work in the field of clinical trial enrollment.]

When I'm discussing and recommending patient recruitment strategies with prospective clients, there is only one serious competitor I'm working against. I do not tailor my presentations in reaction to what other Patient Recruitment Organizations are saying, because they're not usually the thing that causes me the most problems. In almost all cases, when we lose out on a new study opportunity, we have lost to one opponent:

Need patients? Just add water!
Magical thinking.

Magical thinking comes in many forms, but in clinical trial enrollment it traditionally has two dominant flavors:

  • We won’t have any problems with enrollment because we have made it a priority within our organization.
    (This translates to: "we want it to happen, therefore it has to happen, therefore it will happen", but it doesn't sound quite as convincing that way, does it?)
  • We have selected sites that already have access to a large number of the patients we need.
    (I hear this pretty much 100% of the time. Even from people who understand that every trial is different and that past site performance is simply not a great predictor of future performance.)

A new form of magical thinking burst onto the scene a few years ago: the belief that the Internet will enable us to target and engage exactly the right patients. Specifically, some teams (aided by the, shall we say, less-than-completely-totally-true claims of "expert" vendors) began to believe that the web’s great capacity to narrowly target specific people – through Google search advertising, online patient communities, and general social media activities – would prove more than enough to deliver large numbers of trial participants. And deliver them fast and cheap to boot. Sadly evidence has already started to emerge about the Internet’s failure to be a panacea for slow enrollment. As I and others have pointed out, online recruitment can certainly be cost effective, but cannot be relied on to generate a sizable response. As a sole source, it tends to underdeliver even for small trials.

I think we are now seeing the emergence of the newest flavor of magical thinking: Big Data. Take this quote from recent coverage of the JP Morgan Healthcare Conference:
For instance, Phase II, that ever-vexing rubber-road matchmaker for promising compounds that just might be worthless. Identifying the right patients for the right drug can make or break a Phase II trial, [John] Reynders said, and Big Data can come in handy as investigators distill mountains of imaging results, disease progression readings and genotypic traits to find their target participants. 
The prospect of widespread genetic mapping coupled with the power of Big Data could fundamentally change how biotech does R&D, [Alexis] Borisy said. "Imagine having 1 million cancer patients profiled with data sets available and accessible," he said. "Think how that very large data set might work--imagine its impact on what development looks like. You just look at the database and immediately enroll a trial of ideal patients."
Did you follow the logic of that last sentence? You immediately enroll ideal patients ... and all you had to do was look at a database! Problem solved!

Before you go rushing off to get your company some Big Data, please consider the fact that the overwhelming majority of Phase 2 trials do not have a neat, predefined set of genotypic traits they’re looking to enroll. In fact, narrowly-tailored phase 2 trials (such as recent registration trials of Xalkori and Zelboraf) actually enroll very quickly already, without the need for big databases. The reality for most drugs is exactly the opposite: they enter phase 2 actively looking for signals that will help identify subgroups that benefit from the treatment.

Also, it’s worth pointing out that having a million data points in a database does not mean that you have a million qualified, interested, and nearby patients just waiting to be enrolled in your trial. As recent work in medical record queries bears out, the yield from these databases promises to be low, and there are enormous logistic, regulatory, and personal challenges in identifying, engaging, and consenting the actual human beings represented by the data.

More, even fresher flavors of magical thinking are sure to emerge over time. Our urge to hope that our problems will just be washed away in a wave of cool new technology is just too powerful to resist.

However, when the trial is important, and the costs of delay are high, clinical teams need to set the wishful thinking aside and ask for a thoughtful plan based on hard evidence. Fortunately, that requires no magic bean purchase.

Magic Beans picture courtesy of Flikr user sleepyneko

Thursday, December 20, 2012

All Your Site Are Belong To Us


'Competitive enrollment' is exactly that.

This is a graph I tend to show frequently to my clients – it shows the relative enrollment rates for two groups of sites in a clinical trial we'd been working on. The blue line is the aggregate rate of the 60-odd sites that attended our enrollment workshop, while the green line tracks enrollment for the 30 sites that did not attend the workshop. As a whole, the attendees were better enrollers that the non-attendees, but the performance of both groups was declining.

Happily, the workshop produced an immediate and dramatic increase in the enrollment rate of the sites who participated in it – they not only rebounded, but they began enrolling at a better rate than ever before. Those sites that chose not to attend the workshop became our control group, and showed no change in their performance.

The other day, I wrote about ENACCT's pilot program to improve enrollment. Five oncology research sites participated in an intensive, highly customized program to identify and address the issues that stood in the way of enrolling more patients.  The sites in general were highly enthused about the program, and felt it had a positive impact on the operations.

There was only one problem: enrollment didn't actually increase.

Here’s the data:

This raises an obvious question: how can we reconcile these disparate outcomes?

On the one hand, an intensive, multi-day, customized program showed no improvement in overall enrollment rates at the sites.

On the other, a one-day workshop with sixty sites (which addressed many of the same issues as the ENACCT pilot: communications, study awareness, site workflow, and patient relationships) resulted in and immediate and clear improvement in enrollment.

There are many possible answers to this question, but after a deeper dive into our own site data, I've become convinced that there is one primary driver at work: for all intents and purposes, site enrollment is a zero-sum game. Our workshop increased the accrual of patients into our study, but most of that increase came as a result of decreased enrollments in other studies at our sites.

Our workshop graph shows increased enrollment ... for one study. The ENACCT data is across all studies at each site. It stands to reason that if sites are already operating at or near their maximum capacity, then the only way to improve enrollment for your trial is to get the sites to care more about your trial than about other trials that they’re also participating in.

And that makes sense: many of the strategies and techniques that my team uses to increase enrollment are measurably effective, but there is no reason to believe that they result in permanent, structural changes to the sites we work with. We don’t redesign their internal processes; we simply work hard to make our sites like us and want to work with us, which results in higher enrollment. But only for our trials.

So the next time you see declining enrollment in one of your trials, your best bet is not that the patients have disappeared, but rather that your sites' attention has wandered elsewhere.


Tuesday, December 11, 2012

What (If Anything) Improves Site Enrollment Performance?

ENACCT has released its final report on the outcomes from the National Cancer Clinical Trials Pilot Breakthrough Collaborative (NCCTBC), a pilot program to systematically identify and implement better enrollment practices at five US clinical trial sites. Buried after the glowing testimonials and optimistic assessments is a grim bottom line: the pilot program didn't work.

Here are the monthly clinical trial accruals at each of the 5 sites. The dashed lines mark when the pilots were implemented:



4 of the 5 sites showed no discernible improvement. The one site that did show increasing enrollment appears to have been improving before any of the interventions kicked in.

This is a painful but important result for anyone involved in clinical research today, because the improvements put in place through the NCCTBC process were the product of an intensive, customized approach. Each site had 3 multi-day learning sessions to map out and test specific improvements to their internal communications and processes (a total of 52 hours of workshops). In addition, each site was provided tracking tools and assigned a coach to assist them with specific accrual issues.

That’s an extremely large investment of time and expertise for each site. If the results had been positive, it would have been difficult to project how NCCTBC could be scaled up to work at the thousands of research sites across the country. Unfortunately, we don’t even have that problem: the needle simple did not move.

While ENACCT plans a second round of pilot sites, I think we need to face a more sobering reality: we cannot squeeze more patients out of sites through training and process improvements. It is widely believed in the clinical research industry that sites are low-efficiency bottlenecks in the enrollment process. If we could just "fix" them, the thinking goes – streamline their workflow, improve their motivation – we could quickly improve the speed at which our trials complete. The data from the NCCTBC paints an entirely different picture, though. It shows us that even when we pour large amounts of time and effort into a tailored program of "evidence and practice-based changes", our enrollment ROI may be nonexistent.

I applaud the ENACCT team for this pilot, and especially for sharing the full monthly enrollment totals at each site. This data should cause clinical development teams everywhere to pause and reassess their beliefs about site enrollment performance and how to improve it.

Tuesday, October 2, 2012

Decluttering the Dashboard


It’s Gresham’s Law for clinical trial metrics: Bad data drives out good. Here are 4 steps you can take to fix it.

Many years ago, when I was working in the world of technology startups, one “serial entrepreneur” told me about a technique he had used when raising investor capital for his new database firm:  since his company marketed itself as having cutting-edge computing algorithms, he went out and purchased a bunch of small, flashing LED lights and simply glued them onto the company’s servers.  When the venture capital folks came out for due diligence meetings, they were provided a dramatic view into the darkened server room, brilliantly lit up by the servers’ energetic flashing. It was the highlight of the visit, and got everyone’s investment enthusiasm ratcheted up a notch.

The clinical trials dashboard is a candy store: bright, vivid,
attractive ... and devoid of nutritional value.
I was reminded of that story at a recent industry conference, when I naively walked into a seminar on “advanced analytics” only to find I was being treated to an extended product demo. In this case, a representative from one of the large CROs was showing off the dashboard for their clinical trials study management system.

And an impressive system it was, chock full of bubble charts and histograms and sliders.  For a moment, I felt like a kid in a candy store.  So much great stuff ... how to choose?

Then the presenter told a story: on a recent trial, a data manager in Italy, reviewing the analytics dashboard, alerted the study team to the fact that there was an enrollment imbalance in Japan, with one site enrolling all of the patients in that country.  This was presented as a success story for the system: it linked up disparate teams across the globe to improve study quality.

But to me, this was a small horror story: the dashboard had gotten so cluttered that key performance issues were being completely missed by the core operations team. The fact that a distant data manager had caught the issue was a lucky break, certainly, but one that should have set off alarm bells about how important signals were being overwhelmed by the noise of charts and dials and “advanced visualizations”.

Swamped with high-precision trivia
I do not need to single out any one system or vendor here: this is a pervasive problem. In our rush to provide “robust analytic solutions”, our industry has massively overengineered its reporting interfaces. Every dashboard I've had a chance to review – and I've seen a lot of them – contain numerous instances of vividly-colored charts crowding out one another, with minimal sense of differentiating the significant from the tangential.

It’s Gresham’s Law for clinical trial metrics: Bad data drives out good. Bad data – samples sliced so thin they’ve lost significance, histograms of marginal utility made “interesting” (and nearly unreadable) by 3-D rendering, performance grades that have never been properly validated. Bad data is plentiful and much, much easier to obtain than good data.

So what can we do? Here are 4 initial steps to decluttering the dashboard:

1. Abandon “Actionable Analytics”
Everybody today sells their analytics as “actionable” [including, to be fair, even one company’s website that the author himself may be guilty of drafting]. The problem though is that any piece of data – no matter how tenuous and insubstantial -- can be made actionable. We can always think of some situation where an action might be influenced by it, so we decide to keep it. As a result, we end up swamped with high-precision trivia (Dr. Smith is enrolling at the 82nd percentile among UK sites!) that do not influence important decisions but compete for our attention. We need to stop reporting data simply because it’s there and we can report it.

2. Identify Key Decisions First
 The above process (which seems pretty standard nowadays) is backwards. We look at the data we have, and ask ourselves whether it’s useful. Instead, we need to follow a more disciplined process of first asking ourselves what decisions we need to make, and when we need to make them. For example:

  • When is the earliest we will consider deactivating a site due to non-enrollment?
  • On what schedule, and for which reasons, will senior management contact individual sites?
  • At what threshold will imbalances in safety data trigger more thorough investigation?

Every trial will have different answers to these questions. Therefore, the data collected and displayed will also need to be different. It is important to invest time and effort to identify critical benchmarks and decision points, specific to the needs of the study at hand, before building out the dashboard.

3. Recognize and Respect Context
As some of the questions about make clear, many important decisions are time-dependent.  Often, determining when you need to know something is every bit as important as determining what you want to know. Too many dashboards keep data permanently anchored over the course of the entire trial even though it's only useful during a certain window. For example, a chart showing site activation progress compared to benchmarks should no longer be competing for attention on the front of a dashboard after all sites are up and running – it will still be important information for the next trial, but for managing this trial now, it should no longer be something the entire team reviews regularly.

In addition to changing over time, dashboards should be thoughtfully tailored to major audiences.  If the protocol manager, medical monitor, CRAs, data managers, and senior executives are all looking at the same dashboard, then it’s a dead certainty that many users are viewing information that is not critical to their job function. While it isn't always necessary to develop a unique topline view for every user, it is worthwhile to identify the 3 or 4 major user types, and provide them with their own dashboards (so the person responsible for tracking enrollment in Japan is in a position to immediately see an imbalance).

4. Give your Data Depth
Many people – myself included – are reluctant to part with any data. We want more information about study performance, not less. While this isn't a bad thing to want, it does contribute to the tendency to cram as much as possible into the dashboard.

The solution is not to get rid of useful data, but to bury it. Many reporting systems have the ability to drill down into multiple layers of information: this capability should be thoughtfully (but aggressively!) used to deprioritize all of your useful-but-not-critical data, moving it off the dashboard and into secondary pages.

Bottom Line
The good news is that access to operational data is becoming easier to aggregate and monitor every day. The bad news is that our current systems are not designed to handle the flood of new information, and instead have become choked with visually-appealing-but-insubstantial chart candy. If we want to have any hope of getting a decent return on our investment from these systems, we need to take a couple steps back and determine: what's our operational strategy, and who needs what data, when, in order to successfully execute against it?


[Photo credit: candy store from flikr user msgolightly.]

Thursday, August 16, 2012

Clinical Trial Alerts: Nuisance or Annoyance?


Will physicians change their answers when tired of alerts?

I am an enormous fan of electronic health records (EMRs).  Or rather, more precisely, I am an enormous fan of what EMRs will someday become – current versions tend to leave a lot to be desired. Reaction to these systems among physicians I’ve spoken with has generally ranged from "annoying" to "*$%#^ annoying", and my experience does not seem to be at all unique.

The (eventual) promise of EMRs in identifying eligible clinical trial participants is twofold:

First, we should be able to query existing patient data to identify a set of patients who closely match the inclusion and exclusion criteria for a given clinical trial. In reality, however, many EMRs are not easy to query, and the data inside them isn’t as well-structured as you might think. (The phenomenon of "shovelware" – masses of paper records scanned and dumped into the system as quickly and cheaply as possible – has been greatly exacerbated by governments providing financial incentives for the immediate adoption of EMRs.)

Second, we should be able to identify potential patients when they’re physically at the clinic for a visit, which is really the best possible moment. Hence the Clinical Trial Alert (CTA): a pop-up or other notification within the EMR that the patient may be eligible for a trial. The major issue with CTAs is the annoyance factor – physicians tend to feel that they disrupt their natural clinical routine, making each patient visit less efficient. Multiple alerts per patient can be especially frustrating, resulting in "alert overload".

A very intriguing study recently in the Journal of the American Medical Informatics Association looked to measure a related issue: alert fatigue, or the tendency for CTAs to lose their effectiveness over time.  The response rate to the alerts definitely decreased steadily over time, but the authors were mildly optimistic in their assessment, noting that response rate was still respectable after 36 weeks – somewhere around 30%:


However, what really struck me here is that the referral rate – the rate at which the alert was triggered to bring in a research coordinator – dropped much more precipitously than the response rate:


This is remarkable considering that the alert consisted of only two yes/no questions. Answering either question was considered a "response", and answering "yes" to both questions was considered a "referral".

  • Did the patient have a stroke/TIA in the last 6 months?
  • Is the patient willing to undergo further screening with the research coordinator?

The only plausible explanation for referrals to drop faster than responses is that repeated exposure to the CTA lead the physicians to more frequently mark the patients as unwilling to participate. (This was not actual patient fatigue: the few patients who were the subject of multiple CTAs had their second alert removed from the analysis.)

So, it appears that some physicians remained nominally compliant with the system, but avoided the extra work involved in discussing a clinical trial option by simply marking the patient as uninterested. This has some interesting implications for how we track physician interaction with EMRs and CTAs, as basic compliance metrics may be undermined by users tending towards a path of least resistance.

ResearchBlogging.org Embi PJ, & Leonard AC (2012). Evaluating alert fatigue over time to EHR-based clinical trial alerts: findings from a randomized controlled study. Journal of the American Medical Informatics Association : JAMIA, 19 (e1) PMID: 22534081

Monday, August 13, 2012

Most* Clinical Trials Are Too** Small

* for some value of "most"
** for some value of "too"


[Note: this is a companion to a previous post, Clouding the Debate on Clinical Trials: Pediatric Edition.]

Are many current clinical trials underpowered? That is, will they not enroll enough patients to adequately answer the research question they were designed to answer? Are we wasting time and money – and even worse, the time and effort of researchers and patient-volunteers – by conducting research that is essentially doomed to produce clinically useless results?

That is the alarming upshot of the coverage on a recent study published in the Journal of the American Medical Association. This Duke Medicine News article was the most damning in its denunciation of the current state of clinical research:
Duke: Mega-Trial experts concerned
that not enough trials are mega-trials
Large-Scale Analysis Finds Majority of Clinical Trials Don't Provide Meaningful Evidence

The largest comprehensive analysis of ClinicalTrials.gov finds that clinical trials are falling short of producing high-quality evidence needed to guide medical decision-making.
The study was also was also covered in many industry publications, as well as the mainstream news. Those stories were less sweeping in their indictment of the "clinical trial enterprise", but carried the same main theme: that an "analysis" had determined that most current clinical trial were "too small".

I have only one quibble with this coverage: the study in question didn’t demonstrate any of these points. At all.

The study is a simple listing of gross characteristics of interventional trials registered over a 6 year period. It is entirely descriptive, and limits itself entirely to data entered by the trial sponsor as part of the registration on ClinicalTrials.gov. It contains no information on the quality of the trials themselves.

That last part can’t be emphasized enough: the study contains no quality benchmarks. No analysis of trial design. No benchmarking of the completeness or accuracy of the data collected. No assessment of the clinical utility of the evidence produced. Nothing like that at all.

So, the question that nags at me is: how did we get from A to B? How did this mildly-interesting-and-entirely-descriptive data listing transform into a wholesale (and entirely inaccurate) denunciation of clinical research?

For starters, the JAMA authors divide registered trials into 3 enrollment groups: 1-100, 101-1000, and >1000. I suppose this is fine, although it should be noted that it is entirely arbitrary – there is no particular reason to divide things up this way, except perhaps a fondness for neat round numbers.

Trials within the first group are then labeled "small". No effort is made to explain why 100 patients represents a clinically important break point, but the authors feel confident to conclude that clinical research is "dominated by small clinical trials", because 62% of registered trials fit into this newly-invented category. From there, all you need is a completely vague yet ominous quote from the lead author. As US News put it:
The new report says 62 percent of the trials from 2007-2010 were small, with 100 or fewer participants. Only 4 percent had more than 1,000 participants.

"There are 330 new clinical trials being registered every week, and a number of them are very small and probably not as high quality as they could be," [lead author Dr Robert] Califf said.
"Probably not as high quality as they could be", while just vague enough to be unfalsifiable, is also not at all a consequence of the data as reported. So, through a chain of arbitrary decisions and innuendo, "less than 100" becomes "small" becomes "too small" becomes "of low quality".

Califf’s institution, Duke, appears to be particularly guilty of driving this evidence-free overinterpretation of the data, as seen in the sensationalistic headline and lede quoted above. However, it’s clear that Califf himself is blurring the distinction between what his study showed and what it didn’t:
"Analysis of the entire portfolio will enable the many entities in the clinical trials enterprise to examine their practices in comparison with others," says Califf. "For example, 96 percent of clinical trials have ≤1000 participants, and 62 percent have ≤ 100. While there are many excellent small clinical trials, these studies will not be able to inform patients, doctors, and consumers about the choices they must make to prevent and treat disease."
Maybe he’s right that these small studies will not be able to inform patients and doctors, but his study has provided absolutely no support for that statement.

When we build a protocol, there are actually only 3 major factors that go into determining how many patients we want to enroll:
  1. How big a difference we estimate the intervention will have compared to a control (the effect size)
  2. How much risk we’ll accept that we’ll get a false-positive (alpha) or false-negative (beta) result
  3. Occasionally, whether we need to add participants to better characterize safety and tolerability (as is frequently, and quite reasonably, requested by FDA and other regulators)
Quantity is not quality: enrolling too many participants in an investigational trial is unethical and a waste of resources. If the numbers determine that we should randomize 80 patients, it would make absolutely no sense to randomize 21 more so that the trial is no longer "too small". Those 21 participants could be enrolled in another trial, to answer another worthwhile question.

So the answer to "how big should a trial be?" is "exactly as big as it needs to be." Taking descriptive statistics and applying normative categories to them is unhelpful, and does not make for better research policy.


ResearchBlogging.org Califf RM, Zarin DA, Kramer JM, Sherman RE, Aberle LH, & Tasneem A (2012). Characteristics of clinical trials registered in ClinicalTrials.gov, 2007-2010. JAMA : the journal of the American Medical Association, 307 (17), 1838-47 PMID: 22550198

Tuesday, July 31, 2012

Clouding the Debate on Clinical Trials: Pediatric Edition

I would like to propose a rule for clinical trial benchmarks. This rule may appear so blindingly obvious that I run the risk of seeming simple-minded and naïve for even bringing it up.

The rule is this: if you’re going to introduce a benchmark for clinical trial design or conduct, explain its value.

Are we not putting enough resources into pediatric research, or have we over-incentivized risky experimentation on a vulnerable population?  This is a critically important question in desperate need of more data and thoughtful analysis.
That’s it.  Just a paragraph explaining the rationale of why you’ve chosen to measure what you’re measuring.  Extra credit if you compare it to other benchmarks you could have used, or consider the limitations of your new metric.

I would feel bad for bringing this up, were it not for two recent articles in major publications that completely fail to live up to this standard. I’ll cover one today and one tomorrow.

The first is a recent article in Pediatrics, Pediatric Versus Adult Drug Trials for Conditions With High Pediatric Disease Burden, which has received a fair bit of attention in the industry -- mostly due to Reuters uncritically recycling the authors’ press release

It’s worth noting that the claim made in the release title, "Drug safety and efficacy in children is rarely addressed in drug trials for major diseases", is not at all supported by any data in the study itself. However, I suppose I can live with misleading PR.  What is frustrating is the inadequacy of the measures the authors use in the actual study, and the complete lack of discussion about them.

To benchmark where pediatric drug research should be, they use the proportion of total "burden of disease" borne by children.   Using WHO estimates, they look at the ratio of burden (measured, essentially, in years of total disability) between children and adults.  This burden is further divided into high-income countries and low/middle-income countries.

This has some surface plausibility, but presents a host of issues.  Simply looking at the relative prevalence of a condition does not really give us any insights into what we need to study about treatment.  For example: number 2 on the list for middle/low income diseases is diarrheal illness, where WHO lists the burden of disease as 90% pediatric.  There is no question that diarrheal diseases take a terrible toll on children in developing countries.  We absolutely need to focus resources on improving prevention and treatment: what we do not particularly need is more clinical trials.  As the very first bullet on the WHO fact sheet points out, diarrheal diseases are preventable and treatable.  Prevention is mostly about improving the quality of water and food supplies – this is vitally important stuff, but it has nothing to do with pharmaceutical R&D.

In the US, the NIH’s National Institute for Child Health and Human Development (NICHD) has a rigorous process for identifying and prioritizing needs for pediatric drug development, as mandated by the BPCA.  It is worth noting that only 2 of the top 5 diseases in the Pediatrics article make the cut among the 41 highest-priority areas in the NICHD’s list for 2011.

(I don’t even think the numbers as calculated by the authors are even convincing on their own terms:  3 of the 5 "high burden" diseases in wealthy countries – bipolar, depression, and schizophrenia – are extremely rare in very young children, and only make this list because of their increasing incidence in adolescence.  If our objective is to focus on how these drugs may work differently in developing children, then why wouldn’t we put greater emphasis on the youngest cohorts?)

Of course, just because a new benchmark is at odds with other benchmarks doesn’t necessarily mean that it’s wrong.  But it does mean that the benchmark requires some rigorous vetting before its used.  The authors make no attempt at explaining why we should use their metric, except to say it’s "apt". The only support provided is a pair of footnotes – one of those, ironically, is to this article from 1999 that contains a direct warning against their approach:
Our data demonstrate how policy makers could be misled by using a single measure of the burden of disease, because the ranking of diseases according to their burden varies with the different measures used.
If we’re going to make any progress in solving the problems in drug development – and I think we have a number of problems that need solving – we have got to start raising our standards for our own metrics.

Are we not putting enough resources into pediatric research, or have we over-incentivized risky experimentation on a vulnerable population? This is a critically important question in desperate need of more data and thoughtful analysis. Unfortunately, this study adds more noise than insight to the debate.

Tomorrow In a couple weeks, I’ll cover the allegations about too many trials being too small. [Update: "tomorrow" took a little longer than expected. Follow up post is here.]

[Note: the Pediatrics article also uses another metric, "Percentage of Trials that Are Pediatric", that is used as a proxy for amount of research effort being done.  For space reasons, I’m not going to go into that one, but it’s every bit as unhelpful as the pediatric burden metric.]

ResearchBlogging.org Bourgeois FT, Murthy S, Pinto C, Olson KL, Ioannidis JP, & Mandl KD (2012). Pediatric Versus Adult Drug Trials for Conditions With High Pediatric Disease Burden. Pediatrics PMID: 22826574

Tuesday, July 24, 2012

How Not to Report Clinical Trial Data: a Clear Example

I know it’s not even August yet, but I think we can close the nominations for "Worst Trial Metric of the Year".  The hands-down winner is Pharmalot, for the thoughtless publication of this article reviewing "Deaths During Clinical Trials" per year in India.  We’ll call it the Pharmalot Death Count, or PDC, and its easy to explain – it's just the total number of patients who died while enrolled in any clinical trial, regardless of cause, and reported as though it were an actual meaningful number.

(To make this even more execrable, Pharmalot actually calls this "Deaths attributed to clinical trials" in his opening sentence, although the actual data has exactly nothing to do with the attribution of the death.)

In fairness, Pharmalot is really only sharing the honors with a group of sensationalistic journalists in India who have jumped on these numbers.  But it has a much wider readership within the research community, and could have at least attempted to critically assess the data before repeating it (along with criticism from "experts").

The number of things wrong with this metric is a bit overwhelming.  I’m not even sure where to start.  Some of the obvious issues here:

1. No separation of trial-related versus non-trial-related.  Some effort is made to explain that there may be difficulty in determining whether a particular death was related to the study drug or not.  However, that obscures the fact that the PDC lumps together all deaths, whether they took an experimental medication or not. That means the PDC includes:
  • Patients in control arms receiving standard of care and/or placebo, who died during the course of their trial.
  • Patients whose deaths were entirely unrelated to their illness (eg, automobile accident victims)
2. No base rates.  When a raw death total is presented, a number of obvious questions should come to mind:  how many patients were in the trials?  How many deaths were there in patients with similar diseases who were not in trials?  The PDC doesn’t care about that kind of context

3. No sensitivity to trial design.  Many late-stage cancer clinical trials use Overall Survival (OS) as their primary endpoint – patients are literally in the trial until they die.  This isn’t considered unethical; it’s considered the gold standard of evidence in oncology.  If we ran shorter, less thorough trials, we could greatly reduce the PDC – would that be good for anyone?

Case Study: Zelboraf
FDA: "Highly effective, more personalized therapy"
PDC: "199 deaths attributed to Zelboraf trial!"
There is a fair body of evidence that participants in clinical trials fare about the same as (or possibly a bit better than) similar patients receiving standard of care therapy.  However, much of that evidence was accumulated in western countries: it is a fair question to ask if patients in India and other countries receive a similar benefit.  The PDC, however, adds nothing to our ability to answer that question.

So, for publicizing a metric that has zero utility, and using it to cast aspersions on the ethics of researchers, we congratulate Pharmalot and the PDC.