Showing posts with label data quality. Show all posts
Showing posts with label data quality. Show all posts

Tuesday, June 4, 2013

Can FDA's New Transparency Survive Avandia?

PDUFA V commitments signal a strong commitment to tolerance of open debate in the face of uncertainty.

I can admit to a rather powerful lack of enthusiasm when reading about interpersonal squabbles. It’s even worse in the scientific world: when I read about debates getting mired in personal attacks I tend to simply stop reading and move on to something else.

However, the really interesting part of this week’s meeting of an FDA joint Advisory Committee to discuss the controversial diabetes drug Avandia – at least in the sense of likely long-term impact – is not the scientific question under discussion, but the surfacing and handling of the raging interpersonal battle going on right now inside the Division of Cardiovascular and Renal Products. So I'll have to swallow my distaste and follow along with the drama.
Two words that make us mistrust Duke:
 Anil Potti Christian Laettner

Not that the scientific question at hand – does Avandia pose significant heart risks? – isn't interesting. It is. But if there’s one thing that everyone seems to agree on, it’s that we don’t have good data on the topic. Despite the re-adjudication of RECORD, no one trusts its design (and, ironically, the one trial with a design to rigorously answer the question was halted after intense pressure, despite an AdComm recommendation that it continue).  And no one seems particularly enthused about changing the current status of Avandia: in all likelihood it will continue to be permitted to be marketed under heavy restrictions. Rather than changing the future of diabetes, I suspect the committee will be content to let us slog along the same mucky trail.

The really interesting question, that will potentially impact CDER for years to come, is how it can function with frothing, open dissent among its staffers. As has been widely reported, FDA reviewer Tom Marciniak has written a rather wild and vitriolic assessment of the RECORD trial, excoriating most everyone involved. In a particularly stunning passage, Marciniak appears to claim that the entire output of anyone working at Duke University cannot be trusted because of the fraud committed by Duke cancer researcher Anil Potti:
I would have thought that the two words “Anil Potti” are sufficient for convincing anyone that Duke University is a poor choice for a contractor whose task it is to confirm the integrity of scientific research. 
(One wonders how far Marciniak is willing to take his guilt-by-association theme. Are the words “Cheng Yi Liang” sufficient to convince us that all FDA employees, including Marciniak, are poor choices for deciding matter relating to publicly-traded companies? Should I not comment on government activities because I’m a resident of Illinois (my two words: “Rod Blagojevich”)?)

Rather than censoring or reprimanding Marciniak, his supervisors have taken the extraordinary step of letting him publicly air his criticisms, and then they have in turn publicly criticized his methods and approach.

I have been unable to think of a similar situation at any regulatory agency. The tolerance for dissent being displayed by FDA is, I believe, completely unprecedented.

And that’s the cliffhanger for me: can the FDA’s commitment to transparency extend so far as to accommodate public disagreements about its own approval decisions? Can it do so even when the disagreements take an extremely nasty and inappropriate tone?

  • Rather than considering that open debate is a good thing, will journalists jump on the drama and portray agency leadership as weak and indecisive?
  • Will the usual suspects in Congress be able to exploit this disagreement for their own political gain? How many House subcommittees will be summoning Janet Woodcock in the coming weeks?

I think what Bob Temple and Norman Stockbridge are doing is a tremendous experiment in open government. If they can pull it off, it could force other agencies to radically rethink how they go about crafting and implementing regulations. However, I also worry that it is politically simply not a viable approach, and that the agency will ultimately be seriously hurt by attacks from the media and legislators.

Where is this coming from?

As part of its recent PDUFA V commitment, the FDA put out a fascinating draft document, Structured Approach to Benefit-Risk Assessment in Drug Regulatory Decision-Making. It didn't get a lot of attention when first published back in February (few FDA documents do). However, it lays out a rather bold vision for how the FDA can acknowledge the existence of uncertainty in its evaluation of new drugs. Its proposed structure even envisions an open and honest accounting of divergent interpretations of data:
When they're frothing at the mouth, even Atticus
doesn't let them publish a review
A framework for benefit-risk decision-making that summarizes the relevant facts, uncertainties, and key areas of judgment, and clearly explains how these factors influence a regulatory decision, can greatly inform and clarify the regulatory discussion. Such a framework can provide transparency regarding the basis of conflicting recommendations made by different parties using the same information.
(Emphasis mine.)

Of course, the structured framework here is designed to reflect rational disagreement. Marciniak’s scattershot insults are in many ways a terrible first case for trying out a new level of transparency.

The draft framework notes that safety issues, like Avandia, are some of the major areas of uncertainty in the regulatory process. Contrast this vision of coolly and systematically addressing uncertainties with the sad reality of Marciniak’s attack:
In contrast to the prospective and highly planned studies of effectiveness, safety findings emerge from a wide range of sources, including spontaneous adverse event reports, epidemiology studies, meta-analyses of controlled trials, or in some cases from randomized, controlled trials. However, even controlled trials, where the evidence of an effect is generally most persuasive, can sometimes provide contradictory and inconsistent findings on safety as the analyses are in many cases not planned and often reflect multiple testing. A systematic approach that specifies the sources of evidence, the strength of each piece of evidence, and draws conclusions that explain how the uncertainty weighed on the decision, can lead to more explicit communication of regulatory decisions. We anticipate that this work will continue beyond FY 2013.
I hope that work will continue beyond 2013. Thoughtful, open discussions of real uncertainties are one of the most worthwhile goals FDA can aspire to, even if it means having to learn how to do so without letting the Marciniaks of the world scuttle the whole endeavor.

[Update June 6: Further bolstering the idea that the AdCom is just as much about FDA's ability to transparently manage differences of expert opinion in the face of uncertain data, CDER Director Janet Woodcock posted this note on the FDA's blog. She's pretty explicit about the bigger picture:
There have been, and continue to be, differences of opinion and scientific disputes, which is not uncommon within the agency, stemming from varied conclusions about the existing data, not only with Avandia, but with other FDA-regulated products. 
At FDA, we actively encourage and welcome robust scientific debate on the complex matters we deal with — as such a transparent approach ensures the scientific input we need, enriches the discussions, and enhances our decision-making.
I agree, and hope she can pull it off.]

Tuesday, February 5, 2013

The World's Worst Coin Trick?


Ben Goldacre – whose Bad Pharma went on sale today – is fond of using a coin-toss-cheating analogy to describe the problem of "hidden" trials in pharmaceutical clinical research. He uses it in this TED talk:
If it's a coin-toss conspiracy, it's the worst
one in the history of conspiracies.
If I flipped a coin a hundred times, but then withheld the results from you from half of those tosses, I could make it look as if I had a coin that always came up heads. But that wouldn't mean that I had a two-headed coin; that would mean that I was a chancer, and you were an idiot for letting me get away with it. But this is exactly what we blindly tolerate in the whole of evidence-based medicine. 
and in this recent op-ed column in the New York Times:
If I toss a coin, but hide the result every time it comes up tails, it looks as if I always throw heads. You wouldn't tolerate that if we were choosing who should go first in a game of pocket billiards, but in medicine, it’s accepted as the norm. 
I can understand why he likes using this metaphor. It's a striking and concrete illustration of his claim that pharmaceutical companies are suppressing data from clinical trials in an effort to make ineffective drugs appear effective. It also dovetails elegantly, from a rhetorical standpoint, with his frequently-repeated claim that "half of all trials go unpublished" (the reader is left to make the connection, but presumably it's all the tail-flip trials, with negative results, that aren't published).

Like many great metaphors, however, this coin-scam metaphor has the distinct weakness of being completely disconnected from reality.

If we can cheat and hide bad results, why do we have so many public failures? Pharmaceutical headlines in the past year were mostly dominated by a series of high-profile clinical trial failures. Even drugs that showed great promise in phase 2 failed in phase 3 and were discontinued. Less than 20% of drugs that start up in human testing ever make it to market ... and by some accounts it may be less than 10%. Pfizer had a great run of approvals to end 2012, with 4 new drugs approved by the FDA (including Xalkori, the exciting targeted therapy for lung cancer). And yet during that same period, the company discontinued 8 compounds.

Now, this wasn't always the case. Mandatory public registration of all pharma trials didn't begin in the US until 2005, and mandatory public results reporting came later than that. Before then, companies certainly had more leeway to keep results to themselves, with one important exception: the FDA still had the data. If you ran 4 phase 3 trials on a drug, and only 2 of them were positive, you might be able to only publish those 2, but when it came time to bring the drug to market, the regulators who reviewed your NDA report would be looking at the totality of evidence – all 4 trials. And in all likelihood you were going to be rejected.

That was definitely not an ideal situation, but even then it wasn't half as dire as Goldacre's Coin Toss would lead you to believe. The cases of ineffective drugs reaching the US market are extremely rare: if anything, FDA has historically been criticized for being too risk-averse and preventing drugs with only modest efficacy from being approved.

Things are even better now. There are no hidden trials, the degree of rigor (in terms of randomization, blinding, and analysis) has ratcheted up consistently over the last two decades, lots more safety data gets collected along the way, and phase 4 trials are actually being executed and reported in a timely manner. In fact, it is safe to say that medical research has never been as thorough and rigorous as it is today.

That doesn't mean we can’t get better. We can. But the main reason we can is that we got on the path to getting better 20 years ago, and continue to make improvements.

Buying into Goldacre's analogy requires you to completely ignore a massive flood of public evidence to the contrary. That may work for the average TED audience, but it shouldn't be acceptable at the level of rational public discussion.

Of course, Goldacre knows that negative trials are publicized all the time. His point is about publication bias. However, when he makes his point so broadly as to mislead those who are not directly involved in the R&D process, he has clearly stepped out of the realm of thoughtful and valid criticism.

I got my pre-ordered copy of Bad Pharma this morning, and look forward to reading it. I will post some additional thoughts on the book as I get through it. In the meantime,those looking for more can find a good skeptical review of some of Goldacre's data on the Dianthus Medical blog here and here.

[Image: Bad Pharma's Bad Coin courtesy of flikr user timparkinson.]

Friday, October 12, 2012

The "Scandal" of "Untested" Generics


I am in the process of writing up a review of this rather terrible Forbes piece on the FDA recall of one manufacturer's version of generic 300 mg bupropion XL. However, that's going to take a while, so I thought I'd quickly cover just one of the points brought up there, since it seems to be causing a lot of confusion.

Forbes is shocked, SHOCKED to learn that things
 are happening the same way they always have:
call Congress at once!
The FDA’s review of the recall notes that when the generic was approved, only the 150 mg version was tested for bioequivalence in humans. The 300 mg version was approved based upon the 150 mg data as well as detailed information about the manufacturing and composition of both versions.

A number of people expressed surprise about this – they seemed to genuinely not be aware that a drug approval could happen in this way. The Forbes article stated that this was entirely inappropriate and worthy of Congressional investigation.

In fact, many strengths of generic drugs do not undergo in vivo bioequivalence and bioavailability testing as part of their review and approval. This is true in both the US and Europe. Here is a brief rundown of when and why such testing is waived, and why such waivers are neither new, nor shocking, nor unethical.

Title 21, Part 320 of the US Code of Federal Regulations is the regulatory foundation regarding bioequivalence testing in drugs.  Section 22 deals specifically with conditions where human testing should be waived. It is important to note that these regulations aren't new, and the laws that they're based on aren't new either (in fact, the federal law is 20 years old, and was last updated 10 years ago).

By far the most common waiver is for lower dosage strengths. When a drug exists in many approved dosages, generally the highest dose is subject to human bioequivalence testing and the lower doses are approved based on the high-dose results supplemented by in vitro testing.

However, when higher doses carry risks of toxicity, the situation can be reversed, out of ethical concerns for the welfare of test subjects. So, for example, current FDA guidance for amiodarone – a powerful antiarrhythmic drug with lots of side effects – is that the maximum “safe” dose of 200 mg should be tested in humans, and that 100 mg, 300 mg, and 400 mg dosage formulations will be approved if the manufacturer also establishes “acceptable in-vitro dissolution testing of all strengths, and … proportional similarity of the formulations across all strengths”.

That last part is critically important: the generic manufacturer must submit additional evidence about how the doses work in vitro, as well as keep the proportions of inactive ingredients constant. It is this combination of in vivo bioequivalence, in vitro testing, and manufacturing controls that supports a sound scientific decision to approve the generic at various doses.

In fact, certain drugs are so toxic – most chemotherapies, for example – that performing a bioequivalence test in healthy humans in patently unethical. In many of those cases, generic approval is granted on the basis of formulation chemistry alone. For example, generic paclitaxel is waived from human testing (here is a waiver from 2001 – again demonstrating that there’s nothing terribly shocking or new about this process).

In the case of bupropion, FDA had significant concerns about the risk of seizures at the 300 mg dose level. Similar to the amiodarone example above, they issued guidance providing for a waiver of the higher dosage, but only based upon the combination of in vivo data from the 150 mg dose, in vitro testing, and manufacturing controls.

You may not agree with the current system, and there may be room for improvement, but you cannot claim that it is new, unusual, or requiring congressional inquiry. It’s based on federal law, with significant scientific and ethical underpinnings.

Further reading: FDA Guidance for Industry: Bioavailability and Bioequivalence Studies for Orally Administered Drug Products — General Considerations

Thursday, October 11, 2012

TransCelerate and CDISC: The Relationship Explained


Updating my post from last month about the launch announcement for TransCelerate BioPharma, a nonprofit entity funded by 10 large pharmaceutical companies to “bring new medicines to patients faster”: one of the areas I had some concern about was in the new company's move into the “development of clinical data standards”.

How about we transcelerate
this website a bit?
Some much-needed clarification has come by way of Wayne Kubick, the CTO of CDISC. In an article in Applied Clinical Trials, he lays out the relationship in a bit more detail:
TransCelerate has been working closely with CDISC for several months to see how they can help us move more quickly in the development of therapeutic area data standards.  Specifically, they are working to provide CDISC with knowledgeable staff to help us plan for and develop data standards for more than 55 therapeutic areas over the next five years.
And then again:
But the important thing to realize is that TransCelerate intends to help CDISC achieve its mission to develop therapeutic area data standards more rapidly by giving us greater access to skilled volunteers to contribute to standards development projects.   
So we have clarification on at least one point: TransCelerate will donate some level of additional skilled manpower to CDISC-led initiatives.

That’s a good thing, I assume. Kubick doesn't mention it, but I would venture to guess that “more skilled volunteers” is at or near the top of CDISC's wish list.

But it raises the question: why TransCelerate? Couldn't the 10 member companies have contributed this employee time already? Did we really need a new entity to organize a group of fresh volunteers? And if we did somehow need a coordinating entity to make this happen, why not use an existing group – one with, say, a broader level of support across the industry, such as PhRMA?

The promise of a group like TransCelerate is intriguing. The executional challenges, however, are enormous: I think it will be under constant pressure to move away from meaningful but very difficult work towards supporting more symbolic and easy victories.

Tuesday, October 2, 2012

Decluttering the Dashboard


It’s Gresham’s Law for clinical trial metrics: Bad data drives out good. Here are 4 steps you can take to fix it.

Many years ago, when I was working in the world of technology startups, one “serial entrepreneur” told me about a technique he had used when raising investor capital for his new database firm:  since his company marketed itself as having cutting-edge computing algorithms, he went out and purchased a bunch of small, flashing LED lights and simply glued them onto the company’s servers.  When the venture capital folks came out for due diligence meetings, they were provided a dramatic view into the darkened server room, brilliantly lit up by the servers’ energetic flashing. It was the highlight of the visit, and got everyone’s investment enthusiasm ratcheted up a notch.

The clinical trials dashboard is a candy store: bright, vivid,
attractive ... and devoid of nutritional value.
I was reminded of that story at a recent industry conference, when I naively walked into a seminar on “advanced analytics” only to find I was being treated to an extended product demo. In this case, a representative from one of the large CROs was showing off the dashboard for their clinical trials study management system.

And an impressive system it was, chock full of bubble charts and histograms and sliders.  For a moment, I felt like a kid in a candy store.  So much great stuff ... how to choose?

Then the presenter told a story: on a recent trial, a data manager in Italy, reviewing the analytics dashboard, alerted the study team to the fact that there was an enrollment imbalance in Japan, with one site enrolling all of the patients in that country.  This was presented as a success story for the system: it linked up disparate teams across the globe to improve study quality.

But to me, this was a small horror story: the dashboard had gotten so cluttered that key performance issues were being completely missed by the core operations team. The fact that a distant data manager had caught the issue was a lucky break, certainly, but one that should have set off alarm bells about how important signals were being overwhelmed by the noise of charts and dials and “advanced visualizations”.

Swamped with high-precision trivia
I do not need to single out any one system or vendor here: this is a pervasive problem. In our rush to provide “robust analytic solutions”, our industry has massively overengineered its reporting interfaces. Every dashboard I've had a chance to review – and I've seen a lot of them – contain numerous instances of vividly-colored charts crowding out one another, with minimal sense of differentiating the significant from the tangential.

It’s Gresham’s Law for clinical trial metrics: Bad data drives out good. Bad data – samples sliced so thin they’ve lost significance, histograms of marginal utility made “interesting” (and nearly unreadable) by 3-D rendering, performance grades that have never been properly validated. Bad data is plentiful and much, much easier to obtain than good data.

So what can we do? Here are 4 initial steps to decluttering the dashboard:

1. Abandon “Actionable Analytics”
Everybody today sells their analytics as “actionable” [including, to be fair, even one company’s website that the author himself may be guilty of drafting]. The problem though is that any piece of data – no matter how tenuous and insubstantial -- can be made actionable. We can always think of some situation where an action might be influenced by it, so we decide to keep it. As a result, we end up swamped with high-precision trivia (Dr. Smith is enrolling at the 82nd percentile among UK sites!) that do not influence important decisions but compete for our attention. We need to stop reporting data simply because it’s there and we can report it.

2. Identify Key Decisions First
 The above process (which seems pretty standard nowadays) is backwards. We look at the data we have, and ask ourselves whether it’s useful. Instead, we need to follow a more disciplined process of first asking ourselves what decisions we need to make, and when we need to make them. For example:

  • When is the earliest we will consider deactivating a site due to non-enrollment?
  • On what schedule, and for which reasons, will senior management contact individual sites?
  • At what threshold will imbalances in safety data trigger more thorough investigation?

Every trial will have different answers to these questions. Therefore, the data collected and displayed will also need to be different. It is important to invest time and effort to identify critical benchmarks and decision points, specific to the needs of the study at hand, before building out the dashboard.

3. Recognize and Respect Context
As some of the questions about make clear, many important decisions are time-dependent.  Often, determining when you need to know something is every bit as important as determining what you want to know. Too many dashboards keep data permanently anchored over the course of the entire trial even though it's only useful during a certain window. For example, a chart showing site activation progress compared to benchmarks should no longer be competing for attention on the front of a dashboard after all sites are up and running – it will still be important information for the next trial, but for managing this trial now, it should no longer be something the entire team reviews regularly.

In addition to changing over time, dashboards should be thoughtfully tailored to major audiences.  If the protocol manager, medical monitor, CRAs, data managers, and senior executives are all looking at the same dashboard, then it’s a dead certainty that many users are viewing information that is not critical to their job function. While it isn't always necessary to develop a unique topline view for every user, it is worthwhile to identify the 3 or 4 major user types, and provide them with their own dashboards (so the person responsible for tracking enrollment in Japan is in a position to immediately see an imbalance).

4. Give your Data Depth
Many people – myself included – are reluctant to part with any data. We want more information about study performance, not less. While this isn't a bad thing to want, it does contribute to the tendency to cram as much as possible into the dashboard.

The solution is not to get rid of useful data, but to bury it. Many reporting systems have the ability to drill down into multiple layers of information: this capability should be thoughtfully (but aggressively!) used to deprioritize all of your useful-but-not-critical data, moving it off the dashboard and into secondary pages.

Bottom Line
The good news is that access to operational data is becoming easier to aggregate and monitor every day. The bad news is that our current systems are not designed to handle the flood of new information, and instead have become choked with visually-appealing-but-insubstantial chart candy. If we want to have any hope of getting a decent return on our investment from these systems, we need to take a couple steps back and determine: what's our operational strategy, and who needs what data, when, in order to successfully execute against it?


[Photo credit: candy store from flikr user msgolightly.]

Tuesday, September 25, 2012

What We Can Anticipate from TransCelerate


TransCelerate: Pharma's great kumbaya moment?
Last week, 10 of the largest pharmaceutical companies caused quite a hullaballoo in the research world with their announcement that they were anteing up to form a new nonprofit entity “to identify and solve common drug development challenges with the end goals of improving the quality of clinical studies and bringing new medicines to patients faster”. The somewhat-awkwardly-named TransCelerate BioPharma immediately got an enthusiastic reception from industry watchers and participants, mainly due to the perception that it was well poised to attack some of the systemic causes of delays and cost overruns that plague clinical trials today.

I myself was caught up in the breathless excitement of the moment, immediately tweeting after reading the initial report:

 Over the past few days, though, I've had time to re-read and think more about the launch announcement, and dial down my enthusiasm considerably.  I still think it’s a worthwhile effort, but it’s probably not fair to expect anything that fundamentally changes much in the way of current trial execution.

Mostly, I’m surprised by the specific goals selected, which seem for the most part either tangential to the real issues in modern drug development or stepping into areas where an all-big-pharma committee isn’t the best tool for the job. I’m also very concerned that a consortium like this would launch without a clearly-articulated vision of how it fits in with, and adds to, the ongoing work of other key players – the press release is loaded with positive, but extremely vague, wording about how TransCelerate will work with, but be different from, groups such as the CTTI and CDISC. The new organization also appears to have no formal relationship with any CRO organizations.  Given the crucial and deeply embedded nature of CROs in today’s research, this is not a detail to be worked out later; it is a vital necessity if any worthwhile progress is to be made.

Regarding the group’s goals, here is what their PR had to say:
Five projects have been selected by the group for funding and development, including: development of a shared user interface for investigator site portals, mutual recognition of study site qualification and training, development of risk-based site monitoring approach and standards, development of clinical data standards, and establishment of a comparator drug supply model.
Let’s take these five projects one by one, to try to get a better picture of TransCelerate’s potential impact:

1. Development of a shared user interface for investigator site portals

Depending on how it’s implemented, the impact of this could range from “mildly useful” to “mildly irksome”. Sure, I hear investigators and coordinators complain frequently about all the different accounts they have to keep track of, so having a single front door to multiple sponsor sites would be a relief. However, I don’t think that the problem of too many usernames cracks anyone’s “top 20 things wrong with clinical trial execution” list – it’s a trivial detail. Aggravating, but trivial.

Worse, if you do it wrong and develop a clunky interface, you’ll get a lot more grumbling about making life harder at the research site. And I think there’s a high risk of that, given that this is in effect software development by committee – and the committee is a bunch of companies that do not actually specialize in software development.

In reality, the best answer to this is probably a lot simpler than we imagine: if we had a neutral, independent body (such as the ACRP) set up a single sign-on (SSO) registry for investigators and coordinators, then all sponsors, CROs, and IVRS/IWRS/CDMS can simply set themselves up as service providers. (This works in the same way that many people today can log into disparate websites using their existing Google or Facebook accounts.)  TransCelerate might do better sponsoring and promoting an external standard than trying to develop an entirely new platform of its own.

2. Mutual recognition of study site qualification and training

This is an excellent step forward. It’s also squarely in the realm of “ideas so obvious we could have done them 10 years ago”. Forcing site personnel to attend multiple iterations of the same training seminars simply to ensure that you’ve collected enough binders full of completion certificates is a sad CYA exercise with no practical benefit to anyone.

This will hopefully re-establish some goodwill with investigators. However, it’s important to note that it’s pretty much a symbolic act in terms of efficiency and cost savings. Nothing wrong with that – heaven knows we need some relationship wins with our increasingly-disillusioned sites – but let’s not go crazy thinking that the represents a real cause of wasted time or money. In fact, it’s pretty clear that one of the reasons we’ve lived with the current site-unfriendly system for so long is that it didn’t really cost us anything to do so.

(It’s also worth pointing out that more than a few biotechs have already figured out, usually with CRO help, how to ensure that site personnel are properly trained and qualified without subjecting them to additional rounds of training.)

3. Development of risk-based site monitoring approach and standards

The consensus belief and hope is that risk-based monitoring is the future of clinical trials. Ever since FDA’s draft guidance on the topic hit the street last year, it’s been front and center at every industry event. It will, unquestionably, lead to cost savings (although some of those savings will hopefully be reinvested into more extensive centralized monitoring).  It will not necessarily shave a significant amount of time off the trials, since in many trials getting monitors out to sites to do SDV is not a rate-limiting factor, but it should still at the very least result in better data at lower cost, and that’s clearly a good thing.

So, the big question for me is: if we’re all moving in this direction already, do we need a new, pharma-only consortium to develop an “approach” to risk-based monitoring?

 First and foremost, this is a senseless conversation to have without the active involvement and leadership of CROs: in many cases, they understand the front-line issues in data verification and management far better than their pharma clients.  The fact that TransCelerate launched without a clear relationship with CROs and database management vendors is a troubling sign that it isn’t poised to make a true contribution to this area.

In a worst-case scenario, TransCelerate may actually delay adoption of risk-based monitoring among its member companies, as they may decide to hold off on implementation until standards have been drafted, circulated, vetted, re-drafted, and (presumably, eventually) approved by all 10 companies. And it will probably turn out that the approaches used will need to vary by patient risk and therapeutic area anyway, making a common, generic approach less than useful.

Finally, the notion that monitoring approaches require some kind of industry-wide “standardization” is extremely debatable. Normally, we work to standardize processes when we run into a lot of practical interoperability issues – that’s why we all have the same electric outlets in our homes, but not necessarily the same AC adaptors for our small devices.  It would be nice if all cell phone manufacturers could agree on a common standard plug, but the total savings from that standard would be small compared to the costs of defining and implementing it.  That’s the same with monitoring: each sponsor and each CRO have a slightly different flavor of monitoring, but the costs of adapting to any one approach for any given trial are really quite small.

Risk-based monitoring is great. If TransCelerate gets some of the credit for its eventual adoption, that’s fine, but I think the adoption is happening anyway, and TransCelerate may not be much help in reality.

4. Development of clinical data standards

This is by far the most baffling inclusion in this list. What happened to CDISC? What is CDISC not doing right that TransCelerate could possibly improve?

In an interview with Matthew Herper at Forbes, TransCelerate’s Interim CEO expands a bit on this point:
“Why do some [companies] record that male is a 0 and female is a 1, and others use 1 and 0, and others use M and F. Where is there any competitive advantage to doing that?” says Neil. “We do 38% of the clinical trials but 70% of the [spending on them]. IF we were to come together and try to define some of these standards it would be an enabler for efficiencies for everyone.”
It’s really worth noting that the first part of that quote has nothing to do with the second part. If I could wave a magic wand and instantly standardize all companies’ gender reporting, I would not have reduced clinical trial expenditures by 0.01%. Even if we extend this to lots of other data elements, we’re still not talking about a significant source of costs or time.

Here’s another way of looking at it: those companies that are conducting the other 62% of trials but are only responsible for 30% of the spending – how did they do it, since they certainly haven’t gotten together to agree on a standard format for gender coding?

But the main problem here is that TransCelerate is encroaching on the work of a respected, popular, and useful initiative – CDISC – without clearly explaining how it will complement and assist that initiative. Neil’s quote almost seems to suggest that he plans on supplanting CDISC altogether.  I don’t think that was the intent, but there’s no rational reason to expect TransCelerate to offer substantive improvement in this area, either.

5. Establishment of a comparator drug supply model

This is an area that I don’t have much direct experience in, so it’s difficult to estimate what impact TransCelerate will have. I can say, anecdotally, that over the past 10 years, exactly zero clinical trials I’ve been involved with have had significant issues with comparator drug supply. But, admittedly, that’s quite possibly a very unrepresentative sample of pharmaceutical clinical trials.

I would certainly be curious to hear some opinions about this project. I assume it’s a somewhat larger problem in Europe than in the US, given both their multiple jurisdictions and their stronger aversion to placebo control. I really can’t imagine that inefficiencies in acquiring comparator drugs (most of which are generic, and so not directly produced by TransCelerate’s members) represent a major opportunity to save time and money.

Conclusion

It’s important to note that everything above is based on very limited information at this point. The transcelerate.com website is still “under construction”, so I am only reacting to the press release and accompanying quotes. However, it is difficult to imagine at this point that TransCelerate’s current agenda will have more than an extremely modest impact on current clinical trials.  At best, it appears that it may identify some areas to cut some costs, though this is mostly through the adoption of risk-based monitoring, which should happen whether TransCelerate exists or not.

I’ll remain a fan of TransCelerate, and will follow its progress with great interest in the hopes that it outperforms my expectations. However, it would do us all well to recognize that TransCelerate probably isn’t going to change things very dramatically -- the many systemic problems that add to the time and cost of clinical trials today will still be with us, and we need to continue to work hard to find better paths forward.

[Update 10-Oct-2012: Wayne Kubick, the CTO of CDISC, has posted a response with some additional details around cooperation between TransCelerate and CDISC around point 4 above.]

Mayday! Mayday! Photo credit: "Wheatley Maypole Dance 2008" from flikr user net_efekt.

Monday, August 13, 2012

Most* Clinical Trials Are Too** Small

* for some value of "most"
** for some value of "too"


[Note: this is a companion to a previous post, Clouding the Debate on Clinical Trials: Pediatric Edition.]

Are many current clinical trials underpowered? That is, will they not enroll enough patients to adequately answer the research question they were designed to answer? Are we wasting time and money – and even worse, the time and effort of researchers and patient-volunteers – by conducting research that is essentially doomed to produce clinically useless results?

That is the alarming upshot of the coverage on a recent study published in the Journal of the American Medical Association. This Duke Medicine News article was the most damning in its denunciation of the current state of clinical research:
Duke: Mega-Trial experts concerned
that not enough trials are mega-trials
Large-Scale Analysis Finds Majority of Clinical Trials Don't Provide Meaningful Evidence

The largest comprehensive analysis of ClinicalTrials.gov finds that clinical trials are falling short of producing high-quality evidence needed to guide medical decision-making.
The study was also was also covered in many industry publications, as well as the mainstream news. Those stories were less sweeping in their indictment of the "clinical trial enterprise", but carried the same main theme: that an "analysis" had determined that most current clinical trial were "too small".

I have only one quibble with this coverage: the study in question didn’t demonstrate any of these points. At all.

The study is a simple listing of gross characteristics of interventional trials registered over a 6 year period. It is entirely descriptive, and limits itself entirely to data entered by the trial sponsor as part of the registration on ClinicalTrials.gov. It contains no information on the quality of the trials themselves.

That last part can’t be emphasized enough: the study contains no quality benchmarks. No analysis of trial design. No benchmarking of the completeness or accuracy of the data collected. No assessment of the clinical utility of the evidence produced. Nothing like that at all.

So, the question that nags at me is: how did we get from A to B? How did this mildly-interesting-and-entirely-descriptive data listing transform into a wholesale (and entirely inaccurate) denunciation of clinical research?

For starters, the JAMA authors divide registered trials into 3 enrollment groups: 1-100, 101-1000, and >1000. I suppose this is fine, although it should be noted that it is entirely arbitrary – there is no particular reason to divide things up this way, except perhaps a fondness for neat round numbers.

Trials within the first group are then labeled "small". No effort is made to explain why 100 patients represents a clinically important break point, but the authors feel confident to conclude that clinical research is "dominated by small clinical trials", because 62% of registered trials fit into this newly-invented category. From there, all you need is a completely vague yet ominous quote from the lead author. As US News put it:
The new report says 62 percent of the trials from 2007-2010 were small, with 100 or fewer participants. Only 4 percent had more than 1,000 participants.

"There are 330 new clinical trials being registered every week, and a number of them are very small and probably not as high quality as they could be," [lead author Dr Robert] Califf said.
"Probably not as high quality as they could be", while just vague enough to be unfalsifiable, is also not at all a consequence of the data as reported. So, through a chain of arbitrary decisions and innuendo, "less than 100" becomes "small" becomes "too small" becomes "of low quality".

Califf’s institution, Duke, appears to be particularly guilty of driving this evidence-free overinterpretation of the data, as seen in the sensationalistic headline and lede quoted above. However, it’s clear that Califf himself is blurring the distinction between what his study showed and what it didn’t:
"Analysis of the entire portfolio will enable the many entities in the clinical trials enterprise to examine their practices in comparison with others," says Califf. "For example, 96 percent of clinical trials have ≤1000 participants, and 62 percent have ≤ 100. While there are many excellent small clinical trials, these studies will not be able to inform patients, doctors, and consumers about the choices they must make to prevent and treat disease."
Maybe he’s right that these small studies will not be able to inform patients and doctors, but his study has provided absolutely no support for that statement.

When we build a protocol, there are actually only 3 major factors that go into determining how many patients we want to enroll:
  1. How big a difference we estimate the intervention will have compared to a control (the effect size)
  2. How much risk we’ll accept that we’ll get a false-positive (alpha) or false-negative (beta) result
  3. Occasionally, whether we need to add participants to better characterize safety and tolerability (as is frequently, and quite reasonably, requested by FDA and other regulators)
Quantity is not quality: enrolling too many participants in an investigational trial is unethical and a waste of resources. If the numbers determine that we should randomize 80 patients, it would make absolutely no sense to randomize 21 more so that the trial is no longer "too small". Those 21 participants could be enrolled in another trial, to answer another worthwhile question.

So the answer to "how big should a trial be?" is "exactly as big as it needs to be." Taking descriptive statistics and applying normative categories to them is unhelpful, and does not make for better research policy.


ResearchBlogging.org Califf RM, Zarin DA, Kramer JM, Sherman RE, Aberle LH, & Tasneem A (2012). Characteristics of clinical trials registered in ClinicalTrials.gov, 2007-2010. JAMA : the journal of the American Medical Association, 307 (17), 1838-47 PMID: 22550198

Thursday, July 19, 2012

Measuring Quality: Probably Not Easy


I am a bit delayed getting my latest post up.  I am writing up some thoughts on this recentstudy put out by ARCO, which suggests that the level of quality in clinical trials does not vary significantly across global regions.

The study has gotten some attention through ARCO’s press release (an interesting range of reactions: the PharmaTimes headline declares “Developingcountries up to scratch on trial data quality”, while Pharmalot’s headline, “WhatProblem With Emerging Markets Trial Data?”, betrays perhaps a touch more skepticism). 


And it’s a very worthwhile topic: much of the difficultly, unfortunately, revolves around agreeing on what we consider adequate metrics for data quality.  The study only really looks at one metric (query rates), but does an admirably job of trying to view that metric in a number of different ways.  (I wrote about another metric – protocol deviations – in a previous post on the relation of quality to site enrollment performance.)

I have run into some issues parsing the study results, however, and have a question in to the lead author.  I’ll withhold further comment until I head back and have had a chance to digest a bit more.

Sunday, July 15, 2012

Site Enrollment Performance: A Better View

Pretty much everyone involved in patient recruitment for clinical trials seems to agree that "metrics" are, in some general sense, really really important. The state of the industry, however, is a bit dismal, with very little evidence of effort to communicate data clearly and effectively. Today I’ll focus on the Site Enrollment histogram, a tried-but-not-very-true standby in every trial.

Consider this graphic, showing enrolled patients at each site. It came through on a weekly "Site Newsletter" for a trial I was working on:



I chose this histogram not because it’s particularly bad, but because it’s supremely typical. Don’t get me wrong ... it’s really bad, but the important thing here is that it looks pretty much exactly like every site enrollment histogram in every study I’ve ever worked on.

This is a wasted opportunity. Whether we look at per-site enrollment with internal teams to develop enrollment support plans, or share this data with our sites to inform and motivate them, a good chart is one of the best tools we have. To illustrate this, let’s look at a few examples of better ways to look at the data.

If you really must do a static site histogram, make it as clear and meaningful as possible. 

This chart improves on the standard histogram in a few important ways:


Stateful histo - click to enlarge

  1.  It looks better. This is not a minor point when part of our work is to engage sites and makes them feel like they are part of something important. Actually, this graph is made clearer and more appealing mostly by the removal of useless attributes (extraneous whitespace, background colors, and unhelpful labels).
  2. It adds patient disposition information. Many graphs – like the one at the beginning of this post – are vague about who is being counted. Does "enrolled" include patients currently being screened, or just those randomized? Interpretations will vary from reader to reader. Instead, this chart makes patient status an explicit variable, without adding to the complexity of the presentation. It also provides a bit of information about recent performance, by showing patients who have been consented but not yet fully screened.
  3. It ranks sites by their total contribution to the study, not by the letters in the investigator’s name. And that is one of the main reasons we like to share this information with our sites in the first place.
Find Opportunities for Alternate Visualizations
 
There are many other ways in which essentially the same data can be re-sliced or restructured to underscore particular trends or messages. Here are two that I look at frequently, and often find worth sharing.

Then versus Now

Tornado chart - click to enlarge

This tornado chart is an excellent way of showing site-level enrollment trajectory, with each sites prior (left) and subsequent (right) contributions separated out. This example spotlights activity over the past month, but for slower trials a larger timescale may be more appropriate. Also, how the data is sorted can be critical in the communication: this could have been ranked by total enrollment, but instead sorts first on most-recent screening, clearly showing who’s picked up, who’s dropped off, and who’s remained constant (both good and bad).

This is especially useful when looking at a major event (e.g., pre/post protocol amendment), or where enrollment is expected to have natural fluctuations (e.g., in seasonal conditions).

Net Patient Contribution

In many trials, site activation occurs in a more or less "rolling" fashion, with many sites not starting until later in the enrollment period. This makes simple enrollment histograms downright misleading, as they fail to differentiate sites by the length of time they’ve actually been able to enroll. Reporting enrollment rates (patients per site per month) is one straightforward way of compensating for this, but it has the unfortunate effect of showing extreme (and, most importantly, non-predictive), variance for sites that have not been enrolling for very long.

As a result, I prefer to measure each site in terms of its net contribution to enrollment, compared to what it was expected to do over the time it was open:
Net pt contribution - click to enlarge

To clarify this, consider an example: A study expects sites to screen 1 patient per month. Both Site A and Site B have failed to screen a single patient so far, but Site A has been active for 6 months, whereas Site B has only been active 1 month.

On an enrollment histogram, both sites would show up as tied at 0. However, Site A’s 0 is a lot more problematic – and predictive of future performance – than Site B’s 0. If I compare them to benchmark, then I show how many total screenings each site is below the study’s expectation: Site A is at -6, and Site B is only -1, a much clearer representation of current performance.

This graphic has the added advantage of showing how the study as a whole is doing. Comparing the total volume of positive to negative bars gives the viewer an immediate visceral sense of whether the study is above or below expectations.

The above are just 3 examples – there is a lot more that can be done with this data. What is most important is that we first stop and think about what we’re trying to communicate, and then design clear, informative, and attractive graphics to help us do that.

Friday, July 6, 2012

A placebo control is not a placebo effect

Following up on yesterday's post regarding a study of placebo-related information, it seems worthwhile to pause and expand on the difference between placebo controls and placebo effects.

The very first sentence of the study paper reflects a common, and rather muddled, belief about placebo-controlled trials:
Placebo groups are used in trials to control for placebo effects, i.e. those changes in a person's health status that result from the meaning and hope the person attributes to a procedure or event in a health care setting.
The best I can say about the above sentence is that in some (not all) trials, this accounts for some (not all) of the rationale for including a placebo group in the study design. 

There is no evidence that “meaning and hope” have any impact on HbA1C levels in patients with diabetes. The placebo effect only goes so far, and certainly doesn’t have much sway over most lab tests.  And yet we still conduct placebo-controlled trials in diabetes, and rightly so. 

To clarify, it may be helpful to break this into two parts:
  1. Most trials need a “No Treatment” arm. 
  2. Most “No Treatment” arms should be double-blind, which requires use of a placebo.
Let’s take these in order.

We need a “No Treatment” arm:
  • Where the natural progression of the disease is variable (e.g., many psychological disorders, such as depression, have ups and downs that are unrelated to treatment).  This is important if we want to measure the proportion of responders – for example, what percentage of diabetes patients got their HbA1C levels below 6.5% on a particular regimen.  We know that some patients will hit that target even without additional intervention, but we won’t know how many unless we include a control group.
  • Where the disease is self-limiting.  Given time, many conditions – the flu, allergies, etc. – tend to go away on their own.  Therefore, even an ineffective medication will look like it’s doing something if we simply test it on its own.  We need a control group to measure whether the investigational medication is actually speeding up the time to cure.
  • When we are testing the combination of an investigational medication with one or more existing therapies. We have a general sense of how well metformin will work in T2D patients, but the effect will vary from trial to trial.  So if I want to see how well my experimental therapy works when added to metformin, I’ll need a metformin-plus-placebo control arm to be able to measure the additional benefit, if any.

All of the above are especially important when the trial is selecting a group of patients with greater disease severity than average.  The process of “enriching” a trial by excluding patients with mild disease has the benefit of requiring many fewer enrolled patients to demonstrate a clinical effect.  However, it also will have a stronger tendency to exhibit “regression to the mean” for a number of patients, who will exhibit a greater than average improvement during the course of the trial.  A control group accurately measures this regression and helps us measure the true effect size.

So, why include a placebo?  Why not just have a control group of patients receiving no additional treatment?  There are compelling reasons:
  • To minimize bias in investigator assessments.  We most often think about placebo arms in relation to patient expectations, but often they are even more valuable in improving the accuracy of physician assessments.  Like all humans, physician investigators interpret evidence in light of their beliefs, and there is substantial evidence that unblinded assessments exaggerate treatment effects – we need the placebo to help maintain investigator blinding.
  • To improve patient compliance in the control arm.  If a patient is clearly not receiving an active treatment, it is often very difficult to keep him or her interested and engaged with the trial, especially if the trial requires frequent clinic visits and non-standard procedures (such as blood draws).  Retention in no-treatment trials can be much lower than in placebo-controlled trials, and if it drops low enough, the validity of any results can be thrown into question.
  • To accurately gauge adverse events.  Any problem(s) encountered are much more likely to be taken seriously – by both the patient and the investigator – if there is genuine uncertainty about whether the patient is on active treatment.  This leads to much more accurate and reliable reporting of adverse events.
In other words, even if the placebo effect didn’t exist, it would still be necessary and proper to conduct placebo-controlled trials.  The failure to separate “placebo control” from “placebo effect” yields some very muddled thinking (which was the ultimate point of my post yesterday).