Department of Anesthesiology and Critical Care Medicine, Hôpital Européen Georges Pompidou, Université Paris-Descartes Sorbonne Paris Cité, Paris, France

Department of Informatics and Biostatistics, Unité Inserm UMR S717, Hôpital Saint Louis, Université Paris 7 Diderot, Paris, France

Abstract

Background

As a result of reporting bias, or frauds, false or misunderstood findings may represent the majority of published research claims. This article provides simple methods that might help to appraise the quality of the reporting of randomized, controlled trials (RCT).

Methods

This evaluation roadmap proposed herein relies on four steps: evaluation of the distribution of the reported variables; evaluation of the distribution of the reported

Results

Despite obvious nonnormal distributions, several variables are presented as if they were normally distributed. The set of 16 ^{-6}, reported

Conclusions

Such simple evaluation methods might offer some

Background

In a world where medicine is supposed to be based on evidence, the question of how evidence-based the published results should be appraised and translated into clinical practice is of crucial importance

In an ideal world, major scientific journals should ask the researchers to provide their raw data to allow an external verification of the results

Methods

We first provide a concise description of the methods previously proposed

Variable distribution

In many papers, data are reported as if they were normally distributed, while usually, the authors only assume that the data are normally distributed.

Hence, it is of interest to analyze the distribution of reported variables.

Baseline covariate distribution between groups

Statistical testing should be avoided when evaluating covariate balance, because usual tests are not designed to accept the null hypothesis. However, most published manuscripts report such statistical tests. Analyzing the results of such statistical tests could help to detect poor quality data.

P values distribution

If the randomization was adequately performed, baseline characteristics distribution should be balanced between the two randomized groups. Under such a null hypothesis (i.e., the two groups have similar baseline characteristics), the

Explicit p value computations

Based on the reported summary statistics (means and standard deviations for instance), one also could compute the

**Formulas.**

Click here for file

Parametric bootstrap

The parametric bootstrap

It should be emphasized that such an approach does not aim to provide new inference but only to detect some potential inconsistencies. Moreover, variable simulation from reported means and standard deviations rely on an assumption concerning the underlying distribution. Because some variables are obviously not normally distributed, it could be of interest to compare simulations under normal distribution to simulations obtained under alternate distributions. Finally, simulating 10,000 datasets is not the same as “redoing 10,000 times the same clinical trial, with the same sample size.” In such simulations, we assume observed means and standard deviations are the true population parameters, while they are in fact the observed parameters of a random sample drawn from the underlying population.

Illustrative example

To illustrate these methods, we selected the data from a randomized study published in 2009

**Table S1 and 4 from the illustrative paper [**
**].**

Click here for file

We identified 19 tabulated continuous variables supposed to be normally distributed and for which a mean and a standard deviation were reported.

Results

Variables distribution

First, variables, such as duration of anesthesia, cardiopulmonary bypass (CPB), cross-clamp, and intubation, were presented using mean and standard deviation, whereas they are usually not normally distributed (Additional file

Illustration of the checking procedure

**Illustration of the checking procedure. A** Variable distribution. The variable FFP during surgery is described with a mean of 60 and a SD of 210. As shown in the left panel, if this variable was normally distributed, it should exhibit some negative values. Because negative values are impossible for such a variable, its distribution is necessarily asymmetric (right panel: example of a strictly positive variable characterized by a large SD). **B****C** Distribution of the simulated

Under the null hypothesis of no intergroup difference after adequate randomization, the set of 16

Explicit computations

Based on the reported means and standard deviations in each randomized groups reported in the Table

Urine output at 5 hours: ^{-6} (comparison reported as nonsignificant by the authors)

PRBC during surgery:

Parametric bootstrap

We limited the simulations to those variables where (1) there seemed to be some clinical differences between the groups, but the authors reported no statistical difference, (2) the authors reported a statistically significant difference but such a difference did not seem to be of clinical relevance, or (3) the normal distribution assumption seemed to be violated. The results of the simulation are given in Table

**
p
**

**Variable**

**Normal distribution**

**Lognormal distributions**

**distribution**

**distributions**

Number of

*Statistically significant comparison according to the authors.

**Colloids 5 hr after surgery**

9,113/10,000

8,969/10,000

**Urine output 5 hr after**

9,995/10,000

9,985/10,000

**surgery**

**PRBC volume**

**During surgery***

3,920/10,000

5,371/10,000

**5 hr after surgery***

5,114/10,000

6,601/10,000

**Until first POD***

5,533/10,000

7,133/10,000

**Until second POD***

4,633/10,000

5,742/10,000

**FFP**

**during surgery**

879/10,000

2,133/10,000

**5 hr after surgery***

4,377/10,000

8,412/10,000

**Until first POD***

5,880/10,000

9,446/10,000

**Until second POD***

5,874/10,000

9,424/10,000

Discussion

We proposed a critical appraisal of the results of randomized control trials based on a multisteps procedure (Figure

If absent, such warning signals do not exclude the presence of publication bias. Moreover, careful analysis might still fail to address other sources of bias, such as selective reporting (e.g., reporting of observations that have reached significance only) and publication bias (e.g., selective, faster, and more prominent publication of research findings that produced unusual, very strong, highly significant results)

To illustrate our four-step procedure, we have applied it to analyze the data reported by Boldt et al. in an article that has been recently retracted for fraud

This series of analyses have some limitations. First, these analyses alone may not enable to discriminate between low- and high-quality data, because there is variety of sources of bias that may not be explored. However, in the context of medical diagnosis, we usually oppose screening vs. diagnostic tools. We ask a diagnostic tool to be very specific, while we essentially ask a screening tool to be sensitive. The objective is to offer to the patients a strategy based on a

Conclusions

There is increasing concern that in modern research, false findings may be the majority of published research claims

Key messages

– Poor-quality evidence and fraud are upcoming concerns in medical research

– Reporting guidelines should be strictly followed and imposed by medical journals

– Guidelines should be provided to reviewers to offer homogeneous evaluation of some reporting key points

– Similar simple screening tools should be available for the readers.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

RP conceived and performed the analysis, wrote the manuscript, MRR, SC participated to manuscript writing, DJ participated to study design and to manuscript writing. All authors read and approved the final manuscript.