Logo of halHAL - Archives Ouvertes - Home page.
J Gastrointest Surg. 2010 January; 14(1): 156–65.
Published online 2009 October 14. doi: 10.1007/s11605-009-1052-y.

Author keywords: quality of life, randomized controlled trial, surgery

MESH keywords: Digestive System Surgical Procedures, Health Status, Humans, Quality of Life, Randomized Controlled Trials as Topic, standards

This century we have witnessed significant progress in the diagnosis and treatment of disease. The effects of disease and its treatment on patients have traditionally been assessed in terms of pain scores, duration of hospital stay, and return to normal activities. These outcomes, however, are dependent much on external factors, such as on local habits and social security matters. Thus, the application of quality-of-life instruments, which measure recovery in a patient-centered manner, has become more popular in recent times and has been accepted more and more as a solid primary outcome measure in scientific studies [1].

At the present time, there is no single definition of HRQOL. Nevertheless, there is a broad consensus that it refers to the physical, psychologic, and social functioning of patients and the impact of disease and treatment on their abilities and daily functioning [25].

There are several valid measures of HRQOL that are suitable to use in surgical research. Generic measures (such as the short form health survey SF-36 [6]) broadly assess physical, mental, and social health and can be used to compare conditions and treatments. Measures specific to illnesses (such as the Gastro-Intestinal Quality of Life Index GIQLI[7]) can supplement generic measures or can be used independently [8].

Although these instruments are widely available, careful application of the tools in clinical studies is needed to produce reliable and clinically useful results. These range from the accurate selection of the most appropriate instrument for the particular trial objective, to the handling of missing data and accurate interpretation of outcomes [911]. Unless standards for measuring HRQOL are adhered to in clinical trials, the data that are collected will be difficult to interpret and unlikely to make clinical sense [11].

Previous review on randomised controlled clinical trials (RCTs) including an HRQOL evaluation in oncology have shown overall a number of methodological shortcomings [1219].

However, to date, no detailed systematic methodological review of the quality of the conduct and reporting of QOL results from RCTs for gastrointestinal surgery has appeared. Therefore, the aim of this study was to evaluate the quality of HRQOL methodologic assessment in randomized controlled clinical trials (RCTs) involving the gastrointestinal surgery, and determine how improvements can be made.

Because they are considered the optimal study design for evaluating the effects of different surgical interventions [20], we limited our search to « randomized controlled trials » and to recent articles published between January 2006 and December 2007.

Twelve journals were chosen for review: eight English-language surgical journals (American Journal of Surgery, Annals of Surgery, Archives of Surgery, Journal of the American College of Surgeons, Surgery; British Journal of Surgery, European Journal of Surgical Oncology) and four English-language medical journals (New England Journal of Medicine, Lancet, British Medical Journal, and Journal of the American Medical Association).

To identify eligible articles, all issues of these journals were hand-searched.

Studies included for review had to be randomised-controlled gastro-intestinal surgical trials, phases III published between 01/01/2006 and 31/12/2007.

All randomised-controlled trials (RCTs) comparing different treatment were eligible, regardless of the intervention type. No restrictions were performed on trial location, number of patients enrolled in the trial, treatment modalities and sponsor of trial.

The exclusion criteria were: (a) trials published as a letter, abstract, or short article; (b) randomised phase II trials; and (c) non-experimental (observational) studies.

The search was restricted to RCTs as they represent the gold standard by which health care professionals make decisions about treatment effectiveness [20].

Characteristics assessed
Two reviewers (V.B. and JJ.T.), who were not involved in any of the identified studies, analysed the identified RCTs independently. Any disagreement was resolved through discussion between the two reviewers.

As Quality of Life (QOL) was the main outcome measure sought, any studies including assessing quality of life as an end point or making some conclusion about quality of life were considered.

The standardised protocol was based on a checklist (available from the authors). The items to be included were: country of origin, industry funded (yes versus no), number of patients randomised, multicenter studies (yes versus no), informed consent reported (yes versus no), approval of a research ethics committee reported (yes versus no) and Health Related Quality of Life (HRQOL) difference between treatment arms (yes versus no). The latter was defined as any statistical difference between treatment arms at any given time point assessment during the trial (even if this only occurred in one HRQOL domain).

The selected articles were evaluated for trial quality and quality of reporting on HRQOL.

Trial quality was evaluated with the Jadad scale [21]. The maximum possible score was 13 points using an11item instrument. This was considered to be good when the score was more than 9 points and poor when the score was equal to or less than 9 points. Items related directly to the control of bias using the Jadad scale are:

  • Was the study designed as randomised?
  • Was the study designed as double blind?
  • Was there a description of withdrawals and drop outs?

Other markers not related directly to the control of bias:

  • Were the objectives of the study defined?
  • Were the outcome measures defined clearly?
  • Was there a clear description of the inclusion and exclusion criteria?
  • Was the sample size justified (for example, power calculation)?
  • Was there a clear description of the interventions?
  • Was there at least one control (comparison) group?
  • Was the method used to assess adverse effects described?
  • Were the methods of statistical analysis described?

Items are scored as follows:

  • Give either a score of 1 point for each « yes » or 0 points for each « no ». There are no inbetween marks.
  • Give 1 additional point if, for question 1, the method to generate the sequence of randomisation was described and was appropriate (table of random numbers, computer generated, etc.) and/or if, for question 2, the method of double blinding was described and was appropriate (identical placebo, active placebo, dummy, etc.).
  • Deduct 1 point if, for question 1, the method to generate the sequence of randomisation was described and was inappropriate (patients were allocated alternately, or according to date of birth, hospital number, etc.) and/or if, for question 2, the study was described as double blind but the method of blinding was inappropriate (for example, comparison of tablet versus injection with no double dummy).

The criteria used to evaluate quality of reporting on HRQOL were based on those proposed by Efficace et al [11] (table 1)

This 11-item checklist was developed on the basis of good practice in conducting a HRQOL evaluation and it was specifically aimed at evaluating the reported quality of the HRQOL assessment methodology in a clinical trial setting. The checklist items were devised to have a dichotomous answer; these can be scored as « yes » (giving a score of 1) or « no » (giving a score of 0), the higher the score the higher the considered robustness of the outcomes. This checklist addresses the basic and essential issues that a given trial should report to have methodologically sound outcomes.

The original checklist also included whether the measure covered, at least, the main HRQOL dimensions relevant for a generic cancer population. This criterion has been built into this review automatically.

Studies scoring at least seven on this checklist including three mandatory items (i.e. baseline compliance, missing data and psychometric properties reported) could be considered as probably robust’. Hence, all studies were classified into « probably robust » (as defined above), limited’ (scoring higher than three but either lower than seven or not including all three mandatory items), and very limited’ (all other studies, i.e. scoring three or lower on the checklist score).

When an article provided explicit reference to a related paper reporting additional data, this was retrieved as well. When more than one paper reported HRQOL data of the same trial, information was pooled to be reported in the tables.

Ethical aspects
In accordance with French regulations, this study was exempted from IRB approval.

The appendix list all articles reviewed.

According to the eligibility criteria, a total of 26 citations were identified that included HRQOL outcomes in 24 randomized clinical trials. Besides these, three other studies were also retrieved, but excluded from trial analyses (with the consensus of all authors). One of these studies met our criteria, but did not report any details about the methodology used to assess HRQOL and the remaining two was excluded because it was impossible to check for the HRQOL measure used.

The studies were conducted across a variety of countries: 16 (66,6%) in European countries, two (8,3%) in the USA, two (8,3%) in Asia, one (4,2%) in Australia, one (4,2%) in Burkina Faso. and two (8,3%) were conducted on an international setting.

Eight (33,3 %) of the 24 studies were industry sponsored, as identified by author affiliation with a company or by a statement regarding commercial funding.

Half of the trials were multicenter studies.

The number of patients enrolled into the trials varied considerably, ranging from 27 to 700 patients, with a total of 3476 patients.

A total of 23 trial reports (95.8%) stated that a research ethics committee had approved the research and reported that informed consent from patients had been requested from the participants.

HRQOL assessment
The 24 RCTs identified were classified according to the predefined checklist. One of these could be considered as very limited in terms of methodological design according to previously defined criteria (4%). Twelve trials (50%) were considered limited while eleven (45,8%) where evaluated as being probably robust. The overall level of reporting is provided in table 3.

14 distinct QOL questionnaires (5 generic, 9 specific) were used in the 24 studies analyzed. The most frequently used instrument (in conjunction or not with other tools) was the Short Form 36-item questionnaire in 13 trials, and the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire C30 in four trials.

In two studies SF-36 was administered in a modified version, attempting to give a more comprehensive HRQOL assessment but altering the psychometric properties of the original tool.

45 % (11/24) of the reports described the questionnaires, i.e. the number of dimensions and their contents, such as the number of items per dimension, and the minimal and maximal scores. Only the name of the questionnaire was given in 13 others.

QOL was a primary endpoint in only six (25%) studies. QOL was a secondary endpoint in 75% (18/24) of the analyzed reports. In two of them, trial results were published in several different articles, one relating the QOL findings and the other(s) the clinical outcomes. Only five (20,8 %) studies reported the use of power calculations for HRQOL aspects.

Only ten studies (41,6%) reported a priori hypotheses and only five studies (20,8%) provided a rationale for selecting a HRQOL measure.

Information about the administration of the HRQOL questionnaire was mentioned in 13 (54,1%) reports.

Three RCTs (12,5%) did not provide the absolute number or the percentage of patients who completed the questionnaire before commencing the trial.

Methods of health-related quality of life analysis and results
The response rate for quality of life end points was given in 19 of the studies, with response rates ranging from 14,2 % to 100%.

Nine RCTs (37,5%) did not provide any details about HRQOL missing data during the course of the trial. Furthermore, in those trials where an indication of HRQOL missing data was provided, only one trial undertook a detailed statistical exploration of the biases due to missing data. The remaining studies did not investigate this issue.

Reporting the level of missing data and the reasons why the data is missing (ie, random or systematic) are factors critical to understanding any possible source of bias in determining HRQOL significance

However, all studies provided details about the timing of HRQOL assessment and 14 (58,3%) discussed somehow the HRQOL outcomes in the paper.

All RCTs, with the exception of one, applied a statistical test for determining a HRQOL difference between treatment arms.

Of the 23 eligible studies, 11 (47,8%) found some significant difference on HRQOL scales between arms.

Obtaining a statistical difference in terms of HRQOL outcome does not necessarily imply a clinically meaningful difference from a patients’ perspective (24). But only six (25%) discussed the related HRQOL outcomes in terms of clinical significance from a patient’s perspective. This issue is closely related to the difficulty in interpreting the HRQOL data from a given measure.

Inadequate reporting of randomised controlled trials is common and hampers the appraisal of the validity and generalisability of results [22, 23]. To overcome such problems, the Consolidated Standards for reporting of Trials (CONSORT) Group developed the CONSORT statement [24] in 1996, which was followed by a revised version in 2001[25].

This can explain the high level of the mean Jadad score found in our study.

The main objective of this article was to evaluate the methodological quality of RCTs with a HRQOL component in gastrointestinal surgery.

Using the stated selection and eligibility criteria, we found 24 RCTs with HRQOL assessment which included some 3476 patients. HRQOL was a secondary end point in most trials (75%).

The aim of our study was not to compare results with other medical or surgical specialities but to describe how digestive surgeons use HRQOL in their trials and to find how to improve. However, compared with other HRQOL studies in other disease sites and treatments [1219, 26, 27] (table 4) the overall quality of the reported trials is good, although there are a number of shortcomings with regard to the reporting of the HRQOL design and results.

Among the studies reviewed, there is generally poor details about the rationale for selecting a specific measure and instrument administration.

A justification for selecting the HRQOL measure was given in only five studies (20,8%). In others reviews this justification was given in 9,7 to 90% of the trials (table 4). This point has to be improved as well in surgery as for others specialities.

This is regarded as important because instrument selection is critical for reliability, validity, and reproducible results.

It is also important to standardize the instructions and completion procedures of assessment material administered to patients, particularly in RCTs, because often many researchers and institutions are involved in collecting HRQOL data. Thus, standard procedures help to ensure adequate data quality and minimize any possible bias in data collection. (For example, will patients answer the questionnaire at a clinic visit, by telephone, by mail? Is it to be done by the investigator, his colleagues, or a research nurse?)

Our study was in accordance with previous studies (table 4) where instrument administration was reported in 0 to 66,6%.

A major methodological drawback was a lack of a priori hypothesis about possible HRQOL changes before commencing the trial. Only ten RCTs (41,6%) explicitly stated an a priori hypothesis thus limiting spurious HRQOL results due to multiple significance testing. This results was similar and even better to the majority of previous study where the a priori hypothesis was stated in 13 to 72,7% of the trials (table 4).

A key consideration for future studies is the selection of a limited number of HRQOL indicators before commencing the trial, possibly basing this selection on previous related trials, or on a specific a priori research hypothesis about the impact of a given therapy.

One major issue is the reporting of compliance at baseline and the documentation of missing data. Three RCTs (12,5 %) did not provide the absolute number or the percentage of patients who completed the questionnaire before commencing the trial and 9 (37,5%) did not provide any details about HRQOL missing data during the course of the trial.

This result was similar to previous studies where missing data were documented in only 48,4 to 74,8% (table 4).

Although the majority of trials started with reasonable sample sizes, many were plagued with problems of patient drop-out. Such attrition often limits the general robustness of the results and reduces confidence in the HRQOL conclusions. Data are generally not missing at random and therefore bias can be introduced [17, 2830]. The benefit of a intervention may be overestimated by comparison of group means as only individuals who remain well enough to fill in questionnaires provide data. The unreported details of missing data is a frequent problem in studies where HRQOL is measured [31] and previous works already proposed procedures to address this issue [32, 33]. More attention to improving compliance and reporting in future studies would be valuable.

Of the 23 eligible studies, 11 (47,8%) found some significant difference on HRQOL scales between arms. This would indicate that the HRQOL measures are valuable in providing additional data.

However, although HRQOL differences were observed, it is necessary to remember that, whereas many subscales are often used and compared over treatments and time, not all subscales will show a significant difference.

This underlines the need to declare in advance the HRQOL hypotheses and the importance of careful interpretation of multiple repeated statistical analyses.

A further trap in analysis of HRQOL data is the difference between statistical and clinical significance in changes of scores. It is acknowledged that, although analysis of large samples may reveal small changes that seem to be statistically significant, these changes may not be clinically meaningful to the patient and are, therefore, of limited value to the improvement of patient care.

An effort to determine if such small numerical differences have a clinical meaning from a patient’s perspective has been highlighted as an important aspect for determining the impact of a given treatment [28, 34].

Unfortunately, only six of these studies (26%) examined the clinical significance of apparent differences. It is highly desirable that future studies will routinely include the concept of clinical significance to help evaluate the value of HRQOL results.

Furthermore, in several RCTs, the HRQL results were not formally presented, but the main results were described in the text.

It is possible that this occurred because most trials used HRQOL as a secondary end point. In such trials, it is frequently observed that limited space is given to HRQOL data, with priority given to the primary clinical end point.

Two authors have overcome this difficulty by separately reporting clinical and HRQL trial outcomes.

This is an opportunity for adequate explanation and presentation of what may often be complex results.

However the disadvantage of splitting the HRQL data from the main trial paper is that surgeons are unlikely to read the HRQL paper once the main clinical message of a particularly trial has been published. If this occurs, then during the process of clinical decision making the HRQL impacts of treatment may be overlooked [19]. It is therefore recommended that clinical and HRQL outcomes are published together so that clinical decision making is based upon relevant patient-centered endpoints.

Whilst we identified the above reported methodological limitations, it was impressive that nearly all the studies used HRQOL valid measures and provided details on the HRQOL timing of assessment during the trial.

There were eleven trials (45,8%) with robust HRQL design, and statistically significant differences in HRQL were reported in six of these trials. Only one trial « very limited » (4%) could be invalidated by its lack of rigor in presenting HRQOL data.

Pertinently, the strict methodological approach to the assessment of the patient-based QOL criteria in the evaluation of therapeutic strategies can help, patients and their doctors, in medical decision making.

The Efficace’s checklist can be considered as a minimum standard; however, HRQOL design also greatly depends on the context and the specific research question of the trial; hence good reports may have different emphases and some issues might have different relevance according to the specific study questions [11].

If HRQL is considered to be a relevant outcome in a clinical trial and if HRQL is assessed robustly, then it will always contribute to clinical decision making regardless of the direction of the outcomes [11, 26]. Only if HRQL assessments are flawed (underpowered study, too many missing data items, invalid questionnaires) may they not contribute to clinical decision making.

Our study showed some limitations. We selected randomised controlled trials (RCTs) from 12 journals. This restrictive choice was led by the recognised quality of the four medical journals selected (leading journals that publish research reports in all fields and have a broad readership) and because the eight surgical journals comprised a good sample of surgery around the world. The purpose of this choice was to create a homogeneous group of publications and conditions that allowed standardised analysis. This arbitrary choice may have introduced a bias causing overestimation of the quality of the RCTs analysed. In addition, authors of original articles were not pursued for additional data or for clarification of points that were unclear about trial methodology.

We recognise also that our review is limited by its restriction to RCTs, but the checklist developed by Efficace and al [11]was originally devised only for this type of design. It could be interesting to develop an applicable and useful checklist for not randomized studies.

Despite these potential limitations, this paper suggests that surgeons are interested in HRQL outcomes and that HRQOL assessment in RCT settings has the potential to provide invaluable data for developing new treatments in gastrointestinal surgery.