Institut Curie, Ecole des Mines de Paris, INSERM U900, Paris, FRANCE

INSERM U717, Hôpital Saint-Louis, Paris, FRANCE

Abstract

Background

Log-linear association models have been extensively used to investigate the pattern of agreement between ordinal ratings. In 2007, log-linear non-uniform association models were introduced to estimate, from a cross-classification of two independent raters using an ordinal scale, varying degrees of distinguishability between distant and adjacent categories of the scale.

Methods

In this paper, a simple method based on simulations was proposed to estimate the power of non-uniform association models to detect heterogeneities across distinguishabilities between adjacent categories of an ordinal scale, illustrating some possible scale defects.

Results

Different scenarios of distinguishability patterns were investigated, as well as different scenarios of marginal heterogeneity within rater. For sample size of N = 50, the probabilities of detecting heterogeneities within the tables are lower than .80, whatever the number of categories. In additition, even for large samples, marginal heterogeneities within raters led to a decrease in power estimates.

Conclusion

This paper provided some issues about how many objects had to be classified by two independent observers (or by the same observer at two different times) to be able to detect a given scale structure defect. Our results also highlighted the importance of marginal homogeneity within raters, to ensure optimal power when using non-uniform association models.

Background

Initially developped in psychometrics to assess the severity of behavioral troubles or disturbances

When designing a reproducibility study with two observers (or one observer at two different times) assessing the same objects on an ORS, two major questions have to be solved: How many objects has to be classified by the two observers to be able to detect a given heterogeneous pattern of distinguishability between adjacent categories? Is it important to select these objects in an attempt to approximate some marginal distributions? In this study, simulations were used to estimate the power of non-uniform association models to detect heterogeneities across distinguishabilities between adjacent categories as a function of typical distinguishability patterns and total number of objects classified, assuming homogeneous marginal distribution within reader and between readers. Then, for the same numbers of objects classified twice, the influence of different patterns of marginal heterogeneity within reader on power estimate was studied.

Methods

Log-linear non-uniform association models

Log-linear modelling and parameters interpretation

Classifications of _{ij }
_{ij }
_{
ij
}, where _{ij }
_{ij }

where

When analyzing agreement in ordered contingency table, we can usually expect an association between ratings due to the natural ordering of the scale. As described by several authors _{ij }
_{ii}m_{jj}
_{ij}m_{ji}
_{
ij
}, Darroch and McCloud defined

Uniform Association (UA) and Non-Uniform Association (NUA) models

In order to take into account this association, Goodman introduced the uniform association (UA) model. In 2007, Valet

where

For this model, DDs are written as:

illustrating the possible DDs variations between categories, even between adjacent ones. NUA models are a generalization of UA models. Indeed, UA model is a particular case of a NUA model where parameters _{k}
_{
k+1 }are all equal (do not depend on

Power estimation of tests in NUA models

To investigate the ability of NUA models to detect heterogeneities within the DDs between adjacent categories, a simple method was proposed to simulate ordered contingency tables resulting from the use of ORS having different patterns of distinguishability between their adjacent categories. Hereafter, tests were defined for a null hypothesis _{0 }corresponding to the UA model defined by equation (2), and alternative hypotheses _{1 }corresponding to NUA models defined by equation (3). Different scenarios of DDs heterogeneity were proposed to illustrate different typical scale structures. In all situations, marginal homogeneity between readers was assumed, which can be expressed as:

Simulation of I × I contingency tables from the NUA models

The total sample size _{ij }
_{
ij
},_{
ij
}were defined, using equation (3), as a function of the parameters of the NUA model:

When _{
k, k+1 }(_{ij }
_{i }

The first set of equations of the system defined by (6) allows us to control the marginal probabilities distribution during simulations, i.e. to control marginal probabilities

Many different scenarios of distinguishability patterns can be simulated, using different sets of {_{
k,k+1}; _{
k, k+1 }are equal) and NUA models with all possible combinations of association parameters, i.e. to test all possible equalities between association parameters. For example, testing equality of exactly

Definition of alternative hypotheses

For simplicity, we will consider hereafter contingency tables resulting from the use of ORS with

Examples of association parameters and distinguishability patterns between adjacent categories from NUA models in a 5 × 5 contingency table

**Hypothesis**

**Association parameters**

**Distinguishability patterns**

**
H
_{0}
**

**All association parameters are equal**

_{1,2 }= _{2,3 }= _{3,4 }= _{4,5 }=

1 ---- 2 ---- 3 ---- 4 ---- 5

**1 association parameter is different**

_{1,2 }≠ _{2,3 }= _{3,4 }= _{4,5 }=

1 - 2 ---- 3 ---- 4 ---- 5

1-------- 2 - 3 - 4 ---- 5

_{2,3 }≠ _{1,2 }= _{3,4 }= _{4,5 }=

1 ---- 2 - 3 ---- 4 ---- 5

1 -- 2-------- 3 - 4 -- 5

_{3,4 }≠ _{1,2 }= _{2,3 }= _{4,5 }=

1------ 2 -- 3 - 4 -- 5

1 -- 2 - 3 ------ 4 - 5

_{4,5 }≠ _{1,2 }= _{2,3 }= _{3,4 }=

1 ---- 2 ---- 3 ---- 4 - 5

1 - 2 -- 3 - 4 ---------- 5

**2 association parameters are different**

_{1,2 }= _{2,3 }≠ _{3,4 }= _{4,5 }=

1-2 - 3------4------5

1------2------ 3-4-5

_{1,2 }= _{4,3 }≠ _{2,3 }= _{3,4 }=

1--2---- 3---- 4--5

1 ---- 2 - 3 - 4 ---- 5

**All association parameters are different**

_{1,2 }≠ _{2,3 }≠ _{3,4 }≠ _{4,5}

1 - 2 ---- 3 -- 4 ------ 5

*Distinguishabilities values between two categories are proportionnal to number of dashed-lines between these two categories

Symmetric hypotheses in association parameters:

From the UA model where all association parameters are equal (_{0 }hypothesis), a different value just for one association parameter (

Distribution of marginal probabilities

In addition to the different sets of distinguishabilities values, i.e. different sets {_{
k,k+1};

Sets of marginal theoretical probabilities in a 5 × 5 contingency table used in our simulations

**Description**

.20

.20

.20

.20

.20

Homogeneous distribution

.05

.24

.24

.24

.23

Few counts in first category

.24

.05

.24

.24

.23

Few counts in intermediate category

.24

.24

.05

.24

.23

Few counts in central category

.05

.30

.30

.30

.05

Few counts in extreme categories

.05

.05

.30

.30

.30

Few counts in the first two adjacent categories

.05

.15

.40

.30

.10

Heterogeneous distribution

Power and Type I error estimation

For each specific set of {_{
k, k+1}; _{i }
_{ij }
_{
k, k+1}; _{i}
_{1,2 }= _{2,3 }= _{3,4 }= _{4,5 }= log(3) was chosen, corresponding to similar association between adjacent ratings (_{1,2 }= _{2,3 }= _{3,4 }= _{4,5 }= 3) and hence similar DDs between all adjacent categories. To account for different null hypotheses, we also proposed a common value of _{1,2 }= _{2,3 }= _{3,4 }= _{4,5 }
_{1,2 }= _{2,3 }= _{3,4 }= _{4,5 }= _{0}) and NUA models defined by _{1 }were calculated. As proposed by several authors ^{2 }likelihood ratio-statistic was used to compare these two models. Indeed, we used the difference statistics _{UA }
_{NUA }

Results

All simulations and power estimations were performed using R software _{
k, k+1}; _{1,2 }≠ _{2,3 }= _{3,4 }= _{4,5},

Power estimates of tests with alternative hypotheses given by _{1,2 }≠ _{2,3 }= _{3,4 }= _{4,5 }= _{1,2 }= _{2,3 }≠ _{3,4 }= _{4,5 }= _{1,2 }= _{4,5 }≠ _{2,3 }= _{3,4 }=

**Power estimates of tests with alternative hypotheses given by **. Marginal probabilities are given by

It is clear that table _{1,2 }= _{2,3 }= 2.25. From power estimates corresponding to _{1,2 }= _{2,3 }= 2.20 (namely .32, .53, .69, .81 and .89) and those corresponding to _{1,2 }= _{2,3 }= 2.30 (namely .35, .57, .76, .87 and .92), we can interpolate those corresponding to 2.25 = 2.20 + (2.30 - 2.20)/2 as (0.32 + (0.35 - 0.32)/2,..., 0.89 + (.92 - .89)/2. The corresponding new values are then equal to .34, .55, .73, .84 and .91 respectively for

Power estimates of tests in a 5 × 5 table, as a function of

**50**

**100**

**150**

**200**

**250**

**50**

**100**

**150**

**200**

**250**

_{12}

OR

DD

**a**.

**d**.

.00

1

.00

.34

.57

.74

**.85**

**.92**

.21

.30

.43

.54

.63

.69

2

.50

.10

.12

.16

.19

.21

.10

.10

.10

.11

.13

1.10

3

.67

.07

.06

.06

.05

.05

.09

.10

.07

.06

.06

1.39

4

.75

.08

.08

.10

.11

.12

.11

.13

.11

.10

.10

1.61

5

.80

.11

.14

.17

.21

.26

.14

.17

.16

.16

.18

1.79

6

.83

.15

.20

.27

.35

.42

.15

.21

.21

.23

.25

1.95

7

.86

.18

.27

.38

.46

.55

.17

.25

.26

.29

.33

2.08

8

.87

.22

.34

.45

.56

.67

.18

.29

.31

.34

.40

2.20

9

.88

.24

.39

.54

.66

.76

.20

.30

.35

.40

.48

2.30

10

.90

.28

.43

.60

.73

**.82**

.21

.35

.40

.44

.52

2.48

12

.92

.33

.53

.70

**.83**

**.89**

.23

.39

.45

.52

.61

2.64

14

.93

.38

.61

.78

**.89**

**.95**

.26

.43

.52

.58

.67

2.77

16

.94

.42

.67

**.83**

**.92**

**.97**

.26

.47

.56

.63

.73

_{12}, _{23}

OR

DD

**b**.

**e**.

.00

1

.00

.73

**.95**

**.99**

**1**

**1**

.59

**.84**

**.96**

**.99**

**1**

.69

2

.50

.14

.21

.28

.35

.42

.13

.17

.24

.29

.37

1.10

3

.67

.06

.06

.05

.05

.05

.09

.07

.06

.06

.06

1.39

4

.75

.09

.10

.13

.15

.18

.12

.13

.14

.15

.17

1.61

5

.80

.13

.19

.26

.33

.39

.17

.20

.26

.31

.36

1.79

6

.83

.19

.29

.40

.50

.59

.21

.27

.36

.43

.51

1.95

7

.86

.22

.37

.51

.64

.74

.26

.34

.46

.55

.66

2.08

8

.87

.27

.47

.62

.74

**.82**

.28

.39

.54

.64

.74

2.20

9

.88

.32

.53

.69

**.81**

**.89**

.32

.45

.61

.71

.**81**

2.30

10

.90

.35

.57

.76

**.87**

**.92**

.35

.49

.66

.77

**.85**

2.48

12

.92

.41

.67

**.84**

**.92**

**.97**

.40

.56

.74

**.84**

**.92**

2.64

14

.93

.46

.74

**.89**

**.96**

**.99**

.44

.61

.79

**.88**

**.95**

2.77

16

.94

.50

.79

**.93**

**.97**

**.99**

.46

.66

**.84**

**.92**

**.97**

_{12}, _{45}

OR

DD

**c**.

**f**.

.00

1

.00

.37

.64

**.80**

**.90**

**.95**

.18

.32

.45

.57

.67

.69

2

.50

.10

.13

.17

.21

.25

.09

.10

.11

.13

.15

1.10

3

.67

.06

.06

.05

.05

.05

.08

.06

.06

.05

.05

1.39

4

.75

.08

.08

.10

.12

.14

.09

.09

.09

.09

.10

1.61

5

.80

.11

.16

.21

.27

.31

.12

.13

.16

.18

.22

1.79

6

.83

.15

.24

.33

.42

.49

.15

.19

.24

.30

.35

1.95

7

.86

.20

.32

.44

.55

.66

.18

.25

.32

.41

.47

2.08

8

.87

.25

.41

.55

.67

.77

.21

.30

.40

.50

.59

2.20

9

.88

.28

.47

.63

.76

**.84**

.23

.35

.47

.58

.67

2.30

10

.90

.32

.53

.71

**.82**

**.90**

.27

.40

.53

.65

.74

2.48

12

.92

.38

.64

**.81**

**.90**

**.95**

.32

.18

.64

.76

**.84**

2.64

14

.93

.45

.71

**.87**

**.95**

**.98**

.37

.56

.73

**.84**

**.90**

2.77

16

.94

.49

.77

**.91**

**.97**

**.99**

.40

.61

.78

**.89**

**.94**

In a similar way, tables

Power estimates of tests in a 5 × 5 table, as a function of

**50**

**100**

**150**

**200**

**250**

**
**

**50**

**100**

**150**

**200**

**250**

_{12}

OR

DD

**a**.

**d**.

.00

1

.00

.22

.38

.51

.59

.72

.14

.16

.21

.26

.32

.69

2

.50

.06

.05

.05

.05

.06

.07

.07

.07

.06

.06

1.10

3

.67

.11

.13

.17

.22

.26

.10

.10

.10

.10

.14

1.39

4

.75

.16

.25

.41

.48

.57

.13

.15

.21

.25

.28

1.61

5

.80

.26

.41

.56

.67

.79

.17

.22

.29

.36

.43

1.79

6

.83

.33

.52

.70

**.82**

**.88**

.22

.30

.39

.47

.55

1.95

7

.86

.38

.63

.78

**.92**

**.94**

.25

.34

.46

.57

.66

2.08

8

.87

.43

.72

**.85**

**.94**

**.98**

.30

.41

.56

.62

.74

2.20

9

.88

.47

.76

**.90**

**.97**

**.99**

.34

.43

.58

.70

.78

2.30

10

.90

.52

.79

**.94**

**.98**

**.99**

.38

.51

.67

.74

**.86**

2.48

12

.92

.58

**.85**

**.96**

**.99**

**1**

.39

.55

.71

**.81**

**.91**

2.64

14

.93

.64

**.90**

**.97**

**1**

**1**

.41

.58

.78

**.86**

**.93**

2.77

16

.94

.69

**.95**

**.99**

**1**

**1**

.46

.62

**.84**

**.88**

**.97**

_{12}, _{23}

OR

DD

**b**.

**e**.

.00

1

.00

.43

.74

**.87**

**.96**

**.97**

.34

.52

.73

**.86**

**.92**

.69

2

.50

.06

.06

.05

.05

.06

.08

.07

.06

.06

.05

1.10

3

.67

.12

.20

.28

.37

.44

.16

.19

.28

.35

.40

1.39

4

.75

.24

.41

.57

.66

.78

.28

.41

.54

.62

.74

1.61

5

.80

.34

.57

.78

**.86**

**.93**

.36

.52

.71

**.81**

**.89**

1.79

6

.83

.42

.69

**.86**

**.95**

**.97**

.40

.62

**.81**

**.89**

**.94**

1.95

7

.86

.51

.78

**.90**

**.97**

**.98**

.48

.71

**.89**

**.94**

**.98**

2.08

8

.87

.53

**.85**

**.95**

**.98**

**1**

.54

.76

**.93**

**.96**

**.99**

2.20

9

.88

.62

**.88**

**.97**

**1**

**1**

.55

**.80**

**.94**

**.97**

**.99**

2.30

10

.90

.64

**.90**

**.97**

**1**

**1**

.57

**.85**

**.96**

**.98**

**1**

2.48

12

.92

.69

**.93**

**.99**

**1**

**1**

.65

**.86**

**.98**

**.99**

**1**

2.64

14

.93

.73

**.96**

**.99**

**1**

**1**

.67

**.87**

**.97**

**1**

**1**

2.77

16

.94

.77

**.98**

**.99**

**1**

**1**

.71

**.93**

**.99**

**1**

**1**

_{12}, _{45}

OR

DD

**c**.

**f**.

.00

1

.00

.20

.37

.52

.63

.74

.10

.16

.27

.32

.37

.69

2

.50

.07

.05

.05

.05

.04

.06

.07

.05

.05

.05

1.10

3

.67

.11

.12

.16

.23

.26

.10

.10

.13

.14

.18

1.39

4

.75

.19

.32

.44

.52

.61

.15

.21

.28

.34

.38

1.61

5

.80

.26

.46

.62

.74

**.82**

.21

.32

.42

.52

.60

1.79

6

.83

.35

.58

.75

**.88**

**.94**

.25

.42

.54

.71

.77

1.95

7

.86

.45

.68

**.85**

**.93**

**.97**

.31

.49

.65

.79

**.84**

2.08

8

.87

.44

.74

**.92**

**.96**

**1**

.36

.55

.73

**.86**

**.91**

2.20

9

.88

.55

**.84**

**.94**

**.98**

**.99**

.41

.61

.78

**.89**

**.94**

2.30

10

.90

.58

**.87**

**.96**

**.99**

**1**

.47

.67

**.85**

**.92**

**.96**

2.48

12

.92

.66

**.91**

**.99**

**.99**

**1**

.52

.77

**.91**

**.97**

**.98**

2.64

14

.93

.70

**.94**

**.98**

**1**

**1**

.53

**.82**

**.93**

**.98**

**1**

2.77

16

.94

.74

**.95**

**.99**

**1**

**1**

.58

**.85**

**.96**

**.98**

**1**

Power estimates of tests in a 5 × 5 table, as a function of

**50**

**100**

**150**

**200**

**250**

**
**

**50**

**100**

**150**

**200**

**250**

_{12}

OR

DD

**a**.

**d**.

.00

1

.00

.40

.66

**.82**

**.92**

**.97**

.21

.31

.43

.50

.65

.69

2

.50

.16

.22

.31

.35

.45

.11

.12

.16

.18

.24

1.10

3

.67

.07

.09

.08

.12

.12

.09

.08

.07

.08

.08

1.39

4

.75

.06

.05

.05

.04

.06

.08

.07

.07

.06

.05

1.61

5

.80

.08

.06

.06

.08

.08

.08

.08

.08

.07

.06

1.79

6

.83

.09

.10

.12

.14

.16

.07

.09

.10

.10

.11

1.95

7

.86

.11

.11

.17

.22

.25

.12

.12

.12

.13

.15

2.08

8

.87

.13

.17

.24

.31

.36

.11

.14

.15

.18

.23

2.20

9

.88

.15

.21

.30

.37

.45

.13

.17

.20

.23

.27

2.30

10

.90

.18

.24

.34

.43

.52

.14

.19

.25

.24

.30

2.48

12

.92

.21

.35

.45

.58

.67

.19

.21

.31

.32

.40

2.64

14

.93

.24

.38

.53

.66

.77

.19

.26

.33

.42

.49

2.77

16

.94

.29

.46

.61

.76

**.84**

.20

.28

.39

.45

.55

_{12}, _{23}

OR

DD

**b**.

**e**.

.00

1

.00

**.85**

**.99**

**1**

**1**

**1**

.71

**.92**

**.99**

**1**

**1**

.69

2

.50

.23

.43

.60

.67

.76

.22

.34

.50

.61

.72

1.10

3

.67

.09

.12

.13

.15

.18

.11

.10

.11

.11

.15

1.39

4

.75

.05

.05

.05

.06

.05

.07

.06

.08

.06

.05

1.61

5

.80

.07

.08

.11

.10

.10

.11

.08

.10

.10

.11

1.79

6

.83

.11

.11

.15

.21

.22

.16

.13

.17

.17

.20

1.95

7

.86

.14

.18

.25

.28

.35

.16

.19

.21

.29

.30

2.08

8

.87

.14

.22

.31

.40

.45

.18

.23

.27

.30

.42

2.20

9

.88

.17

.29

.41

.50

.57

.23

.26

.36

.39

.49

2.30

10

.90

.23

.33

.49

.56

.69

.23

.30

.40

.46

.53

2.48

12

.92

.25

.41

.59

.73

**.81**

.28

.33

.49

.57

.67

2.64

14

.93

.29

.51

.67

.79

**.86**

.32

.42

.55

.65

.76

2.77

16

.94

.30

.56

.75

**.86**

**.92**

.35

.47

.60

.71

**.81**

_{12}, _{45}

OR

DD

**c**.

**f**.

.00

1

.00

.45

.77

**.90**

**.97**

**.99**

.26

.47

.60

.71

**.82**

.69

2

.50

.14

.26

.36

.44

.50

.11

.15

.21

.25

.32

1.10

3

.67

.08

.09

.10

.13

.12

.08

.09

.08

.09

.10

1.39

4

.75

.05

.06

.05

.05

.06

.08

.06

.06

.06

.06

1.61

5

.80

.06

.07

.08

.07

.09

.10

.08

.09

.08

.07

1.79

6

.83

.08

.12

.14

.16

.21

.12

.11

.12

.14

.14

1.95

7

.86

.12

.15

.20

.24

.33

.14

.14

.17

.19

.23

2.08

8

.87

.13

.22

.28

.38

.44

.14

.17

.22

.28

.33

2.20

9

.88

.20

.25

.36

.47

.56

.17

.20

.27

.35

.41

2.30

10

.90

.20

.31

.46

.55

.66

.17

.22

.33

.41

.49

2.48

12

.92

.26

.38

.54

.67

**.80**

.23

.31

.43

.50

.60

2.64

14

.93

.28

.50

.64

.77

**.87**

.26

.42

.51

.63

.73

2.77

16

.94

.35

.55

.75

**.87**

**.90**

.29

.45

.61

.70

.79

**Power estimates of tests in a 4 × 4 table, as a function of N, with three different alternative hypotheseses
** Estimates greater than 80% are in bold. This table provided in the case of 4 × 4 contingency tables, power estimates, with three different alternative hypotheses and considering homogeneous (left column) and heterogeneous (right column) marginal theoretical distributions.

Click here for file

Discussion

Results given by Figure

In our simulations of contingency tables resulting from cross-classifications of the same objects twice on an ordinal rating scale, the assumption of marginal homogeneity between readers was assumed, which can be seen as a limiting constraint. However, as described by the authors

For each simulations, the algorithm of Lacruz _{i}
_{i }

In this simulation study we presented three alternative hypotheses illustrating different patterns of distinguishability between adjacent categories. The first tested hypothesis

Conclusions

In this paper we proposed a new simple method based on simulations, to estimate power of tests in log-linear non-uniform association models. To this aim, we first presented a method to simulate contingency tables resulting from cross-classifications of the same objects, using ordinal rating scales having different patterns of distinguishability between their adjacent categories. Then, taking typical situations of scale structures, we proposed a table summarizing the main effects of sample size, alternative hypotheses and marginal distributions on power estimates for the detection of DDs heterogeneities within the scale structure. Results were given for three typical alternative hypotheses, and in the case of an 5 × 5 contingency tables.

In health-research assessment of disease severity or patients' well being are more and more performed using ordinal rating scales. One of the major component of an ordinal scale is category distinguishability between its adjacent categories. Using a simple method based on simulations, this paper provided some issues about how many objects has to be classified by two observers to be able to detect a given scale structure defect, what may be of prime interest to improve ordinal scale quality and then others assessments made using this scale.

Competing interests

The authors declare that they have no competing interests

Authors' contributions

FV and JYM developed the method, performed all statistical analyses and participated to article writing. FV and JYM read and approved the final manuscript.

Acknowledgements

The authors would like to thank Pr. Sylvie Chevret for her great interest and support of this work.

Pre-publication history

The pre-publication history for this paper can be accessed here: