R. E. Millsap, Statistical Approaches to Measurement Invariance. Routledge, 2011.

B. D. Zumbo and E. Chan, Validity and Validation in Social, Behavioral, and Health Sciences, 2014.

, Guidance for industry: patient-reported outcome measures: use in medical product development to support labeling claims: draft guidance, Health Qual Life Outcomes, vol.4, p.17034633, 2006.

L. B. Mokkink, C. B. Terwee, D. L. Patrick, J. Alonso, P. W. Stratford et al., The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes, J Clin Epidemiol, vol.63, p.20494804, 2010.

J. M. Valderas, M. Ferrer, J. Mendívil, O. Garin, L. Rajmil et al., Development of EMPRO: A Tool for the Standardized Assessment of Patient-Reported Outcome Measures, Value Health, vol.11, p.18194398, 2008.

, Standards for Educational and Psychological Testing, 2014.

W. H. Angoff, Perspectives on differential item functioning methodology, Holland PW, Wainer H, editors. Differential item functioning, pp.3-23, 1993.

G. J. Mellenbergh, Item bias and item response theory, Int J Educ Res, vol.13, pp.127-143, 1989.

H. Swaminathan and H. J. Rogers, Detecting Differential Item Functioning Using Logistic Regression Procedures, J Educ Meas, vol.27, pp.361-370, 1990.

R. E. Millsap and H. T. Everson, Methodology Review: Statistical Approaches for Assessing Measurement Bias, Appl Psychol Meas, vol.17, pp.297-334, 1993.

P. W. Holland and D. T. Thayer, Differential Item Performance and the Mantel-Haenszel Procedure, 1986.

N. J. Dorans and P. W. Holland, Dif Detection and Description: Mantel-Haenszel and Standardization1,2, ETS Res Rep Ser, vol.1992, 1992.

N. J. Dorans and E. Kulick, Demonstrating the Utility of the Standardization Approach to Assessing Unexpected Differential Item Performance on the Scholastic Aptitude Test, J Educ Meas, vol.23, pp.355-368, 1986.

D. Borsboom, When Does Measurement Invariance Matter?

, Med Care Meas Multi-Ethn Soc, vol.44, p.17060825, 2006.

K. B. Christensen, S. Kreiner, and M. Mesbah, Front Matter. Rasch Models in Health, pp.i-xvi, 2012.

S. E. Embretson and S. P. Reise, Item Response Theory for Psychologists. L. Erlbaum Associates, 2000.

G. Rasch, Probabilistic models for some intelligence and attainment tests, 1980.

J. Linacre, Sample Size and Item Calibration, Stability. Rash Meas Trans, vol.7, p.328, 1994.

A. Rouquette, J. Hardouin, and J. Coste, Differential Item Functioning (DIF) and Subsequent Bias in Group Comparisons using a Composite Measurement Scale: a Simulation Study, J Appl Meas, vol.17, p.28027055, 2016.

P. K. Crane, L. E. Gibbons, L. Jolley, and G. Van-belle, Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar, Med Care, vol.44, p.17060818, 2006.

L. Gibbons, DIFDETECT: Stata module to detect and adjust for differential item functioning (DIF)

, Boston College Department of Economics, 2015.

S. W. Choi and . Lordif, Logistic Ordinal Regression Differential Item Functioning using IRT, 2016.

G. N. Masters, A Rasch model for partial credit scoring, Psychometrika, vol.47, pp.149-174, 1982.

G. R. Norman, J. A. Sloan, and K. W. Wyrwich, Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation, Med Care, vol.41, pp.582-592, 2003.

, StataCorp LP. Stata Statistical Software: Release 12.1. College, 2012.

J. Cohen, Statistical Power Analysis for the Behavioral Sciences

L. E. Associates, , 1988.

M. Brouwers, M. Kho, G. Browman, F. Cluzeau, G. Feder et al., AGREE II: Advancing guideline development, reporting and evaluation in healthcare, Can Med Assoc J, vol.182, pp.839-842, 2010.

D. Andrich, B. S. Sheridan, and G. Luo, Rumm2030: Rasch Unidimensional Measurement Models

W. Perth and . Australia, , 2010.

J. Linacre, WINSTEPS Rasch measurement computer program. Chicago: Winsteps.com, 2006.

R. Nandakumar, Simultaneous DIF Amplification and Cancellation: Shealy-Stout's Test for DIF, J Educ Meas, vol.30, pp.293-311, 1993.

R. Shealy and W. Stout, A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF, Psychometrika, vol.58, pp.159-194, 1993.

J. A. Teresi, Different approaches to differential item functioning in health applications. Advantages, disadvantages and some neglected topics, Med Care, vol.44, p.17060822, 2006.

A. E. Wyse, DIF Cancellation in the Rasch Model, J Appl Meas, vol.14, p.23816591, 2013.

J. Hardouin, SIMIRT: Stata module to process data generated by IRT models. Boston College Department of Economics, 2005.

D. Rizopoulos, ltm: Latent Trait Models under IRT, version 1.0-0, 2015.

D. Andrich and C. Hagquist, Real and Artificial Differential Item Functioning, J Educ Behav Stat, vol.37, pp.387-416, 2012.

D. Andrich and C. Hagquist, Real and Artificial Differential Item Functioning in Polytomous Items, Educ Psychol Meas, vol.75, pp.185-207, 2015.

C. Hagquist and D. Andrich, Recent advances in analysis of differential item functioning in health research using the Rasch model, Health Qual Life Outcomes, vol.15, p.181, 2017.