| |
| Conf Proc IEEE Eng Med Biol Soc. 2008; 1: 394–397. doi: 10.1109/IEMBS.2008.4649173.Multimodal medical case retrieval using the Dezert-Smarandache theory Gwénolé Quellec,1,2 Mathieu Lamard,1* Guy Cazuguel,1,2 Christian Roux,1,2 and Béatrice Cochener1,3 1 Laboratoire de Traitement de l'Information Medicale - Latim
INSERM : U650, Université de Bretagne Occidentale - Brest, Hopital Morvan, 5 Avenue Foch, 29609 Brest Cedex,FR 2 TELECOM Bretagne
Institut TÉLÉCOM, UEB; Dpt ITI, Brest, F-29200,FR 3 Service d'ophtalmologie
CHU Brest, FR |
Most medical images are now digitized and stored with semantic information, leading to medical case databases. They may be used for aid to diagnosis, by retrieving similar cases to those in examination. But the information are often incomplete, uncertain and sometimes conflicting, so difficult to use. In this paper, we present a Case Based Reasoning (CBR) system for medical case retrieval, derived from the Dezert-Smarandache theory, which is well suited to handle those problems. We introduce a case retrieval specific frame of discernment θ, which associates each element of θ with a case in the database; we take advantage of the flexibility offered by the DSmT’s hybrid models to finely model the database. The system is designed so that heterogeneous sources of information can be integrated in the system: in particular images, indexed by their digital content, and symbolic information. The method is evaluated on two classified databases: one for diabetic retinopathy follow-up (DRD) and one for screening mammography (DDSM). On these databases, results are promising: the retrieval precision at five reaches 81.8% on DRD and 84.8% on DDSM. Author keywords: Case based reasoning, Dezert-Smarandache theory, Diabetic Retinopathy, Image indexing, Mammography |
In medicine, the knowledge of experts is a mixture of textbook knowledge and experience through real life clinical cases. Consequently, there is a growing interest in case-based reasoning (CBR), introduced in the early 1980s, for the development of medical decision support systems [1]. The underlying idea of CBR is the assumption that analogous problems have similar solutions, an idea backed up by physicians’ experience. In CBR, the basic process of interpreting a new situation revolves around the retrieval of relevant cases in a case database. The retrieved cases are then used to help interpreting the new one. We propose in this article a CBR system for the retrieval of medical cases made up of a series of images with contextual information: a class of CBR problems which has hardly been treated. The proposed system is applied to a Diabetic Retinopathy (DR) multimedia database built up in our laboratory; to diagnose DR, physicians analyze series of multimodal photographs together with contextual information such as the patient age, sex and medical history. To show that the method is generic, we also applied it to DDSM, a public access database for screening mammography; to screen mammography, physicians analyze two views of each breast, with associated contextual information. When designing a CBR system to retrieve such cases, several problems arise. We have to aggregate heterogeneous sources of evidence (images, contextual information) and to manage missing information. These sources may be uncertain and conflicting. As a consequence, we applied the Dezert-Smarandache Theory (DSmT) of plausible and paradoxical reasoning, proposed in recent years [2], which is well suited to fuse uncertain, highly conflicting and imprecise sources of evidence. |
A. Databases
-
Diabetic retinopathy database: the diabetic retinopathy (DR) database contains retinal images of diabetic patients, with associated anonymous information on the pathology. Diabetes is a metabolic disorder characterized by sustained inappropriate high blood sugar levels. This progressively affects blood vessels in many organs, which may lead to serious renal, cardiovascular, cerebral and also retinal complications. Different lesions appear on the damaged vessels, which may lead to blindness. The database is made up of 63 patient files containing 1045 photographs altogether. Images have a definition of 1280 pixels/line for 1008 lines/image. They are lossless compressed images. Patients have been recruited at Brest University Hospital since June 2003 and images were acquired by experts using a Topcon Retinal Digital Camera (TRC-50IA) connected to a computer. An example of an image series is given in figure 1.
The contextual information available is the patients’ age and sex and structured medical information (about the general clinical context, the diabetes context, eye symptoms and maculopathy). Thus, at most, patients records are made up of 10 images per eye (see figure 1) and of 13 contextual attributes; 12.1% of these images and 40.5% of these contextual attribute values are missing. The disease severity level, according to ICDRS classification [ 3], was determined by experts for each patient. The distribution of the disease severity among the above-mentioned 63 patients is given in table I.
-
Digital Database for Screening Mammography (DDSM): the DDSM project [4] has built a mammographic image database for research on breast cancer screening. It is made up of 2277 patient files. Each one includes two images of each breast, associated with some patient information (age at time of study, rating for abnormalities, American College of Radiology breast density rating and keyword description of abnormalities) and imaging information. The following contextual attributes are taken into account in the system:
-
the age at time of study
-
breast density rating
Images have a varying definition, of about 2000 pixels/line for 5000 lines/image. An example of image sequence is given in figure 2. Each patient file has been graded by a physician. Patients are then classified in three groups: normal, benign and cancer. The distribution of grades among the patients is given in table I. B. Including images in the retrieval system To include images in the proposed retrieval system, we have to define a distance measure between images and to cluster images acquired at a given imaging modality into a finite number of groups. For this purpose, we follow the usual steps of Content-Based Image Retrieval (CBIR) [ 5]: 1) building a signature for each image (i.e. extracting a feature vector summarizing their numerical content), and 2) defining a distance measure between two signatures. Thus, measuring the distance between two images comes down to measuring the distance between two signatures. We can then cluster similar image signatures according to the defined distance measure. In previous studies, we proposed to compute a signature for images from their wavelet transform (WT) [6]. These signatures model the distribution of the WT coefficients in each subband of the decomposition. The associated distance measure d [6] computes the divergence between these distributions. We used these signature and distance measure to cluster similar images. Any clustering algorithm can be used, provided that the distance measure between feature vectors can be specified. We used FCM (Fuzzy C-Means) [7], one of the most common algorithms, and replaced the Euclidian distance by d. C. Dezert-Smarandache Theory The Dezert-Smarandache Theory (DSmT) allows to combine any type of independent sources of information represented in term of belief functions. It is more general than probabilistic fusion or Dempster-Shafer theory (DST). It is particularly well suited to fuse uncertain, highly conflicting and imprecise sources of evidence [ 2].
-
The fusion model: let θ = {θ1, θ2, …} be a set of hypotheses under consideration for a fusion problem; θ is called the frame of discernment. In DST, these hypotheses are assumed incompatibles (constrained model ℳ0(θ)), while in DSmT they are not (free model ℳf (θ)): if θ = {“blue”, “red”}, we may model objects that are both blue and red (magenta). Thus, in DSmT, a belief mass m(A) is assigned to each element A of the hyper-power set D(θ), i.e. the set of all composite propositions built from elements of θ with ∩ and ∪ operators, such that m(∅) = 0 and ΣA∈D(θ)m(A) = 1; m is called a generalized basic belief assignment (gbba). Nevertheless, it is possible to introduce constraints in the model [2] (hybrid model ℳ(θ)): we can specify pairs of incompatible hypotheses (θa, θb), i.e. each subset A of θa ∩θb must have a null mass, noted A ∈ C(θ).
-
The fusion operators and the decision functions: to fuse information in the DSmT framework, the user specifies a gbba mj for each source of evidence Sj, j = 1..N. Then, these gbba are fused into a global gbba mf, according to a given rule of combination. Several rules have been proposed to combine mass functions, including the hybrid rule of combination or the PCR (Proportional Conflict Redistribution) rules [2].
-
The decision functions: once the fused gbba mf has been computed, a decision function (such as the credibility, the plausibility or the pignistic probability) is used to evaluate the probability of each hypothesis. The pignistic probability BetP, a compromise between the other two functions, is used since it provides the best system performance:
where
ℳ( Bi) is the cardinality of Bi in the Venn diagram of ℳ( θ): in the examples of figure 3,
ℳ0( θ1) = 1,
ℳf ( θ1) = 4 and
ℳ( θ1) = 2. D. Dezert-Smarandache Theory Based Retrieval To find τj, we define the following test Tj=“if DRj( ci, cq) > τj then Ci is true, otherwise Ci is false”. The sensitivity (resp. the specificity) of Tj represents the degree of confidence in a positive (resp. negative) answer to Tj. Tj is relevant if it is both sensitive and specific. As τj increases, sensitivity increases and specificity decreases. So, we set τj as the intersection of the two curves “sensitivity according to τj” and “specificity according to τj ”, using a dichotomic search. We set mj0 as the sensitivity of Tj.
 is assigned the degree of uncertainty of Tj. Note that a single setup ( τj, mj0) is used for each feature, whatever ci and cq. |
The mean precision at five (mp5) of the system, i.e. the mean number of relevant cases among the top five results, is 78.6% for DRD and 82.1% for DDSM. As a comparison, the mp5 obtained by CBIR (when cases are made up of a single image), with the same image signatures, is 46.1% for DRD and 70.0% for DDSM [6]. To evaluate the contribution of the proposed system for the retrieval of heterogeneous and incomplete cases, it is compared to the linear combination of heterogeneous distance functions [8] that was used to build the model. A mp5 of 52.3% was achieved by this method for DRD and of 71.4% for DDSM. To assess the robustness of the method regarding missing information, 1) we generated artificial cases from each case in the database by removing attributes, 2) we placed sequentially each artificial case as a query to the system and 3) we plotted on figure 2 the precision at five of these queries according to the number of available attributes. |
DISCUSSION AND CONCLUSION In this article, we introduce a method to include image series, with contextual information, in CBR systems. DSmT is used to fuse the output of several sensors (cases features). We introduced a case retrieval specific frame of discernment θ, which associates each element of θ with a case in the database; we use the flexibility offered by the DSmT’s hybrid models ℳ(θ) to finely model the database. On DRD, the method largely outperforms our first CBIR algorithm (78.6%/46.1%). This stands to reason because experts generally need more than one image to correctly diagnose the patient’s disease severity level. The improvement is also interesting on DDSM (82.1%/70.0%). This non-linear retrieval method is significantly more precise than a simple linear combination of heterogeneous distances on both databases (78.6%/52.3% on DRD, 82.1%/71.4% on DDSM). Moreover, if we use a Bayesian network to infer the missing values, prior to estimating the degree of relevance (described in section II-D.3), the mp5 becomes 81.8% on DRD and 84.8% on DDSM. It is a possible alternative to the decision tree based retrieval system we proposed previously [9] (giving a performance of 79.5% in mp5 on DRD). Finally, the proposed framework is also interesting for being generic: any multimedia database may be processed, as far as a procedure to cluster cases is provided for each new modality (sound, video, etc). |
1. Bichindaritz I, Marling C. Case-based reasoning in the health sciences: What’s next? Artificial Intelligence in Medicine. 2006 january;36(2):127–135. 2. Smarandache F, Dezert J. Advances and Applications of DSmT for Information Fusion II. American Research Press; Rehoboth: 2006. 3. Wilkinson C, Ferris FRK, et al. Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales. Ophthalmology. 2003;110(9):1677–1682. 4. Heath M, Bowyer KDK, et al. Digital Mammography. Kluwer Academic Publishers; 1998. Current status of the digital database for screening mammography; pp. 457–460. 5. Smeulders A, Worring M, Santini S, Gupta A, Jain R. Content-based image retrieval at the end of the early years. IEEE Transactions on PAMI. 2000 December;22(12):1349–1380. 6. Lamard M, Quellec G, Bekri L, Cazuguel G, Cochener B, Roux C. Content based image retrieval based on wavelet transform coefficients distribution. Proceedings of the 29th annual international conference of the IEEE EMBS; august 2007; 7. Bezdek J. PhD dissertation. Applied Math. Center, Cornell University; Ithaca: 1973. Fuzzy mathematics in pattern classification. 9. Quellec G, Lamard M, Bekri L, Cazuguel G, Cochener B, Roux C. Multimedia medical case retrieval using decision trees. 29th annual international conference of the IEEE EMBS; august 2007; pp. 4536–4539. |
 | Fig. 1 Photograph series of a patient eye |
 | Fig. 2 Mammographic image sequence of the same patient. (a) and (b) are two views of the left breast, (c) and (d) are two views of the right one. |
 | Fig. 3Considering the frame of discernment θ = { θ1, θ2, θ3}, the figure represents, from left to right, the Venn diagram of the constrained, the free and a hybrid model (in which θ3 is incompatible with the other two hypotheses). A non-empty intersection between (more ...) |
 | Fig. 4 Building the model ℳ(θ) from the compatibility graph. An example of compatibility graph is shown on figure (a). Hypotheses associated with cases at different severity levels are represented with different colors. |
 | Fig. 6 Robustness with respect to missing values. Note that cases are returned at random when no attributes are inputted (0 on the X axis). |
 | TABLE I Patient disease severity distribution |
|