Non-local robust detection of DTI white matter differences with small databases

Non-Local Means for the Comparison of Diffusion Data

Diffusion imaging, through the study of water diffusion, allows for the characterization of brain white matter, both at the population and individual level. In recent years, it has been employed to detect brain abnormalities in patients suffering from a disease, e.g. from multiple sclerosis (MS). State-of-the-art methods usually utilize a database of matched (age, sex, …) controls, registered onto a template, to test for differences in the patient white matter. Such approaches however suffer from two main drawbacks. First, registration algorithms are prone to local errors, thereby degrading the comparison results. Second, the database needs to be large enough to obtain reliable results. However, in medical imaging, such large databases are hardly available. In this paper, we propose a new method that addresses these two issues. It relies on the search for samples in a local neighborhood of each pixel to increase the size of the database. Then, we propose a new test based on these samples to perform a voxelwise comparison of a patient image with respect to a population of controls. We demonstrate on simulated and real MS patient data how such a framework allows for an improved detection power and a better robustness and reproducibility, even with a small database.

MeSH keywords: Brain, pathology, Databases, Factual, Diffusion Tensor Imaging, methods, Humans, Information Storage and Retrieval, methods, Multiple Sclerosis, pathology, Nerve Fibers, Myelinated, pathology, Pattern Recognition, Automated, methods, Radiology Information Systems, Reproducibility of Results, Sensitivity and Specificity

Non-Local Means for the Comparison of Diffusion Data

Diffusion weighted imaging is an MRI modality that provides information about water diffusion within tissues. It has therefore gained much interest for the study of brain white matter architecture. In particular, it may be utilized for the detection of structural differences related to a disease. Reported studies on diseases usually fall within two categories: (i) group comparisons between a population of healthy subjects and a group of patients suffering from the disease and (ii) comparison of one patient to a set of healthy controls. The former aims at characterizing the overall course of a disease while the latter focuses on detecting its early signs and, possibly, its future evolution.

Both approaches are of great interest to understand a disease. In this work, we are interested in diffusion imaging for multiple sclerosis (MS). MS is a demyelinating disease, causing both lesions visible on conventional MRI and diffuse damage to the brain white matter architecture that may be visible in diffusion imaging [1]. Having a robust detection of that diffuse damage for a specific patient is crucial as it could help to predict how the disease will evolve in time, and potentially allow to adapt the treatment.

Recent works on the comparison of diffusion images have first focused on scalar values extracted from the diffusion tensor, such as mean diffusivity (MD) or fractional anisotropy (FA). For example, Filippi et al. [2] presented a study on manually defined regions of interest demonstrating an MD increase and an FA decrease in specific regions for MS patients brains. However, utilizing only a scalar value may discard a large part of the tensor information and decrease the precision of the comparison. To overcome this problem, several groups have proposed methods to utilize the full tensor either for population comparison (Lepore et al. utilized the Hotelling’s T ² test on tensors for HIV patients [3], Whitcher et al. [4] the Cramers test on tensors), or for patient to group comparison (Commowick et al. [5]). These works have demonstrated that a test based on the full tensor information yields more precise comparisons. Finally, when high quality data is available (HARDI acquisitions), one may now consider higher order models such as orientation distribution functions (ODFs) to get improved sensitivity in crossing fibers regions where the diffusion tensor performs poorly [6].

Independently of their strengths and weaknesses, comparison methods usually rely on a parametric or permutation statistical test. Such approaches often require large databases either to ensure that the distribution of the test statistic matches the hypothesized one or to make the permutation test data independent. However, in medical imaging studies, databases are usually small due to the difficulty to recruit patients and volunteers, and they may be even smaller when parameters such as age or sex must match between the control database and the patients. In those cases, the chosen statistical test may become erroneous and generate either false positive or false negative detections.

In addition, all automatic approaches need a common reference frame that is often constructed from the healthy subjects by means of non linear registration (so called atlas construction [7]). However, such registration methods are not perfect and may be prone to errors due to noise and artifacts. Such errors may further corrupt the comparison performance.

To tackle these issues, we propose a new methodology for the robust detection of white matter differences at a patient level. It is based on ideas recently introduced for non-local means denoising [8] and segmentation [9], adapted for a patient-to-group comparison of diffusion models that can be represented as vectors in a vector space (e.g diffusion tensors or ODFs). We present in Section 2 the overall comparison method. We then apply this new method to simulated data and real data of multiple sclerosis patients demonstrating higher accuracy and reproducibility for differences detection over state-of-the-art methods.

Non-Local Means for the Comparison of Diffusion Data

Results

Conclusion

References

Non-Local Means for the Comparison of Diffusion Data

In the following, we assume that a database of M images I_m has been constituted, i.e. all these images have been non linearly registered to a common reference system, and that we are interested in comparing voxel by voxel an image T to the reference database. We propose an algorithm relying on the non-local means framework [10] optimized by Coupé et al. for medical image denoising [8] and segmentation [9]. For each point x of T, we define a patch B(x) (half size h) around it and follow these main steps:

For each image I_m , search for patches B_m (x_j ) similar to the patch B(x) of T in a neighborhood N(x) around x
Associate a weight w_mj to each patch B_m (x_j ), depending of the similarity between B(x) and B_m (x_j ) (Section 2.1)
Keep the center voxel D_mj of each patch and associate it to its weight w_mj
Utilize the set of weighted samples to perform the comparison between T and I_m, m ∈ {0, …, M} (Section 2.2)

This framework has several advantages: it may help to account for potential registration errors onto the common template for comparison, and it may significantly increase the number of samples to perform the voxelwise comparison even though the database consists of a limited number of images.

2.1. Similarity Weights between Patches

The selection of patches is a crucial point as it will define the relative importance of each patch in the final differences detection step. We consider that the model chosen to describe the diffusion of water molecules may be represented as a vector, e.g. tensors in the Log-Euclidean framework [11] or ODFs on a spherical harmonics basis [6]. Before comparing patches, a preselection is performed for speed reasons and to avoid the degeneracy of the patches weights w_mj :

Compute the Log-Euclidean distance between the covariance matrices Σ _B(x) and Σ _{B
_m(x
_j)}: if it exceeds the average distance between any two covariance matrices Σ _{B
_m(x)} of the database, discard B_m (x_j ), otherwise proceed to the next step;
Compute Hotelling’s T ² statistic [12] to test for mean differences between B(x) and B_m (x_j ) using the pooled covariance matrix: if it exceeds the average statistic computed from any two patches B_m (x), discard patch B_m (x_j ).

For the remaining patches, we then compute their weights. The weight w_mj between two patches B(x) and B_m (x_j ) is defined as a function of the sum of squared differences between the two patches:

(1)

where Δ_i = I_m (i + x_j − x) − T(i) are the differences between corresponding voxels of the patches, |B(x)| is the number of voxels in B(x), β a user-defined scale parameter and Σ̂(x) is the local noise covariance around x on T. These weights characterize the similarity between patches and vary between 0 and 1: 1 is reached when the two patches are equal, 0 corresponds to a total disagreement.

Since structures with different orientations may occur, the noise covariance Σ̂ is estimated locally. Computing it globally over the whole image could indeed lead to an over-estimation and therefore to biased weights. Coupé et al. [8] proposed a method to estimate such a local noise variance on scalar valued images. Here, we extend it to vector-valued images. Σ̂(x) is estimated from voxels in patch B(x):

For each voxel x_i in B(x), consider a small neighborhood of N voxels around it (e.g. the 26 neighbors of x_i ). Pseudo-residuals ε _{x
_i} are computed as:

The local image noise covariance is defined from these pseudo-residuals:

2.2. Comparison of Weighted Data Samples

We have constituted a list of weighted samples S = {S ₁, …, S_M } at each voxel x of T, where S_m = {(D_m ₀, w_m ₀), …, (D_mJ , w_mJ )}. We now utilize these samples to confront the patient image to the healthy subjects database. We compute the weighted mean μ_x and weighted covariance matrix Σ_x at each point x as:

(2)

These estimates are very interesting as they take into account the similarity of each patch in the estimation of the mean and covariance. We then test for voxelwise differences by computing the Mahalanobis distance at each point:

(3)

where D_x is the vector value of the patient image at point x (e.g. log-tensor or ODF value). Considering there are enough samples, this squared distance follows a χ ² distribution with d degrees of freedom, where d is the vector dimension, and a p-value is computed from Z ² as:

(4)

where

is the cumulative distribution function of a χ ² distribution with d degrees of freedom.

Non-Local Means for the Comparison of Diffusion Data

Our method has two main parameters: the patch size and the search neighborhood. The smaller the sizes, the closer the method gets to [5]. On the contrary, large sizes tend to increase the number of false negatives. We fixed the parameters on the basis of qualitative results on several patients: patch size of 3 × 3 × 3 (h = 1) and a neighborhood for patch search of 4 voxels in every direction. In addition, we have set β - Eq. (1) - to 1 as is suggested by Coupé et al. [8].

3.1. Experiments on Simulated Data

We first present a quantitative study on simulated images. Starting from a reference diffusion tensor image (Fig. 1.a), 90 images were simulated by adding Rician noise to the DWI. Then, a patient image was simulated by inserting lesions, i.e. tensors swollen in the two non principal directions. To illustrate the detection power of our method and its robustness to database size, we randomly selected from the 90 images subgroups of 15 to 90 images and used them as the reference database to compare to the simulated patient. Fig. 1 shows the average Dice score results of our method (M ₂) and the one proposed in [5] (M ₁).

This figure illustrates well the issues arising when using a small database for differences detection. As the sample size decreases, M ₁ performs worse, mainly due to a large number of false positives being detected (see Fig. 1.c). These errors mainly stem from the small size of the database that weakens the power of the test. Instead, M ₂ obtains much better and more steady scores, which demonstrate its robustness. M ₂ performs better as we are able from a small database to increase the number of samples used for the comparison.

3.2. Experiments on Multiple Sclerosis Data

We have utilized the LONI ICBM database of healthy control diffusion images ^1,. This database is composed of 160 control images: T1-weighted images (isotropic 1mm ³) and diffusion images acquired on a 3T MRI scanner (b-value of 1000 s/mm ², 30 directions with a resolution of 2x2x2 mm ³). This control subject database was compared to a database of 10 MS patient images acquired following a similar protocol with the same parameters. As a first step before processing, the diffusion tensor images are first registered to the T1-w images using a global affine transform [13] and a non linear free-form deformation [14] with few control points to recover EPI distortions. Then, a DTI atlas is computed from the control subjects DTI using Guimond’s et al. atlas construction method [7], combined to a non linear tensor-based registration algorithm.

Each DTI patient image is then registered onto the atlas and compared voxel by voxel to the database of controls either with the method proposed in [5] M ₁ or the proposed method M ₂. We present in Fig. 2 a representative qualitative comparison of the results obtained by the two methods utilizing only a subgroup of 40 images from the controls subjects database.

We can notice on this figure that M ₁ is affected by the small size of the database and the registration errors, resulting in a large number of false positive detections in Fig. 2.b. On the contrary, adding additional patches as it is done in M ₂ leads to many more patches being considered (see Fig. 2.d) and possibly more accurate ones if the registration errors were in the bounds of the local neighborhood. As a consequence, the detection results in Fig. 2.c reveal much less false positives while keeping the detection power on the MS lesions.

Finally, we present a quantitative evaluation of the reproducibility of the obtained score maps when the control subjects database changes. To do so, we have, for each patient, repeatedly selected N_Db images out of the 160 images of the database. We have then computed for each of these sub-databases a score map deriving either from M ₁ or M ₂. To evaluate the variability of the scores, we have chosen to utilize the average of the voxelwise standard deviation of these maps. We present in Table 1 the average over all images of these standard deviation values for N_Db = 20, 40 and 80 images.

This table shows that the obtained standard deviations are significantly lower for M ₂ (paired t-tests, p-value of 0.001). This indicates a better reproducibility of the results when considering our non-local approach. This confirms the robustness of the proposed method and the interest of utilizing neighboring patches, especially when performing a comparison against a very small database.