Automated diffeomorphic registration of anatomical structures with rigid parts: application to dynamic cervical MRI

We propose an iterative two-step method to compute a diffeomorphic non-rigid transformation between images of anatomical structures with rigid parts, without any user intervention or prior knowledge on the image intensities. First we compute spatially sparse, locally optimal rigid transformations between the two images using a new block matching strategy and an efficient numerical optimiser (BOBYQA). Then we derive a dense, regularised velocity field based on these local transformations using matrix logarithms and M-smoothing. These two steps are iterated until convergence and the final diffeomorphic transformation is defined as the exponential of the accumulated velocity field. We show our algorithm to outperform the state-of-the-art log-domain diffeomorphic demons method on dynamic cervical MRI data.

Abstract

MeSH keywords: Algorithms, Anatomic Landmarks, Cervical Vertebrae, pathology, Humans, Image Enhancement, methods, Image Interpretation, Computer-Assisted, methods, Magnetic Resonance Imaging, methods, Pattern Recognition, Automated, methods, Reproducibility of Results, Sensitivity and Specificity, Spinal Cord Injuries, pathology, Subtraction Technique

Introduction

In medical image analysis, one is often confronted with the problem of registering anatomical structures containing both hard, rigid (typically, bones) and soft, non-rigid (most other tissues) parts. Such problems are met for instance when following-up spinal cord lesions in MRI for the diagnosis of multiple sclerosis [1], or when assessing cervical injuries using dynamic/kinematic MR imaging with positional changes [2]. Many methods have been developed for both fully global rigid registration and fully local non-rigid registration separately [3], but the literature on hybrid methods, allowing for adequate registration of the structures depending on the stiffness of their components, is still quite sparse.

The earliest work we know of is that of Little et al. [4], who showed how to incorporate rigid structures into a deformation field, using radial basis functions; this was later improved by others to make the field invertible and even diffeomorphic [5–7]. However, these methods require the user to specify which structures are rigid, which led to the development of semi-automated methods in which rigidity can be locally favoured/enforced through a regularisation term in the criterion to be minimised [8]. This idea was later improved to allow for this term to be adaptively tuned to the structures to register, through prior segmentation of the rigid parts or design of a stiffness map (typically computed from the image intensities; e.g. bones have high intensities in CT) [9–11]. Instead of segmenting rigid parts, it was also proposed to define several anchors, to which is attached an unknown polyaffine transformation, which can be subsequently estimated using a modified EM-ICP algorithm [12].

In this paper, we propose an iterative two-step method to compute a diffeomorphic non-rigid transformation between images of structures with rigid parts, without any user intervention or prior knowledge on the image intensities (to compute rigid parts or anchors). First we compute spatially sparse, locally optimal rigid transformations between the two images by adopting a new (as opposed to classical, translation-based) block matching strategy, made possible by the use of an efficient numerical optimiser (BOBYQA) (Sec. 2.1). The rationale behind this original strategy is our hope to recover both large rotations and subvoxel displacements. Then we derive a dense, regularised velocity field based on these local transformations using matrix logarithms and M-smoothing (Sec. 2.2). The floating image is then resampled and the two steps are iterated until convergence; the final diffeomorphic transformation is defined as the exponential of the accumulated velocity field. We finally compare our algorithm with the state-of-the-art log-domain diffeomorphic demons method [13] on dynamic cervical and multiple sclerosis MR images (Sec. 3).

To compute a diffeomorphism T between a reference image I and a floating image J, we iterate between two steps: computation of a sparse set of locally optimal rigid transformations using block matching between I and J ∘ T^l (Sec. 2.1) and computation of a dense velocity field δLT^l computed from these locally estimated transformations (Sec. 2.2). Given that the transformation T is initially set to the identity (T⁰ = Id), and that the initial velocity field is set to LT⁰ = log T⁰ = 0, the velocity field is then updated as LT^l⁺¹ = LT^l + δLT^l. This two-step algorithm stops at the iteration l when δLT^l is close to 0, and the final diffeomorphism is computed as T = T^l = exp(LT^l). The complete algorithm is outlined in Sec. 2.3. For the sake of clarity, we detail the two steps using the simpler notations I, J and δLT (Sec. 2.1 and 2.2).

Material and Methods

2.1. Computing a sparse set of locally optimal rigid transformations

Classical block matching algorithm In this approach, that we do not follow, one first defines a set of blocks in each image, before matching each block in the reference image I with the most similar block in the floating image J. Similarity is typically computed using a measure on the voxel intensities, such as the sum of squared differences or the squared correlation coefficient in monomodal problems, or the mutual information or the correlation ratio in multimodal problems. The most common approach to optimise the similarity measure (at least in medical image analysis) is to perform an exhaustive search of the block with the highest similarity in J, within a given neighbourhood of each block in I.

This strategy implicitly assumes that the local motion between the images can be well recovered by a discrete translation (i.e. defined on the discrete grid of the image). Subvoxel displacements and large rotations are thus likely to be missed. It is all the more true when registering piecewise rigid structures, because in this case there exists no single, global rigid movement, that could be corrected before non-rigid registration.

Modified block matching algorithm Recent advances in nonlinear optimisation allow for testing another strategy. We first define a set of blocks in I (as in the standard strategy), and then we propose to directly compute the rigid transformation best superposing each of these blocks with J, using a similarity measure on the voxel intensities. As opposed to the standard, discrete translation-based strategy, the computation of the similarity measure for a given block in I and a given tested transformation implies resampling to build the block in J. Given that the solution space is no longer finite, this leads to a potentially much more computationally expensive algorithm.

We propose to use the recent BOBYQA algorithm [14] to implement this idea. In essence, BOBYQA is a derivative-free, trust-region method which uses successive approximations of the similarity measure by quadratic functions, whose maxima can be computed analytically. It is very similar to the classical NEWUOA algorithm except that bounds must be specified on the variables. We thus end up with a set of blockwise-estimated optimal rigid transformations between I and J. In practice, however, we do not estimate a transformation for the blocks in I having a low variance σ². The set of estimated transformations (R₁, …, R_m) is thus spatially sparse, due to these missing transformations, and also due to the resolution of the grid of blocks in I, which is different from that of I. In addition, we weigh each estimated transformation R_i with a weight w_i set equal to the similarity measure; here we use the squared correlation coefficient, to be insensitive to local intensity changes, thus 0 ≤ w_i ≤ 1.

2.2. Estimating a dense velocity field

The set of estimated transformations (R₁, …, R_m) is spatially sparse, but is also noisy and likely to contain outliers (due to the noise in the images to be registered and the potential errors in local registrations). How to estimate a dense (n = card(I)) and smooth velocity field δLT from (R₁, …, R_m)? We propose to use the logarithms of these m transformations, defined in the space of 4 × 4 real matrices ( inline-graphic halms716106ig1

(ℝ)) restricted to those whose last row contains only zeros, and to estimate n intermediate matrices in the same space (that we name log S₁, …, log S_n by analogy) as the minimisers of a criterion C:

where:

ρ: ℝ → ℝ⁺ is a robust error norm,
||.|| is the Frobenius norm in ( ℝ), and |.| is the Euclidean norm in ℝ³,
v_j is the coordinate of the central voxel of the block where R_j was estimated,
v_i is the coordinate of the voxel where S_i is to be computed,
V_i is a neighbourhood around the position v_i; note that the sum over j ∈ V_i must read: “the sum over all the points v_j where a rigid transformation R_j was estimated, and which are inside V_i”,
w_j is the weight defined in Sec. 2.1,
d(.) : ℝ³ → ℝ⁺ is a (spatial) error norm.

It must be clear that we do not estimate S_i and then its logarithm; we do estimate log S_i directly; we use this notation here only as a convention for the sake of simplicity. Solving this minimisation problem is known as local M-smoothing [15], due to the use of a robust error norm and of spatial neighbourhoods to design C; C can be minimised using gradient descent, where each transformation can be estimated independently of the others. Using a particular adaptive, data-dependent step size leads to an easy-to-interpret update formula for each log S_i [15]:

It can be seen from this formula that ρ′ acts as a tonal kernel, while d acts as a spatial kernel. After convergence, each finally estimated log S_i is a linear combination of the logarithms of the rigid transformations R_j; following Arsigny et al. [5], we define the final dense velocity field δLT as δLT (v_i) = log(S_i).v_i, ∀i = 1, …, n. In practice, we define ρ as the Welsch function, which leads to ρ′(a²) = exp(−a²/2λ²), and we define d as d(b²) = exp(−b²/2θ²); this leads to two similar expressions for the two kernels (with different bandwidths). V_i is spherical with radius 2θ (to achieve an approximate 95% confidence interval for a Gaussian law). To initialise the gradient descent algorithm, is computed as the solution of the update formula by setting ρ′(a²) = 1 (i.e. no tonal kernel).

2.3. Complete algorithm

The final estimated transformation is a diffeomorphism [5]. We perform all the update calculations on the velocity field, whose exponential is required only once per iteration to resample the floating image. This is the same approximation as that done by Vercauteren et al., who showed experimentally that exp(LT^l⁻¹) ∘ exp(δLT) could be approximated by exp(LT^l⁻¹ +δLT) for a small enough velocity field δLT [13].

2.4. Implementation details

For the block matching: size of the blocks: 7 voxels; grid step size: 3 voxels; minimal intensity variance in the blocks: 1/4 of the maximum squared intensity; search radiuses within BOBYQA: 2 voxels (translation) and 5 degrees (rotation).

Algorithm.

1:	Initialize T to identity: T⁰ ← Id = exp(LT⁰) and the velocity field to 0: LT⁰ ← 0
2:	for each pyramid level of the multiresolution scheme, do
3:	repeat
4:	Estimate local rigid transformations using block matching (Sec. 2.1):
	R = (R₁, …, R_m) ← block-matching(I, J ∘ T^l⁻¹)
5:	Interpolate a dense velocity field using M-smoothing (Sec. 2.2):
	δLT ← M-smooth(R)
6:	Increment the velocity field: LT^l = LT^l⁻¹ + δLT
7:	Regularise (elastic-like) the velocity field: LT^l ← G_ν* LT^l
8:	Compute T^l = exp(LT^l) to resample J
9:	until δLT is sufficiently small

For the M-smoothing: kernel bandwidths: λ² = med_j _≠ _h || log R_j − log R_h||²/2 (tonal), θ = 4 voxels (spatial); ν = 4 voxels in the elastic-like regularisation. We use a 3-level multiresolution strategy, and the resampling of the floating image is done using trilinear interpolation. Run-time of the algorithm (dual core Xeon 3.0 GHz PC): about 6 min (vs 2 min for the log-domain diffeomorphic demons [13]).

We propose to assess our algorithm quantitatively on ten patients with traumatic cervical cord injury, who got dynamic cervical MRI (T2-w, size 384 × 384 × 14, voxel size 0.8 × 0.8 × 3 mm³) with two different positions each: either flexion/neutral, or extension/neutral [2]. For each patient, we manually defined landmarks on the cervical/thoracic vertebrae C1-C3-C6-T1 (and T4 when visible), the pontomedullary junction, and the gnathion (lower border of the mandible) on each of the two MRI. We considered the neutral position as the reference image in the extension/neutral setting, and the flexion as the reference image in the flexion/neutral setting. For a given patient, the registration accuracy was evaluated as the root mean square error (RMSE) computed over the homologous landmarks after registration using four different methods: global rigid registration (M₁) [16], log-domain diffeomorphic demons (M₂) [13], our algorithm (M₃), and M₂ initialised using M₃ (M₄); both M₂ and M₃ are initialised using M₁. We also assessed our algorithm visually on two patients with MS lesions in the spinal cord, and two other patients with tumours in the spinal cord, with two time points each (T1-w, size 256 × 256 × 64, voxel size 1 × 1 × 1 mm³).

The box-and-whisker plot in Fig. 1 computed from the ten patients shows our algorithm (M₃) to significantly outperform both M₁ (paired t-test: p = 3×10⁻⁴) and M₂ (paired t-test: p = 2 × 10⁻³), with much smaller error and much smaller error dispersal. M₃ is also slightly better than M₄ (paired t-test: p = 5 × 10⁻²). This suggests that the log-domain diffeomorphic demons performs worse than our algorithm even when properly initialised (using M₃ instead of M₁). The results of M₁, M₂ and M₃ on one of the ten patients are shown in Fig. 2, and that of M₁ and M₃ are shown on one of the MS patients in Fig. 3.

It appears that our strategy for non-rigid registration, based on the computation of locally optimal rigid transformations in the first place, allows us to recover displacements and deformations of piecewise rigid structures (as seen e.g. in dynamic cervical MRI) much better than standard methods which are implicitly based on locally optimal translations, such as the log-domain diffeomorphic demons algorithm. This original strategy was made possible by (i) the use of an up-to-date very efficient optimiser and (ii) the design of a specific regularising procedure on the (sparse) set of locally estimated rigid transformations, based on robust estimation techniques. A future line of research could be to combine our regularisation technique with those previously proposed in this context [9–11]. Our intuition is also that our algorithm could perform very well in more general problems, without necessarily rigid structures involved, and on other image modalities; we will evaluate this in a near future.

Acknowledgments

We thank Elise Bannier, Jean-Yves Gauvrit and Auroré Esquevin (CHU, University Hospital of Rennes, F-35043 Rennes, France) for discussions on the clinical protocol and for acquiring the MR images. We also thank Charles Kervrann (INRIA, SERPICO Project, F-35042 Rennes, France) for last-minute and life-saving reading suggestions.