Université de Lyon, CREATIS; CNRS UMR5220; INSERM U1044; INSA-Lyon; Université Lyon 1; 7 av Jean Capelle, 69621 Villeurbanne, France

Electronics and Telecommunications Department, Università degli Studi di Firenze, Via Santa Marta 3, 50139 Firenze, Italy

Abstract

Acoustic simulation has always played an important role in the development of new ultrasound imaging techniques. In nonlinear ultrasound imaging particularly, the simulators are accurate but time-consuming, because of the high derivative order of the propagation equation and to the classic solution based on finite difference schemes. This article presents a fast 3D +

1. Introduction

The use of harmonic imaging in ultrasound has become popular because of the improvement it offers in terms of axial and lateral resolution with respect to standard B-mode imaging

In the last few years, the increased performance of graphics processor units (GPUs) has made them excellent candidates not only for display but also for intensive calculus, and different applications have been transferred from central processing units (CPUs) to GPUs. The increasing number of cores on a GPU can be exploited for high-level parallelism and intensive simulations. Recent works in ultrasound demonstrate the potential of the GPU in several applications such as, e.g., Doppler imaging

In this article, a GPU implementation of the generalized ASM (GASM) in a 3D +

The next section reviews the GASM, which can compute the fundamental and second harmonic evolution separately. The second part is dedicated to the GPU implementation of the method, and the different choices made to increase its performance are discussed. The results obtained with this implementation are presented in Section 4, where they are also compared to the results of a classic CPU implementation.

2. Angular spectrum method

In a lossless medium, the evolution of the ultrasound pressure in 4D (3D + _{1}) and second harmonic (_{2}) components and their time and spatial evolution can be expressed as

respectively, with _{0 }the speed of sound, _{0 }the density,

In order to decrease the derivative order of (1) and (2), the Fourier transform (FT) of each equation must be computed. The FT (F) and the inverse FT (IFT, F^{-1}) are, respectively, defined as

with _{x }
_{y }
_{t }

where

where _{j }
^{−1}. In the computation, only the real part of the _{1 }and _{2 }are

with _{0 }the FT of the source wave _{0 }at depth _{0}. It has to be noted that since the nonlinear coefficient

3. CPU/GPU implementation of the GASM

The solution of Equations 11 and 12 is particularly well suited to GPU programming. Indeed, the different calculations are separately performed in the

3.1 Computation of _{1}

The evolution of the fundamental component is only linked to the initial wave source, _{0}, and to the propagation distance, _{1 }spectrum is obtained after the computation of the rotation kernel in the Fourier domain, and then the IFT is used to obtain the final solution. It must be noted that the fundamental wave component does not depend on the

3.2 Computation of _{2}

The second harmonic wave component will be solved in five steps. First, from the initial _{1 }image, the new term _{1}
^{2 }has to be computed. Second, the FT of the resulting image is done. Third, the spectrum is rotated. Fourth, the spectrum has to be integrated. Finally, the integrated Fourier spectrum must be rotated once more. The different rotations are defined with the same rotation kernel. To compute the _{2 }wave, the

3.3 Fourier transform

The FT library used in the CPU implementation is the FFTW library, which is considered the most efficient in the community _{1 }and _{2 }as 3D real images and _{1 }and _{2 }as complex means the dimension of the complex image can be halved and also the computation time in both the FT and IFT decreased.

3.4 Kernel description

The kernels used in the GPU implementation are described below. The different kernels are particularly suitable for the GPU because the mathematical operations used in the GASM only involved the voxels at a given position in the 3D images. No access to other specific memory areas is needed to compute the output images, which is very efficient in GPU programming.

3.4.1. Rotation kernel

To compute the fundamental and the second harmonic, a rotation kernel is needed. According to the Euler formula, the complex exponential is considered in its Cartesian form, and then a classic multiplication is computed to obtain the new complex number.

3.4.2 Kernel to compute βp_{1}
^{2}

Usually, in a biological medium, the nonlinear parameter _{1}
^{2}(_{1 }is real, its value is simply multiplied by itself to obtain the square value. This operation is very efficient in GPU programming.

3.4.3. Kernel to compute the integral

The integral computation is the most complex part. In order to compute it, a finite difference scheme was used. Contrary to the fundamental evolution computation, a

To compute the integral at the

3.5 Final algorithm

The final algorithm is described in Table

Illustration of the different steps of the GASM

_{0 }→ _{0 }→ [FT]

For each z point:

_{0 }→ _{1 }→ [rotation kernel]

_{1 }→ _{1}(

Compute _{1}^{2 }→ [_{1}^{2 }kernel]

_{1}^{2 }→ F (_{1}^{2}) → [FT]

Rotate F (_{1}^{2}) → [rotation kernel]

Compute integral

_{2 }→ [rotation kernel]

_{2 }→ _{2}(

The different FTs, IFTs, and kernels are represented in square brackets.

4. Results

4.1 Speed increment

Two different CPUs and GPUs were used to estimate the algorithm's performance and are described in Table

Description of the two CPUs and GPUs used

**Machine 1**

**Machine 2**

CPU

Processor name

Intel Core2 Duo T9400

Intel Xeon E5220

Speed

2.53 GHz

2.27 GHz

Memory

3.48 GB

5.9 GB

GPU

Name

Quadro NVS 160M

GTX 285

Global memory

256 MB

1024 MB

Number of multiprocessors

1

30

Number of cores

8

240

The resulting calculation times are reduced by a factor of 3.5 ± 0.2 on the Quadro NVS 160 M and 13.6 ± 2.1 on the GTX 285. The difference in these ratios is explained by the higher performance of the GTX GPU, which is composed of more cores and larger memory (see Table

Computation time on the CPU (dotted lines) and GPU (full lines) for the two different PCs

**Computation time on the CPU (dotted lines) and GPU (full lines) for the two different PCs**. The curves with 'o' correspond to the laptop (machine 1) and the curves with '+' to a standard PC (machine 2). The total time takes into account for calculating the complete 3D +

Regarding the computation time, it can be noted that an increase of a factor 30 in the number of GPU cores leads to a relatively weak performance gain. However, the processing times on the Quadro NVS and on the GTX GPU are 360 and 47 ms, respectively, for a working 2D +

4.2 Resulting fields

One possible application of the GASM is to calculate the pressure evolution in a medium with an inhomogeneous nonlinear coefficient. In such cases, the second harmonic pressure is expected to sharply increase according to the nonlinear parameter. For example, Figure

Evolution of the pressure obtained in simulation for inhomogeneous nonlinear medium

**Evolution of the pressure obtained in simulation for inhomogeneous nonlinear medium**. Two planes (x = 0 and y = 0) are displayed for the fundamental **(a) **and the second harmonic **(b) **field. The limit between the two regions with different nonlinear parameters corresponds to the probe axis of symmetry

5. Discussion and conclusions

Currently available ultrasound simulators, such as FieldII

The use of GPUs for fast ultrasound simulation is indeed promising and paves the way for the investigation of new applications. For example, the so far prohibitively long parameter sweep that is needed for optimization purposes becomes possible. Pasovic et al.

One known limitation of the GASM concerns the simulation bandwidth. For example, it is surely not adequate for the needs of cMUT transducers

The GPU programming of the GASM shows a very promising opportunity in time reduction simulation in ultrasound. The GASM is the first method in ultrasound that has been tested on a GPU and the results obtained show several opportunities for future simulation tools and applications.

Competing interests

The authors declare that they have no competing interests.

Acknowledgements

Special thanks are extended to ANR-07 TecSan-015-01 MONITHER for financial support. FV**
**was financially supported by the Franco-Italian University with a VINCI and a Gallilée grant and by the Rhone-Alpes region with an Explora'Doc grant.