DeepWave: A Recurrent Neural-Network for Real-Time Imaging

Matthieu Simeoni, Sepand Kashani, Paul Hurley and Martin Vetterli

In Neural Information Processing Systems (NeurIPS), 2019

acoustic imaging

@article{simeoni2019deepwave,
title={Deepwave: a recurrent neural-network for real-time acoustic imaging},
author={Simeoni, Matthieu and Kashani, Sepand and Hurley, Paul and Vetterli, Martin},
journal={Advances In Neural Information Processing Systems},
year={2019},
volume={32},
}

PDF Code

TL;DR: This paper proposes a lightweight RNN to reconstruct spherical acoustic maps in real-time. The network is based on LISTA and it is trained with proximal gradient descent.

1. Introduction

2. Background

3. Proximal Gradient Descent

4. Methods

1. Introduction

Previous work: Delay-And-Sum (DAS) beamformer [1, Chapter 5]. Idea: Real-time reconstruction of acoustic spherical maps based on LISTA [2]. Limitations: It can reconstruct only high resolution microphone arrays

2. Background

2.1 Steering Matrix

The steering matrix is a matrix that contains the steering vectors of the microphone array. The steering vector is a vector that contains the phase shifts of the microphones in the array.
Signals can arrive to the microphones from different positions and angles. A direction can be parametrized as:

Azimuth $θ$ : The horizontal angle of arrival.
Elevation $ϕ$ : The vertical angle of arrival.

The vector that stores the phase shifts or time delays of the signal to each microphone in the array is the steering vector. The steering vector is a complex 1D vector of

M

elements (microphones),

a (θ) \in C^{M}

\begin{matrix} (Eq. 1) & a (θ) = (\begin{matrix} 1 \\ e^{- j k d \sin (θ)} \\ e^{- j 2 k d \sin (θ)} \\ ⋮ \\ e^{- j (M - 1) k d \sin (θ)} \end{matrix}) \end{matrix}

where:

$k = \frac{2 π}{λ}$ is the wavenumber.
$d$ is the distance between the microphones.
$λ$ is the wavelength.

When we have multiple sources or multiple angles of arrival, we can generalize the steering vector in a steering matrix. Supposing that we have different signals arriving from angles

θ_{1}, \dots, θ_{N}

, being

N

the number of sources, the steering matrix

A \in C^{M \times N}

is defined as:

\begin{matrix} (Eq. 2) & A = (a (θ_{1}), . . ., a (θ_{N})) = (\begin{matrix} 1 & 1 & \dots & 1 \\ e^{- j k d \sin (θ_{1})} & e^{- j k d \sin (θ_{2})} & \dots & e^{- j k d \sin (θ_{N})} \\ e^{- j 2 k d \sin (θ_{1})} & e^{- j 2 k d \sin (θ_{2})} & \dots & e^{- j 2 k d \sin (θ_{N})} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ e^{- j (M - 1) k d \sin (θ_{1})} & e^{- j (M - 1) k d \sin (θ_{2})} & \dots & e^{- j (M - 1) k d \sin (θ_{N})} \end{matrix}) \end{matrix}

3. Proximal Gradient Descent

DeepWave reconstructs the images with a Recurrent Neural Network trained with Proximal Gradient Descent (PGD) by optimizing: $\begin{matrix} (Eq. 3) & \hat{x} = \arg min_{x \in R_{+}^{N}} \frac{1}{2} ‖ \hat{Σ} - A diag (x) A^{H} ‖_{F}^{2} + λ [γ ‖ x ‖_{1} + (1 - γ) ‖ x ‖_{2}^{2}], \end{matrix}$ which, after vectorization becomes: $\begin{matrix} (Eq. 4) & \hat{x} = \arg min_{x \in R_{+}^{N}} \frac{1}{2} ‖ vec (\hat{Σ}) - (\overset{―}{A} \circ A) ‖_{F}^{2} + λ [γ ‖ x ‖_{1} + (1 - γ) ‖ x ‖_{2}^{2}], \end{matrix}$ where:

$\hat{x}$ is variable being optimized..
$\hat{Σ}$ is the estimated covariance matrix.
$A \in C^{M \times N}$ is the steering matrix.
$A^{H}$ is the Hermitian of the steering matrix.
$diag (x)$ is a diagonal matrix with the elements of $x$ .
$λ$ is the regularization parameter.
$γ$ is the regularization parameter, the trade-off parameter between the $ℓ_{1}$ and $ℓ_{2}$ norms.
$\overset{―}{A}$ is the conjugate of the steering matrix.
$\circ$ is the Hadamard product.
$‖ \cdot ‖_{F}$ : The Frobenius norm, which measures the difference between the target matrix and the approximation.

Proximal Gradient Descend (PGD) allows to approximate the expressions above with the following:

\begin{matrix} (Eq. 5) & x^{k} = ReLU (\frac{x^{k - 1} - α (\overset{―}{A} \circ A)^{H} [(A \circ A) x^{k - 1} - vec (\hat{Σ})] - λ α γ}{2 λ α (1 - γ) + 1}) k \geq 1, \end{matrix}

4. Methods

Given the covariance matrix $\hat{Σ} \in C^{M \times M}$ , where $M$ is the number of microphones, the network reconstructs the spherical acoustic map (SAM) with 2 layers. $x$ denotes the neuron at layer $l$ in Eq. 6 and Fig. 1.

Fig. 1. DeepWave's network.

The trainable parameters are

P_{θ} (L)

\hat{B}

, and

τ

. The network is trained with the objective function in Eq. 4.

\begin{matrix} (Eq. 6) & x^{l} = σ (P_{θ} (L) x^{l - 1} + [\hat{B} \circ B]^{H} vec (\hat{Σ}) - τ), l = 1, . . ., L \end{matrix}

where:

$x^{l}$ is the neuron at layer $l$ .
$σ$ is the ReLU activation function.
$P_{θ}$ is the deblurring operator and $P_{θ} (L)$ the deblurring matrix.
$L \in R^{N \times N}$ is the graph Laplacian.
$\hat{B}$ is the conjugate of the steering matrix.
$B$ is the steering matrix.
$τ$ is the threshold.
$\hat{Σ}$ is the estimated covariance matrix.
$vec (\cdot)$ is the vectorization operation.
Let $A \in C^{M \times N}$ be a matrix, where $M$ is the number of rows and $N$ is the number of columns. The vectorization operator $vec (\cdot)$ reshapes the matrix $A$ into a vector of dimension $M N \times 1$ by stacking its columns: $vec (A) \in C^{M N \times 1}$ . The operation is defined as: $\begin{matrix} (Eq. 7) & [vec (A)] M (j - 1) + i = [A] i j for i = 1, \dots, M and j = 1, \dots, N \end{matrix}$

The

Paper Title

DeepWave: A Recurrent Neural-Network for Real-Time Imaging

Matthieu Simeoni, Sepand Kashani, Paul Hurley and Martin Vetterli

In Neural Information Processing Systems (NeurIPS), 2019

Table of Contents

1. Introduction

2. Background

3. Proximal Gradient Descent

4. Methods

1. Introduction

2. Background

2.1 Steering Matrix

3. Proximal Gradient Descent

4. Methods