Sequential Separation and Dereverberation: the Two-Stage Approach

This chapter will continue the discussion started in the previous chapter on source extraction (or separation) and speech dereverberation with classical approaches. The same MIMO framework will be used for analysis. But instead of trying to determine a so

  • PDF / 283,030 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 21 Downloads / 162 Views

DOWNLOAD

REPORT


8.1 Introduction This chapter will continue the discussion started in the previous chapter on source extraction (or separation) and speech dereverberation with classical approaches. The same MIMO framework will be used for analysis. But instead of trying to determine a solution in one step, we will present a two-stage approach for sequential separation and dereverberation. This will help the reader better comprehend the interactions between spatial and temporal processings in a microphone array system.

8.2 Signal Model and Problem Description The problem of source separation and speech dereverberation has been clearly described in Section 7.2. But for the self containment of this chapter and for the convenience of the readers, we decide to briefly repeat the signal model in the following. We consider an N -element microphone array in a reverberant acoustic environment in which there are M sound sources. This is an M × N MIMO system. As shown in Fig. 8.1, the nth microphone output is expressed as yn (k) =

M 

gnm ∗ sm (k) + vn (k), n = 1, 2, . . . , N.

(8.1)

m=1

The objective of separation and dereverberation is to retrieve the source signals sm (k) (m = 1, 2, . . . , M ) by applying a set of filters hmn (m = 1, 2, . . . , M , n = 1, 2, . . . , N ) to the microphone outputs yn (k) (n = 1, 2, . . . , N ), as illustrated by Fig. 8.1. In the absence of additive noise, the resulting signal of separation and dereverberation is obtained as za (k) = HGsM L (k),

(8.2)

166

8 Separation and Dereverberation

s2 (k)

. g12

g22

.

gN2

.

gN1

1

2

y2 (k)

y1 (k)

...

...

...

Σ

Σ

z1 (k)

z2 (k)

HM 2 (z)

H22 (z)

...

H12 (z)

...

HM 1 (z)

H21 (z)

...

.

. .

. .

.

.

g2M . . . g NM

vN (k)

N

. .

.

yN (k)

...

.

.

...

HM N (z)

v2 (k)

v1 (k)

H11 (z)

sM (k) g1M

H1N (z)

g21

g11

.

..

H2N (z)

s1 (k)

...

... Σ zM (k)

Fig. 8.1. Illustration of source separation and speech dereverberation.

where T  za (k) = z1 (k) z2 (k) · · · zM (k) , ⎤ ⎡ T h11 hT12 · · · hT1N ⎢ hT21 hT22 · · · hT2N ⎥ ⎥ ⎢ , H=⎢ . .. . . .. ⎥ ⎣ .. . . ⎦ . hTM 1 hTM 2 · · · hTM N M ×N L h T  hmn = hmn,0 hmn,1 · · · hmn,Lh −1 ,

8.2 Signal Model and Problem Description

167

m = 1, 2, . . . , M, n = 1, 2, . . . , N, ⎤ ⎡ G11 G12 · · · G1M ⎢ G21 G22 · · · G2M ⎥ ⎥ ⎢ , G=⎢ . .. . . .. ⎥ ⎣ .. . . . ⎦ ⎡

Gnm

GN 1 GN 2 · · · GN M

N Lh ×M L

gnm,0 · · · gnm,Lg −1 0 ⎢ 0 gnm,0 · · · g nm,L g −1 ⎢ =⎢ . . . . .. .. .. ⎣ .. 0

···

0

gnm,0

··· ··· .. .

0 0 .. .

· · · gnm,Lg −1

⎤ ⎥ ⎥ ⎥ ⎦

,

Lh ×L

n = 1, 2, . . . , N, m = 1, 2, . . . , M, T  T sM L (k) = sL,1 (k) sTL,2 (k) · · · sTL,M (k) , T  sL,m (k) = sm (k) sm (k − 1) · · · sm (k − L + 1) , m = 1, 2, . . . , M, Lg is the length of the longest channel impulse response in the acoustic MIMO system, Lh is the length of the separation-and-dereverberation filters, and L = Lg + Lh − 1. Since we aim to make zm (k) = sm (k − τm ), m = 1, 2, . . . , M,

(8.3)

where τm is a constant delay, the conditions for separation and dereverberation are deduced as ⎡ T ⎤ u11 0TL×1 · · · 0TL×1