The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Home Explore The Lost Honour of 2-Based Regularization

View in Fullscreen

The Lost Honour of ℓ2-Based Regularization Uri Ascher Department of Computer Science University of British Columbia October 2012 Kees van den Doel & Eldad Haber ...

Like this book? You can publish your book online for free in a few minutes!

http://anyflip.com/kjri/uzbc/

Download PDF

Related Publications

Discover the best professional documents and content resources in AnyFlip Document Base.

Published by , 2016-07-08 07:39:03

The Lost Honour of 2-Based Regularization

Pages:

1 - 50
51 - 79

The Lost Honour of ℓ2-Based Regularization Uri Ascher Department of Computer Science University of British Columbia October 2012 Kees van den Doel & Eldad Haber ...

Highly ill-conditioned large problems Computed myography

. CMG problem

Potential problem

−∇ · (σ∇v ) = u in Ω,

subject to Neumann boundary conditions.

Inverse problem: recover source u from measurements of v on

boundary.
In previous notation, J = QA−1, with A the discretization matrix of
the potential problem and Q the data projector operator.

Highly ill-posed: many different sources explain data. Regularization
should select solution that agrees with a priori information.

Model consists of set of discrete tripoles: use L1 regularization?

Typical 3D calculation: 50,000 finite elements. L1 codes based on

gradient projection are hopeless! ......
October 2012 30 / 44
Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2

Highly ill-conditioned large problems Computed myography

. CMG problem

Potential problem

−∇ · (σ∇v ) = u in Ω,

subject to Neumann boundary conditions.

Inverse problem: recover source u from measurements of v on

boundary.
In previous notation, J = QA−1, with A the discretization matrix of
the potential problem and Q the data projector operator.

Highly ill-posed: many different sources explain data. Regularization
should select solution that agrees with a priori information.

Model consists of set of discrete tripoles: use L1 regularization?

Typical 3D calculation: 50,000 finite elements. L1 codes based on

gradient projection are hopeless! ......
October 2012 30 / 44
Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2

Highly ill-conditioned large problems Inverse potential problem

. Inverse potential problem

Study the simpler problem with σ ≡ 1 on a square (2D):
−∆v = u(x), x ∈ Ω.

Generate data using 64 × 64 grid, pollute by 1% white noise. Ground
truth for sought source u are various tripole distributions.

Left: ground truth. Center: L2G. Right: L1G.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 31 / 44

Highly ill-conditioned large problems Inverse potential problem

. More example instances

Left: ground truth. Center: L2G. Right: L1G.
Next, consider a point charge pair for a source:

Left: ground truth. Center: L2G. Right: L1G.

..... .
32 / 44
Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012

Highly ill-conditioned large problems Inverse potential problem

. More example instances

Left: ground truth. Center: L2G. Right: L1G.
Next, consider a point charge pair for a source:

Left: ground truth. Center: L2G. Right: L1G.

..... .
32 / 44
Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012

Highly ill-conditioned large problems Inverse potential problem

. L1 and L2 for a point charge pair

Left: L2. Center: L1. Right: weighted L1.

Indeed, the L1 methods give sparse solutions, but not very sparse and not

very good ones.

......

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 33 / 44

Highly ill-conditioned large problems Inverse potential problem

. Simple analysis

min 1 ∥J u − b∥2 + β ∥W u∥1
2
u

Singular value decomposition:
J = UΣV T

with Σ = {diag σi }, σ1 ≥ · · · ≥ σm ≥ 0, and U, V orthogonal.
Special case: W = V T . Then for z = W u, c = UT b, consider

min 1 ∥Σz − c∥2 + β ∥z∥1
2
z

Representation for true model z∗ = W u∗ satisfies σi zi∗ = ci + εi .
Consider a sparse representation

zi∗ = 1 if i ∈ T , = 0 if i ∈/ T .

Under what conditions can we expect to stably calculate z with the

same sparsity (i.e. zi = 0 iff i ∈/ T )? ......
October 2012 34 / 44
Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2

Highly ill-conditioned large problems Inverse potential problem

. Simple analysis

min 1 ∥J u − b∥2 + β ∥W u∥1
2
u

Singular value decomposition:
J = UΣV T

with Σ = {diag σi }, σ1 ≥ · · · ≥ σm ≥ 0, and U, V orthogonal.
Special case: W = V T . Then for z = W u, c = UT b, consider

min 1 ∥Σz − c∥2 + β ∥z∥1
2
z

Representation for true model z∗ = W u∗ satisfies σi zi∗ = ci + εi .
Consider a sparse representation

zi∗ = 1 if i ∈ T , = 0 if i ∈/ T .

Under what conditions can we expect to stably calculate z with the

same sparsity (i.e. zi = 0 iff i ∈/ T )? ......
October 2012 34 / 44
Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2

Highly ill-conditioned large problems Inverse potential problem

. Simple analysis for L1

Assume noise with mean 0 and covariance ρ2I , and define

σ+ = max σi , σ− = min σi .

i ∈/ T i ∈T

Theorem:
Using L1, the true and reconstructed models, z∗ and z, are expected

to have the same zero structure only if either σ+ ≤ σ− or

ρ ≤ √ σ−2 .
σ+2 − σ−2

(Can’t get the sparse computed z stably for just any sparse z∗.)

Further difficulties arise in the latter case in determining the
regularization parameter β.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 35 / 44

Highly ill-conditioned large problems Inverse potential problem

. Simple analysis for L1

Assume noise with mean 0 and covariance ρ2I , and define

σ+ = max σi , σ− = min σi .

i ∈/ T i ∈T

Theorem:
Using L1, the true and reconstructed models, z∗ and z, are expected

to have the same zero structure only if either σ+ ≤ σ− or

ρ ≤ √ σ−2 .
σ+2 − σ−2

(Can’t get the sparse computed z stably for just any sparse z∗.)

Further difficulties arise in the latter case in determining the
regularization parameter β.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 35 / 44

Highly ill-conditioned large problems Inverse potential problem

. Simple analysis for L1

Assume noise with mean 0 and covariance ρ2I , and define

σ+ = max σi , σ− = min σi .

i ∈/ T i ∈T

Theorem:
Using L1, the true and reconstructed models, z∗ and z, are expected

to have the same zero structure only if either σ+ ≤ σ− or

ρ ≤ √ σ−2 .
σ+2 − σ−2

(Can’t get the sparse computed z stably for just any sparse z∗.)

Further difficulties arise in the latter case in determining the
regularization parameter β.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 35 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

. Nonlinear inverse problems

Nonlinear objective function and ℓ1 constraints.

Note: when the constraint (solid red) is non-linear, it does not
necessarily intersect the level set of ∥u∥1 at a vertex, so the solution
is not necessarily sparse.

Still, L1G can be very useful due to less smearing across jumps!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 36 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

. Nonlinear inverse problems

Nonlinear objective function and ℓ1 constraints.

Note: when the constraint (solid red) is non-linear, it does not
necessarily intersect the level set of ∥u∥1 at a vertex, so the solution
is not necessarily sparse.

Still, L1G can be very useful due to less smearing across jumps!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 36 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

. Nonlinear inverse problems

Nonlinear objective function and ℓ1 constraints.

Note: when the constraint (solid red) is non-linear, it does not
necessarily intersect the level set of ∥u∥1 at a vertex, so the solution
is not necessarily sparse.

Still, L1G can be very useful due to less smearing across jumps!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 36 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

. EIT and DC resistivity

[Haber, Heldman & Ascher, 2007; van den Doel & Ascher, 2011; Haber, Chung

& Herrmann, 2011; Roosta-Khorasani, van den Doel & Ascher, 2012]
Forward problem:

∇ · (σ∇v i ) = qi , x ∈ Ω, i = 1, . . . , s,

∂vi |∂Ω = 0.
∂ν

Take Ω to be a square. Construct s data sets, choosing different
current patterns:

qi (x) = δx,pLiL − δx,piRR ,
where pLiL and√pRiR are located on the left and right boundaries√resp.
Place each at s different, equidistant locations, 1 ≤ iL, iR ≤ s.
Predict data by measuring field at specified locations:

F (u) = (F1(u), . . . , Fs (u))T .

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 37 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

. EIT and DC resistivity: incorporating bounds

Often we have a priori information that σmin ≤ σ(x) ≤ σmax.
Predict data by measuring field at specified locations:

F (u) = (F1(u), . . . , Fs (u))T , where
u(x) = ψ−1(σ(x)), with transfer function
ψ(t) = .5(σmax − σmin) tanh(t) + .5(σmax + σmin).

In the following experiments use
σmin = 1, σmax = 10,

and calculate on a 64 × 64 grid.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 38 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

. EIT and DC resistivity: incorporating bounds

Often we have a priori information that σmin ≤ σ(x) ≤ σmax.
Predict data by measuring field at specified locations:

F (u) = (F1(u), . . . , Fs (u))T , where
u(x) = ψ−1(σ(x)), with transfer function
ψ(t) = .5(σmax − σmin) tanh(t) + .5(σmax + σmin).

In the following experiments use
σmin = 1, σmax = 10,

and calculate on a 64 × 64 grid.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 38 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

EIT and DC resistivity example: multiple
. right hand sides

Note: having many experiments here helps, both theoretically and
practically! Thus, may want s large, although this brings major
computational difficulties.
Use stochastic methods for model reduction in case that s is large.
Synthesize experiment’s data by using “true” u(x), calculating data
on twice-as-fine grid, adding 3% Gaussian noise.
Generalized Tikhonov:

min 1 ∥F (u) − b∥2 + βR(u)
u2

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 39 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

. L2G vs L1G: results

Left: true. Center: s = 4, L2G. Right: s = 4, L1G.

Left: true. Center: s = 64, L2G. Ri. ght.: s =. 64,. L1G. . .

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 October 2012 40 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

. L2G vs L1G: results cont.

√
Decrease noise to 1%, increase s.

99
88
77
66
55
44
33
22

Left: s = 1024, L2G. Right: s = 1024, L1G.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 41 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

. L2G vs L1G results: observations

Indeed, better results for larger s.
For s = 64 the L1G results are somewhat better than L2G.
Upon increasing s further, and using finer grids, and having less
noise, L1G becomes significantly better than L2G.
Expensive computing may be encountered when trying to obtain more
accurate solutions. Even more so in 3D. Adaptive grids (meshes) and
random sampling of right hand side combinations help decrease the
pain significantly!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 42 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

. L2G vs L1G results: observations

Indeed, better results for larger s.
For s = 64 the L1G results are somewhat better than L2G.
Upon increasing s further, and using finer grids, and having less
noise, L1G becomes significantly better than L2G.
Expensive computing may be encountered when trying to obtain more
accurate solutions. Even more so in 3D. Adaptive grids (meshes) and
random sampling of right hand side combinations help decrease the
pain significantly!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 42 / 44

Highly ill-conditioned large problems Nonlinearity, EIT and DC resistivity

. L2G vs L1G results: observations

Indeed, better results for larger s.
For s = 64 the L1G results are somewhat better than L2G.
Upon increasing s further, and using finer grids, and having less
noise, L1G becomes significantly better than L2G.
Expensive computing may be encountered when trying to obtain more
accurate solutions. Even more so in 3D. Adaptive grids (meshes) and
random sampling of right hand side combinations help decrease the
pain significantly!

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 42 / 44

Highly ill-conditioned large problems Is ℓ1-based worthwhile?

ℓ1-based regularization for large, highly

. ill-conditioned problems?

Yes, L1G, aka total variation, is worth considering!

although, it does not always deliver and it’s often not cheap to work
with.

Bigger doubts linger regarding L1 (when relevant), not only because
of the possibly prohibitive cost:

For the inverse potential problem, can show based on physical grounds
that the γ-criterion of Juditsky & Nemirovski is violated.
Furthermore, our numerical results for the inverse potential problem do
not show positive evidence.
Can show that in some situations, using L1 can be highly restrictive
and ineffective, regardless of cost.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 43 / 44

Highly ill-conditioned large problems Is ℓ1-based worthwhile?

ℓ1-based regularization for large, highly

. ill-conditioned problems?

Yes, L1G, aka total variation, is worth considering!

although, it does not always deliver and it’s often not cheap to work
with.

Bigger doubts linger regarding L1 (when relevant), not only because
of the possibly prohibitive cost:

For the inverse potential problem, can show based on physical grounds
that the γ-criterion of Juditsky & Nemirovski is violated.
Furthermore, our numerical results for the inverse potential problem do
not show positive evidence.
Can show that in some situations, using L1 can be highly restrictive
and ineffective, regardless of cost.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 43 / 44

Highly ill-conditioned large problems Is ℓ1-based worthwhile?

ℓ1-based regularization for large, highly

. ill-conditioned problems?

Yes, L1G, aka total variation, is worth considering!

although, it does not always deliver and it’s often not cheap to work
with.

Bigger doubts linger regarding L1 (when relevant), not only because
of the possibly prohibitive cost:

For the inverse potential problem, can show based on physical grounds
that the γ-criterion of Juditsky & Nemirovski is violated.
Furthermore, our numerical results for the inverse potential problem do
not show positive evidence.
Can show that in some situations, using L1 can be highly restrictive
and ineffective, regardless of cost.

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 43 / 44

Conclusions Conclusions

. Conclusions

In many situations, ℓ1-based regularization is well-worth using. Such
techniques can provide exciting advances (e.g. in model reduction).

However, such techniques are not suitable for all problems, and it is
dangerous (and may consume many student-years) to apply them
blindly.

In practice, always consider first using ℓ2-based regularization,
because they are simpler and more robust. Only upon deciding that
these are not sufficiently good for the given application, proceed to
examine ℓ1-based alternatives (when this makes sense).

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 44 / 44

Conclusions Conclusions

. Conclusions

In many situations, ℓ1-based regularization is well-worth using. Such
techniques can provide exciting advances (e.g. in model reduction).

However, such techniques are not suitable for all problems, and it is
dangerous (and may consume many student-years) to apply them
blindly.

In practice, always consider first using ℓ2-based regularization,
because they are simpler and more robust. Only upon deciding that
these are not sufficiently good for the given application, proceed to
examine ℓ1-based alternatives (when this makes sense).

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 44 / 44

Conclusions Conclusions

. Conclusions

In many situations, ℓ1-based regularization is well-worth using. Such
techniques can provide exciting advances (e.g. in model reduction).

However, such techniques are not suitable for all problems, and it is
dangerous (and may consume many student-years) to apply them
blindly.

In practice, always consider first using ℓ2-based regularization,
because they are simpler and more robust. Only upon deciding that
these are not sufficiently good for the given application, proceed to
examine ℓ1-based alternatives (when this makes sense).

Kees van den Doel & Eldad Haber (UBC) ℓ1 and ℓ2 ......
October 2012 44 / 44

Pages:

1 - 50
51 - 79

Click to View FlipBook Version