DL [Course 1/5] Neural Networks and Deep Learning [Week 4/4] Deep Neural Networks

Key Concepts

Understand how to use a cache to pass information from forward propagation to back propagation.
Build and train a deep L-layer Neural Network
See deep neural networks as successive blocks put one after each other
Analyze matrix and vector dimensions to check neural network implementations.
Understand the role of hyperparameters in deep learning

[mathjax]
Neural Networks and Deep Learning (deeplearning.ai) の受講メモ

Week4: Deep Neural Networks

Deep L-layer neural network

layer数
- 事前予測は難しい
- holdout
- cross validation
- dev set evaluation
Deep neural network notation

Forward propagation in a deep network

L=4の場合$$
\begin{align*}
x: z^{[1]}&=w^{[1]}\overbrace{X}^{a^{[0]}}+b^{[1]}\\
a^{[1]}&=g^{[1]}(z^{[1]})\\
z^{[2]}&=w^{[2]}a^{[1]}+b^{[2]}\\
a^{[2]}&=g^{[2]}(z^{[2]})\\
\dots\\
z^{[4]}&=w^{[4]}a^{[3]}+b^{[4]}\\
a^{[4]}&=g^{[4]}(z^{[4]})\\
&= \hat y
\end{align*}
$$
一般化$$
\begin{align*}
z^{[l]}&=w^{[l]}a^{[l-1]}+b^{[l]}\\
a^{[l]}&=g^{[l]}(z^{[l]})\\
\end{align*}
$$
Vectorized$$
\begin{align*}
Z^{[l]}&=w^{[l]}A^{[l-1]}+b^{[l]}\\
A^{[l]}&=g^{[l]}(Z^{[l]})\\
\end{align*}
$$
このl=1..L部分の実装はfor loop しかないと思われる
バグのない実装のためには
- 行列のサイズについて、秩序正しく、慎重になる
- 紙に書く

Getting your matrix dimensions right

Parameters $w^{[l]}$ and $b^{[l]}$
- $ w^{[l]}: (n^{[l]}, n^{[l-1]})$
- $ b^{[l]}: (n^{[l]}, 1)$
- $ dw^{[l]}: (n^{[l]}, n^{[l-1]})$
- $ db^{[l]}: (n^{[l]}, 1)$
- $ z^{[l]},a^{[l]}: (n^{[l]}, 1)$
Vectorized implementation
- $ w^{[l]}: (n^{[l]}, n^{[l-1]})$
- $ b^{[l]}: (n^{[l]}, m)$
  - python broadcasting をつかおう
- $ Z^{[l]}, A^{[l]} : (n^{[l]}, m)$
- $ dZ^{[l]},dA^{[l]} : (n^{[l]}, m)$

Why deep representations?

Intuition about deep representation
- 簡単なものから複雑なものへ積み上げ組み立てる
- 畳み込みネットワーク
- 顔認識
  - 境界を探すところからパーツ、全体へ
- 音声認識
  - 音量上下、ホワイトノイズ、周波数
  - 音の基本ユニット、言語学音素
  - 音の中の単語
  - センテンス、フレーズ
- 深層学習と脳が類似してると考えるのは危険
Circuit theory and deep learning 回路理論とDL
- small L-layer deep neural network（狭くて深い・hidden unitが少ない）
- shallow network(浅い)と指数関数的に多くのhidden unitが必要
- n bit xのXOR回路
  - #layer: O(log n), Node:small
  - #layer:1, Node O(n^2)
  - 深いネットのほうが計算が楽な数学的関数が存在する、ということ
Branding: Deep Learning
- いままで Neural networks with a lot of hidden layers と呼ばれていた

Building blocks of deep neural networks

Forward and backward functions
\[
\begin{array}{ccc}
a^{[l-1]}\rightarrow & \boxed{w^{[l]},b^{[l]}} & \rightarrow a^{[l]}\\
\ & \downarrow \text{cache } z^{[l]}& \ \\
da^{[l-1]}\leftarrow & \boxed{w^{[l]},b^{[l]},dz^{[l]}} & \leftarrow da^{[l]}\\
\ & \downarrow & \ \\
\ & dw^{[l]}, db^{[l]} & \ \\
\end{array}
\]

\[
\require{enclose}
\begin{array}{ccccccccc}

\begin{array}{l} a^{[0]}\\ \color{green}x \end{array} \rightarrow & \boxed{w^{[1]},b^{[1]}} & \rightarrow a^{[1]} \rightarrow & \boxed{w^{[2]},b^{[2]}} & \rightarrow a^{[2]} \rightarrow & \boxed{\dots} & \rightarrow a^{[l-1]} \rightarrow & \boxed{w^{[l]},b^{[l]}} & \rightarrow a^{[l]},\hat y \\

\ & z^{[1]} \downarrow \text{cache } \lbrace \begin{array}{l}w^{[1]} \\b^{[1]}\end{array}& \ & z^{[2]} \downarrow \text{cache } \lbrace \begin{array}{l}w^{[2]} \\b^{[2]}\end{array} & \ & \dots \downarrow \dots & \ & z^{[l]} \downarrow \text{cache } \lbrace \begin{array}{l}w^{[l]} \\b^{[l]}\end{array} & \color{green}\downarrow \\

\enclose{horizontalstrike}{da^{[0]}}\leftarrow & \boxed{w^{[1]},b^{[1]},dz^{[1]}} & \color{red} \leftarrow da^{[1]} \color{red}\leftarrow & \boxed{w^{[2]},b^{[2]},dz^{[2]}} & \color{red} \leftarrow da^{[2]} \color{red}\leftarrow & \boxed{\dots} & \color{red}\leftarrow da^{[l-1]} \color{red}\leftarrow & \boxed{w^{[l]},b^{[l]},dz^{[l]}} & \color{red}\leftarrow da^{[l]} \\

\ & \downarrow \color{green}\downarrow & \ & \downarrow \color{green}\downarrow & \ & \ \color{green}\downarrow & \ & \downarrow \color{green}\downarrow & \ \\

\ & dw^{[1]}, db^{[1]} & \ & dw^{[2]}, db^{[2]} & \ & \dots & \ & dw^{[l]}, db^{[l]} & \ \\

\end{array}
\]

Forward and Backward Propagation

Forward propagation for layer l
- Input $ a^{[l-1]}$
- Output $a^{[l]}$, cache ($z^{[l]}$)
Backward propagation for layer l
- Input $ da^{[l-1]}$
- Output $da^{[l]}, dW^{[l]}, db^{[l]}$
- \[
  \begin{align*}
  dZ^{[l]}&=\overbrace{W^{[l+1]T} dZ^{[l+1]}}^{A^{[l]}} * g^{[l]’}(Z^{[l]})\\
  dW^{[l]}&=\frac{1}{m} dZ^{[l]}X^T\\
  db^{[l]}&=\frac{1}{m} np.sum(dZ^{[l]}, axis=1, keepdims=true) \\
  dA^{[l-1]}&=W^{[l]T}\cdot dZ^{[l]}
  \end{align*}
  \]

Parameters vs Hyperparameters

Parameters: $W^{[1]}, b^{[1]}, W^{[2]}, b^{[2]}, W^{[3]}, b^{[3]}\dots $
Hyperparameters:
- Learning rate: $ \alpha $
- #iterations
- #hidden layer: $ L $
- #hidden units: $ n^{[1]}, n^{[2]}\dots $
- choice of activation function
Later course: Momentum, Minibatch, Regularization…

Applied deep learning is a very empirical process

Repeat {
- Idea
- Code
- Experiment
}

What does this have to do with the brain?

深層学習と脳はほとんど関係ない
- 理解困難なので、脳のようだと類推される
- 大衆の想像力を掻き立てるのに役立ってる
閾値計算をする脳細胞と似ている
- ロジスティック回帰と部分的には似てる
- 学習原理はわかっていない、back propagationやgradient descent なのかどうか