DL [Course 1/5] Neural Networks and Deep Learning [Week 1,2/4] Introduction to deep learning / Neural Networks Basics

Key Concepts(week1)

Be able to explain how deep learning is applied to supervised learning.
Understand what are the major categories of models (such as CNNs and RNNs), and when they should be applied.
Be able to recognize the basics of when deep learning will (or will not) work well.
Understand the major trends driving the rise of deep learning.

Key Concepts(week2)

Understand how to compute derivatives for logistic regression, using a backpropagation mindset.
Work with iPython Notebooks
Become familiar with Python and Numpy
Implement the main steps of an ML algorithm, including making predictions, derivative computation, and gradient descent.
Build a logistic regression model, structured as a shallow neural network
Be able to implement vectorization across multiple training examples
Implement computationally efficient, highly vectorized, versions of models.

[mathjax]

Neural Networks and Deep Learning (deeplearning.ai) の受講メモ

About

ディープニューラルネットワークをどのように構築して動かすのかがわかる。numpyで猫判別。

Week1: Introduction to deep learning

DLが流行ってるのはなぜか
教師あり学習にDLがどう使われるか
メジャーなモデル（CNN,RNN）
DLに適してることと適してないこと

Week2: Neural Networks Basics

Logistic Regression as a Neural Network

NN実装では、for loopを使わないで処理したい
フォワードプロパゲーション
バックプロパゲーション
ロジスティック回帰モデル(logistic regression model)
binary classificationと表記法(notation)
- x: 画像のピクセルデータ(64px*64px*3(rgb)など)
- y: 1:cat,0:non cat
- n: dimension
- (x,y): $$x\in\Bbb R^{n_x}, y\in{0,1}$$
- m train: $$\{(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),…(x^{(m)},y^{(m)})\}$$
- m test: テストサンプル数
- すべての教師サンプル行列 $$X=\begin{bmatrix}
  x^{(1)} & x^{(2)} & … & x^{(m)}\cr
  \vdots & \vdots & \vdots & \vdots \cr
  x^{(1)} & x^{(2)} & … & x^{(m)}\cr
  \end{bmatrix}$$
- 1つの教師サンプルを1行にする：転置行列
  - 実装しにくいので採用しない
- Python: $$X.shape =(n_x,m)$$
  - nx * m 次元の行列と確認
- ラベル: $$Y=\begin{bmatrix}
  y^{(1)} & y^{(2)} & \dots & y^{(m)}
  \end{bmatrix}$$
- $$Y\in\Bbb R^{1m}$$
- Python: $$Y.shape=(1,m)$$
Logistic Regression
- yの予想: $$\hat y＝P(y=1|x)$$
  - $$0\leq\hat y\leq1$$
- xは $x\in\Bbb R^{n_x}$ だから
- パラメタ: $$w\in\Bbb R^{n_x}$$
  $$b\in\Bbb R$$
- Output: $\hat y=w^{\mathrm{T}}x+b$
  - 線形回帰
  - しかしこれだと負の数や大きい数になるので意味をなさない
  - ロジスティック回帰をつかう＝シグモイド
sigmoid
- $$\sigma(z)=\frac{1}{1+e^{-z}}$$
- z is large: $$\sigma(z)\approx\frac{1}{1+0}=1$$
- z is small: $$\sigma(z)=\frac{1}{1+Bignum}\approx 0$$
Logistic Regression cost function
- y output: $$\hat y^{(i)}=\sigma(w^{\mathrm{T}}x+b), where \sigma(z^{(i)})=\frac{1}{1+e^{-z^{(i)}}}$$
- Geven: $$\{(x^{(1)},y^{(1)}),\dots,(x^{(m)},y^{(m)})\}, want \hat y^{(i)}\approx y^{(i)}$$
- Loss(error) function: $$L(\hat y, y)$$
- 二乗誤差はロジスティック回帰だと適用不可: $$=\frac{1}{2}(\hat y-y)^2$$
- 妥当な損失関数: $$=-(y \log \hat y + (1-y)\log(1-\hat y))$$
- If y=1: $$L(\hat y, y)=-\log \hat y $$ $$\gets \text{ Want } \log \hat y \text{ large, Want }\hat y\text{ large }$$
- If y=0: $$L(\hat y, y)=-\log(1-\hat y)$$ $$ \gets \text{ Want }\log(1-\hat y)\text{ large }\dots\text{ Want }\hat y\text{ small }$$
- Cost function: $$J(w,b)=\frac{1}{m}\sum_{i=1}^{m} L(\hat y^{(i)},y^{(i)})$$ $$=-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log \hat y^{(i)} + (1-y^{(i)}) \log (1- \hat y^{(i)}))$$
- 損失関数：ひとつの教師サンプルに対して適用
- コスト関数：すべてのパラメータに関するコスト
gradient descent
- $$J(w,b)$$
- $$w := w-\alpha\frac{dJ(w,b)}{dw}$$
- $$b := b-\alpha\frac{dJ(w,b)}{db}$$
- 偏微分記号ないないけど(ﾟεﾟ)ｷﾆｼﾅｲ!! $$\frac{\partial J(w,b)}{\partial w}$$
computation graph
- 構成段階
  - forward path or forward propagation step(前方パスか順誤差伝搬法)
  - backward path or back propagation step(後方パスか誤差逆伝播法)
Logistic Regression Gradient Descent
- 定義: \[
  z=w^\mathrm{T}x+b \\
  \hat y=a=\sigma(z) \\
  L(a,y)=-(y\log(a)+(1-y)\log(1-a)) \\
  \]
Logistic regression derivatives
- w1,w2だけ考えるとする:\[
  z=w_1x_1+w_2x_2+b\rightarrow a=\sigma(z)\rightarrow L(a,y)
  \]
- backward\[
  \begin{align*}
  “da”&=\frac{dL(a,y)}{da}=-\frac{y}{a}+\frac{1-y}{1-a} \\
  “dz”&=\frac{dL}{dz}=\frac{dL(a,y)}{dz}\\
  &=\frac{dL}{da}\frac{da}{dz}\\
  &=(-\frac{y}{a}+\frac{1-y}{1-a})(a(1-a))\\
  &=a-y
  \end{align*}
  \]
- 参考 sigmoid derivatives:\[
  \sigma(z)=\frac{1}{1+e^{-z}}=(1+e^{-z})^{-1}\\
  \frac{d\sigma(z)}{dz}=(1-\sigma(z))\sigma(z)
  \]
  - https://mathtrain.jp/sigmoid
  - DLにおいては連鎖律ですっとばせる
- よって:\[
  \frac{\partial L}{\partial w_1}=”dw_1″=x_1dz\\
  “dw_2″=x_2dz\\
  “db”=dz
  \]
- update:\[
  w_1:=w_1-\alpha \cdot dw_1\\
  w_2:=w_2-\alpha \cdot dw_2\\
  b:=b-\alpha \cdot db
  \]
Logistic regression on m examples
- vectorization is getting rid of for-loops.
  - m個の教師データのループ
  - wの添字分のループ

Python and Vectorization

Vectorization
- for-loopを使わない
- np.dotでwT*xを直接計算
- CPU vs GPU
  - どっちもSIMD(Single Instruction Multiple Data)命令がある
  - CPUもそれほど悪くはない
- Neural network programming guideline
  - Whenever possible, avoid explicit for-loops.
- Vectors and matrix valued functions
Vectorizing Logistic Regression
- $$
  \begin{align*}
  Z&=[z^{(1)} z ^{(2)} \dots z ^{(m)} ]\\
  &=w^\mathrm{T}X+\begin{bmatrix}b & b & \dots & b\end{bmatrix}\\
  &=\begin{bmatrix}
  w^\mathrm{T}X^{(1)}+b & w^\mathrm{T}X^{(2)}+b & \dots & w^\mathrm{T}X^{(m)}+b \end{bmatrix}
  \end{align*}$$
- in python(実数bは1xmベクトルに自動拡張される) $$Z=np.dot(w^\mathrm{T},X)+b$$
Vectorizing Logistic Regression’s Gradient Output
- $$dz^{(i)}=a{(i)}-y{(i)} \dots$$
- $$dZ=\begin{bmatrix} dz^{(1)} & dz^{(2)} & \dots & dz^{(m)}\end{bmatrix}$$
- $$A=\begin{bmatrix} a^{(1)} & a^{(2)} & \dots & a^{(m)}\end{bmatrix}$$
- $$Y=\begin{bmatrix} y^{(1)} & y^{(2)} & \dots & y^{(m)}\end{bmatrix}$$
- $$db=\frac{1}{m}np.sum(dZ)$$
- $$dw=\frac{1}{m}XdZ$$
- for-loop: \[
  \begin{align*}
  Z&=w^\mathrm{T}X+b\\
  &=np.dot(w^\mathrm{T},X)+b\\
  A&=\sigma(Z)\\
  dZ&=A-Y\\
  dw&=\frac{1}{m}XdZ^\mathrm{T}\\
  db&=\frac{1}{m}np.sum(dZ)\\
  w&:=w-\alpha dw\\
  b&:=b-\alpha db
  \end{align*}
  \]
Broadcasting in Python
- (m,n) {_-*/} (1,n) -> (m,n)
- 実数も展開される
- bsxfun in Matlab
A note on python/numpy vectors
- (5,)や(n,)やランク1配列のような構造を使わない
  - a= np.random.randn(5)
  - a.shape: (5,) : rank1array ng!!
- a = np.random.randn(5,1) 列ベクトル ok
- a = np.random.randm(1,5) 行ベクトル ok
- assert(a.shape == (5,1)) assertしよう
- a.reshape((5,1)) reshapeしよう
Quick tour of Jupyter /iPython Notebooks
- つかいかた
Explanation of logistic regression cost function(optional)
- 損失関数Lが妥当か\[
  \text{If } y=1: p(y|x) = \hat y\\
  \text{If } y=0: p(y|x) = 1 – \hat y\\
  p(y|x) = \hat y^y(1-\hat y)^{(1-y)}
  \]
- 単調増加 $\log$:\[
  \begin{align*}
  \log p(x|y) &= \log \hat y^y(1-\hat y)^{(1-y)}\\
  &=y\log \hat y + (1-y)\log(1-\hat y)\\
  &=-L(\hat y,y)
  \end{align*}
  \]
- Cost on m examples:\[
  \log p(\boxed{\text{labels in training set}}) = \log \prod_{i=1}^{m} p(y^{(i)}|x^{(i)})\\
  =\sum_{i=0}^{m}\log p(y^{(i)}|x^{(i)})\\
  =-\sum_{i=0}^{m}L(\hat y^{(i)}|y^{(i)})
  \]
- Cost(min):\[
  J(w,b)=\frac{1}{m}\sum_{i=1}^{m}L(\hat y^{(i)},y^{(i)})
  \]

Programming Assignments

Python basics with numpy(optional)

how to use numpy
- basic core DL functions such as softmax,sigmoid,dsigmoid
- vectorization
- broadcasting
Building basic functions with numpy. What you need to remember:
- np.exp(x) works for any np.array x and applies the exponential function to every coordinate
- the sigmoid function and its gradient
- image2vector is commonly used in deep learning
- np.reshape is widely used. In the future, you’ll see that keeping your matrix/vector dimensions straight will go toward eliminating a lot of bugs.
- numpy has efficient built-in functions
- broadcasting is extremely useful
Vectorization. What to remember:
- Vectorization is very important in deep learning. It provides computational efficiency and clarity.
- You have reviewed the L1 and L2 loss.
- You are familiar with many numpy functions such as np.sum, np.dot, np.multiply, np.maximum, etc…

Logistic Regression with a Neural Network mindset(required)

Learn about:
- Work with logistic regression in a way that builds intuition relevant to neural networks.
- Learn how to minimize the cost function.
- Understand how derivatives of the cost are used to update parameters.
Common steps for pre-processing a new dataset are:
- Figure out the dimensions and shapes of the problem (m_train, m_test, num_px, …)
- Reshape the datasets such that each example is now a vector of size (num_px * num_px * 3, 1)
- “Standardize” the data
Key steps: In this exercise, you will carry out the following steps:
- Initialize the parameters of the model
- Learn the parameters for the model by minimizing the cost
- Use the learned parameters to make predictions (on the test set)
- Analyse the results and conclude

The main steps for building a Neural Network are:
1. Define the model structure (such as number of input features)
2. Initialize the model’s parameters
3. Loop:
  - Calculate current loss (forward propagation)
  - Calculate current gradient (backward propagation)
  - Update parameters (gradient descent)
4. You often build 1-3 separately and integrate them into one function we call model().
What to remember: You’ve implemented several functions that:
- Initialize (w,b)
- Optimize the loss iteratively to learn parameters (w,b):
  - computing the cost and its gradient
  - updating the parameters using gradient descent
- Use the learned (w,b) to predict the labels for a given set of examples
What to remember from this assignment:
- Preprocessing the dataset is important.
- You implemented each function separately: initialize(), propagate(), optimize(). Then you built a model().
- Tuning the learning rate (which is an example of a “hyperparameter”) can make a big difference to the algorithm. You will see more examples of this later in this course!

About

Week1: Introduction to deep learning

Week2: Neural Networks Basics

Logistic Regression as a Neural Network

Python and Vectorization

Programming Assignments

Python basics with numpy(optional)

Logistic Regression with a Neural Network mindset(required)

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル