カテゴリー
深層学習

DL [Course 1/5] Neural Networks and Deep Learning [Week 1,2/4] Introduction to deep learning / Neural Networks Basics

Key Concepts(week1)

  • Be able to explain how deep learning is applied to supervised learning.
  • Understand what are the major categories of models (such as CNNs and RNNs), and when they should be applied.
  • Be able to recognize the basics of when deep learning will (or will not) work well.
  • Understand the major trends driving the rise of deep learning.

Key Concepts(week2)

  • Understand how to compute derivatives for logistic regression, using a backpropagation mindset.
  • Work with iPython Notebooks
  • Become familiar with Python and Numpy
  • Implement the main steps of an ML algorithm, including making predictions, derivative computation, and gradient descent.
  • Build a logistic regression model, structured as a shallow neural network
  • Be able to implement vectorization across multiple training examples
  • Implement computationally efficient, highly vectorized, versions of models.

[mathjax]

Neural Networks and Deep Learning (deeplearning.ai) の受講メモ

About

ディープニューラルネットワークをどのように構築して動かすのかがわかる。numpyで猫判別。

Week1: Introduction to deep learning

  • DLが流行ってるのはなぜか
  • 教師あり学習にDLがどう使われるか
  • メジャーなモデル(CNN,RNN)
  • DLに適してることと適してないこと

Week2: Neural Networks Basics

Logistic Regression as a Neural Network

  • NN実装では、for loopを使わないで処理したい
  • フォワードプロパゲーション
  • バックプロパゲーション
  • ロジスティック回帰モデル(logistic regression model)
  • binary classificationと表記法(notation)
    • x: 画像のピクセルデータ(64px*64px*3(rgb)など)
    • y: 1:cat,0:non cat
    • n: dimension
    • (x,y): $$x\in\Bbb R^{n_x}, y\in{0,1}$$
    • m train: $$\{(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),…(x^{(m)},y^{(m)})\}$$
    • m test: テストサンプル数
    • すべての教師サンプル行列 $$X=\begin{bmatrix}
      x^{(1)} & x^{(2)} & … & x^{(m)}\cr
      \vdots & \vdots & \vdots & \vdots \cr
      x^{(1)} & x^{(2)} & … & x^{(m)}\cr
      \end{bmatrix}$$
    • 1つの教師サンプルを1行にする:転置行列
      • 実装しにくいので採用しない
    • Python: $$X.shape =(n_x,m)$$
      • nx * m 次元の行列 と確認
    • ラベル: $$Y=\begin{bmatrix}
      y^{(1)} & y^{(2)} & \dots & y^{(m)}
      \end{bmatrix}$$
    • $$Y\in\Bbb R^{1m}$$
    • Python: $$Y.shape=(1,m)$$
  • Logistic Regression
    • yの予想: $$\hat y=P(y=1|x)$$
      • $$0\leq\hat y\leq1$$
    • xは \(x\in\Bbb R^{n_x}\) だから
    • パラメタ: $$w\in\Bbb R^{n_x}$$
      $$b\in\Bbb R$$
    • Output: \(\hat y=w^{\mathrm{T}}x+b\)
      • 線形回帰
      • しかしこれだと負の数や大きい数になるので意味をなさない
      • ロジスティック回帰をつかう=シグモイド
  • sigmoid
    • $$\sigma(z)=\frac{1}{1+e^{-z}}$$
    • z is large: $$\sigma(z)\approx\frac{1}{1+0}=1$$
    • z is small: $$\sigma(z)=\frac{1}{1+Bignum}\approx 0$$
  • Logistic Regression cost function
    • y output: $$\hat y^{(i)}=\sigma(w^{\mathrm{T}}x+b), where \sigma(z^{(i)})=\frac{1}{1+e^{-z^{(i)}}}$$
    • Geven: $$\{(x^{(1)},y^{(1)}),\dots,(x^{(m)},y^{(m)})\}, want \hat y^{(i)}\approx y^{(i)}$$
    • Loss(error) function: $$L(\hat y, y)$$
    • 二乗誤差はロジスティック回帰だと適用不可: $$=\frac{1}{2}(\hat y-y)^2$$
    • 妥当な損失関数: $$=-(y \log \hat y + (1-y)\log(1-\hat y))$$
    • If y=1: $$L(\hat y, y)=-\log \hat y $$ $$\gets \text{ Want } \log \hat y \text{ large, Want }\hat y\text{ large }$$
    • If y=0: $$L(\hat y, y)=-\log(1-\hat y)$$ $$ \gets \text{ Want }\log(1-\hat y)\text{ large }\dots\text{ Want }\hat y\text{ small }$$
    • Cost function: $$J(w,b)=\frac{1}{m}\sum_{i=1}^{m} L(\hat y^{(i)},y^{(i)})$$ $$=-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log \hat y^{(i)} + (1-y^{(i)}) \log (1- \hat y^{(i)}))$$
    • 損失関数:ひとつの教師サンプルに対して適用
    • コスト関数:すべてのパラメータに関するコスト
  • gradient descent
    • $$J(w,b)$$
    • $$w := w-\alpha\frac{dJ(w,b)}{dw}$$
    • $$b := b-\alpha\frac{dJ(w,b)}{db}$$
    • 偏微分記号ないないけど(゚ε゚)キニシナイ!! $$\frac{\partial J(w,b)}{\partial w}$$
  • computation graph
    • 構成段階
      • forward path or forward propagation step(前方パスか順誤差伝搬法)
      • backward path or back propagation step(後方パスか誤差逆伝播法)
  • Logistic Regression Gradient Descent
    • 定義: \[
      z=w^\mathrm{T}x+b \\
      \hat y=a=\sigma(z) \\
      L(a,y)=-(y\log(a)+(1-y)\log(1-a)) \\
      \]
  • Logistic regression derivatives
    • w1,w2だけ考えるとする:\[
      z=w_1x_1+w_2x_2+b\rightarrow a=\sigma(z)\rightarrow L(a,y)
      \]
    • backward\[
      \begin{align*}
      “da”&=\frac{dL(a,y)}{da}=-\frac{y}{a}+\frac{1-y}{1-a} \\
      “dz”&=\frac{dL}{dz}=\frac{dL(a,y)}{dz}\\
      &=\frac{dL}{da}\frac{da}{dz}\\
      &=(-\frac{y}{a}+\frac{1-y}{1-a})(a(1-a))\\
      &=a-y
      \end{align*}
      \]
    • 参考 sigmoid derivatives:\[
      \sigma(z)=\frac{1}{1+e^{-z}}=(1+e^{-z})^{-1}\\
      \frac{d\sigma(z)}{dz}=(1-\sigma(z))\sigma(z)
      \]
    • よって:\[
      \frac{\partial L}{\partial w_1}=”dw_1″=x_1dz\\
      “dw_2″=x_2dz\\
      “db”=dz
      \]
    • update:\[
      w_1:=w_1-\alpha \cdot dw_1\\
      w_2:=w_2-\alpha \cdot dw_2\\
      b:=b-\alpha \cdot db
      \]
  • Logistic regression on m examples
    • vectorization is getting rid of for-loops.
      • m個の教師データのループ
      • wの添字分のループ

Python and Vectorization

  • Vectorization
    • for-loopを使わない
    • np.dotでwT*xを直接計算
    • CPU vs GPU
      • どっちもSIMD(Single Instruction Multiple Data)命令がある
      • CPUもそれほど悪くはない
    • Neural network programming guideline
      • Whenever possible, avoid explicit for-loops.
    • Vectors and matrix valued functions
  • Vectorizing Logistic Regression
    • $$
      \begin{align*}
      Z&=[z^{(1)} z ^{(2)} \dots z ^{(m)} ]\\
      &=w^\mathrm{T}X+\begin{bmatrix}b & b & \dots & b\end{bmatrix}\\
      &=\begin{bmatrix}
      w^\mathrm{T}X^{(1)}+b & w^\mathrm{T}X^{(2)}+b & \dots & w^\mathrm{T}X^{(m)}+b \end{bmatrix}
      \end{align*}$$
    • in python(実数bは1xmベクトルに自動拡張される) $$Z=np.dot(w^\mathrm{T},X)+b$$
  • Vectorizing Logistic Regression’s Gradient Output
    • $$dz^{(i)}=a{(i)}-y{(i)} \dots$$
    • $$dZ=\begin{bmatrix} dz^{(1)} & dz^{(2)} & \dots & dz^{(m)}\end{bmatrix}$$
    • $$A=\begin{bmatrix} a^{(1)} & a^{(2)} & \dots & a^{(m)}\end{bmatrix}$$
    • $$Y=\begin{bmatrix} y^{(1)} & y^{(2)} & \dots & y^{(m)}\end{bmatrix}$$
    • $$db=\frac{1}{m}np.sum(dZ)$$
    • $$dw=\frac{1}{m}XdZ$$
    • for-loop: \[
      \begin{align*}
      Z&=w^\mathrm{T}X+b\\
      &=np.dot(w^\mathrm{T},X)+b\\
      A&=\sigma(Z)\\
      dZ&=A-Y\\
      dw&=\frac{1}{m}XdZ^\mathrm{T}\\
      db&=\frac{1}{m}np.sum(dZ)\\
      w&:=w-\alpha dw\\
      b&:=b-\alpha db
      \end{align*}
      \]
  • Broadcasting in Python
    • (m,n) {_-*/} (1,n) -> (m,n)
    • 実数も展開される
    • bsxfun in Matlab
  • A note on python/numpy vectors
    • (5,)や(n,)やランク1配列のような構造を使わない
      • a= np.random.randn(5)
      • a.shape: (5,) : rank1array ng!!
    • a = np.random.randn(5,1) 列ベクトル ok
    • a = np.random.randm(1,5) 行ベクトル ok
    • assert(a.shape == (5,1)) assertしよう
    • a.reshape((5,1)) reshapeしよう
  • Quick tour of Jupyter /iPython Notebooks
    • つかいかた
  • Explanation of logistic regression cost function(optional)
    • 損失関数Lが妥当か\[
      \text{If } y=1: p(y|x) = \hat y\\
      \text{If } y=0: p(y|x) = 1 – \hat y\\
      p(y|x) = \hat y^y(1-\hat y)^{(1-y)}
      \]
    • 単調増加 \(\log\):\[
      \begin{align*}
      \log p(x|y) &= \log \hat y^y(1-\hat y)^{(1-y)}\\
      &=y\log \hat y + (1-y)\log(1-\hat y)\\
      &=-L(\hat y,y)
      \end{align*}
      \]
    • Cost on m examples:\[
      \log p(\boxed{\text{labels in training set}}) = \log \prod_{i=1}^{m} p(y^{(i)}|x^{(i)})\\
      =\sum_{i=0}^{m}\log p(y^{(i)}|x^{(i)})\\
      =-\sum_{i=0}^{m}L(\hat y^{(i)}|y^{(i)})
      \]
    • Cost(min):\[
      J(w,b)=\frac{1}{m}\sum_{i=1}^{m}L(\hat y^{(i)},y^{(i)})
      \]

Programming Assignments

Python basics with numpy(optional)

  • how to use numpy
    • basic core DL functions such as softmax,sigmoid,dsigmoid
    • vectorization
    • broadcasting
  • Building basic functions with numpy. What you need to remember:
    • np.exp(x) works for any np.array x and applies the exponential function to every coordinate
    • the sigmoid function and its gradient
    • image2vector is commonly used in deep learning
    • np.reshape is widely used. In the future, you’ll see that keeping your matrix/vector dimensions straight will go toward eliminating a lot of bugs.
    • numpy has efficient built-in functions
    • broadcasting is extremely useful
  • Vectorization. What to remember:
    • Vectorization is very important in deep learning. It provides computational efficiency and clarity.
    • You have reviewed the L1 and L2 loss.
    • You are familiar with many numpy functions such as np.sum, np.dot, np.multiply, np.maximum, etc…

Logistic Regression with a Neural Network mindset(required)

  • Learn about:
    • Work with logistic regression in a way that builds intuition relevant to neural networks.
    • Learn how to minimize the cost function.
    • Understand how derivatives of the cost are used to update parameters.
  • Common steps for pre-processing a new dataset are:
    • Figure out the dimensions and shapes of the problem (m_train, m_test, num_px, …)
    • Reshape the datasets such that each example is now a vector of size (num_px * num_px * 3, 1)
    • “Standardize” the data
  • Key steps: In this exercise, you will carry out the following steps:
    • Initialize the parameters of the model
    • Learn the parameters for the model by minimizing the cost
    • Use the learned parameters to make predictions (on the test set)
    • Analyse the results and conclude
  • The main steps for building a Neural Network are:
    1. Define the model structure (such as number of input features)
    2. Initialize the model’s parameters
    3. Loop:
      • Calculate current loss (forward propagation)
      • Calculate current gradient (backward propagation)
      • Update parameters (gradient descent)
    4. You often build 1-3 separately and integrate them into one function we call model().
  • What to remember: You’ve implemented several functions that:
    • Initialize (w,b)
    • Optimize the loss iteratively to learn parameters (w,b):
      • computing the cost and its gradient
      • updating the parameters using gradient descent
    • Use the learned (w,b) to predict the labels for a given set of examples
  • What to remember from this assignment:
    • Preprocessing the dataset is important.
    • You implemented each function separately: initialize(), propagate(), optimize(). Then you built a model().
    • Tuning the learning rate (which is an example of a “hyperparameter”) can make a big difference to the algorithm. You will see more examples of this later in this course!

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です