Key Concepts(week1)
- Be able to explain how deep learning is applied to supervised learning.
- Understand what are the major categories of models (such as CNNs and RNNs), and when they should be applied.
- Be able to recognize the basics of when deep learning will (or will not) work well.
- Understand the major trends driving the rise of deep learning.
Key Concepts(week2)
- Understand how to compute derivatives for logistic regression, using a backpropagation mindset.
- Work with iPython Notebooks
- Become familiar with Python and Numpy
- Implement the main steps of an ML algorithm, including making predictions, derivative computation, and gradient descent.
- Build a logistic regression model, structured as a shallow neural network
- Be able to implement vectorization across multiple training examples
- Implement computationally efficient, highly vectorized, versions of models.
[mathjax]
Neural Networks and Deep Learning (deeplearning.ai) の受講メモ
About
ディープニューラルネットワークをどのように構築して動かすのかがわかる。numpyで猫判別。
Week1: Introduction to deep learning
- DLが流行ってるのはなぜか
- 教師あり学習にDLがどう使われるか
- メジャーなモデル(CNN,RNN)
- DLに適してることと適してないこと
Week2: Neural Networks Basics
Logistic Regression as a Neural Network
- NN実装では、for loopを使わないで処理したい
- フォワードプロパゲーション
- バックプロパゲーション
- ロジスティック回帰モデル(logistic regression model)
- binary classificationと表記法(notation)
- x: 画像のピクセルデータ(64px*64px*3(rgb)など)
- y: 1:cat,0:non cat
- n: dimension
- (x,y): $$x\in\Bbb R^{n_x}, y\in{0,1}$$
- m train: $$\{(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),…(x^{(m)},y^{(m)})\}$$
- m test: テストサンプル数
- すべての教師サンプル行列 $$X=\begin{bmatrix}
x^{(1)} & x^{(2)} & … & x^{(m)}\cr
\vdots & \vdots & \vdots & \vdots \cr
x^{(1)} & x^{(2)} & … & x^{(m)}\cr
\end{bmatrix}$$
- 1つの教師サンプルを1行にする:転置行列
- 実装しにくいので採用しない
- Python: $$X.shape =(n_x,m)$$
- nx * m 次元の行列 と確認
- ラベル: $$Y=\begin{bmatrix}
y^{(1)} & y^{(2)} & \dots & y^{(m)}
\end{bmatrix}$$ - $$Y\in\Bbb R^{1m}$$
- Python: $$Y.shape=(1,m)$$
- Logistic Regression
- yの予想: $$\hat y=P(y=1|x)$$
- $$0\leq\hat y\leq1$$
- xは \(x\in\Bbb R^{n_x}\) だから
- パラメタ: $$w\in\Bbb R^{n_x}$$
$$b\in\Bbb R$$ - Output: \(\hat y=w^{\mathrm{T}}x+b\)
- 線形回帰
- しかしこれだと負の数や大きい数になるので意味をなさない
- ロジスティック回帰をつかう=シグモイド
- yの予想: $$\hat y=P(y=1|x)$$
- sigmoid
- $$\sigma(z)=\frac{1}{1+e^{-z}}$$
- z is large: $$\sigma(z)\approx\frac{1}{1+0}=1$$
- z is small: $$\sigma(z)=\frac{1}{1+Bignum}\approx 0$$
- Logistic Regression cost function
- y output: $$\hat y^{(i)}=\sigma(w^{\mathrm{T}}x+b), where \sigma(z^{(i)})=\frac{1}{1+e^{-z^{(i)}}}$$
- Geven: $$\{(x^{(1)},y^{(1)}),\dots,(x^{(m)},y^{(m)})\}, want \hat y^{(i)}\approx y^{(i)}$$
- Loss(error) function: $$L(\hat y, y)$$
- 二乗誤差はロジスティック回帰だと適用不可: $$=\frac{1}{2}(\hat y-y)^2$$
- 妥当な損失関数: $$=-(y \log \hat y + (1-y)\log(1-\hat y))$$
- If y=1: $$L(\hat y, y)=-\log \hat y $$ $$\gets \text{ Want } \log \hat y \text{ large, Want }\hat y\text{ large }$$
- If y=0: $$L(\hat y, y)=-\log(1-\hat y)$$ $$ \gets \text{ Want }\log(1-\hat y)\text{ large }\dots\text{ Want }\hat y\text{ small }$$
- Cost function: $$J(w,b)=\frac{1}{m}\sum_{i=1}^{m} L(\hat y^{(i)},y^{(i)})$$ $$=-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log \hat y^{(i)} + (1-y^{(i)}) \log (1- \hat y^{(i)}))$$
- 損失関数:ひとつの教師サンプルに対して適用
- コスト関数:すべてのパラメータに関するコスト
- gradient descent
- $$J(w,b)$$
- $$w := w-\alpha\frac{dJ(w,b)}{dw}$$
- $$b := b-\alpha\frac{dJ(w,b)}{db}$$
- 偏微分記号ないないけど(゚ε゚)キニシナイ!! $$\frac{\partial J(w,b)}{\partial w}$$
- computation graph
- 構成段階
- forward path or forward propagation step(前方パスか順誤差伝搬法)
- backward path or back propagation step(後方パスか誤差逆伝播法)
- 構成段階
- Logistic Regression Gradient Descent
- 定義: \[
z=w^\mathrm{T}x+b \\
\hat y=a=\sigma(z) \\
L(a,y)=-(y\log(a)+(1-y)\log(1-a)) \\
\]
- 定義: \[
- Logistic regression derivatives
- w1,w2だけ考えるとする:\[
z=w_1x_1+w_2x_2+b\rightarrow a=\sigma(z)\rightarrow L(a,y)
\] - backward\[
\begin{align*}
“da”&=\frac{dL(a,y)}{da}=-\frac{y}{a}+\frac{1-y}{1-a} \\
“dz”&=\frac{dL}{dz}=\frac{dL(a,y)}{dz}\\
&=\frac{dL}{da}\frac{da}{dz}\\
&=(-\frac{y}{a}+\frac{1-y}{1-a})(a(1-a))\\
&=a-y
\end{align*}
\] - 参考 sigmoid derivatives:\[
\sigma(z)=\frac{1}{1+e^{-z}}=(1+e^{-z})^{-1}\\
\frac{d\sigma(z)}{dz}=(1-\sigma(z))\sigma(z)
\]- https://mathtrain.jp/sigmoid
- DLにおいては連鎖律ですっとばせる
- よって:\[
\frac{\partial L}{\partial w_1}=”dw_1″=x_1dz\\
“dw_2″=x_2dz\\
“db”=dz
\] - update:\[
w_1:=w_1-\alpha \cdot dw_1\\
w_2:=w_2-\alpha \cdot dw_2\\
b:=b-\alpha \cdot db
\]
- w1,w2だけ考えるとする:\[
- Logistic regression on m examples
- vectorization is getting rid of for-loops.
- m個の教師データのループ
- wの添字分のループ
- vectorization is getting rid of for-loops.
Python and Vectorization
- Vectorization
- for-loopを使わない
- np.dotでwT*xを直接計算
- CPU vs GPU
- どっちもSIMD(Single Instruction Multiple Data)命令がある
- CPUもそれほど悪くはない
- Neural network programming guideline
- Whenever possible, avoid explicit for-loops.
- Vectors and matrix valued functions
- Vectorizing Logistic Regression
- $$
\begin{align*}
Z&=[z^{(1)} z ^{(2)} \dots z ^{(m)} ]\\
&=w^\mathrm{T}X+\begin{bmatrix}b & b & \dots & b\end{bmatrix}\\
&=\begin{bmatrix}
w^\mathrm{T}X^{(1)}+b & w^\mathrm{T}X^{(2)}+b & \dots & w^\mathrm{T}X^{(m)}+b \end{bmatrix}
\end{align*}$$ - in python(実数bは1xmベクトルに自動拡張される) $$Z=np.dot(w^\mathrm{T},X)+b$$
- $$
- Vectorizing Logistic Regression’s Gradient Output
- $$dz^{(i)}=a{(i)}-y{(i)} \dots$$
- $$dZ=\begin{bmatrix} dz^{(1)} & dz^{(2)} & \dots & dz^{(m)}\end{bmatrix}$$
- $$A=\begin{bmatrix} a^{(1)} & a^{(2)} & \dots & a^{(m)}\end{bmatrix}$$
- $$Y=\begin{bmatrix} y^{(1)} & y^{(2)} & \dots & y^{(m)}\end{bmatrix}$$
- $$db=\frac{1}{m}np.sum(dZ)$$
- $$dw=\frac{1}{m}XdZ$$
- for-loop: \[
\begin{align*}
Z&=w^\mathrm{T}X+b\\
&=np.dot(w^\mathrm{T},X)+b\\
A&=\sigma(Z)\\
dZ&=A-Y\\
dw&=\frac{1}{m}XdZ^\mathrm{T}\\
db&=\frac{1}{m}np.sum(dZ)\\
w&:=w-\alpha dw\\
b&:=b-\alpha db
\end{align*}
\]
- Broadcasting in Python
- (m,n) {_-*/} (1,n) -> (m,n)
- 実数も展開される
- bsxfun in Matlab
- A note on python/numpy vectors
- (5,)や(n,)やランク1配列のような構造を使わない
a= np.random.randn(5)
a.shape: (5,)
: rank1array ng!!
a = np.random.randn(5,1)
列ベクトル oka = np.random.randm(1,5)
行ベクトル okassert(a.shape == (5,1))
assertしようa.reshape((5,1))
reshapeしよう
- (5,)や(n,)やランク1配列のような構造を使わない
- Quick tour of Jupyter /iPython Notebooks
- つかいかた
- Explanation of logistic regression cost function(optional)
- 損失関数Lが妥当か\[
\text{If } y=1: p(y|x) = \hat y\\
\text{If } y=0: p(y|x) = 1 – \hat y\\
p(y|x) = \hat y^y(1-\hat y)^{(1-y)}
\] - 単調増加 \(\log\):\[
\begin{align*}
\log p(x|y) &= \log \hat y^y(1-\hat y)^{(1-y)}\\
&=y\log \hat y + (1-y)\log(1-\hat y)\\
&=-L(\hat y,y)
\end{align*}
\] - Cost on m examples:\[
\log p(\boxed{\text{labels in training set}}) = \log \prod_{i=1}^{m} p(y^{(i)}|x^{(i)})\\
=\sum_{i=0}^{m}\log p(y^{(i)}|x^{(i)})\\
=-\sum_{i=0}^{m}L(\hat y^{(i)}|y^{(i)})
\] - Cost(min):\[
J(w,b)=\frac{1}{m}\sum_{i=1}^{m}L(\hat y^{(i)},y^{(i)})
\]
- 損失関数Lが妥当か\[
Programming Assignments
Python basics with numpy(optional)
- how to use numpy
- basic core DL functions such as softmax,sigmoid,dsigmoid
- vectorization
- broadcasting
- Building basic functions with numpy. What you need to remember:
np.exp(x)
works for anynp.array x
and applies the exponential function to every coordinate- the
sigmoid
function and its gradient image2vector
is commonly used in deep learningnp.reshape
is widely used. In the future, you’ll see that keeping your matrix/vector dimensions straight will go toward eliminating a lot of bugs.numpy
has efficient built-in functions- broadcasting is extremely useful
- Vectorization. What to remember:
- Vectorization is very important in deep learning. It provides computational efficiency and clarity.
- You have reviewed the L1 and L2 loss.
- You are familiar with many numpy functions such as
np.sum
,np.dot
,np.multiply
,np.maximum
, etc…
Logistic Regression with a Neural Network mindset(required)
- Learn about:
- Work with logistic regression in a way that builds intuition relevant to neural networks.
- Learn how to minimize the cost function.
- Understand how derivatives of the cost are used to update parameters.
- Common steps for pre-processing a new dataset are:
- Figure out the dimensions and shapes of the problem (
m_train
,m_test
,num_px
, …) - Reshape the datasets such that each example is now a vector of size
(num_px * num_px * 3, 1)
- “Standardize” the data
- Figure out the dimensions and shapes of the problem (
- Key steps: In this exercise, you will carry out the following steps:
- Initialize the parameters of the model
- Learn the parameters for the model by minimizing the cost
- Use the learned parameters to make predictions (on the test set)
- Analyse the results and conclude
- The main steps for building a Neural Network are:
- Define the model structure (such as number of input features)
- Initialize the model’s parameters
- Loop:
- Calculate current loss (forward propagation)
- Calculate current gradient (backward propagation)
- Update parameters (gradient descent)
- You often build 1-3 separately and integrate them into one function we call model().
- What to remember: You’ve implemented several functions that:
- Initialize (w,b)
- Optimize the loss iteratively to learn parameters (w,b):
- computing the cost and its gradient
- updating the parameters using gradient descent
- Use the learned (w,b) to predict the labels for a given set of examples
- What to remember from this assignment:
- Preprocessing the dataset is important.
- You implemented each function separately:
initialize()
,propagate()
,optimize()
. Then you built amodel()
. - Tuning the learning rate (which is an example of a “hyperparameter”) can make a big difference to the algorithm. You will see more examples of this later in this course!