カテゴリー
深層学習

DL [Course 3/5] Structuring Machine Learning Projects [Week 1/2]

Key Concepts

  • Apply satisficing and optimizing metrics to set up your goal for ML projects
  • Take the correct ML Strategic decision based on observations of performances and dataset
  • Understand why Machine Learning strategy is important
  • Choose a correct train/dev/test split of your dataset
  • Use human-level perform to define your key priorities in ML projects
  • Understand how to define human-level performance

[mathjax]

Structuring Machine Learning Projects (deeplearning.ai)の受講メモ

Introduction to ML Strategy

Why ML Strategy

  • Motivating example
    • 90%
  • Ideas
    • Collect more data
    • Collect more diverse training set
    • Train algorithm longer with gradient descent
    • Try Adam instead of gradient descent
    • Try bigger network
    • Try smaller network
    • Try dropout
    • Add L_2 regularization
    • Network architecture
      • Activation functions
      • #hidden units

Orthogonalization(直交化)

  • TV tuning example
    • So in this context, orthogonalization refers to that the TV designers had designed the knobs so that each knob kind of does only one thing. And this makes it much easier to tune the TV, so that the picture gets centered where you want it to be.
  • Chain of assumptions in ML
    1. Fit training set well on cost function
      • bigger network
      • better algorithm adam
      • (early stopping: less orthogonalized)
    2. Fit dev set well on cost function
      • Regularization
      • Bigger train set
    3. Fit test set well on cost function
      • Bigger dev set
    4. Performs well in real world
      • Change dev set or cst function
  • supervised learning system
    • tune 4 knobs(train,dev,test,real)
  • how to diagnose what exactly is the bottleneck to your system’s performance. As well as identify the specific set of knobs you could use to tune your system to improve that aspect of its performance.

Setting up your goal

Single number evaluation metric

ClassifierPrecisionRecallF1 Score
A95%90%92.4%
B98%85%91.0%
  • F1 Score=Average of Precision and Recall: harmonic mean\[
    F1=\frac2{\frac1P+\frac1R}
    \]
  • single number evaluation metric
    • speed up iteration

Satisficing and Optimizing metric

ClassifierAccuracyRunning time
A90%80ms
B92%95ms
C95%1,500ms
  • cost = accuracy – 0.5 * runningTime
    • maximize accuracy
    • Subject to runningTime <= 100ms
  • Accuracy: optimizing metric
  • Running Time: satisficing metric
  • N metric: 1 optimizing, N-1 satisficing

Train/dev/test distributions

  • Cat classification dev/test set
  • Regions(randomly choose):
    • US/UK/OtherEu/SouthAmeria … dev set
    • Indiea/China/OtherAsia/Australia … test set
    • it looks like that come from same distribution
    • Bad Idea
  • Randomly shuffle data into dev/test
  • Guideline
    • Choose a dev set and test set to reflect data you expect to get in the future and consider important to do well on.

Size of the dev and test sets

  • Old way of splitting data
    • train:test = 70:30
    • train:dev:test = 60:20:20
  • modern deep learning era
    • train:dev:test = 98:2:2
  • Size of test set
    • set your test set to be big enough to give high confidence in the overall performance of your system.
    • 10,000-100,000 enough
  • no test set might be ok
    • test set can evaluate cost final for shipping quality

When to change dev/test sets and metrics

  • Cat dataset examples
    • Metric: classification error
    • Algorithm A: 3% error
      • good for metric
      • but letting through a lot of pornographic
    • Algorithm B: 5% error
      • good for users
    • change evaluation metric: \[
      Error: \underbrace{\frac 1{m_{dev}}}_{\color{red}{\frac1{\sum_i w^{(i)}}}} \sum_{i=1}^{m_{dev}} \color{red}{w^{(i)}} I\{ y_{predicted}^{(i)}=y^{(i)}\}\\
      \color{red}{w^{(i)}=\begin{cases}
      1 \text{ if } x^{(i)} \text{is non-porn}\\
      10 \text{ if } x^{(i)} \text{ is porn}
      \end{cases}}
      \]
  • Orthogonalization for cat pictures: anti-porn
    1. So far we’ve only discussed how to define a metric to evaluate classifiers.(Place target: first knob)
    2. Worry separately about how to do well on this metric.(Aim/Shoot at target: another knob)\[
      J=\underbrace{\frac 1m}_{\color{red}{\frac1{\sum_i w^{(i)}}}} \sum_{i=1}^m \color{red}{w^{(i)}} L( \hat y^{(i)},y^{(i)})
      \]
  • Another example
    • Dev/test: high resolution picture
    • user images: low resolution picture
    • If doing well on your metric+dev/test set does not correspond to doing well on your application, change your metric and/or dev/test set.

Comparing to human-level performance

Why human-level performance?

  • Comparing to human-level performance
    • accuracy/time
    • human/bayes optimal error(=best possible error)
  • sometimes slows down after you surpass human level performance. why?
    1. human task is not so far from bayes optimal error
    2. when under human level tools works good but hard to use that surpassed human level.
  • Humans are quite good at a lot of tasks.So long as ML is worse than humans,you can:
    • Get labeled data from humans.
    • Gain insight from manual error analysis:Why did a person get this right?
    • Better analysis of bias/variance

Avoidable bias

Cat classification example

dataAdataB
Humans(\(\approx bayes\))1%7.5%
Training error8%8%
Dev error10%10%
Tactics: Avoid focus onbias
(underfitting)
variance
(overfitting)
  • dataA: 1%-8% Avoidable Bias=7%
  • dataB: 8%-10% Variance=2%

Understanding human-level performance

Human-level error as a proxy for Bayes error

  • Medical image classification example
    • Typical human 3% error
    • Typical doctor 1% error
    • Experienced doctor 0.7% error
    • Team of experienced doctors 0.5% error
      • Bayes error <= 0.5%
    • What is human-level error?: 0.5%

Error analysis example

Human1%
0.7%
0.5%
1%
0.7%
0.5%
0.5%
Training error5%1%0.7%
Dev error6%5%0.8%
BiasVarianceboth

Summary of bias/variance with human-level performance

Human-level error(proxy of bayes error)
↕ Avoidable bias
Training error
↕ Variance
Dev error

Surpassing human-level performance

  • Problems where ML significantly surpasses human-level performance
    • Online advertising: estimating click
    • Product recommendations
    • Logistics (predicting transit time)
    • Loan approvals
  • 4 examples
    • learning from structure data. not computer vision.
    • Not natural perception
    • lots of data
  • other surpassing
    • Speech recogiation
    • Some image recogination
    • Medical: ECG,skin cancer, radiology task

Improving your model performance(guideline)

  • The two fundamental assumptions of supervised learning
    1. You can fit the training set pretty well
      • avoidable bias
    2. The training set performance generalizes pretty well to the dev/test set.
      • variance
  • Reducing(avoidable) bias and variance
    • Human-level
      • Avoidable bias
        • Train bigger model
        • Train longer/better optimization algorithms: momentum, RMSprom, Adam
        • NN architecture/hyperparameters search:RNN,CNN
    • Training error
      • Variance
        • More data
        • Regularization: L2, dropout, data augumentation
        • NN architecture/hyperparameters search
    • Dev error

Machine Learning flight simulator

  • Having three evaluation metrics makes it harder for you to quickly choose between two different algorithms, and will slow down the speed with which your team can iterate.
  • Sometimes we’ll need to train the model on the data that is available, and its distribution may not be the same as the data that will occur in production.  Also, adding training data that differs from the dev set may still help the model improve performance on the dev set. What matters is that the dev and test set have the same distribution.
  • You have only 1,000 images of the new species of bird. The city expects a better system from you within the next 3 months. Which of these should you do first?
    • Use the data you have to define a new evaluation metric taking into account the new species.

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です