a.k.a. ysano

メモ:医学論文の自動仕分けチャレンジ

Systematic Review

https://signate.jp/competitions/471

from the findings of the prizewinners

入賞者の方々のプレゼンからのメモです

FBetaScore

\[
\text{FBetaScore}=\frac{(1+\beta^2) \cdot \text{precision} \cdot \text{recall}}{\beta^2 \cdot \text{precision} + \text{recall}}
\]

TP: Correctly detect the paper you are looking for.
FP: Incorrectly detected papers that should have been excluded.
FN: Accidentally eliminate the paper you are looking for. (avoid this）

Data overview

正：負＝1:42 unbalanced data

Preprocessing

Simple: Title+abstract
Data augmentation not works.
- doesn’t work in the NLP.(why?)

Modeling

BERT Pre-trained model
- PubMedBERT(Public:0.9198)
Structure
- Title + Abstract
- BERT
- Dropout
- Focal Loss

Training

5 Fold Straitfied KFold
Ada Belief Optimizer
Training Parameters
- Evaluating each epoch misses the good models.

Choose best model

Select a hi-score models for each fold and ensemble them.

Choose loss function

focal loss mitigates the extremes of predictions.

NG

Large models
Pseudo labeling
Back translation
Layer-wise learning rate decay
Learning Rate Scheduling
Stochastic Weight Averaging
Virtual Adversarial training
Multi Sample Dropout
Consine Similarity Loss

QA

Q.長文を分割していくと意味が失われたり、はみ出て情報が無くなってしまう

A.はみ出てしまったものは、取り入れないことにした.経験上問題ないとおもう

タグ deeplearning

コメントを残すコメントをキャンセル