Systematic Review
https://signate.jp/competitions/471
from the findings of the prizewinners
入賞者の方々のプレゼンからのメモです
FBetaScore
\[
\text{FBetaScore}=\frac{(1+\beta^2) \cdot \text{precision} \cdot \text{recall}}{\beta^2 \cdot \text{precision} + \text{recall}}
\]
- TP: Correctly detect the paper you are looking for.
- FP: Incorrectly detected papers that should have been excluded.
- FN: Accidentally eliminate the paper you are looking for. (avoid this)
Data overview
- 正:負=1:42 unbalanced data
Preprocessing
- Simple: Title+abstract
- Data augmentation not works.
- doesn’t work in the NLP.(why?)
Modeling
- BERT Pre-trained model
- PubMedBERT(Public:0.9198)
- Structure
- Title + Abstract
- BERT
- Dropout
- Focal Loss
Training
- 5 Fold Straitfied KFold
- Ada Belief Optimizer
- Training Parameters
- Evaluating each epoch misses the good models.
Choose best model
- Select a hi-score models for each fold and ensemble them.
Choose loss function
focal loss mitigates the extremes of predictions.
NG
- Large models
- Pseudo labeling
- Back translation
- Layer-wise learning rate decay
- Learning Rate Scheduling
- Stochastic Weight Averaging
- Virtual Adversarial training
- Multi Sample Dropout
- Consine Similarity Loss
QA
Q.長文を分割していくと意味が失われたり、はみ出て情報が無くなってしまう
A.はみ出てしまったものは、取り入れないことにした.経験上問題ないとおもう