|
# Model Details |
|
|
|
#### Model Name: NumericBERT |
|
|
|
#### Model Type: Transformer |
|
|
|
#### Architecture: BERT |
|
|
|
#### Training Method: Masked Language Modeling (MLM) |
|
|
|
#### Training Data: MIMIC IV Lab values data |
|
|
|
#### Training Hyperparameters: |
|
|
|
- **Optimizer:** AdamW |
|
- **Learning Rate:** 5e-5 |
|
- **Masking Rate:** 20% |
|
- **Tokenization:** Custom numeric-to-text mapping using the TextEncoder class |
|
|
|
### Text Encoding Process |
|
|
|
**Overview:** Non-negative integers are converted into uppercase letter-based representations, allowing numerical values to be expressed as sequences of letters. |
|
|
|
**Normalization and Binning:** |
|
- **Method:** Log normalization and splitting into 10 bins. |
|
- **Representation:** Each bin is represented by a letter (A-J). |
|
|
|
### Token Construction: |
|
|
|
- **Format:** `<<lab_id_token>><<lab_value_bin>>` |
|
- **Example:** For a lab value of type 'Bic' with a normalized value in bin 'C', the token might be `BicC`. |
|
- **Columns Used:** 'Bic', 'Crt', 'Pot', 'Sod', 'Ure', 'Hgb', 'Plt', 'Wbc'. |
|
|
|
### Training Data Preprocessing |
|
|
|
- **Column Selection:** Numerical values from selected lab values. |
|
- **Text Encoding:** Numeric values are encoded into text using the process described above. |
|
- **Masking:** 20% of the data is randomly masked during training. |
|
|
|
### Model Output |
|
|
|
- **Description:** Outputs predictions for masked values during training. |
|
- **Format:** Contains the encoded text representing the predicted lab values. |
|
|
|
### Limitations and Considerations |
|
|
|
- **Numeric Data Representation:** The custom text representation may have limitations in capturing the intricacies of the original numeric data. |
|
- **Training Data Source:** Performance may be influenced by the characteristics and biases inherent in the MIMIC IV dataset. |
|
- **Generalizability:** The model's effectiveness outside the context of the training dataset is not guaranteed. |
|
|
|
### Contact Information |
|
|
|
- **Email:** davidres@mit.edu |
|
- **Name:** David Restrepo |
|
- **Affiliation:** MIT Critical Data - MIT |
|
|