dsrestrepo
/

BERT_Lab_Values_10B_lab_id_no_repetition

Inference Endpoints

Model card Files Files and versions Community

dsrestrepo commited on Jan 24

Commit

9c9f50c

•

1 Parent(s): 63935a7

Create README.md

Files changed (1) hide show

README.md +55 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# Model Details
+#### Model Name: NumericBERT
+#### Model Type: Transformer
+#### Architecture: BERT
+#### Training Method: Masked Language Modeling (MLM)
+#### Training Data: MIMIC IV Lab values data
+#### Training Hyperparameters:
+- **Optimizer:** AdamW
+- **Learning Rate:** 5e-5
+- **Masking Rate:** 20%
+- **Tokenization:** Custom numeric-to-text mapping using the TextEncoder class
+### Text Encoding Process
+**Overview:** Non-negative integers are converted into uppercase letter-based representations, allowing numerical values to be expressed as sequences of letters.
+**Normalization and Binning:**
+- **Method:** Log normalization and splitting into 10 bins.
+- **Representation:** Each bin is represented by a letter (A-J).
+### Token Construction:
+- **Format:** `<<lab_id_token>><<lab_value_bin>>`
+- **Example:** For a lab value of type 'Bic' with a normalized value in bin 'C', the token might be `BicC`.
+- **Columns Used:** 'Bic', 'Crt', 'Pot', 'Sod', 'Ure', 'Hgb', 'Plt', 'Wbc'.
+### Training Data Preprocessing
+- **Column Selection:** Numerical values from selected lab values.
+- **Text Encoding:** Numeric values are encoded into text using the process described above.
+- **Masking:** 20% of the data is randomly masked during training.
+### Model Output
+- **Description:** Outputs predictions for masked values during training.
+- **Format:** Contains the encoded text representing the predicted lab values.
+### Limitations and Considerations
+- **Numeric Data Representation:** The custom text representation may have limitations in capturing the intricacies of the original numeric data.
+- **Training Data Source:** Performance may be influenced by the characteristics and biases inherent in the MIMIC IV dataset.
+- **Generalizability:** The model's effectiveness outside the context of the training dataset is not guaranteed.
+### Contact Information
+- **Email:** davidres@mit.edu
+- **Name:** David Restrepo
+- **Affiliation:** MIT Critical Data - MIT