aleynahukmet commited on
Commit
85d7f32
·
verified ·
1 Parent(s): ba982e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -17,6 +17,11 @@ Women's health is a critical and often underserved topic, with limited accessibl
17
 
18
  ```python
19
 
 
 
 
 
 
20
  max_seq_length = 2048
21
  dtype = None
22
  load_in_4bit = False
@@ -64,12 +69,12 @@ if not use_streamer:
64
 
65
  ## Dataset
66
 
67
- The dataset was prepared through a comprehensive process to ensure quality and relevance. Health-related websites were scraped, open-source e-books and PDFs focusing on women's health were collected, and an instruction dataset was created from these sources. To generate high-quality questions, we utilized the gemini-flash model, ensuring the dataset’s alignment with the domain.
68
  The dataset will be made publicly available through its dedicated repository [altaidevorg/women-health-mini](https://huggingface.co/datasets/altaidevorg/women-health-mini). Additionally, the code used for dataset generation will be released in the future to promote transparency and enable reproducibility.
69
 
70
  ## Evaluation Notes
71
 
72
- During testing, we observed that the LoRA checkpoint performed better in evaluations compared to the version where the LoRA checkpoint was merged with the base model. Interestingly, the standalone LoRA checkpoint consistently delivered superior results, though we currently lack a concrete explanation for this phenomenon.
73
  We are actively investigating the underlying cause, with our best hypothesis being that the merging process may introduce some form of precision loss. Further research is underway to validate this theory and optimize the performance.
74
 
75
  ### Disclaimer
 
17
 
18
  ```python
19
 
20
+ from unsloth import FastLanguageModel
21
+ import torch
22
+ from transformers import TextStreamer, AutoTokenizer
23
+
24
+
25
  max_seq_length = 2048
26
  dtype = None
27
  load_in_4bit = False
 
69
 
70
  ## Dataset
71
 
72
+ The dataset was prepared through a comprehensive process based on our synthetic dataset generation method. Health-related websites were scraped, open-source e-books and PDFs focusing on women's health were collected, and an instruction dataset was created from these sources. To generate high-quality questions, we utilized a technique that involves role-playing between two LLMs and contextually-augmented by RAG.
73
  The dataset will be made publicly available through its dedicated repository [altaidevorg/women-health-mini](https://huggingface.co/datasets/altaidevorg/women-health-mini). Additionally, the code used for dataset generation will be released in the future to promote transparency and enable reproducibility.
74
 
75
  ## Evaluation Notes
76
 
77
+ During testing, we observed that the LoRA checkpoint performed better in evaluations compared to the version where the LoRA checkpoint was [merged with the base model](https://huggingface.co/altaidevorg/gemma-women-health-merged). Interestingly, the standalone LoRA checkpoint consistently delivered superior results, though we currently lack a concrete explanation for this phenomenon.
78
  We are actively investigating the underlying cause, with our best hypothesis being that the merging process may introduce some form of precision loss. Further research is underway to validate this theory and optimize the performance.
79
 
80
  ### Disclaimer