noahsantacruz
commited on
Commit
•
3e946ec
1
Parent(s):
8a44bf8
Update README.md
Browse files
README.md
CHANGED
@@ -41,6 +41,11 @@ It detects the following types of entities:
|
|
41 |
| Group | Name of a group of people. E.g. nations (Egypt), schools (Bet Hillel, Tosafot) |
|
42 |
| Citation | Citations to Torah texts. See notes below. |
|
43 |
|
|
|
|
|
|
|
|
|
|
|
44 |
## Notes on citation matches
|
45 |
|
46 |
- Final parentheses is not included in the match. E.g. if the citation is `Genesis (1:1)` then the final parentheses will not be included. We found that the model would get confused if the final parentheses was part of the entity. It is fairly simple to add it back in via a deterministic check.
|
|
|
41 |
| Group | Name of a group of people. E.g. nations (Egypt), schools (Bet Hillel, Tosafot) |
|
42 |
| Citation | Citations to Torah texts. See notes below. |
|
43 |
|
44 |
+
## Notes on normalization
|
45 |
+
|
46 |
+
All text the model was trained on was initially put through the following normalizer: [link](https://github.com/Sefaria/Machine-Learning/blob/main/util/helper.py#L43).
|
47 |
+
Results will be signicantly worse if this normalizer is not used.
|
48 |
+
|
49 |
## Notes on citation matches
|
50 |
|
51 |
- Final parentheses is not included in the match. E.g. if the citation is `Genesis (1:1)` then the final parentheses will not be included. We found that the model would get confused if the final parentheses was part of the entity. It is fairly simple to add it back in via a deterministic check.
|