Icelandic-lt
/

biaffine_parser

Model card Files Files and versions Community

danielschnell commited on May 27, 2024

Commit

e81ff46

1 Parent(s): 695687f

Update submodule Tokenizer, adapt README.md

README.md:

- add "Prerequisites" section
- remove Icelandic text to minimize redundancies
- update links
- use correct form of cmdline text formatting

Submodule Tokenizer:

- use correct version of the Tokenizer submodule

Signed-off-by: Daniel Schnell <dschnell@grammatek.com>

Files changed (3) hide show

.gitattributes +1 -0
README.md +13 -26
Tokenizer +1 -1

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+diaparser-is-combined-v211/diaparser.model filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,27 +1,10 @@
-## UD-þáttari sem nýtir sér upplýsingar úr Transformer-mállíkani
-Mælt er með því að þáttarinn sé keyrður með Python3.8 í sýndarumhverfi.
-Hægt er að nota conda fyrir sýndarumhverfi: https://conda.io/projects/conda/en/latest/user-guide/getting-started.html
-Einnig er hægt að nota venv fyrir sýndarumhverfi: https://docs.python.org/3/library/venv.html
-Til þess að keyra þáttarann þarf að setja upp nauðsynlega pakka, eftir að sýndarumhverfi hefur verið virkjað: python3 -m pip install -r requirements.txt
-Tokenizer-mappan er klónuð gagnahirsla [tókarans frá Miðeind](https://github.com/mideind/Tokenizer).
-Hægt er að keyra þáttarann svona: ~~~python3 parse_file.py --parser diaparser-is-combined-v211/diaparser.model --infile test_file.txt~~~
-~~~transformer_models/~~~ inniheldur forþjálfað transformer-líkan, electra-base-igc-is, sem tókarinn sækir samhengisháðar orðgreypingar og athygli í. Það var þjálfað af Jóni Friðriki Daðasyni.
-Skor:
-Metric     | Precision |    Recall |  F1 Score | AligndAcc
------------+-----------+-----------+-----------+-----------
-Tokens     |     99.70 |     99.77 |     99.73 |
-Sentences  |    100.00 |    100.00 |    100.00 |
-Words      |     99.62 |     99.61 |     99.61 |
-UAS        |     89.58 |     89.57 |     89.58 |     89.92
-LAS        |     86.46 |     86.45 |     86.46 |     86.79
-CLAS       |     82.30 |     81.81 |     82.05 |     82.24
 ## A Universal Dependency parser built on top of a Transformer language model
@@ -32,11 +15,15 @@ You can also use venv for a virtual environment: https://docs.python.org/3/libra
 To run this package, after having activated your virtual environment, you need to install the requirements: python3 -m pip install -r requirements.txt.
-The Tokenizer directory is a clone of [Miðeind's tokenizer](https://github.com/mideind/Tokenizer). It is included because one of Diaparser's modules is named tokenizer.
-The parser can be run as follows: ~~~python3 parse_file.py --parser diaparser-is-combined-v211/diaparser.model --infile test_file.txt~~~
-~~~transformer_models/~~~ contains a pretrained model, electra-base-igc-is, which supplies the parser with contextual embeddings and attention, trained by Jón Friðrik Daðason.
 The parser scores as follows:
@@ -49,5 +36,5 @@ UAS        |     89.58 |     89.57 |     89.58 |     89.92
 LAS        |     86.46 |     86.45 |     86.46 |     86.79
 CLAS       |     82.30 |     81.81 |     82.05 |     82.24
-### License
 https://opensource.org/licenses/Apache-2.0

+## Prerequisites
+After cloning the repository, first fetch submodule dependencies and run:
+```bash
+git submodule update --init --recursive
+```
 ## A Universal Dependency parser built on top of a Transformer language model
 To run this package, after having activated your virtual environment, you need to install the requirements: python3 -m pip install -r requirements.txt.
+The Tokenizer submodule is using [Miðeind's tokenizer](https://github.com/icelandic-lt/Tokenizer). It is included because one of Diaparser's modules is named tokenizer.
+The parser can be run as follows:
+```python
+python3 parse_file.py --parser diaparser-is-combined-v211/diaparser.model --infile test_file.txt
+```
+The directory `transformer_models/` contains a pretrained model, [electra-base-igc-is](https://huggingface.co/Icelandic-lt/electra-base-igc-is), which supplies the parser with contextual embeddings and attention, trained by Jón Friðrik Daðason.
 The parser scores as follows:
 LAS        |     86.46 |     86.45 |     86.46 |     86.79
 CLAS       |     82.30 |     81.81 |     82.05 |     82.24
+## License
 https://opensource.org/licenses/Apache-2.0

Tokenizer CHANGED Viewed

	@@ -1 +1 @@
1	- Subproject commit ~~be8ee4de465ecf0dbf008d986b99df43210f27bf~~


1	+ Subproject commit 5ae4551ad3a3a99ad657bd0528dd4147f4f5f95f