Icelandic-lt
/

biaffine_parser

Model card Files Files and versions Community

biaffine_parser / README.md

danielschnell's picture

README.md: format table as raw text

c1fa0b3 8 months ago

|

history blame contribute delete

1.75 kB

	## Prerequisites

	After cloning the repository, first fetch submodule dependencies and run:

	```bash
	git submodule update --init --recursive
	```

	## A Universal Dependency parser built on top of a Transformer language model

	Python3.8 recommended, as well as a virtual environment.

	You can use conda for a virtual environment: https://conda.io/projects/conda/en/latest/user-guide/getting-started.html
	You can also use venv for a virtual environment: https://docs.python.org/3/library/venv.html

	To run this package, after having activated your virtual environment, you need to install the requirements: python3 -m pip install -r requirements.txt.

	The Tokenizer submodule is using [Miðeind's tokenizer](https://github.com/icelandic-lt/Tokenizer). It is included because one of Diaparser's modules is named tokenizer.

	The parser can be run as follows:

	```python
	python3 parse_file.py --parser diaparser-is-combined-v211/diaparser.model --infile test_file.txt
	```

	The directory `transformer_models/` contains a pretrained model, [electra-base-igc-is](https://huggingface.co/Icelandic-lt/electra-base-igc-is), which supplies the parser with contextual embeddings and attention, trained by Jón Friðrik Daðason.

	The parser scores as follows:

	```
	Metric \| Precision \| Recall \| F1 Score \| AligndAcc
	-----------+-----------+-----------+-----------+-----------
	Tokens \| 99.70 \| 99.77 \| 99.73 \|
	Sentences \| 100.00 \| 100.00 \| 100.00 \|
	Words \| 99.62 \| 99.61 \| 99.61 \|
	UAS \| 89.58 \| 89.57 \| 89.58 \| 89.92
	LAS \| 86.46 \| 86.45 \| 86.46 \| 86.79
	CLAS \| 82.30 \| 81.81 \| 82.05 \| 82.24
	```

	## License
	https://opensource.org/licenses/Apache-2.0