|
--- |
|
language: |
|
- en |
|
tags: |
|
- pytorch |
|
- causal-lm |
|
license: bigscience-openrail-m |
|
--- |
|
|
|
|
|
[GeoV](https://huggingface.co/docs/transformers/model_doc/geov)-9B is a 9 billion parameter autoregressive language model. |
|
|
|
The GeoV model was designed by Georges Harik and uses |
|
[Rotary Positional Embeddings with Relative distances (RoPER)](http://research.labml.ai/RoPER.html) |
|
by [Georges Hark](https://twitter.com/ghark) and [Varuna Jayasiri](https://twitter.com/vpj). |
|
|
|
[RoPER]((http://research.labml.ai/RoPER.html), |
|
in addition to using relative positions in the attention score calculation by RoPE embeddings, |
|
adds relative positional information explicitly to value embeddings. |
|
Specifically, it incorporates the relative positions of the tokens paid attention to. |
|
RoPER has given better performance in some algorithmic tasks, and seems comparable to RoPE in language modeling. |
|
|
|
## Model details |
|
|
|
- Developed by: [Georges Harik](http://twitter.com/gharik) |
|
- Model type: Transformer-based Language Model |
|
- Language: English |
|
|
|
<figure style="width:30em"> |
|
|
|
| Hyperparameter | Value | |
|
| ---------------------- | ----------- | |
|
| n<sub>parameters</sub> | 9B | |
|
| n<sub>layers</sub> | 32 | |
|
| d<sub>model</sub> | 5120 | |
|
| n<sub>heads</sub> | 40 | |
|
| d<sub>head</sub> | 128 | |
|
| n<sub>vocab</sub> | 65500 | |
|
| Sequence Length | 2049 | |
|
</figure> |
|
|
|
|
|
## Generation |
|
|
|
The `generate()` method can be used to generate text using GeoV model. |
|
|
|
```python |
|
>>> from transformers import GeoVForCausalLM, GeoVTokenizer |
|
|
|
>>> model = GeoVForCausalLM.from_pretrained("GeoV/GeoV-9b") |
|
>>> tokenizer = GeoVTokenizer.from_pretrained("GeoV/GeoV-9b") |
|
|
|
>>> prompt = "In mathematics, topology is the study of" |
|
|
|
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids |
|
|
|
>>> gen_tokens = model.generate( |
|
... input_ids, |
|
... do_sample=True, |
|
... temperature=0.9, |
|
... max_length=100, |
|
... ) |
|
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0] |
|
``` |