Context Length Config
In the README.md
it states that the model has a context length of 128k, yet in the config.json
it states "max_position_embeddings": 8192
. How comes the maximum positional embeddings isn't configured to ~128k?
This implementation is based on the Llama implementation which materializes this huge buffer which would not be feasible for 128k context. The model does support 128k context with a better implementation.
causal_mask = torch.full( (config.max_position_embeddings, config.max_position_embeddings), fill_value=True, dtype=torch.bool )
Maintaining context length defaults low enough to prevent end users from experiencing OOM right out of the box is generally accepted as an unwritten rule in the HF community.
Command-r defaults to 128k