Mistral AI_ org

Mixtral doesn't use sliding window attention. We force set it to null since the default in transformers is 4k.
I think the regression came from https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/commit/858fdc292793fc3e671bf51fc5586c5cc10fbe3a . @ybelkada did you use a specific script to update the PR? If so I think you need to update that script so that the default write "null".

The configuration file (and conversion if needed) will be adjusted accordingly thanks reporting!

@TimeRobber I updated the conversion script already here: https://github.com/huggingface/transformers/pull/28068 based on your suggestion

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment