Solves the TypeError issue in GPTNeoXLayer init, which also takes the layer id

#6

Due to recent changes to support k/v caching, we need to initialize the attention class with a layer index

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment