Factorized the weight matrix in the GlobalAttentionPoolingHead, thus reducing the number of parameters in this layer by a factor of 48 a1e9f64 PeteBleackley commited on Mar 11, 2024
Might be simpler to inherit from RobertaModel rather than PreTrainedModel f0ad7f1 PeteBleackley commited on Oct 9, 2023
Removed a base model that was causing a loop in model initialisation 87535ff PeteBleackley commited on Oct 9, 2023
Further changes for compatibility with HuggingFace Pytorch implementation 5b7a8ed PeteBleackley commited on Oct 9, 2023
PyTorch implementation of HugggingFace PreTrainedModel class does not allow direct setting of base_model. Rejig constructors accordingly 519dfd1 PeteBleackley commited on Oct 9, 2023