This model is a weight-pruned large language model originated from Vicuna-13B. Language model pruning is a technique used to reduce the size and computational requirements of language models, making them more efficient for deployment without significantly sacrificing their performance or accuracy.
This model uses structured pruning instead of unstructured pruning. The structured pruning removes entire units or channels (e.g., neurons, layers, or filter channels in trnasformer). This approach can lead to more efficient computational gains since it aligns better with how hardware utilizes data, but it may have a more significant impact on model performance. However, the unstructured pruning, remove individual weights across the model without regard to the structure of the network. While it can lead to significant reductions in model size, it may not always translate to speed gains since the resulting sparse matrices might not be efficiently handled by all hardware.
- Downloads last month
- 10