|
--- |
|
license: unknown |
|
language: |
|
- en |
|
- zh |
|
--- |
|
|
|
This model is a weight-pruned large language model originated from Vicuna-13B. |
|
Language model pruning is a technique used to reduce the size and computational requirements of language models, |
|
making them more efficient for deployment without significantly sacrificing their performance or accuracy. |
|
|
|
This model uses structured pruning instead of unstructured pruning. |
|
The structured pruning removes entire units or channels (e.g., neurons, layers, or filter channels in trnasformer). |
|
This approach can lead to more efficient computational gains since it aligns better with how hardware utilizes data, |
|
but it may have a more significant impact on model performance. |
|
However, the unstructured pruning, remove individual weights across the model without regard to the structure of the network. |
|
While it can lead to significant reductions in model size, |
|
it may not always translate to speed gains since the resulting sparse matrices might not be efficiently handled by all hardware. |