daedalus314
/

Marx-3B-V2-GPTQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

daedalus314 commited on Oct 12, 2023

Commit

3a2c5e7

•

1 Parent(s): 262f2c3

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ This model is a quantized version of [Marx-3B-V2](https://huggingface.co/acrastt
 # Usage
 The model has been quantized as part of the project [GPTStonks](https://github.com/GPTStonks). It works with `transformers>=4.33.0` and it can run on a consumer GPU, with less than 3GB of GPU RAM. The libraries `optimum`, `auto-gptq`, `peft` and `accelerate` should also be installed.
-Here is a sample code to load the model and run inference with it using sampling as decoding strategy:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch

 # Usage
 The model has been quantized as part of the project [GPTStonks](https://github.com/GPTStonks). It works with `transformers>=4.33.0` and it can run on a consumer GPU, with less than 3GB of GPU RAM. The libraries `optimum`, `auto-gptq`, `peft` and `accelerate` should also be installed.
+Here is a sample code to load the model and run inference with it using greedy decoding:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch