PairRanker used in llm-blender, trained on deberta-v3-large. This is the ranker model used in experiments in LLM-Blender paper, which is trained on mixinstruct dataset for 5 epochs.
Statistics
Context length
PairRanker type | Source max length | Candidate max length | Total max length |
---|---|---|---|
pair-ranker (This model) | 128 | 128 | 384 |
pair-reward-model | 1224 | 412 | 2048 |
MixInstrut Performance
Methods | BERTScore | BARTScore | BLEURT | GPT-Rank | Beat Vic(%) | Beat OA(%) | Top-1(%) | Top-2(%) | Top-3(%) |
---|---|---|---|---|---|---|---|---|---|
Open Assistant | 74.68 | -3.45 | -0.39 | 3.90 | 62.78 | N/A | 17.35 | 35.67 | 51.98 |
Vicuna | 69.60 | -3.44 | -0.61 | 4.13 | N/A | 64.77 | 25.47 | 41.23 | 52.88 |
Alpaca | 71.46 | -3.57 | -0.53 | 4.62 | 56.70 | 61.35 | 15.41 | 29.81 | 44.46 |
Baize | 65.57 | -3.53 | -0.66 | 4.86 | 52.76 | 56.40 | 14.23 | 26.91 | 38.80 |
moss | 64.85 | -3.65 | -0.73 | 5.09 | 51.62 | 51.79 | 15.93 | 27.52 | 38.27 |
ChatGLM | 70.38 | -3.52 | -0.62 | 5.63 | 44.04 | 45.67 | 9.41 | 19.37 | 28.78 |
Koala | 63.96 | -3.85 | -0.84 | 6.76 | 39.93 | 39.01 | 8.15 | 15.72 | 22.55 |
Dolly v2 | 62.26 | -3.83 | -0.87 | 6.90 | 33.33 | 31.44 | 5.16 | 10.06 | 16.45 |
Mosaic MPT | 63.21 | -3.72 | -0.82 | 7.19 | 30.87 | 30.16 | 5.39 | 10.61 | 16.24 |
StableLM | 62.47 | -4.12 | -0.98 | 8.71 | 21.55 | 19.87 | 2.33 | 4.74 | 7.96 |
Flan-T5 | 64.92 | -4.57 | -1.23 | 8.81 | 23.89 | 19.93 | 1.30 | 2.87 | 5.32 |
Oracle(BERTScore) | 77.67 | -3.17 | -0.27 | 3.88 | 54.41 | 38.84 | 20.16 | 38.11 | 53.49 |
Oracle(BLEURT) | 75.02 | -3.15 | -0.15 | 3.77 | 55.61 | 45.80 | 21.48 | 39.84 | 55.36 |
Oracle(BARTScore) | 73.23 | -2.87 | -0.38 | 3.69 | 50.32 | 57.01 | 26.10 | 43.70 | 57.33 |
Oracle(ChatGPT) | 70.32 | -3.33 | -0.51 | 1.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Random | 66.36 | -3.76 | -0.77 | 6.14 | 37.75 | 36.91 | 11.28 | 20.69 | 29.05 |
MLM-Scoring | 64.77 | -4.03 | -0.88 | 7.00 | 33.87 | 30.39 | 7.29 | 14.09 | 21.46 |
SimCLS | 73.14 | -3.22 | -0.38 | 3.50 | 52.11 | 49.93 | 26.72 | 46.24 | 60.72 |
SummaReranker | 71.60 | -3.25 | -0.41 | 3.66 | 55.63 | 48.46 | 23.89 | 42.44 | 57.54 |
PairRanker | 72.97 | -3.14 | -0.37 | 3.20 | 54.76 | 57.79 | 30.08 | 50.68 | 65.12 |
Usage Example
Since PairRanker contains some custom layers and tokens. We recommend use our pairranker with our llm-blender python repo.
Otherwise, loading it directly with hugging face from_pretrained()
API will encounter errors.
- First install
llm-blender
pip install git+https://github.com/yuchenlin/LLM-Blender.git
- Then use pairranker with the following code:
import llm_blender
# ranker config
ranker_config = llm_blender.RankerConfig()
ranker_config.ranker_type = "pairranker" # only supports pairranker now.
ranker_config.model_type = "deberta"
ranker_config.model_name = "microsoft/deberta-v3-large" # ranker backbone
ranker_config.load_checkpoint = "llm-blender/pair-ranker" # hugging face hub model path or your local ranker checkpoint <your checkpoint path>
ranker_config.cache_dir = "./hf_models" # hugging face model cache dir
ranker_config.source_maxlength = 128
ranker_config.candidate_maxlength = 128
ranker_config.n_tasks = 1 # number of singal that has been used to train the ranker. This checkpoint is trained using BARTScore only, thus being 1.
fuser_config = llm_blender.GenFuserConfig()
# ignore fuser config as we don't use it here. You can load it if you want
blender_config = llm_blender.BlenderConfig()
# blender config
blender_config.device = "cuda" # blender ranker and fuser device
blender = llm_blender.Blender(blender_config, ranker_config, fuser_config)
- Then you can rank candidates with the following function
inputs = ["input1", "input2"]
candidates_texts = [["candidate1 for input1", "candidatefor input1"], ["candidate1 for input2", "candidate2 for input2"]]
ranks = blender.rank(inputs, candidates_texts, return_scores=False, batch_size=2)
# ranks is a list of ranks where ranks[i][j] represents the ranks of candidate-j for input-i
- Using pairranker to directly compare two candidates
candidates_A = [cands[0] for cands in candidates]
candidates_B = [cands[1] for cands in candidates]
comparison_results = blender.compare(inputs, candidates_A, candidates_B)
# comparison_results is a list of bool, where element[i] denotes whether candidates_A[i] is better than candidates_B[i] for inputs[i]
See LLM-Blender Github README.md and jupyter file blender_usage.ipynb for detailed usage examples.
- Downloads last month
- 7