metadata

model-index:
  - name: jiazhengli/deberta-v3-large-Rationale-to-Score
    results: []
language:
  - en
base_model: microsoft/deberta-v3-large
license: apache-2.0
widget:
  - text: >-
      The student's answer received a score of 0 according to the marking
      rubric, as it failed to describe any additional pieces of information
      necessary to accurately replicate the experiment. The key answer elements
      required specific details such as the amount and type of vinegar used, the
      materials tested, the size/surface area of materials, the rinsing and
      drying durations, the drying method, and the size/type of container. The
      student's response did not address any of these points; instead, it
      suggested changes to the experimental conditions like 'there can be a
      different amount of time','more containers', 'different placements', and
      'temperature differences', which do not provide the specific information
      needed for replication. 
    example_title: Example 1
  - text: >-
      The student's answer scored 3 points as per the marking rubric, which
      requires drawing a valid conclusion supported by data and describing two
      ways to improve the experimental design. The student correctly concluded
      that 'plastic type B, is the most effective type of plastic' matching the
      key answer element for a valid experimental conclusion. The student also
      suggested additional trials ('In the experiment, the group could have done
      1 more trial to really reinforce the results') and maintaining weight
      constant ('the group also should have measured the weight of the weights,
      and kept the weight constant for more valid results'), aligning with the
      key answer elements for improving experimental design. However, the answer
      did not suggest ensuring uniform initial measurements or sample thickness,
      which were other key answer elements. 
    example_title: Example 2
  - text: >-
      The student's answer scored 0 points according to the marking rubric,
      which awards 3 points for addressing three key elements, 2 points for two,
      1 point for one, and 0 points for none. The student's response failed to
      address any of the key answer elements accurately. Specifically, the
      student mentioned 'Equillibrium' and 'Diffusion' but did not explain these
      processes in the context of cell membrane transport mechanisms like
      selective permeability, passive transport, osmosis, facilitated diffusion,
      active transport, or the use of pumps and protein channels. Additionally,
      there was no mention of membrane-assisted processes such as exocytosis,
      endocytosis, phagocytosis, or pinocytosis. 
    example_title: Example 3
  - text: >-
      The student's answer scored 1 point based on the marking rubric that
      awards one point for addressing 'One or two key elements'. The student's
      response correctly addressed the process of mRNA traveling to the
      ribosomes as evidenced by 'First it travels along to the ribosomes',
      fulfilling one key answer element. However, the response failed to
      correctly describe the interactions and processes involving codons,
      anticodons, and the specific mechanism of tRNA in protein synthesis, as
      well as the initiation and elongation phases of protein synthesis, which
      are vital for higher scoring. The phrases 'the ribosomes give the mRNA its
      proper tRNA' and 'Then, it is sent off' show confusion about tRNA's role
      and incorrect understanding of mRNA's interaction with ribosomes. 
    example_title: Example 4

Model Card for deberta-v3-large-Rationale-to-Score

This repository hosts a version of microsoft/deberta-v3-large that has been fine-tuned to assess text-based rationales and generate corresponding scores. As shown in the examples, the model processes a given free-text rationale and outputs a numerical score.

For a comprehensive understanding of the training process and methodologies employed, please refer to our detailed research paper: Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring.

If you utilize this model in your research, please acknowledge it by citing our work:

Citation Information

@misc{li2024calibratingllmspreferenceoptimization,
      title={Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring}, 
      author={Jiazheng Li and Hainiu Xu and Zhaoyue Sun and Yuxiang Zhou and David West and Cesare Aloisi and Yulan He},
      year={2024},
      eprint={2406.19949},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.19949}, 
}