eitanturok's picture
Update README.md
3b1ad8f verified
metadata
library_name: transformers
tags: []

Model Card for Model ID

This is the Llama-2-7b-chat tokenizer but modified to support tool use and function calling.

There are only two things different about this tokenizer:

  1. This chat template supports the "tool" role while the original tokenizer only supported the "system", "assisstant", and "user" roles.
  2. The old Llama tokenizer forced users to have alternating "assistant", "user", "assistant", "user" turns. This chat template does not have any such requirements.

The chat template of this tokenizer looks like this:

{% if messages[0]['role'] == 'system' %}
    {% set loop_messages = messages[1:] %}
    {% set system_message = '<<SYS>>\n' + messages[0]['content'].strip() + '\n<</SYS>>\n\n' %}
{% else %}
    {% set loop_messages = messages %}
    {% set system_message = '' %}
{% endif %}

{% for message in loop_messages %}

    {% if loop.index0 == 0 %}
        {% set content = system_message + message['content'] %}
    {% else %}
        {% set content = message['content'] %}
    {% endif %}

    {% if message['role'] == 'user' %}
        {{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}
    {% elif message['role'] == 'assistant' %}
        {{ ' '  + content.strip() + ' ' + eos_token }}
    {% elif message['role'] == 'tool' %}
        {{ ' '  + content.strip() + ' ' + eos_token }}
    {% endif %}
{% endfor %}

The old Llama chat template (which we no longer use) looked like this:

{% if messages[0]['role'] == 'system' %}
  {% set loop_messages = messages[1:] %}
  {% set system_message = messages[0]['content'] %}
{% else %}
  {% set loop_messages = messages %}
  {% set system_message = false %}
{% endif %}

{% for message in loop_messages %}

  {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
    {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}
  {% if loop.index0 == 0 and system_message != false %}
    {% set content = '<<SYS>>\n' + system_message + '\n<</SYS>>\n\n' + message['content'] %}
  {% else %}
    {% set content = message['content'] %}
  {% endif %}

  {% if message['role'] == 'user' %}
    {{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}
  {% elif message['role'] == 'assistant' %}
    {{ ' '  + content.strip() + ' ' + eos_token }}
  {% endif %}
{% endfor %}