AbrahamSanders/opt-2.7b-realtime-chat

Base model facebook/opt-2.7b

Fine-tuned for causal language modeling of transcribed spoken dialogue from the TalkBank CABank collection. Training corpora include:

CABNC - Spoken language segment of the British National Corpus
CallFriend English (N) - Phone calls
CallFriend English (S) - Phone calls
CallHome English - Phone calls
GCSAusE - Australian conversations
ISL - Conversations recorded to test ASR methods for meeting
MICASE - Michigan Corpus of Academic Spoken English
SCoSE - The Saarbrücken Corpus of Spoken (American) English.

(Corpus descriptions are from TalkBank)

Data input format: The data format models a sequence of spoken dialogue between two or more participants:

The sequence is prefixed with information about the participants including name (can be a proper noun, a title/role, or unknown), age (can be a number or unknown), and sex (can be male, female, other, unknown).
It then proceeds to sequentially list all utterances in the conversation, each prefixed with their participant code (S1, S2, S3, etc.).
Utterances support a limited set of transcription notations in the CHAT & CHAT-CA formats:
- Pauses: (.) for a generic short pause, or (N.N) for a timed pause. For example (3.4) is a pause for 3.4 seconds.
- Non-verbal sounds: &=laughs, &=cough, &=breathes, &=click, etc. Anything describing a speaker-produced non-verbal sound can come after a prefix of &=
- Comments about speaker or setting: [% baby crying in background], [% smiling], [% phone clicking noise], [% imitating him], etc. Anything describing the state of the speaker or environment can be in this block. Also, a comment block can be used to describe speaker-produced sounds, but it is more common to use the &= prefix for that.
- Unknown or unintelligible utterances: xxx
- Breathing: hhh

Example:

<participant> S1 (name: Dave, age: 33, sex: male) <participant> S2 (name: unknown, age: unknown, sex: unknown) <dialog> S1: Hi! (2.3) are you there? S2: hhh hhh [% background noise] uh yeah (0.8) I can hear you. (1.2) &=cough can you hear me? S1: ...

Usage Info:

Per the OPT documentation, the model was trained with tokenizer setting use_fast=False.

To use this model for real-time inference in a continuous duplex dialogue system, see: https://github.com/AbrahamSanders/realtime-chatbot.