nyrkr-joker-llama

New Yorker cartoon description and caption -> attempt at a joke explanation

Technical details:

Prompt options

The original paper, Figure 10 uses this format for joke explanations:

`In this task, you will see a description of an uncanny situation. Then, you will see a joke that was written about the situation. Explain how the joke relates to the situation and why it is funny. ###

{few-shot examples separated by ###, newline after "explanation of the caption:"} This scene takes place in the following location: a bank. Three people are standing in line at the bank. The bank teller is a traditional pirate with a hook hand, eye patch, and a parrot. The scene includes: Piracy, Bank teller. caption: Can I interest you in opening an offshore account? explanation of the caption: `

In training, I used just the individual example:

This scene takes place in the following location: a bank. Three people are standing in line at the bank. The bank teller is a traditional pirate with a hook hand, eye patch, and a parrot. The scene includes: Piracy, Bank teller. caption: Can I interest you in opening an offshore account? explanation of the caption:\n

In inference, I had some better results with a more natural prompt (no newline or space at end)

This scene takes place in the following location: a bank. Three people are standing in line at the bank. The bank teller is a traditional pirate with a hook hand, eye patch, and a parrot. The scene includes: Piracy, Bank teller. caption: Can I interest you in opening an offshore account? the caption is funny because

Training script

Trained on a V100

git clone https://github.com/artidoro/qlora
cd qlora

pip3 install -r requirements.txt --quiet

! cd qlora && python qlora.py \
    --model_name_or_path ../llama-2-7b-hf \
    --output_dir ../thatsthejoke \
    --logging_steps 20 \
    --save_strategy steps \
    --data_seed 42 \
    --save_steps 80 \
    --save_total_limit 10 \
    --evaluation_strategy steps \
    --max_new_tokens 64 \
    --dataloader_num_workers 1 \
    --group_by_length \
    --logging_strategy steps \
    --remove_unused_columns False \
    --do_train \
    --lora_r 64 \
    --lora_alpha 16 \
    --lora_modules all \
    --double_quant \
    --quant_type nf4 \
    --bits 4 \
    --warmup_ratio 0.03 \
    --lr_scheduler_type constant \
    --gradient_checkpointing \
    --dataset /content/nycaptions.jsonl \
    --dataset_format 'self-instruct' \
    --source_max_len 16 \
    --target_max_len 512 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --max_steps 250 \
    --eval_steps 187 \
    --learning_rate 0.0002 \
    --adam_beta2 0.999 \
    --max_grad_norm 0.3 \
    --lora_dropout 0.1 \
    --weight_decay 0.0 \
    --seed 0
Downloads last month
27
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train monsoon-nlp/nyrkr-joker-llama