Commits · inflaton-ai/logical-reasoning

ready for tuning qwen2.5-72b

de489ee

dh-mc commited on Sep 21, 2024

llama3.1-70b 30-shot

f06a1e9

inflaton commited on Sep 21, 2024

ready for qwen2.5-7b

37fb5b2

dh-mc commited on Sep 21, 2024

Update logical_reasoning_utils.py

0c7b7f6

dh-mc commited on Sep 21, 2024

ready for qwen2.5

d5ab5d2

dh-mc commited on Sep 21, 2024

qwen2.5-3b 0-shot

06dfa32

dh-mc commited on Sep 21, 2024

updated scripts

959d8cc

dh-mc commited on Sep 21, 2024

Qwen2.5 fine-tuned

f584ea4

inflaton commited on Sep 21, 2024

tune qwen2.5

88d67f6

dh-mc commited on Sep 21, 2024

o1-preview few-shots complete

6ff118c

inflaton commited on Sep 21, 2024

llama3.1-70b 20-shot

96f3c1e

inflaton commited on Sep 21, 2024

more o1 results

e7f34e0

inflaton commited on Sep 21, 2024

more

489600e

inflaton commited on Sep 21, 2024

o1-min 50-shot

3c4359a

inflaton commited on Sep 20, 2024

mistral 5-shot

5158692

inflaton commited on Sep 20, 2024

counted few-shot prompts for all models

a8683cf

dh-mc commited on Sep 20, 2024

o1-preview 20-shot

0baa6cc

inflaton commited on Sep 19, 2024

Update eval-mgtv-shots_4bit.sh

492d1d4

dh-mc commited on Sep 19, 2024

log

fe51ea8

inflaton commited on Sep 19, 2024

o1-preview 5-shot

f2a583b

inflaton commited on Sep 19, 2024

o1-mini 5/20 shots results

9042941

inflaton commited on Sep 19, 2024

try 5-shot for open source models

d2150e8

dh-mc commited on Sep 18, 2024

o1-preview 0-shot

545719f

inflaton commited on Sep 18, 2024

o1-mini 0-shot

16adfc9

inflaton commited on Sep 16, 2024

o1-preview 10-shot

6838eea

inflaton commited on Sep 16, 2024

ready to run 10-shots for 70/72B models

809e98c

dh-mc commited on Sep 16, 2024

10-shot results ready for 7/8 B models

3db2ae5

dh-mc commited on Sep 16, 2024

logs/internlm2_5-20b-chat_tune_and_few_shots.txt

d8cfffb

inflaton commited on Sep 16, 2024

10-shot results

6bc1181

inflaton commited on Sep 15, 2024

internlm 20b results

47d6ce1

inflaton commited on Sep 15, 2024

0-shot notebook

5b276b0

dh-mc commited on Sep 15, 2024

Create eval-mgtv-internlm-20b.sh

75c4663

dh-mc commited on Sep 15, 2024

mistral 10-shot

33cd694

dh-mc commited on Sep 15, 2024

rtx4090 0-shot

d028752

dh-mc commited on Sep 15, 2024

ready for few shots eval

cf912f1

dh-mc commited on Sep 14, 2024

claude 0-shot

397a2fa

inflaton commited on Sep 14, 2024

added original data from MGTV challenge

5f9686b

dh-mc commited on Sep 14, 2024

https://github.com/mazzzystar/TurtleBenchmark

444a581

dh-mc commited on Sep 14, 2024

compare o1 vs gpt-4o

4cd13da

dh-mc commited on Sep 14, 2024

o1-mini analyzed

f1b0a53

dh-mc commited on Sep 13, 2024

o1-mini results

fd14581

inflaton commited on Sep 13, 2024

LogiQA2.0 dataset

bf13772

dh-mc commited on Sep 13, 2024

openai batch

921fa92

dh-mc commited on Sep 13, 2024

Create 04e_OpenAI_comparison.ipynb

2bb5512

dh-mc commited on Sep 13, 2024

internlm_v2 results

83818dc

inflaton commited on Sep 13, 2024

internlm2_5-7b-chat fine-tune results

e4bce5e

inflaton commited on Sep 13, 2024

added scripts/eval-mgtv-internlm_v2.sh

71dcee7

inflaton commited on Sep 13, 2024

Update 04_Few-shot_Prompting_OpenAI.ipynb

8e678e8

dh-mc commited on Sep 12, 2024

ready for fine-tuning internlm2_5-20b-chat

62c2b84

dh-mc commited on Sep 12, 2024

saved best results/metrics

573f5d1

dh-mc commited on Sep 12, 2024

Commit History

ready for tuning qwen2.5-72b de489ee

llama3.1-70b 30-shot f06a1e9

ready for qwen2.5-7b 37fb5b2

Update logical_reasoning_utils.py 0c7b7f6

ready for qwen2.5 d5ab5d2

qwen2.5-3b 0-shot 06dfa32

updated scripts 959d8cc

Qwen2.5 fine-tuned f584ea4

tune qwen2.5 88d67f6

o1-preview few-shots complete 6ff118c

llama3.1-70b 20-shot 96f3c1e

more o1 results e7f34e0

more 489600e

o1-min 50-shot 3c4359a

mistral 5-shot 5158692

counted few-shot prompts for all models a8683cf

o1-preview 20-shot 0baa6cc

Update eval-mgtv-shots_4bit.sh 492d1d4

log fe51ea8

o1-preview 5-shot f2a583b

o1-mini 5/20 shots results 9042941

try 5-shot for open source models d2150e8

o1-preview 0-shot 545719f

o1-mini 0-shot 16adfc9

o1-preview 10-shot 6838eea

ready to run 10-shots for 70/72B models 809e98c

10-shot results ready for 7/8 B models 3db2ae5

logs/internlm2_5-20b-chat_tune_and_few_shots.txt d8cfffb

10-shot results 6bc1181

internlm 20b results 47d6ce1

0-shot notebook 5b276b0

Create eval-mgtv-internlm-20b.sh 75c4663

mistral 10-shot 33cd694

rtx4090 0-shot d028752

ready for few shots eval cf912f1

claude 0-shot 397a2fa

added original data from MGTV challenge 5f9686b

https://github.com/mazzzystar/TurtleBenchmark 444a581

compare o1 vs gpt-4o 4cd13da

o1-mini analyzed f1b0a53

o1-mini results fd14581

LogiQA2.0 dataset bf13772

openai batch 921fa92

Create 04e_OpenAI_comparison.ipynb 2bb5512

internlm_v2 results 83818dc

internlm2_5-7b-chat fine-tune results e4bce5e

added scripts/eval-mgtv-internlm_v2.sh 71dcee7

Update 04_Few-shot_Prompting_OpenAI.ipynb 8e678e8

ready for fine-tuning internlm2_5-20b-chat 62c2b84

saved best results/metrics 573f5d1

ready for tuning qwen2.5-72b

de489ee

llama3.1-70b 30-shot

f06a1e9

ready for qwen2.5-7b

37fb5b2

Update logical_reasoning_utils.py

0c7b7f6

ready for qwen2.5

d5ab5d2

qwen2.5-3b 0-shot

06dfa32

updated scripts

959d8cc

Qwen2.5 fine-tuned

f584ea4

tune qwen2.5

88d67f6

o1-preview few-shots complete

6ff118c

llama3.1-70b 20-shot

96f3c1e

more o1 results

e7f34e0

more

489600e

o1-min 50-shot

3c4359a

mistral 5-shot

5158692

counted few-shot prompts for all models

a8683cf

o1-preview 20-shot

0baa6cc

Update eval-mgtv-shots_4bit.sh

492d1d4

log

fe51ea8

o1-preview 5-shot

f2a583b

o1-mini 5/20 shots results

9042941

try 5-shot for open source models

d2150e8

o1-preview 0-shot

545719f

o1-mini 0-shot

16adfc9

o1-preview 10-shot

6838eea

ready to run 10-shots for 70/72B models

809e98c

10-shot results ready for 7/8 B models

3db2ae5

logs/internlm2_5-20b-chat_tune_and_few_shots.txt

d8cfffb

10-shot results

6bc1181

internlm 20b results

47d6ce1

0-shot notebook

5b276b0

Create eval-mgtv-internlm-20b.sh

75c4663

mistral 10-shot

33cd694

rtx4090 0-shot

d028752

ready for few shots eval

cf912f1

claude 0-shot

397a2fa

added original data from MGTV challenge

5f9686b

https://github.com/mazzzystar/TurtleBenchmark

444a581

compare o1 vs gpt-4o

4cd13da

o1-mini analyzed

f1b0a53

o1-mini results

fd14581

LogiQA2.0 dataset

bf13772

openai batch

921fa92

Create 04e_OpenAI_comparison.ipynb

2bb5512

internlm_v2 results

83818dc

internlm2_5-7b-chat fine-tune results

e4bce5e

added scripts/eval-mgtv-internlm_v2.sh

71dcee7

Update 04_Few-shot_Prompting_OpenAI.ipynb

8e678e8

ready for fine-tuning internlm2_5-20b-chat

62c2b84

saved best results/metrics

573f5d1