Tokenizer is different from cohere - and chat template is ChatML - fully fine-tuned at 128K+ ~ 30M entries long, web crawl input, GPT-4-32k/3.5-16k output, synthetic dataset - 1 epoch

For another candidate version of 1 epoch - https://huggingface.co/CausalLM/35b-beta - somehow less overfitting?

No loras, no quants, no tricks.

This one is not "very 128k", use https://huggingface.co/CausalLM/35b-beta-long for long context. But better in general tasks, knowledge, coding and so on.

And, merge them if you want!

Downloads last month
12
Safetensors
Model size
35B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train CausalLM/35b-beta2ep

Collection including CausalLM/35b-beta2ep