Papers
arxiv:2312.15166

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Published on Dec 23, 2023
ยท Submitted by akhaliq on Dec 27, 2023
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

We introduce depth up-scaling (DUS), a novel technique to up-scale base LLMs efficiently and effectively in a simple manner. In contrast to mixture-of-experts (MoE), DUS does not require complex changes to train and inference. Using DUS, we build SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Comparative evaluations show that SOLAR 10.7B outperforms existing open-source pretrained LLMs, such as Llama 2 and Mistral 7B. We additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

Community

can you upload to ollama.ai?

can you upload to ollama.ai?

Already there. Just run: ollama run solar.

Hi! Coming from this discussion post - what data was used for continued pre-training/was any of the continued pre-training data was synthetically generated via OpenAI models (or any other source with similarly restrictive terms of use)?

@pszemraj Details of Data Sets and Training Techniques: Thank you for your interest! Unfortunately, due to the high level of competition in this field, we are unable to share detailed information about the training techniques and datasets used. We appreciate your understanding. However, we have released a list of fine-tuning datasets.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

@hunkim thanks! understood. I'm primarily interested in this checkpoint upstage/SOLAR-10.7B-v1.0 as it is apache-2.0 - based on your response it seems like you all have done your homework. I assume there is no issue using upstage/SOLAR-10.7B-v1.0 to the fullest extent of it's apache-2.0 license, including synthetic data generation, commercial use, etc. Please advise if my interpretation is incorrect & thanks again (sorry for duplicate response vs. original thread)

@pszemraj Details of Data Sets and Training Techniques: Thank you for your interest! Unfortunately, due to the high level of competition in this field, we are unable to share detailed information about the training techniques and datasets used. We appreciate your understanding. However, we have released a list of fine-tuning datasets.

What format should fine-tune data use?

Why paper so heavily compares itself against MoE without mentioning or comparing itself to prior nlp works, eg Progressively Stacking 2.0[arxiv 2011.13635] works in similar fashion. It also somewhat alleviates problem of "we removed m = 8 layers from both ends of our base model, primarily due to hardware limitations." by freezing original of copied part until the very end, when they train whole model on part of dataset (and their work predates lora; for all we know instead of targeting part of dataset to complete network trained by parts, targeting LoRA works as well but supports bigger model).

@Maykeye Thank you for your comments. We'll take them into account for our next paper revision.

Sign up or log in to comment

Models citing this paper 110

Browse 110 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 69

Collections including this paper 30