An untrained precursor MoE created from Cosmo using mergekit.

Gate routing initialized using prompt hidden state method. Five are based on the visualized topic clusters of Cosmopedia data, three are task-oriented.

Degenerate layers were 0, 1, and 2. Expert gates for layers 0, 1, and 2 have been randomly initialized to with luck mitigate this.

Downloads last month: 29

Safetensors

Model size

10.2B params

Tensor type

F32

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Lambent/cosmoem-8x1B

Adapters

1 model

Quantizations

1 model

Lambent
/

cosmoem-8x1B

Model tree for Lambent/cosmoem-8x1B

Dataset used to train Lambent/cosmoem-8x1B