Edit model card

cosmosage

cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.

cosmosage-v3 is the latest iteration in the cosmosage series. It was trained on top of the LLAMA-3-8B base model. We started with continued pretraining on thousands of papers and textbooks. The next step was fine-tuning on synthetically-generated question-answer pairs. In addition, the OpenHermes 2.5 dataset was used to improve instruction following and general conversational capability.

cosmosage-v3 is a full chat model, though it excels in Q&A mode, where the model gives a single answer in response to a single question.

The code used to generate cosmosage is available at https://github.com/tijmen/cosmosage

A presentation on cosmosage was given on 2024-01-21 at QUP, KEK and is available at https://www.youtube.com/watch?v=azwfG2UTNEY

A paper was prepared and a preprint is available at https://arxiv.org/abs/2407.04420

Usage

cosmosage-v3 uses the Llama-3 prompt template.

Comparison to cosmosage_v2

cosmosage-v3 was trained on the same data with nearly the same hyperparameters as cosmosage_v2. The primary difference is the use of the LLAMA-3-8B base model rather than the Mistral-v0.1 base model.

Training details

cosmosage-v3 was trained on 4xA100 (40 GB) at the Gadi supercomputer, part of NCI, Australia. A big thanks goes out to Yuan-Seng Ting for providing these resources.

Example output

User:

What is "stacking"?

Assistant:

Stacking is a technique used in astronomy to combine multiple images or data sets of the same object into a single image or dataset. This can be done by aligning and co-adding the individual images or data points, which helps to improve the signal-to-noise ratio and reveal faint features that may not be visible in a single image or dataset. Stacking is often used to study the properties of galaxies, galaxy clusters, and other astronomical objects.

User:

Explain the ISW effect.

Assistant:

The Integrated Sachs-Wolfe (ISW) effect is a phenomenon in modern cosmology that describes the gravitational interaction between cosmic microwave background (CMB) photons and large-scale structures in the universe. It occurs when CMB photons pass through regions of space where the gravitational potential has changed due to the expansion of the universe or the presence of matter overdensities.

When a photon moves into an area with a higher gravitational potential, it gains energy as it falls into the potential well. As the photon climbs out of this well on the other side, it loses energy. If the change in the gravitational potential is gradual over time, the photon's energy gain and loss will be integrated, leading to a net temperature shift in the CMB radiation. This effect can either cause a blueshift (increased energy) if the gravitational potential is decreasing, or a redshift (decreased energy) if the gravitational potential is increasing.

The ISW effect is particularly important for understanding the evolution of the universe and the role of dark energy. The detection of the ISW effect has provided evidence for the existence of dark energy and has helped constrain its properties. By analyzing the CMB temperature fluctuations and cross-correlating them with tracers of large-scale structure such as galaxy surveys, researchers can study the ISW effect and learn more about the dynamics of the universe.

User:

What is Digital Active Nulling?

Assistant:

Digital Active Nulling (DAN) is a technique used in the readout of Transition Edge Sensor (TES) bolometers. It works by dynamically nulling the current flowing through the SQUID input coil while digitally demodulating the carrier voltage and the current signal. This method helps to minimize non-linearity effects due to the nonlinear response of the SQUID, improving the overall performance of the system.

User:

Explain why the time of matter-radiation equality affects the CMB damping tail.

Assistant:

The time of matter-radiation equality is when the energy densities of matter and radiation were equal. This has an impact on the Cosmic Microwave Background (CMB) damping tail because it determines the amount of photon diffusion that occurs before decoupling. A later time of matter-radiation equality results in more photon diffusion, leading to a smaller amplitude in the damping tail.

Citation

cosmosage can be cited using the following BibTeX entry:

@article{de_haan_cosmosage_2024,
    title = {cosmosage: {A} {Natural}-{Language} {Assistant} for {Cosmologists}},
    shorttitle = {cosmosage},
    url = {http://arxiv.org/abs/2407.04420},
    abstract = {cosmosage is a natural-language assistant intended for a wide audience, from laypersons interested in cosmology to students, teachers, and professional cosmologists. cosmosage provides a novel way to access knowledge and reason about cosmology. Leveraging the power of advanced large language models (LLMs), cosmosage has learned from a vast corpus of open-access source texts, including textbooks and papers. cosmosage is found to be state-of-the-art on the narrow task of answering questions about cosmology, outperforming all general-purpose models. The model parameters and code are publicly available.},
    urldate = {2024-07-08},
    publisher = {arXiv},
    author = {de Haan, Tijmen},
    month = jul,
    year = {2024},
    note = {arXiv:2407.04420 [astro-ph]},
    keywords = {Astrophysics - Cosmology and Nongalactic Astrophysics, Astrophysics - Instrumentation and Methods for Astrophysics},
}

Built with Axolotl

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 16.80
IFEval (0-Shot) 44.82
BBH (3-Shot) 22.69
MATH Lvl 5 (4-Shot) 1.74
GPQA (0-shot) 4.36
MuSR (0-shot) 10.69
MMLU-PRO (5-shot) 16.51
Downloads last month
15
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Tijmen2/cosmosage-v3

Finetuned
(358)
this model
Quantizations
4 models

Dataset used to train Tijmen2/cosmosage-v3