yangheng's picture
update_leaderboard
1359887
raw
history blame
5.62 kB
from dataclasses import dataclass
from enum import Enum
@dataclass
class Task:
benchmark: str
metric: str
col_name: str
# Select your tasks here
# ---------------------------------------------------
class Tasks(Enum):
# task_key in the json file, metric_key in the json file, name to display in the leaderboard
task0 = Task("mRNA", "RMSE", "mRNA (RMSE)")
task1 = Task("SNMD", "AUC", "SNMD (AUC)")
task2 = Task("SNMR", "F1", "SNMR (F1)")
task3 = Task("ArchiveII", "F1", "ArchiveII (F1)")
task4 = Task("bpRNA", "F1", "bpRNA (F1)")
task5 = Task("RNAStralign", "F1", "RNAStralign (F1)")
NUM_FEWSHOT = 0 # Change with your few shot
# ---------------------------------------------------
# Your leaderboard name
TITLE = """<h1 align="center" id="space-title">OmniGenomeBench Leaderboard</h1>"""
LLM_BENCHMARKS_TEXT = f"""## Why do we need this benchmark?Large-scale foundation models for molecular biology constitute a vital and rapidly developing change in the computational biology and AI4Science landscape.As key parts of biology, such as DNA, RNA sequences, secondary structures, have a large effect on each other, the usage of this information within large-scale models allows for foundation models to be adapted and suited to multiple key tasks.However, with this trend comes significant issues, the primary one being the difficulty to comprehensively evaluate these models and compare them fairly.Here, we refer to the specific lack of real-world data to reflect the true performance of the models, rather than in-silico experiments only.This issue forces repeated benchmark testing and models being trained and adapted for a specific task that may not have any real-world benefit.Given the importance of this, we propose this genomic leaderboard on meticulously curated real-world datasets, to allow for a fair and comprehensive benchmark on the most important genomic downstream tasks.## Evaluation DatasetsTODO HERE## Reported Scores and RankingTODO HERE## How it worksDo we need this?## ReproducibilityTo reproduce our results, here are the commands you can run:"""
EVALUATION_QUEUE_TEXT = """## Some good practices before submitting a model### 1) Make sure you can load your model and tokenizer using AutoClasses:```pythonfrom transformers import AutoConfig, AutoModel, AutoTokenizerconfig = AutoConfig.from_pretrained("your model name", revision=revision)model = AutoModel.from_pretrained("your model name", revision=revision)tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)```If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.Note: make sure your model is public!Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!### 3) Make sure your model has an open license!This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model :hugging_face:### 4) Fill up your model cardWhen we add extra information about models to the leaderboard, it will be automatically taken from the model card## In case of model failureIf your model is displayed in the `FAILED` category, its execution stopped.Make sure you have followed the above steps first.If everything is done, check you can launch the EleutherAIHarness on your model locally, using the above command without modifications (you can add `--limit` to limit the number of examples per task)."""
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
CITATION_BUTTON_TEXT = r"""@article{Yang2024, author = {Yang, Heng and Li, Ke}, title = {Foundation Models Work}, journal = {arXiv}, year = {2024}, note = {arXiv preprint arXiv:XXXX.XXXXX} url = {https://arxiv.org/abs/XXXX.XXXXX}}"""
# What does your leaderboard evaluate?
INTRODUCTION_TEXT = """
The deciphering of RNA and DNA genomes has been ongoing for decades, with the aim of advancing genome analysis, including understanding and synthesizing genomes. Recently, Genomic Foundation Models (GFMs) have emerged as powerful tools for genome analysis and manipulation, leveraging advancements in natural language processing to model the "genomic language" encoded in genomes. However, GFMs face two significant challenges: the lack of benchmarking tools and open-source software for diverse genomics. This hinders progress in various genomic tasks, such as RNA design and structure prediction.
We address these challenges by introducing a dedicated benchmarking toolkit, GFM-Bench. It integrates millions of genomic sequences across hundreds of tasks from four large-scale benchmarks, ensuring robust evaluation of GFMs under the FAIR principles. GFM-Bench tackles issues of data insufficiency, metric reliability, transfer benchmarking, and reproducibility—critical for identifying the limitations of GFMs.
Additionally, we present an open-source software designed to simplify and democratize the use of GFMs for various in-silico genomic tasks. This software offers easy-to-use interfaces, tutorials, and broad compatibility with GFMs and genomic tasks, promoting transparency and innovation in the field. It also includes a public leaderboard for existing GFMs to drive advancements in genome modeling.
"""