ABOUT_TEXT = """# Background Model contamination is an obstacle that many model creators face and has become a growing issue amongst the top scorers in [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). This work is an implementation of the [Detecting Pretraining Data from Large Language Models](https://huggingface.co/papers/2310.16789) following the template provided by [this github repo](https://github.com/swj0419/detect-pretrain-code-contamination/tree/master). I'm aware the Hugginface Team is working on their own implementation of this working directly with the authors of the paper mentioned above. Until that's ready I hope this serves as a metric for evaluating model contamination in open source llms. # Disclaimer This space should NOT be used to flag or accuse models of cheating / being contamined. Instead, it should form part of a holistic assesment by the parties involved. The main goal of this space is to provide more transparency as to what the contents of the datasets used to train models are take whatever is shown in the evaluation's tab as a grain of salt and draw your own conclusions from the data. As a final note, I've outlined my main concerns with this implementation in a pinned discussion under the community tab. Any type of help would be greatly appreciated :)""" SUBMISSION_TEXT = """