EU Training Data Transparency: A Proposal for a Sufficiently Detailed Summary πππΌοΈπͺπΊ Jul 3 β’ 8
π Training Data Transparency in AI: Tools, Trends, and Policy Recommendations π³οΈ Dec 5, 2023 β’ 1
Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 β’ 27
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais β’ 9 days ago β’ 94
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M β’ 10 items β’ Updated about 15 hours ago β’ 172
2024 Interconnects Artifacts Collection Models & datasets mentioned in the bottom section of posts! β’ 244 items β’ Updated about 7 hours ago β’ 3
FLAIR models : landcover semantic segmentation Collection The FLAIR models is a collection of semantic segmentation models initially developed to classify land cover on very high resolution aerial imagery. β’ 9 items β’ Updated Jun 19 β’ 10
Pangea Collection A Fully Open Multilingual Multimodal LLM for 39 Languages β’ 18 items β’ Updated 20 days ago β’ 17
view article Article Democratization of AI, Open Source, and AI Auditing: Thoughts from the DisinfoCon Panel in Berlin By frimelle β’ Oct 8 β’ 5
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi β’ 13 items β’ Updated Sep 18 β’ 218
view article Article Getty Images Brings High-Quality, Commercially Safe Dataset to Hugging Face By andreagagliano β’ Sep 6 β’ 16
Qwen2-VL Collection Vision-language model series based on Qwen2 β’ 15 items β’ Updated Sep 18 β’ 157
Building and better understanding vision-language models: insights and future directions Paper β’ 2408.12637 β’ Published Aug 22 β’ 118
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs Paper β’ 2408.13467 β’ Published Aug 24 β’ 24
view article Article π₯ Argilla 2.0: the data-centric tool for AI makers π€ By dvilasuero β’ Jul 30 β’ 37
π MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" β’ 13 items β’ Updated Jul 24 β’ 54
view article Article Querying Datasets with the Datasets Explorer Chrome Extension By cfahlgren1 β’ Jul 19 β’ 6