I am delighted to share my recent project, GECKO, a bilingual large language model for Korean and English π°π·πΊπΈ. This initiative was inspired by the lack of resources for Korean large language models.
@donggyukimc and I wrote the technical report to share our insights and experiences of developing our model. While our model may not achieve sate-of-the-art performance on all benchmarks, it shows modest results with a relatively small amount of pretrained tokens.
I hope GECKO contribute to the open-source community, offering resources that can built upon and improved. I believe that through collaboration and shared knowledge, we can advance the capabilities and accessibility of large language models for Korean and other low-resource languages.