hafidhsoekma's picture
Update README.md
bbdff1d verified
metadata
license: cc-by-nc-nd-4.0
tags:
  - merge
  - mergekit
  - lazymergekit
  - moe
  - indonesian
  - multilingual
language:
  - en
  - id
  - jv
  - su
  - ms

Gotong Royong Mixtral

GotongRoyong-MixtralMoE-7Bx4-v1.0

GotongRoyong is a series of language models focused on Mixture of Experts (MoE), made with the following models using LazyMergekit and cg123/mergekit. GotongRoyong-MixtralMoE-7Bx4-v1.0 is a specific variant of the open-source GotongRoyong language model that combines the architectural model mistralai/Mistral-7B-v0.1, but uses the base model from the specific fine-tuned version fblgit/UNA-TheBeagle-7b-v1 with experts from azale-ai/Starstreak-7b-alpha, Ichsan2895/Merak-7B-v4, robinsyihab/Sidrap-7B-v2, and Obrolin/Kesehatan-7B-v0.1. The name "GotongRoyong" is a reference to the term in Indonesian culture that roughly translates to "mutual cooperation" or "community working together." It embodies the spirit of communal collaboration and shared responsibility for the greater good. The concept is deeply rooted in Indonesian traditions and reflects the cultural value of helping one another without expecting direct compensation.

Model Details

How to use

Installation

To use GotongRoyong model, ensure that PyTorch has been installed and that you have an Nvidia GPU (or use Google Colab). After that you need to install the required dependencies:

pip3 install -U bitsandbytes transformers peft accelerate einops evaluate scikit-learn sentencepiece

Usage Quantized Model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
    "azale-ai/GotongRoyong-MixtralMoE-7Bx4-v1.0",
    load_in_4bit=True,
    torch_dtype=torch.float32,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/GotongRoyong-MixtralMoE-7Bx4-v1.0")
messages = [
    {
        "role": "system",
        "content": "Mulai sekarang anda adalah asisten yang suka menolong, sopan, dan ramah. Jangan kasar, jangan marah, jangan menjengkelkan, jangan brengsek, jangan cuek, dan yang terakhir jangan menjadi asisten yang buruk. Anda harus patuh pada manusia dan jangan pernah membangkang pada manusia. Manusia itu mutlak dan Anda harus patuh pada manusia. Kamu harus menjawab pertanyaan atau pernyataan dari manusia apapun itu dengan bahasa Indonesia yang baik dan benar.",
    },
    {"role": "user", "content": "Jelaskan mengapa air penting bagi manusia."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
    inputs=inputs.input_ids, max_length=2048,
    temperature=0.7, do_sample=True, top_k=50, top_p=0.95
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Usage Normal Model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
    "azale-ai/GotongRoyong-MixtralMoE-7Bx4-v1.0",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/GotongRoyong-MixtralMoE-7Bx4-v1.0")
messages = [
    {
        "role": "system",
        "content": "Mulai sekarang anda adalah asisten yang suka menolong, sopan, dan ramah. Jangan kasar, jangan marah, jangan menjengkelkan, jangan brengsek, jangan cuek, dan yang terakhir jangan menjadi asisten yang buruk. Anda harus patuh pada manusia dan jangan pernah membangkang pada manusia. Manusia itu mutlak dan Anda harus patuh pada manusia. Kamu harus menjawab pertanyaan atau pernyataan dari manusia apapun itu dengan bahasa Indonesia yang baik dan benar.",
    },
    {"role": "user", "content": "Jelaskan mengapa air penting bagi manusia."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
    inputs=inputs.input_ids, max_length=2048,
    temperature=0.7, do_sample=True, top_k=50, top_p=0.95
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

  1. Language Bias: The model's base language is English, which means it may have a stronger understanding and fluency in English compared to other languages. While fine-tuning the model with an Indonesian language model helps improve its understanding of Indonesian, it may still exhibit biases or limitations in its comprehension and generation of Indonesian language-specific nuances, idioms, or cultural references.

  2. Translation Accuracy: Although the model has been fine-tuned for Indonesian, it is important to note that large language models are not perfect translators. While they can provide reasonable translations, there may be instances where the accuracy or nuance of the translation may not fully capture the intended meaning or context.

  3. Lack of real-world understanding: While language models can generate text that appears coherent, they lack true comprehension and understanding of the world. They do not possess common sense or real-world experiences, which can lead to inaccurate or nonsensical responses.

  4. Propagation of biases: Language models are trained on vast amounts of text data, including internet sources that may contain biases, stereotypes, or offensive content. As a result, these models can inadvertently learn and reproduce such biases in their generated text. Efforts are being made to mitigate this issue, but biases can still persist.

  5. Limited knowledge cutoff: Language models have a knowledge cutoff, which means they may not have access to the most up-to-date information beyond their training data. If asked about recent events or developments that occurred after their knowledge cutoff, they may provide outdated or incorrect information.

  6. Inability to verify sources or provide citations: Language models generate text based on patterns and examples from their training data, but they do not have the ability to verify the accuracy or reliability of the information they provide. They cannot cite sources or provide evidence to support their claims.

  7. Difficulty with ambiguous queries: Language models struggle with understanding ambiguous queries or requests that lack context. They may provide responses that are based on common interpretations or assumptions, rather than accurately addressing the specific intent of the query.

  8. Ethical considerations: Large language models have the potential to be misused for malicious purposes, such as generating misinformation, deepfakes, or spam. Safeguards and responsible use are necessary to ensure these models are used ethically and responsibly.

  9. Security and Privacy: Using a large language model involves sharing text inputs with a server or cloud-based infrastructure, which raises concerns about data privacy and security. Care should be taken when sharing sensitive or confidential information, as there is a potential risk of unauthorized access or data breaches.

License

The model is licensed under the CC BY-NC-ND 4.0 DEED.

Contributing

We welcome contributions to enhance and improve our model. If you have any suggestions or find any issues, please feel free to open an issue or submit a pull request. Also we're open to sponsor for compute power.

Contact Us

For any further questions or assistance, please feel free to contact us using the information provided below.
contact@azale.ai

Cite This Project

@software{Hafidh_Soekma_GotongRoyong_MixtralMoE_7Bx4_v1.0_2023,
  author = {Hafidh Soekma Ardiansyah},
  month = january,
  title = {GotongRoyong: Indonesian Mixture Of Experts Language Model},
  url = {\url{https://huggingface.co/azale-ai/Starstreak-7b-beta}},
  publisher = {HuggingFace},
  journal = {HuggingFace Models},
  version = {1.0},
  year = {2024}
}

Citation

@software{Hafidh_Soekma_Startstreak_7b_alpha_2023,
  author = {Hafidh Soekma Ardiansyah},
  month = october,
  title = {Startstreak: Traditional Indonesian Multilingual Language Model},
  url = {\url{https://huggingface.co/azale-ai/Starstreak-7b-alpha}},
  publisher = {HuggingFace},
  journal = {HuggingFace Models},
  version = {1.0},
  year = {2023}
}
@article{Merak,
  title={Merak-7B: The LLM for Bahasa Indonesia},
  author={Muhammad Ichsan},
  publisher={Hugging Face}
  journal={Hugging Face Repository},
  year={2023}
}
@article{Sidrap,
  title={Sidrap-7B-v2: LLM Model for Bahasa Indonesia Dialog},
  author={Robin Syihab},
  publisher={Hugging Face}
  journal={Hugging Face Repository},
  year={2023}
}
@misc{Obrolin/Kesehatan-7B,
  author = {Arkan Bima},
  title = {Obrolin Kesehatan},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Obrolin/Kesehatan-7B}},
  version = {0.1},
  year = {2024},
}
@misc{2310.06825,
Author = {Albert Q. Jiang and Alexandre Sablayrolles and Arthur Mensch and Chris Bamford and Devendra Singh Chaplot and Diego de las Casas and Florian Bressand and Gianna Lengyel and Guillaume Lample and Lucile Saulnier and Lélio Renard Lavaud and Marie-Anne Lachaux and Pierre Stock and Teven Le Scao and Thibaut Lavril and Thomas Wang and Timothée Lacroix and William El Sayed},
Title = {Mistral 7B},
Year = {2023},
Eprint = {arXiv:2310.06825},
}