keitokei1994
/

Llama-3-Umievo-Shizuko-sqlcoder-2x8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

keitokei1994 commited on Jun 9, 2024

Commit

a67fb89

·

verified ·

1 Parent(s): 4d02f86

Create README.md

Files changed (1) hide show

README.md +55 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+---
+license: llama3
+language:
+- ja
+- en
+tags:
+- code
+- sql
+---
+### モデルの説明(English explanation is below.)
+このモデルは、MergeKitツールを使用して作成されたMixture of Experts (MoE) 言語モデルです。
+gguf版(今後拡充予定)は [こちら](https://huggingface.co/keitokei1994/Llama-3-Umievo-Shizuko-sqlcoder-2x8B-gguf) 。
+umiyukiさんが公開している[Llama-3-Umievo-itr014-Shizuko-8b](https://huggingface.co/umiyuki/Llama-3-Umievo-itr014-Shizuko-8b) に、SQLデータセットでファインチューニングされた[rdefog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b)を合わせることで、日本語能力とSQL生成能力を両立させようとしたMoEモデルです。
+### モデルの詳細
+- **モデル名**: Llama-3-Umievo-Shizuko-sqlcoder-2x8B
+- **モデルアーキテクチャ**: Mixture of Experts (MoE)
+- **ベースモデル**: rdefog/llama-3-sqlcoder-8b, defog/llama-3-sqlcoder-8b
+- **マージツール**: MergeKit
+#### 要求スペック
+Q4_K_M量子化モデルであれば、RTX3060 12GBでフルロード可能です。
+筆者はWSL2やGoogle Colaboratotry Proでの作成後、Llama.cppとLMstudioにて動作確認を行なっています。
+- CPU: Ryzen 5 3600
+- GPU: GeForce RTX 3060 12GB
+- RAM: DDR4-3200 96GB
+- OS: Windows 10
+---
+### Model Description
+This model is a Mixture of Experts (MoE) language model created using the MergeKit tool.
+The gguf version (planned to be expanded in the future) can be found [here](https://huggingface.co/keitokei1994/Llama-3-Umievo-Shizuko-sqlcoder-2x8B-gguf).
+This MoE model aims to achieve both Japanese language ability and SQL generation capability by combining [Llama-3-Umievo-itr014-Shizuko-8b](https://huggingface.co/umiyuki/Llama-3-Umievo-itr014-Shizuko-8b), released by umiyuki, with [rdefog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b), which has been fine-tuned on an SQL dataset.
+### Model Details
+- **Model Name**: Llama-3-Umievo-Shizuko-sqlcoder-2x8B
+- **Model Architecture**: Mixture of Experts (MoE)
+- **Base Models**: rdefog/llama-3-sqlcoder-8b, defog/llama-3-sqlcoder-8b
+- **Merge Tool**: MergeKit
+#### Required Specifications
+If using the Q4_K_M quantized model, it can be fully loaded on an RTX 3060 12GB.
+The author has created the model using WSL2 and Google Colaboratory Pro, and has tested it using Llama.cpp and LMstudio.
+- CPU: Ryzen 5 3600
+- GPU: GeForce RTX 3060 12GB
+- RAM: DDR4-3200 96GB
+- OS: Windows 10