--- license: apache-2.0 datasets: - HuggingFaceTB/cosmopedia - databricks/databricks-dolly-15k - Open-Orca/OpenOrca language: - en metrics: - accuracy library_name: transformers pipeline_tag: text-generation --- # WikiChat-v0.2 Training in progress model to have conversations. The GGUFs uploaded are full FP32 precision. Using OpenOrca GPT-4 data + cosmopedia for some extra data + dolly15k for instruct ## Model Details: - 83.59M parameters (83591800) - 8 attention heads - 40 layers - 384 embeddings size - 4096/8192/16384 context (please use 2/4x RoPE scaling, may train a 16k finetuned version later) - Batch size 16 - llama.cpp (train-text-from-scratch) ## Prompt Format (Alpaca): ``` Instruction: {system} Input: {prompt} Response: {response} ``` Please structure your prompts in an instruct format for maximum performance. ## Training Details: - 1x RTX 3070 8GB (Infrencing speed: 80tok/s, full GPU offload) - 1x Ryzen 3 3700x - 96gb RAM - 10 iterations - Loss Target = 2.5 to 3.0 - Approx 480 samples/1M train tokens (>0.0001 epoches) - Training data = Refer to OpenOrca page ## Notes: The model isn't ready yet; this is to test tokenization of OpenOrca and a balance between training speed and model size ## Example output: ``` User: What is the square root of 4? ``` ``` Assistant: The square root of 4 is 2. ```