New Config file
hiya
NICE WORK!
Can i ask a question . ?
How do you add the talk head ? to the config? changing the architecture from the mistral to the new Quiet ? as it has the remote-code also which i have seen in some models ? is the remote code the new head ? and how is it applied to the model ? as i also had a model which i merged from a YARN model which extended the context to 128k with rope , it also had these remote code files ... are these new heads , performing new extensions as the layers are the same 32? (7b)
Also nice work as i was also wondering ow to add the Thoughts ? was it just a finetuning dataset ?
I'm currently working on fixing the modeling_code provided and adding easy inference examples. I will update on twitter when this is ready for inferencing/training.
There is an inference.py file that takes advantage of all that now. I need to fine-tune it - as it was pre-trained on a pure math dataset, and it wants to make everything a math equation.
yes i noticed you updated the train method to a normality and smaller dataset as it was causing issues, and created the quiet version architecture etc... very nice ... i think i saw te rope inside also. question again lol.
the head that was created in the github repo . is it inside modelling ? as will this new train method populate the new tensors created by the new head, which you filled when you did your first train... as i had problems using his method but eventually got the model into memory but was unable to save it.... as it said there was no values in the tensors yet , hence requiring a first train .... hence his customized training function was tokenizing the data and also pushing it through the new tensors .... i think if i merged a model maybe with ties etc or slerp .. using yours as a base baised model the new model would get the tensors from your model in the merge and still be able to merge with another llm ... ? ...
But its better to weave the new architecture in ... ie the installing the head with a first train ... (his data set was too large as it kept loading forever in colab ... and the storage filled up and crashed ...? (last stages!...) hence perhaps if there was a working note book to train a model ....
but very good work .... this project of adding the head ...is a important step ... as later by changing the tokenizer we can change ADD another form of input to the model .... (using existing tokenizers ie clip) ... enabling for adding input heads then later we can do output heads ... Maybe?
Working on a new one from scratch that will be far more feasible for downstream tasks. But slerp might work.
yes hence the lora is a repeatable approach ... easy way ... then anybody can duplicate the repo and change the adapterconfig from the mistral base instruct to thier own clone mistral ,,,, but also the training script and prompt template : as the new prompt template should handle the thoughts ? or will they exist in a separate response head to be enabled or disables in the kwaks ? i think that's how it was .... really its like worzel gumidge (the scarecrow (English tv) as he had heads for every thing ie he changed his head for thinking as is basic head was dumbo! - now i get it he must have been a robot !! (1960's) )
safetensors/convert convert back to safe tensors
i have spent a whole day in the cloud trying to make this work my friend.
THe start up script ..... has so many mini errors... ?
at one point i actually had the model attactched to the llm but i could not save it because the tensors was empty!
i tried after to do the training thing only to find that it used two llms and loaded them individually ? running the gpu ram out!..
ok i tried again.. and maybe i was somewhere close but the trainer would not work ? So it would not let me save the lora or the merged version of the model !....
SO i get it now a little bit...
the configuration files py files need to be with the model being loaded and remote code true!
but it also need to stay as the mistral Standard model as it will be loaded by the code as the customized mistral (quiet model(you should not change the signature of the functions they should believe they are the mistral model ! like the installation of the YARN!) as this will enable the inheritance to occur correctly and allow for the model be savePretrained! which is in the mistral_modeling.py! as it inherits from the parent intefrface!..
once it is loaded training is no problem ...
for each dataset example you need to generate a thought ... this can be pushed into the prompt completion ... instruct,input,reponse,:Thoughts: (the generated thought ie the reflection or the analysis of the current task? )
the thought generation can be taylored for roleplay (emotive) or for reasoning (Step byu step thoughts).... as these thoughts will also help to generate future thoughts ....
SO te problem s the model init script and the mstral_modeling.py needs to be alligned with the original .... save pretrained and load auto model for causallm etc
Persoanlly i wish it was VB or it would have been complelted as i not a python programmer hence being stuck !
This very much is not ready. I am updating it constantly, many of which don't work.
I will have a new completed model by the end of this week.
I updated the inference code so you can at least try out proper inference.
I developed LeroyDyer/Mixtral_AI_CyberBrain_3_0, embedding tensors within it during training, resulting in responsive thoughts. However, an issue arises with its dependency on wandb within mistral_modelling, obligating its use during training, as stipulated in the original file. The tokenizer was enhanced to encompass thought tokens, inevitably mandating uniqueness in the model. However, in mrgekit, it was observed that the tensors no longer aligned with the standard set. Loading the model with part of the script revealed discrepancies in the arguments list, particularly in Colab. Although the headers were present upon initial upload, they vanished post-training, despite merging in unsloth, which is suboptimal due to its reluctance toward custom configurations. While my model functions similarly to yours, it encountered obstacles in merging and saving (lora).
Moreover, I encountered challenges in inferring the model in unsloth due to issues in loading the config from his two mistral python files. Consequently, I had to patch the transformers base to ensure functionality. Similarly, I integrated your files into the source (models/mistral) before compilation, successfully loading them as well.
Regrettably, the training failed to populate the thinking heads even once. Although the model continuously outputted unknowns, I presumed it merely required further training. Incorporating these additional nodes into the network proves quite intricate. I amalgamated them into my latest iteration, LeroyDyer/Mixtral_AI_Cyber_MegaMind_2_0 24b, hoping to incorporate thoughts. However, it fails to load on my machine, much to my dismay. but it loads in the cloud!
Is it possible to use this with oobabooga ?
Traceback (most recent call last):
File "/home/weiss/text-generation-webui-dev/modules/callbacks.py", line 61, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/weiss/text-generation-webui-dev/modules/text_generation.py", line 389, in generate_with_callback
shared.model.generate(**kwargs)
TypeError: QuietForCausalLM.generate() missing 1 required positional argument: 'input_ids'
Output generated in 0.42 seconds (0.00 tokens/s, 0 tokens, context 16, seed 1279181695)
For now i see this error.
Soon - but it will be a new model within the Cognitive Computations repository. This is very much a "watch me build in public" repo atm. Will likely have the finished and bug-fixed base model released sometime tomorrow/sunday morning
i noticed if you load without trust remote (to invoke the files ) you should be able to change the model (in model type to mistral) the heads will still load .... as they have settings in the config , this is how you can adjust the heads , ... they still respond. only to mask the thought tokens ! ... which you can handle in script... i have now figured out how to add special tokens to the tokenizers
# Update Tokenizer
model.resize_token_embeddings(len(tokenizer))
model.tokenizer = tokenizer
ie the tokenizer now matches the llm or input embeddings for the model . as in general most tokenizers load as 32000 for mistral ... but for hybrid models often there are actually special tokens to add...
If you notice in the output special tokens appearing you can add them and assign them to be removed in the tokenizer ... ie they are silent tokens...
I also noticed that you can take your LLM for a ride !
there are some model types which you might like to switch to. so you can load them with your weights ... ie for this model you can load your own model with the remote code .... then save it to pretrained ... then upload to hub ! .... it will be adjusted ... plus the config will also be adjusted ... ie accomodating the extra functionality ... so when you go to the gym and train the model .... these items will be activated ! .... as long as you create the correct datasets ... iie you might want to train pictures ..... so create a image to decoder model and use your llm as the decoder and the image encoder as the encoder ! , so once you have made the encoder/decoder = model now you can save the model to pretrained and see the new functionality ... be careful as it often overrides the model to be the new model ... but you can fix it by returning it to a mistral ... the new config will still be usable if you instantiate a imageTotext model !...
LeroyDyer/Mixtral_AI_Cyber_Q <<< This guy can be image / text to text (encoder/decoder), Mistral / Quiet.... (if i load the remote files) ....
Did you add the YARN additions to the model ? to extend the context(as it will also need to be in the actual neural network itself also) .... (hence mine still has a few errors (i think the guy was refferenceing some files from the tranformers gitbase hence errors) .... im not great in python (but its becoming familir)!!