Attempted Replication

#2
by ibivibiv - opened

I ran this exact mergekit recipe and tried loading the results into TextGen. Nothing but gibberish is produced. Was there anything other than just the mergekit yaml command in use here? Is the recipe missing something? Thanks in advance.

For completeness I did the following:

  • Used the Yaml you have in this repo
  • did a straight "mergekit-yaml ./mergekit-config.yml ./output/"

The model was merge correctly but just never seems to generate anything but odd characters, almost as if the tokenizer is wrong? Odd. I don't have the same problem with Venus directly, or Goliath or Tess.

Are you running it unquantized using transformers? The mergekit config I supplied definitely works since it's what I used. I used the jupyter notebook for mergekit and not the command line though, so I don't know if your command line setup is correct.

I haven't set any quantizing and yes, transformers. I might have to try to use the Jupyter notebook to see what is different there. I am very sure what you have worked :D I am assuming you mean this notebook:

https://github.com/cg123/mergekit/blob/main/notebook.ipynb

I am betting something is wrong with the tokenizer given that behavior.

Just closing this out, it was absolutely something in the environment I had set up. I wiped it all clean and set up mergekit again. Using the linked notebook above I fully replicated the build of the model and it is functioning. If anyone else gets gibberish generated from the command line? I would advise checking that your mergekit setup and/or validating tokenizer copying. Thanks nsfwthrowitaway69, for sharing your work and the feedback.

ibivibiv changed discussion status to closed

Sign up or log in to comment