Link to dataset?
Is this a non-buggy, no stopping issue, uncensored vicuna dataset?
https://huggingface.co/datasets/gozfarb/ShareGPT_Vicuna_unfiltered he has a stopping issue?
https://huggingface.co/datasets/gozfarb/ShareGPT_Vicuna_unfiltered he has a stopping issue?
thats what I was told :/ I havent tried
the w&b desc says you using ShareGPT_2023.05.04v0_Wasteland_Edition.json
Which doesn't include this update referenced in dataset readme:
2023.05.08v0 - Removed ~7 instances of the default stopping token from a single conversation in Wasteland Edition. It did not appear in the NoUnicode version.
I doubt the 7 instances would be a cause of early stopping.
Also the stopping token is from the vicuna training code.. as you are using ooba maybe not an issue but not sure how that it is setup
Yeah I was already 7 hours in when that update released. I didn't feel like restarting training just for 7 references out of a few hundred megabytes of training data.
Let me know if you guys get stopping issues ^_^
Love this LoRA so far. Absolutely mass respect for the continual training and updates. Been following this for a few weeks and updating my LoRA collection as new releases come out. 40680 latest as of this writing, just downloaded. Previous LoRA proved highly effective. Can't wait to see the endgame release.
Successor to Alpacino mixes might have a Vicuna variant due to this project so many thanks.
Gozfarb has left huggingface, and his datasets are gone too :( @Neko-Institute-of-Science do you happen to have one you can re-upload?