Lajonbot (Lajonbot)

s3nh

posted an update 3 days ago

Post

1437

Welcome back,

Small Language Models Enthusiasts and GPU Poor oss enjoyers lets connect.
Just created an organization which main target is to have fun with smaller models tuneable on consumer range GPUs, feel free to join and lets have some fun, much love ;3

https://huggingface.co/SmolTuners

3 replies

·

s3nh

posted an update 11 months ago

Post

GPU Poor POV: Burnout

Sometimes we do not have an energy to post about AI and new methods.
And thats totally ok, I guess.
Remember to sleep well and drink a lot of water. Have a great day :D <3

2 replies

·

s3nh

posted an update 11 months ago

Post

GPU Poor POV: Quantization

Today I want to share with you my notebook plug and play code
which help me a lot through my quantization journey.
Hope youll find it interesting it could be a good starter point to
gguf some of your awesome models :)

Have a great day <3

https://s3nh.bearblog.dev/gpu-poor-pov-gguf-snippet/

6 replies

·

s3nh

posted an update 11 months ago

Post

GPU Poor POV: Willingness of Customization

I love to use libraries in which you can customize a lot of things. Chromadb is my choice of db if it comes to store embeddings. Te cool feature is that you can define your own embeddings function which can be called on every chromadb collection initialisation or creation. It is useful because sometimes we want to use different prompts, different models, and it can be easily written as inheritence from EmbeddingFunction class.

Edit:

My CustomEmbeddingFunction can be found here:
https://gist.github.com/s3nh/cfbbf43f5e9e3cfe8c3e4e2f0d550b80

and you can use it by initializing or calling the chroma collection.

import chromadb 
from your_custom_fn import CustomEmbeddingFunction
class ChromaStorage:
    def __init__(self, config):
        self.config = config
        self.client = self.init_client()
        self.embedding_function = CustomEmbeddingFunction()

    def check_config(self):
        assert os.path.exists(self.config.path), ValueError('Provided path does not exists!!')

    def init_client(self):
        return chromadb.PersistentClient(path = self.config.path,)

    def init_collection(self, name: str): 
        return self.client.get_or_create_collection(name = name, embedding_function = self.embedding_function)

3 replies

·

s3nh

posted an update 11 months ago

Post

GPU Poor POV: Dont be Afraid :D

Sometimes we dont want to do something because of low self esteem,
I ofter hear 'its to hard for me','i am not an expert','i do not know how to do it', etc. These words are never the truth, we should not be afraid and try to build something because there is no additive value without a failure.

Same things comes in LLMs, there is a lot of fancy words happening, but whats is more important is that there are also people who are constantly building so other can build. Diving into finetuning LLMs is incredibly simple if we assume using axolotl library and pretrains stored on huggingface.

All we need is an idea, our GPU Poor desktop or colab notebooks and these steps:

git clone https://github.com/OpenAccess-AI-Collective/axolotl
cd axolotl

pip3 install packaging
pip3 install -e '.[flash-attn,deepspeed]'

After installation process we can go to examples, and modify configs to our own needs.
Lets jump into

axolotl\examples\llama-2\qlora.yml

and change

base_model: NousResearch/Llama-2-7b-hf

to

base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

choose dataset from huge amounts of dataset that are possible to use from hf.co/datasets and tweak additional params like batch_size, number of epochs, how often do we want to save our model and many more (which I wont focus on rn).
Then,

accelerate launch -m axolotl.cli.train examples/llama-2/qlora.yml

Will allow to start the finetuning process on structure defined strictly by you. After finetuning, model will be saved in path provided in config, and you can check out if it performs better than the base one. Or even you can put it on llm Leaderboard to check if we do not have new SOTA :)
Have fun and have a great day <3

4 replies

·

s3nh

posted an update 11 months ago

Post

GPU Poor POV: My storytelling choices of the week

Its end of the week, I decided to summarize my observations in community based LLMs and mention few models in specific area which are very interesting and has capability to create some insightful stories despite of its relatively lightweight form.

I personally did not use LLMs in my daily routine to tasks like function calling, parsing or assist in code writing. What I tried to use for is storytelling, because it always amaze me how different these models comes to different preferred tasks.

How this model are able to generalize the stories and sometimes, how high level of creativity they carry.

BlueNipples/DaringLotus-v2-10.7b its main target is to generate prose. Quoting the author 'It shares it's good prose, and relatively decent coherency, being a little bit more on the side of prose, and a little bit less on the side of coherency. I like this model for generating great prose if I feel like regening a bit. '

https://huggingface.co/NeuralNovel/Aeryth-7B-v0.1
great work by @NeuralNovel , I really like how flexible this model is, there is no strict focus on a certain role, so definitely worth a try. Would love to hear more about dataset on which was trained, afaik is private rn. best suited for Science Fiction, History & Romance genres due to the training data used.

And the last one for today is FPHam/Sydney_Pirate_Mistral_7b @FPHam work always amaze me how the models are able to stick to provided role. awesome work as always, Ill for sure use this model to generate some interesting stories.

I know that hype train is going fast but as I observe people here on huggingface are creating really creative models which are for sure worth to try. Have a great day <3

7 replies

·

s3nh

posted an update 11 months ago

Post

GPU Poor POV: Low Hanging Fruits

Sometimes we had to work with different language than English (what a surprise!) and it can be problematic, because as you may know many algorithms are mainly developed in English.
I was involved in building RAG in Polish language. At first, we need an proper embeddings for Polish language to feed them into lightweight LLM.
Looking through possible solution I become aware that existing/possible models are not accurate enough, and worked much worse than its 'english equivalent'.
First thing that comes to mind is:

Lets become a mad scientist, download all possible data and train model for months to get the proper one.

But there are few cons of this.
- Its computionally heavy
- You are not full time researcher
- you have potential clients who want to use your solution, and they really happy to use it (in optimistic mood).
Here comes the low hanging fruits.
We developed a easier, workable solution. Instead of training new SOTA, we can use translation module like this one:

Helsinki-NLP/opus-mt-pl-en
translate your knowledge base to english, and use proper embedding model accurately.
I converted existing model using ctranslate2,

ct2-transformers-converter --model Helsinki-NLP/opus-mt-pl-en --output_dir opus-mt-pl-en

so making an inference is not heavy (we observe 5 times speedup in compare to original version).

And by indexing knowledge base, we can return answer to LLM in any language. (Indexes of context found in english language are equal to indexes in native language knowledge base).

Of course there are some tweaks required, we have to validate accuracy of the translation.

It was nice episode, we have our work done, there are people who can use it, so additive value exists.
Have a great day and I wish you more effective deploys! <3

4 replies

·

s3nh

posted an update 11 months ago

Post

GPU Poor POV: Building a RAG which solves specific task.

Everyone loves benchmarks.
They are great because we have standarized approach, competitive feeling. But if you are in specific area, trying to implement some LLM/RAG use case, these benchmarks cannot exactly reflect on the data that you have to deal with.

I built RAG system on bunch of niche procedures/regulation etc, which can be finally deployed as an virtual assistant to minimize the effort in searching through a lot of documentations manually.

Tested a lot of different methods/models/pretrains, finetunes and whats interesting is that, final solution which was scored by human feedback is based on relatively low param models, with multitask ability
Something like:

BAAI/llm-embedder

LLMs help summarize the chunk version of knowledge base found, does not require the model with high number of params, because tradeoff between inference time and accuracy has to be made. Some lightweight models have ability to perform certain task based on instructions, so eg. qwen 7b or mistral 7b (not moe one), realized a task really nicely. And what is more important is that in overall we are able to deploy a RAG system in smaller tasks, in specific area. They can be used by people who need it, give additive value and positive feedback, which IMO is what is all of the building process about.

Have a great day and think about problem which your models have to solve <3