Stefano Fiorucci

anakin87

AI & ML interests

Contributing to Haystack, the LLM Framework ๐Ÿ—๏ธ. NLP / LLMs.

Organizations

Posts 4

view post
Post
357
๐ŸŒŒ Creating adventures with local LLMs

What if ๐Ÿค”... Homer Simpson met Spider-Man and they went on a quest for donuts? ๐Ÿฉ
Or if Fred Astaire and Corporal Hicks teamed up to fight xenomorphs? ๐Ÿ‘พ

In the words of Karpathy, LLMs are dream machines...
they seem specially made to simulate these wild scenarios!

๐„๐ฑ๐ฉ๐ž๐ซ๐ข๐ฆ๐ž๐ง๐ญ๐ข๐ง๐  ๐ฐ๐ข๐ญ๐ก ๐ญ๐ก๐ข๐ฌ ๐ข๐๐ž๐š ๐Ÿ‘‡
Nous Research / @teknium recently released NousResearch/CharacterCodex:
a massive dataset with information on 16k characters, both fictional and real.
I couldn't wait to play it...

After a few attempts, I found that combining the information in this dataset with a good model (like meta-llama/Meta-Llama-3-8B-Instruct) opens the doors to a myriad of chat adventures.

๐Ÿ› ๏ธ Stack:
๐Ÿ”นHaystack for orchestration ๐Ÿ—๏ธ
๐Ÿ”นllamafile ๐Ÿฆ™๐Ÿ—‚๏ธ to run our model locally.

๐Ÿ““ Check out the notebook: https://t.ly/y6jrZ
(includes a bonus ๐Ÿ•ต๏ธ Mystery Character Quiz)
view post
Post
853
๐Ÿงช RAG Evaluation with ๐Ÿ”ฅ Prometheus 2 + Haystack

๐Ÿ“ Blog post: https://haystack.deepset.ai/blog/rag-evaluation-with-prometheus-2
๐Ÿ““ Notebook: https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/prometheus2_evaluation.ipynb

โ”€โ”€โ”€ โ‹†โ‹…โ˜†โ‹…โ‹† โ”€โ”€โ”€

When evaluating LLMs' responses, ๐ฉ๐ซ๐จ๐ฉ๐ซ๐ข๐ž๐ญ๐š๐ซ๐ฒ ๐ฆ๐จ๐๐ž๐ฅ๐ฌ like GPT-4 are commonly used due to their strong performance.
However, relying on closed models presents challenges related to data privacy ๐Ÿ”’, transparency, controllability, and cost ๐Ÿ’ธ.

On the other hand, ๐จ๐ฉ๐ž๐ง ๐ฆ๐จ๐๐ž๐ฅ๐ฌ typically do not correlate well with human judgments and lack flexibility.


๐Ÿ”ฅ Prometheus 2 is a new family of open-source models designed to address these gaps:
๐Ÿ”น two variants: prometheus-eval/prometheus-7b-v2.0; prometheus-eval/prometheus-8x7b-v2.0
๐Ÿ”น trained on open-source data
๐Ÿ”น high correlation with human evaluations and proprietary models
๐Ÿ”น highly flexible: capable of performing direct assessments and pairwise rankings, and allowing the definition of custom evaluation criteria.

See my experiments with RAG evaluation in the links above.