Equinox Elahin

EquinoxElahin

AI & ML interests

None yet

Recent Activity

reacted to nicolay-r's post with 👀 12 days ago

📢 If you're working in relation extraction / character network domain, then the following post would be relevant. Excited to share the most recent milestone on releasing the ARElight 0.25.0 🎊 Core library: https://github.com/nicolay-r/ARElight Server: https://github.com/nicolay-r/ARElight-server 🔎 What is ARElight? It represents Granular Viewer of Sentiments Between Entities in Massively Large Documents and Collections of Texts. Shortly speaking, it allows to extract contexts with mentioned object pairs for the related prompting / classification. In the slides below we illsutrate the ARElight appliation for sentiment classification between object pairs in context. We exploit DeepPavlov NER modes + GoogleTranslate + BERT-based classifier in the demo. The bash script for launching the quick demo illustrates the application of these components. The new update provide a series of new features: ✅ SQlite support for storing all the extracted samples ✅ Support of the enhanced GUI for content investigation. ✅ Switch to external no-string projects for NER and Translator Supplementiary materials: 📜 Paper: https://link.springer.com/chapter/10.1007/978-3-031-56069-9_23

liked a Space about 1 month ago

le-leadboard/OpenLLMFrenchLeaderboard

reacted to davidberenstein1957's post with 🚀 about 2 months ago

For anyone who struggles with NER or information extraction with LLM. We showed an efficient workflow for token classification including zero-shot suggestions and model fine-tuning with Argilla, GliNER, the NuMind NuExtract LLM and SpanMarker. @argilla Video: https://youtu.be/JvLpaYgNd84?feature=shared Notebooks and slides included to try it yourself 🙂

View all activity

Organizations

None yet

EquinoxElahin's activity

reacted to nicolay-r's post with 👀 12 days ago

Post

597

📢 If you're working in relation extraction / character network domain, then the following post would be relevant.
Excited to share the most recent milestone on releasing the ARElight 0.25.0 🎊

Core library: https://github.com/nicolay-r/ARElight
Server: https://github.com/nicolay-r/ARElight-server

🔎 What is ARElight? It represents Granular Viewer of Sentiments Between Entities in Massively Large Documents and Collections of Texts.
Shortly speaking, it allows to extract contexts with mentioned object pairs for the related prompting / classification.
In the slides below we illsutrate the ARElight appliation for sentiment classification between object pairs in context.

We exploit DeepPavlov NER modes + GoogleTranslate + BERT-based classifier in the demo. The bash script for launching the quick demo illustrates the application of these components.

The new update provide a series of new features:
✅ SQlite support for storing all the extracted samples
✅ Support of the enhanced GUI for content investigation.
✅ Switch to external no-string projects for NER and Translator

Supplementiary materials:
📜 Paper: https://link.springer.com/chapter/10.1007/978-3-031-56069-9_23

liked a Space about 1 month ago

Running on CPU Upgrade

🥇

OpenLLM French leaderboard 🇫🇷

reacted to davidberenstein1957's post with 🚀 about 2 months ago

Post

1975

For anyone who struggles with NER or information extraction with LLM.

We showed an efficient workflow for token classification including zero-shot suggestions and model fine-tuning with Argilla, GliNER, the NuMind NuExtract LLM and SpanMarker. @argilla

Video: https://youtu.be/JvLpaYgNd84?feature=shared
Notebooks and slides included to try it yourself 🙂

reacted to AdinaY's post with 👀 about 2 months ago

Post

1499

LLaMA Mesh 🔥 Unifying 3D Mesh Generation with Language Models

Model: Zhengyi/LLaMA-Mesh
Demo: Zhengyi/LLaMA-Mesh
Paper: LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models (2411.09595)

✨ Unified 3D generation & text understanding.
✨ 3D meshes as plain text for seamless LLM integration.
✨ High-quality 3D outputs rivaling specialized models.

reacted to nicolay-r's post with 👀 2 months ago

Post

1849

📢 If you're aimed at processing complex dependencies in spreadsheet data with LLM Chain-of-Thought technique, then this update might be valuable for you 💎

The updated 📦 bulk-chain-0.24.1 which is aimed at iterative processing of CSV/JSONL data with no-string dependencies from third party LLM frameworks is out 🎉

📦: https://pypi.org/project/bulk-chain/0.24.1/
🌟: https://github.com/nicolay-r/bulk-chain
📘: https://github.com/nicolay-r/bulk-chain/issues/26

The key feature of bulk-chain is SQLite caching that saves your time ⏰️ and money 💵 by guarantee no-data-lost, which is important once using the remote LLM providers such as OpenAI, ReplicateIO, OpenRouter, etc.

🔧 This release has the following updates:
✅ Improved stability for various header conditions and the related support from SQLite
✅ Manual setup for ID column / assigning the ID
✅ Make CSV-related setups dynamic, that refers to the related Python 📦 csv package.

Quick start on GoogleColab:
📙: https://colab.research.google.com/github/nicolay-r/bulk-chain/blob/master/bulk_chain_tutorial.ipynb

Below is an example of the three simple steps in pictures:
1. ⬇️ Package installation
2. ✍️ Declaring schema
3. 🚀 Launching inference for your data with Replicate and 🤖 meta-llama/Llama-3.1-405B

reacted to singhsidhukuldeep's post with 🔥 3 months ago

Post

1846

Good folks at @nvidia have released exciting new research on normalized Transformers (nGPT) for faster and more efficient language modeling!

Here is what they are proposing:

1. Remove all normalization layers, like RMSNorm or LayerNorm, from the standard Transformer architecture.

2. Normalize all matrices along their embedding dimension after each training step. This includes input and output embeddings, attention matrices (Q, K, V), output projection matrices, and MLP matrices.

3. Replace the standard residual connections with normalized update equations using learnable eigen learning rates for the attention and MLP blocks.

4. Change the softmax scaling factor in the attention mechanism from 1/sqrt of d_k to sqrt of d_k.

5. Implement rescaling and optional normalization of query (q) and key (k) vectors in the attention mechanism using learnable scaling factors.

6. Rescale the intermediate states of the MLP block using learnable scaling factors.

7. Implement rescaling of the output logits using learnable scaling factors.

8. Remove weight decay and learning rate warmup from the optimization process.

9. Initialize the eigen learning rates and scaling factors with appropriate values as specified in the paper.

10. During training, treat all vectors and matrices as residing on a unit hypersphere, interpreting matrix-vector multiplications as cosine similarities.

11. Implement the update equations for the hidden states using the normalized outputs from attention and MLP blocks, controlled by the eigen learning rates.

12. After each forward pass, normalize all parameter matrices to ensure they remain on the unit hypersphere.

13. Use the Adam optimizer without weight decay for training the model.

14. When computing loss, apply the learnable scaling factor to the logits before the softmax operation.

15. During inference, follow the same normalization and scaling procedures as in training.

Excited to see how it scales to larger models and datasets!

liked a model 4 months ago

nomic-ai/nomic-embed-text-v1.5

updated a collection 6 months ago

Hallucination_LLM

Collection

2 items • Updated Jul 10, 2024

liked a Space 9 months ago

Running

🗺️🏕️

AI Travel Planner

Plan your next vacation with the help of an AI!

reacted to akhaliq's post with ❤️ 10 months ago

Post

ChatMusician

Understanding and Generating Music Intrinsically with LLM

ChatMusician: Understanding and Generating Music Intrinsically with LLM (2402.16153)

While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.

New activity in BAAI/bge-m3 11 months ago

How do you suggest using Colbert vectors ?

#16 opened 11 months ago by

EquinoxElahin

reacted to santiviquez's post with ❤️ 11 months ago

Post

Eigenvalues to the rescue? 🛟🤔

I found out about this paper thanks to @gsarti 's post from last week; I got curious, so I want to post my take on it. 🤗

The paper proposes a new metric called EigenScore to detect LLM hallucinations. 📄

Their idea is that given an input question, they generate K different answers, take their internal embedding states, calculate a covariance matrix with them, and use it to calculate an EigenScore.

We can think of the EigenScore as the mean of the eigenvalues of the covariance matrix of the embedding space of the K-generated answers.

❓But why eigenvalues?

Well, if the K generations have similar semantics, the sentence embeddings will be highly correlated, and most eigenvalues will be close to 0.

On the other hand, if the LLM hallucinates, the K generations will have diverse semantics, and the eigenvalues will be significantly different from 0.

The idea is pretty neat and shows better results when compared to other methods like sequence probabilities, length-normalized entropy, and other uncertainty quantification-based methods.

💭 What I'm personally missing from the paper is that they don't compare their results with other methods like LLM-Eval and SelfcheckGPT. They do mention that EigenScore is much cheaper to implement than SelfcheckGPT, but that's all on the topic.

Paper: INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection (2402.03744)

updated a collection 11 months ago

Hallucination_LLM

Collection

2 items • Updated Jul 10, 2024

reacted to lbourdois's post with ❤️ 11 months ago

Post

The most widely used French NER models on HF ( Jean-Baptiste/camembert-ner and cmarkea/distilcamembert-base-ner) are trained on a single dataset (WikiNER) which on the one hand contains leaks and therefore distorts the true results of these models, and on the other hand overspecializes them in a particular domain (= texts from Wikipedia). They are also only available in a base version (110M parameters).

That's why I've trained new NER models in French both on more data (x3), as well as in base and large versions (336M). They are available in 3 entities (PER, ORG, LOC) or 4 entities (PER, ORG, LOC, MISC):
- CATIE-AQ/NERmembert-base-4entities
- CATIE-AQ/NERmembert-large-4entities
- CATIE-AQ/NERmembert-base-3entities
- CATIE-AQ/NERmembert-large-3entities

Datasets without leaks are also available:
- CATIE-AQ/frenchNER_4entities
- CATIE-AQ/frenchNER_3entities

New activity in NousResearch/Nous-Hermes-llama-2-7b over 1 year ago

Why ### Instruction instead of special tokens?

#5 opened over 1 year ago by

EquinoxElahin

liked a model over 1 year ago

dangvantuan/sentence-camembert-large

New activity in EquinoxElahin/q-FrozenLake-v1-4x4-noSlippery over 1 year ago

Upload README.md

#1 opened over 1 year ago by

flogau

updated a model over 1 year ago

EquinoxElahin/q-FrozenLake-v1-4x4-noSlippery

Updated Jul 13, 2023

liked a model over 1 year ago