Equinox Elahin

EquinoxElahin
Β·

AI & ML interests

None yet

Recent Activity

liked a Space about 1 month ago
le-leadboard/OpenLLMFrenchLeaderboard
View all activity

Organizations

None yet

EquinoxElahin's activity

reacted to nicolay-r's post with πŸ‘€ 12 days ago
view post
Post
597
πŸ“’ If you're working in relation extraction / character network domain, then the following post would be relevant.
Excited to share the most recent milestone on releasing the ARElight 0.25.0 🎊

Core library: https://github.com/nicolay-r/ARElight
Server: https://github.com/nicolay-r/ARElight-server

πŸ”Ž What is ARElight? It represents Granular Viewer of Sentiments Between Entities in Massively Large Documents and Collections of Texts.
Shortly speaking, it allows to extract contexts with mentioned object pairs for the related prompting / classification.
In the slides below we illsutrate the ARElight appliation for sentiment classification between object pairs in context.

We exploit DeepPavlov NER modes + GoogleTranslate + BERT-based classifier in the demo. The bash script for launching the quick demo illustrates the application of these components.

The new update provide a series of new features:
βœ… SQlite support for storing all the extracted samples
βœ… Support of the enhanced GUI for content investigation.
βœ… Switch to external no-string projects for NER and Translator

Supplementiary materials:
πŸ“œ Paper: https://link.springer.com/chapter/10.1007/978-3-031-56069-9_23
reacted to davidberenstein1957's post with πŸš€ about 2 months ago
view post
Post
1975
For anyone who struggles with NER or information extraction with LLM.

We showed an efficient workflow for token classification including zero-shot suggestions and model fine-tuning with Argilla, GliNER, the NuMind NuExtract LLM and SpanMarker. @argilla

Video: https://youtu.be/JvLpaYgNd84?feature=shared
Notebooks and slides included to try it yourself πŸ™‚
reacted to AdinaY's post with πŸ‘€ about 2 months ago
reacted to nicolay-r's post with πŸ‘€ 2 months ago
view post
Post
1849
πŸ“’ If you're aimed at processing complex dependencies in spreadsheet data with LLM Chain-of-Thought technique, then this update might be valuable for you πŸ’Ž

The updated πŸ“¦ bulk-chain-0.24.1 which is aimed at iterative processing of CSV/JSONL data with no-string dependencies from third party LLM frameworks is out πŸŽ‰

πŸ“¦: https://pypi.org/project/bulk-chain/0.24.1/
🌟: https://github.com/nicolay-r/bulk-chain
πŸ“˜: https://github.com/nicolay-r/bulk-chain/issues/26

The key feature of bulk-chain is SQLite caching that saves your time ⏰️ and money πŸ’΅ by guarantee no-data-lost, which is important once using the remote LLM providers such as OpenAI, ReplicateIO, OpenRouter, etc.

πŸ”§ This release has the following updates:
βœ… Improved stability for various header conditions and the related support from SQLite
βœ… Manual setup for ID column / assigning the ID
βœ… Make CSV-related setups dynamic, that refers to the related Python πŸ“¦ csv package.

Quick start on GoogleColab:
πŸ“™: https://colab.research.google.com/github/nicolay-r/bulk-chain/blob/master/bulk_chain_tutorial.ipynb

Below is an example of the three simple steps in pictures:
1. ⬇️ Package installation
2. ✍️ Declaring schema
3. πŸš€ Launching inference for your data with Replicate and πŸ€– meta-llama/Llama-3.1-405B
reacted to singhsidhukuldeep's post with πŸ”₯ 3 months ago
view post
Post
1846
Good folks at @nvidia have released exciting new research on normalized Transformers (nGPT) for faster and more efficient language modeling!

Here is what they are proposing:

1. Remove all normalization layers, like RMSNorm or LayerNorm, from the standard Transformer architecture.

2. Normalize all matrices along their embedding dimension after each training step. This includes input and output embeddings, attention matrices (Q, K, V), output projection matrices, and MLP matrices.

3. Replace the standard residual connections with normalized update equations using learnable eigen learning rates for the attention and MLP blocks.

4. Change the softmax scaling factor in the attention mechanism from 1/sqrt of d_k to sqrt of d_k.

5. Implement rescaling and optional normalization of query (q) and key (k) vectors in the attention mechanism using learnable scaling factors.

6. Rescale the intermediate states of the MLP block using learnable scaling factors.

7. Implement rescaling of the output logits using learnable scaling factors.

8. Remove weight decay and learning rate warmup from the optimization process.

9. Initialize the eigen learning rates and scaling factors with appropriate values as specified in the paper.

10. During training, treat all vectors and matrices as residing on a unit hypersphere, interpreting matrix-vector multiplications as cosine similarities.

11. Implement the update equations for the hidden states using the normalized outputs from attention and MLP blocks, controlled by the eigen learning rates.

12. After each forward pass, normalize all parameter matrices to ensure they remain on the unit hypersphere.

13. Use the Adam optimizer without weight decay for training the model.

14. When computing loss, apply the learnable scaling factor to the logits before the softmax operation.

15. During inference, follow the same normalization and scaling procedures as in training.

Excited to see how it scales to larger models and datasets!
reacted to akhaliq's post with ❀️ 10 months ago
view post
Post
ChatMusician

Understanding and Generating Music Intrinsically with LLM

ChatMusician: Understanding and Generating Music Intrinsically with LLM (2402.16153)

While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.
New activity in BAAI/bge-m3 11 months ago
reacted to santiviquez's post with ❀️ 11 months ago
view post
Post
Eigenvalues to the rescue? πŸ›ŸπŸ€”

I found out about this paper thanks to @gsarti 's post from last week; I got curious, so I want to post my take on it. πŸ€—

The paper proposes a new metric called EigenScore to detect LLM hallucinations. πŸ“„

Their idea is that given an input question, they generate K different answers, take their internal embedding states, calculate a covariance matrix with them, and use it to calculate an EigenScore.

We can think of the EigenScore as the mean of the eigenvalues of the covariance matrix of the embedding space of the K-generated answers.

❓But why eigenvalues?

Well, if the K generations have similar semantics, the sentence embeddings will be highly correlated, and most eigenvalues will be close to 0.

On the other hand, if the LLM hallucinates, the K generations will have diverse semantics, and the eigenvalues will be significantly different from 0.

The idea is pretty neat and shows better results when compared to other methods like sequence probabilities, length-normalized entropy, and other uncertainty quantification-based methods.

πŸ’­ What I'm personally missing from the paper is that they don't compare their results with other methods like LLM-Eval and SelfcheckGPT. They do mention that EigenScore is much cheaper to implement than SelfcheckGPT, but that's all on the topic.

Paper: INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection (2402.03744)
reacted to lbourdois's post with ❀️ 11 months ago
view post
Post
The most widely used French NER models on HF ( Jean-Baptiste/camembert-ner and cmarkea/distilcamembert-base-ner) are trained on a single dataset (WikiNER) which on the one hand contains leaks and therefore distorts the true results of these models, and on the other hand overspecializes them in a particular domain (= texts from Wikipedia). They are also only available in a base version (110M parameters).

That's why I've trained new NER models in French both on more data (x3), as well as in base and large versions (336M). They are available in 3 entities (PER, ORG, LOC) or 4 entities (PER, ORG, LOC, MISC):
- CATIE-AQ/NERmembert-base-4entities
- CATIE-AQ/NERmembert-large-4entities
- CATIE-AQ/NERmembert-base-3entities
- CATIE-AQ/NERmembert-large-3entities

Datasets without leaks are also available:
- CATIE-AQ/frenchNER_4entities
- CATIE-AQ/frenchNER_3entities
New activity in EquinoxElahin/q-FrozenLake-v1-4x4-noSlippery over 1 year ago

Upload README.md

#1 opened over 1 year ago by
flogau