phoebeklett commited on
Commit
fb2e31e
·
verified ·
1 Parent(s): ad894e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -17
README.md CHANGED
@@ -1,30 +1,66 @@
1
- ---
2
- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
- # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
- {}
5
- ---
6
 
7
- # Model Card for Extended-Mind-MPT-7b-Chat
 
8
 
9
- <!-- Provide a quick summary of what the model is/does. -->
 
 
10
 
11
- Extended Mind MPT-7b-chat, as described in [Supersizing Transformers](https://blog.normalcomputing.ai/posts/2023-09-12-supersizing-transformers/supersizing-transformers.html).
12
 
13
- ### Model Description
14
 
15
- <!-- Provide a longer summary of what this model is. -->
16
 
17
- This model implements active externalism for MPT's 7b chat model. The model weights have not been edited. Original architecture and code by Mosaic ML.
18
 
19
- For more details on active externalism, check out our [blog](https://blog.normalcomputing.ai/posts/2023-09-12-supersizing-transformers/supersizing-transformers.html)!
 
 
 
20
 
 
21
 
22
- - **Developed by:** [Normal Computing](https://huggingface.co/normalcomputing), Adapted from [Mosacic ML](https://huggingface.co/mosaicml)
23
- - **License:** Apache 2.0
24
 
 
25
 
26
- ## Limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
29
 
30
- This model is part of ongoing research at Normal Computing.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Model Card for Extended-Mind-MPT-7b-chat
 
 
 
 
2
 
3
+ * Github: https://github.com/normal-computing/extended-mind-transformers/
4
+ * ArXiv: https://arxiv.org/abs/2406.02332
5
 
6
+ Original architecture and code by Mosaic ML.
7
+ * Developed by: Normal Computing, Adapted from Mosacic ML
8
+ * License: Apache 2.0
9
 
10
+ This model is part of the Extended Mind Transformers collection, and implements the methods described in our [paper](https://arxiv.org/abs/2406.02332). This model retrieves and attends to an external cache of key-value pairs (or memories), and has not been finetuned (The original model weights have not been edited).
11
 
12
+ ## Model Usage
13
 
14
+ ### External Memory
15
 
16
+ Passing external memories to the model is easy. Simply pass the token ids to the model during instantiation, as the following examples illustrate. Generating and caching the memories is handled internally, during the first `model.generate()` call. You can update the memories using the following sequence of commands:
17
 
18
+ ```python
19
+ model.clear_memories()
20
+ model.memory_ids = list_of_new_token_ids
21
+ ```
22
 
23
+ Set `trust_remote_code=True` to avoid warnings. Pass the memories to the model as a list of token ids.
24
 
25
+ ```python
26
+ from transformers import AutoModelForCausalLM, AutoTokenizer
27
 
28
+ ag_wiki_entry = """Alexander Grothendieck (/ˈɡroʊtəndiːk/; German pronunciation: [ˌalɛˈksandɐ ˈɡʁoːtn̩ˌdiːk] (listen); French: [ɡʁɔtɛndik]; 28 March 1928 – 13 November 2014) was a stateless (and then, since 1971, French) mathematician who became the leading figure in the creation of modern algebraic geometry.[7][8] His research extended the scope of the field and added elements of commutative algebra, homological algebra, sheaf theory, and category theory to its foundations, while his so-called "relative" perspective led to revolutionary advances in many areas of pure mathematics.[7][9] He is considered by many to be the greatest mathematician of the twentieth century.[10][11]"""
29
 
30
+ tokenizer_hf = AutoTokenizer.from_pretrained("normalcomputing/extended-mind-llama-2-7b")
31
+ memories = tokenizer_hf(ag_wiki_entry).input_ids
32
+
33
+ model_hf = AutoModelForCausalLM.from_pretrained("normalcomputing/extended-mind-llama-2-7b", external_memories=memories, trust_remote_code=True)
34
+ ```
35
+ After this, you can generate text with the model as usual. The model will automatically use the memories during generation. You can update any config parameters (we set `topk` below) by passing new values to the `model.generate()` method.
36
+
37
+ ```python
38
+ inputs = "When did Alexander Grothendieck become a French citizen?"
39
+ inputs = tokenizer(inputs, return_tensors="pt").input_ids
40
+
41
+ outputs = model.generate(inputs, max_length=40, topk=2)
42
+ tokenizer.decode(outputs_hf['sequences'][0], skip_special_tokens=True)
43
+ ```
44
+
45
+ ### Citations
46
 
47
+ By simply setting `output_retrieved_memory_idx=True` in the `model.generate()` method, you can retrieve the memory indices used during generation. We walk through an example in the [demo notebook]().
48
 
49
+
50
+ ### Additional configuration
51
+ LongLLaMA has several other parameters:
52
+ * `memory_type` (`string`, *optional*, defaults to `manual`):
53
+ Whether to store external memories manually or in a vector database.
54
+ * `mask_by_sim` (`bool`, *optional*, defaults to `True`):
55
+ Whether or not to mask retrieved memories by similarity.
56
+ * `sim_threshold` (`float`, *optional*, defaults to `0.25`):
57
+ Threshold for masking retrieved memories.
58
+ * `tokenizer_all_special_ids` (`list`, *optional*, defaults to `[0, 50278]`):
59
+ Ids for special tokens to remove from memories.
60
+ * `remove_special_tokens` (`bool`, *optional*, defaults to `True`):
61
+ Remove memories that correspond to tokenizer special ids.
62
+
63
+ Additionally, the stride used to compute the memory representations can be set within `generate_cache()` method. Smaller strides generate higher-quality representations, while larger strides require fewer computations.
64
+
65
+ ## Limitations
66
+ This model is part of ongoing research at Normal Computing.