sumedh commited on
Commit
2dd7365
1 Parent(s): db44e0a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -1,22 +1,34 @@
1
  ---
2
  library_name: keras
 
3
  tags:
 
4
  - translation
 
 
 
5
  ---
6
 
7
- ## Model description
8
 
9
- More information needed
 
 
 
10
 
11
  ## Intended uses & limitations
12
 
13
- More information needed
14
 
15
  ## Training and evaluation data
16
-
17
- More information needed
18
 
19
  ## Training procedure
 
 
 
 
20
 
21
  ### Training hyperparameters
22
 
@@ -26,6 +38,13 @@ The following hyperparameters were used during training:
26
  |----|-------------|-----|---|--------|-------|--------|------------------|
27
  |RMSprop|0.0010000000474974513|0.0|0.8999999761581421|0.0|1e-07|False|float32|
28
 
 
 
 
 
 
 
 
29
  ## Model Plot
30
 
31
  <details>
 
1
  ---
2
  library_name: keras
3
+ license: apache-2.0
4
  tags:
5
+ - seq2seq
6
  - translation
7
+ language:
8
+ - en
9
+ - fr
10
  ---
11
 
12
+ ## Keras Implementation of Character-level recurrent sequence-to-sequence model
13
 
14
+ This repo contains the model and the notebook [to this Keras example on Character-level recurrent sequence-to-sequence model](https://keras.io/examples/nlp/lstm_seq2seq/).
15
+
16
+ Full credits to : [fchollet](https://twitter.com/fchollet)
17
+ Model reproduced by : [Sumedh](https://huggingface.co/sumedh)
18
 
19
  ## Intended uses & limitations
20
 
21
+ This model implements a basic character-level recurrent sequence-to-sequence network for translating short English sentences into short French sentences, character-by-character. Note that it is fairly unusual to do character-level machine translation, as word-level models are more common in this domain. It works best on text of length <= 15 characters.
22
 
23
  ## Training and evaluation data
24
+ English to French translation data from
25
+ https://www.manythings.org/anki/
26
 
27
  ## Training procedure
28
+ - We start with input sequences from a domain (e.g. English sentences) and corresponding target sequences from another domain (e.g. French sentences).
29
+ - An encoder LSTM turns input sequences to 2 state vectors (we keep the last LSTM state and discard the outputs).
30
+ - A decoder LSTM is trained to turn the target sequences into the same sequence but offset by one timestep in the future, a training process called "teacher forcing" in this context. It uses as initial state the state vectors from the encoder. Effectively, the decoder learns to generate targets[t+1...] given targets[...t], conditioned on the input sequence.
31
+ - In inference mode, when we want to decode unknown input sequences, we: - Encode the input sequence into state vectors - Start with a target sequence of size 1 (just the start-of-sequence character) - Feed the state vectors and 1-char target sequence to the decoder to produce predictions for the next character - Sample the next character using these predictions (we simply use argmax). - Append the sampled character to the target sequence - Repeat until we generate the end-of-sequence character or we hit the character limit.
32
 
33
  ### Training hyperparameters
34
 
 
38
  |----|-------------|-----|---|--------|-------|--------|------------------|
39
  |RMSprop|0.0010000000474974513|0.0|0.8999999761581421|0.0|1e-07|False|float32|
40
 
41
+ ```python
42
+ batch_size = 64 # Batch size for training.
43
+ epochs = 100 # Number of epochs to train for.
44
+ latent_dim = 256 # Latent dimensionality of the encoding space.
45
+ num_samples = 10000 # Number of samples to train on.
46
+ ```
47
+
48
  ## Model Plot
49
 
50
  <details>