Model description

LSTM trained on Andrej Karpathy's tiny_shakespeare dataset, from his blog post, The Unreasonable Effectiveness of Recurrent Neural Networks.

Made to experiment with Hugging Face and W&B.

Intended uses & limitations

The model predicts the next character based on a variable-length input sequence. After 18 epochs of training, the model is generating text that is somewhat coherent.

def generate_text(model, encoder, text, n):
    vocab = encoder.get_vocabulary()
    generated_text = text
    for _ in range(n):
        encoded = encoder([generated_text])
        pred = model.predict(encoded, verbose=0)
        pred = tf.squeeze(tf.argmax(pred, axis=-1)).numpy()
        generated_text += vocab[pred]
    return generated_text

sample = "M"
print(generate_text(model, encoder, sample, 100))
MQLUS:
I will be so that the street of the state,
And then the street of the street of the state,
And

Training and evaluation data

https://example.com

Training procedure

The dataset consists of various works of William Shakespeare concatenated into a single file. The resulting file consists of individual speeches separated by \n\n.

The tokenizer is a Keras TextVectorization preprocessor that uses a simple character-based vocabulary.

To construct the training set, 100 characters are taken with the next character used as the target. This is repeated for each character in the text and results in 1,115,294 shuffled training examples.

TODO: upload encoder

Training hyperparameters

Hyperparameters Value
epochs 18
batch_size 1024
optimizer AdamW
weight_decay 0.001
learning_rate 0.00025

Model Plot

View Model Plot

Model Image

Downloads last month
5
Inference Examples
Inference API (serverless) does not yet support tf-keras models for this pipeline type.

Dataset used to train adamelliotfields/shakespeare