Spaces:
Running
Running
![No Maintenance Intended](https://img.shields.io/badge/No%20Maintenance%20Intended-%E2%9C%95-red.svg) | |
![TensorFlow Requirement: 1.x](https://img.shields.io/badge/TensorFlow%20Requirement-1.x-brightgreen) | |
![TensorFlow 2 Not Supported](https://img.shields.io/badge/TensorFlow%202%20Not%20Supported-%E2%9C%95-red.svg) | |
# Namignizer | |
Use a variation of the [PTB](https://www.tensorflow.org/versions/r0.8/tutorials/recurrent/index.html#recurrent-neural-networks) model to recognize and generate names using the [Kaggle Baby Name Database](https://www.kaggle.com/kaggle/us-baby-names). | |
### API | |
Namignizer is implemented in Tensorflow 0.8r and uses the python package `pandas` for some data processing. | |
#### How to use | |
Download the data from Kaggle and place it in your data directory (or use the small training data provided). The example data looks like so: | |
``` | |
Id,Name,Year,Gender,Count | |
1,Mary,1880,F,7065 | |
2,Anna,1880,F,2604 | |
3,Emma,1880,F,2003 | |
4,Elizabeth,1880,F,1939 | |
5,Minnie,1880,F,1746 | |
6,Margaret,1880,F,1578 | |
7,Ida,1880,F,1472 | |
8,Alice,1880,F,1414 | |
9,Bertha,1880,F,1320 | |
``` | |
But any data with the two columns: `Name` and `Count` will work. | |
With the data, we can then train the model: | |
```python | |
train("data/SmallNames.txt", "model/namignizer", SmallConfig) | |
``` | |
And you will get the output: | |
``` | |
Reading Name data in data/SmallNames.txt | |
Epoch: 1 Learning rate: 1.000 | |
0.090 perplexity: 18.539 speed: 282 lps | |
... | |
0.890 perplexity: 1.478 speed: 285 lps | |
0.990 perplexity: 1.477 speed: 284 lps | |
Epoch: 13 Train Perplexity: 1.477 | |
``` | |
This will as a side effect write model checkpoints to the `model` directory. With this you will be able to determine the perplexity your model will give you for any arbitrary set of names like so: | |
```python | |
namignize(["mary", "ida", "gazorpazorp", "houyhnhnms", "bob"], | |
tf.train.latest_checkpoint("model"), SmallConfig) | |
``` | |
You will provide the same config and the same checkpoint directory. This will allow you to use a the model you just trained. You will then get a perplexity output for each name like so: | |
``` | |
Name mary gives us a perplexity of 1.03105580807 | |
Name ida gives us a perplexity of 1.07770049572 | |
Name gazorpazorp gives us a perplexity of 175.940353394 | |
Name houyhnhnms gives us a perplexity of 9.53870773315 | |
Name bob gives us a perplexity of 6.03938627243 | |
``` | |
Finally, you will also be able generate names using the model like so: | |
```python | |
namignator(tf.train.latest_checkpoint("model"), SmallConfig) | |
``` | |
Again, you will need to provide the same config and the same checkpoint directory. This will allow you to use a the model you just trained. You will then get a single generated name. Examples of output that I got when using the provided data are: | |
``` | |
['b', 'e', 'r', 't', 'h', 'a', '`'] | |
['m', 'a', 'r', 'y', '`'] | |
['a', 'n', 'n', 'a', '`'] | |
['m', 'a', 'r', 'y', '`'] | |
['b', 'e', 'r', 't', 'h', 'a', '`'] | |
['a', 'n', 'n', 'a', '`'] | |
['e', 'l', 'i', 'z', 'a', 'b', 'e', 't', 'h', '`'] | |
``` | |
Notice that each name ends with a backtick. This marks the end of the name. | |
### Contact Info | |
Feel free to reach out to me at knt(at google) or k.nathaniel.tucker(at gmail) | |