NCTCMumbai's picture
Upload 2571 files
0b8359d
|
raw
history blame
3.12 kB
![No Maintenance Intended](https://img.shields.io/badge/No%20Maintenance%20Intended-%E2%9C%95-red.svg)
![TensorFlow Requirement: 1.x](https://img.shields.io/badge/TensorFlow%20Requirement-1.x-brightgreen)
![TensorFlow 2 Not Supported](https://img.shields.io/badge/TensorFlow%202%20Not%20Supported-%E2%9C%95-red.svg)
# Namignizer
Use a variation of the [PTB](https://www.tensorflow.org/versions/r0.8/tutorials/recurrent/index.html#recurrent-neural-networks) model to recognize and generate names using the [Kaggle Baby Name Database](https://www.kaggle.com/kaggle/us-baby-names).
### API
Namignizer is implemented in Tensorflow 0.8r and uses the python package `pandas` for some data processing.
#### How to use
Download the data from Kaggle and place it in your data directory (or use the small training data provided). The example data looks like so:
```
Id,Name,Year,Gender,Count
1,Mary,1880,F,7065
2,Anna,1880,F,2604
3,Emma,1880,F,2003
4,Elizabeth,1880,F,1939
5,Minnie,1880,F,1746
6,Margaret,1880,F,1578
7,Ida,1880,F,1472
8,Alice,1880,F,1414
9,Bertha,1880,F,1320
```
But any data with the two columns: `Name` and `Count` will work.
With the data, we can then train the model:
```python
train("data/SmallNames.txt", "model/namignizer", SmallConfig)
```
And you will get the output:
```
Reading Name data in data/SmallNames.txt
Epoch: 1 Learning rate: 1.000
0.090 perplexity: 18.539 speed: 282 lps
...
0.890 perplexity: 1.478 speed: 285 lps
0.990 perplexity: 1.477 speed: 284 lps
Epoch: 13 Train Perplexity: 1.477
```
This will as a side effect write model checkpoints to the `model` directory. With this you will be able to determine the perplexity your model will give you for any arbitrary set of names like so:
```python
namignize(["mary", "ida", "gazorpazorp", "houyhnhnms", "bob"],
tf.train.latest_checkpoint("model"), SmallConfig)
```
You will provide the same config and the same checkpoint directory. This will allow you to use a the model you just trained. You will then get a perplexity output for each name like so:
```
Name mary gives us a perplexity of 1.03105580807
Name ida gives us a perplexity of 1.07770049572
Name gazorpazorp gives us a perplexity of 175.940353394
Name houyhnhnms gives us a perplexity of 9.53870773315
Name bob gives us a perplexity of 6.03938627243
```
Finally, you will also be able generate names using the model like so:
```python
namignator(tf.train.latest_checkpoint("model"), SmallConfig)
```
Again, you will need to provide the same config and the same checkpoint directory. This will allow you to use a the model you just trained. You will then get a single generated name. Examples of output that I got when using the provided data are:
```
['b', 'e', 'r', 't', 'h', 'a', '`']
['m', 'a', 'r', 'y', '`']
['a', 'n', 'n', 'a', '`']
['m', 'a', 'r', 'y', '`']
['b', 'e', 'r', 't', 'h', 'a', '`']
['a', 'n', 'n', 'a', '`']
['e', 'l', 'i', 'z', 'a', 'b', 'e', 't', 'h', '`']
```
Notice that each name ends with a backtick. This marks the end of the name.
### Contact Info
Feel free to reach out to me at knt(at google) or k.nathaniel.tucker(at gmail)