Spaces:

NCTCMumbai
/

NCTC

Running

App Files Files Community

NCTC / models /research /namignizer /README.md

NCTCMumbai

Upload 2571 files

0b8359d over 1 year ago

preview code

raw

history blame

3.12 kB

	![No Maintenance Intended](https://img.shields.io/badge/No%20Maintenance%20Intended-%E2%9C%95-red.svg)
	![TensorFlow Requirement: 1.x](https://img.shields.io/badge/TensorFlow%20Requirement-1.x-brightgreen)
	![TensorFlow 2 Not Supported](https://img.shields.io/badge/TensorFlow%202%20Not%20Supported-%E2%9C%95-red.svg)

	# Namignizer

	Use a variation of the [PTB](https://www.tensorflow.org/versions/r0.8/tutorials/recurrent/index.html#recurrent-neural-networks) model to recognize and generate names using the [Kaggle Baby Name Database](https://www.kaggle.com/kaggle/us-baby-names).

	### API
	Namignizer is implemented in Tensorflow 0.8r and uses the python package `pandas` for some data processing.

	#### How to use
	Download the data from Kaggle and place it in your data directory (or use the small training data provided). The example data looks like so:

	```
	Id,Name,Year,Gender,Count
	1,Mary,1880,F,7065
	2,Anna,1880,F,2604
	3,Emma,1880,F,2003
	4,Elizabeth,1880,F,1939
	5,Minnie,1880,F,1746
	6,Margaret,1880,F,1578
	7,Ida,1880,F,1472
	8,Alice,1880,F,1414
	9,Bertha,1880,F,1320
	```

	But any data with the two columns: `Name` and `Count` will work.

	With the data, we can then train the model:

	```python
	train("data/SmallNames.txt", "model/namignizer", SmallConfig)
	```

	And you will get the output:

	```
	Reading Name data in data/SmallNames.txt
	Epoch: 1 Learning rate: 1.000
	0.090 perplexity: 18.539 speed: 282 lps
	...
	0.890 perplexity: 1.478 speed: 285 lps
	0.990 perplexity: 1.477 speed: 284 lps
	Epoch: 13 Train Perplexity: 1.477
	```

	This will as a side effect write model checkpoints to the `model` directory. With this you will be able to determine the perplexity your model will give you for any arbitrary set of names like so:

	```python
	namignize(["mary", "ida", "gazorpazorp", "houyhnhnms", "bob"],
	tf.train.latest_checkpoint("model"), SmallConfig)
	```
	You will provide the same config and the same checkpoint directory. This will allow you to use a the model you just trained. You will then get a perplexity output for each name like so:

	```
	Name mary gives us a perplexity of 1.03105580807
	Name ida gives us a perplexity of 1.07770049572
	Name gazorpazorp gives us a perplexity of 175.940353394
	Name houyhnhnms gives us a perplexity of 9.53870773315
	Name bob gives us a perplexity of 6.03938627243
	```

	Finally, you will also be able generate names using the model like so:

	```python
	namignator(tf.train.latest_checkpoint("model"), SmallConfig)
	```

	Again, you will need to provide the same config and the same checkpoint directory. This will allow you to use a the model you just trained. You will then get a single generated name. Examples of output that I got when using the provided data are:

	```
	['b', 'e', 'r', 't', 'h', 'a', '`']
	['m', 'a', 'r', 'y', '`']
	['a', 'n', 'n', 'a', '`']
	['m', 'a', 'r', 'y', '`']
	['b', 'e', 'r', 't', 'h', 'a', '`']
	['a', 'n', 'n', 'a', '`']
	['e', 'l', 'i', 'z', 'a', 'b', 'e', 't', 'h', '`']
	```

	Notice that each name ends with a backtick. This marks the end of the name.

	### Contact Info

	Feel free to reach out to me at knt(at google) or k.nathaniel.tucker(at gmail)