Spaces:

johnson906
/

recipedia

Configuration error

App Files Files Community

johnsonhung commited on Aug 26, 2022

Commit

2a3a041

1 Parent(s): 2d2ef3c

init

Browse files

Files changed (34) hide show

.gitignore +112 -0
CODE_OF_CONDUCT.md +5 -0
CONTRIBUTING.md +36 -0
LICENSE.md +21 -0
README.md +119 -12
app.py +121 -0
data/README.md +1 -0
data/demo_imgs/1.jpg +0 -0
data/demo_imgs/2.jpg +0 -0
data/demo_imgs/3.jpg +0 -0
data/demo_imgs/4.jpg +0 -0
data/demo_imgs/5.jpg +0 -0
data/demo_imgs/6.jpg +0 -0
requirements.txt +11 -0
src/args.py +168 -0
src/build_vocab.py +409 -0
src/data_loader.py +193 -0
src/demo.ipynb +271 -0
src/demo.py +133 -0
src/model.py +236 -0
src/model1_inf.py +43 -0
src/modules/encoder.py +57 -0
src/modules/multihead_attention.py +203 -0
src/modules/transformer_decoder.py +502 -0
src/modules/utils.py +387 -0
src/read_pkl.py +7 -0
src/sample.py +207 -0
src/sim_ingr.py +197 -0
src/train.py +398 -0
src/utils/ims2file.py +94 -0
src/utils/metrics.py +78 -0
src/utils/output_ing.py +28 -0
src/utils/output_utils.py +103 -0
src/utils/tb_visualizer.py +66 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,112 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# pyenv
+.python-version
+# celery beat schedule file
+celerybeat-schedule
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+*.pkl
+*.png
+*.json
+*.nfs*
+*.tex
+.idea/
+*.bin
+*.ckpt

CODE_OF_CONDUCT.md ADDED Viewed

	@@ -0,0 +1,5 @@

+# Code of Conduct
+Facebook has adopted a Code of Conduct that we expect project participants to adhere to.
+Please read the [full text](https://code.fb.com/codeofconduct/)
+so that you can understand what actions will and will not be tolerated.

CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1,36 @@

+# Contributing
+We want to make contributing to this project as easy and transparent as
+possible.
+## Pull Requests
+We actively welcome your pull requests.
+1. Fork the repo and create your branch from `master`.
+2. If you've added code that should be tested, add tests.
+3. If you've changed APIs, update the documentation.
+4. Ensure the test suite passes.
+5. Make sure your code lints.
+6. If you haven't already, complete the Contributor License Agreement ("CLA").
+## Contributor License Agreement ("CLA")
+In order to accept your pull request, we need you to submit a CLA. You only need
+to do this once to work on any of Facebook's open source projects.
+Complete your CLA here: <https://code.facebook.com/cla>
+## Issues
+We use GitHub issues to track public bugs. Please ensure your description is
+clear and has sufficient instructions to be able to reproduce the issue.
+Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
+disclosure of security bugs. In those cases, please go through the process
+outlined on that page and do not file a public issue.
+## Coding Style
+* 4 spaces for indentation rather than tabs
+* 100 character line length
+* PEP8 formatting
+## License
+By contributing to this project, you agree that your contributions will be licensed
+under the LICENSE file in the root directory of this source tree.

LICENSE.md ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) Facebook, Inc. and its affiliates.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,12 +1,119 @@
----
-title: Recipedia
-emoji: 💩
-colorFrom: purple
-colorTo: red
-sdk: gradio
-sdk_version: 3.1.7
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+## Inverse Cooking: Recipe Generation from Food Images
+Code supporting the paper:
+*Amaia Salvador, Michal Drozdzal, Xavier Giro-i-Nieto, Adriana Romero.
+[Inverse Cooking: Recipe Generation from Food Images. ](https://arxiv.org/abs/1812.06164)
+CVPR 2019*
+If you find this code useful in your research, please consider citing using the
+following BibTeX entry:
+```
+@InProceedings{Salvador2019inversecooking,
+author = {Salvador, Amaia and Drozdzal, Michal and Giro-i-Nieto, Xavier and Romero, Adriana},
+title = {Inverse Cooking: Recipe Generation From Food Images},
+booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+month = {June},
+year = {2019}
+}
+```
+### Installation
+This code uses Python 3.6 and PyTorch 0.4.1 cuda version 9.0.
+- Installing PyTorch:
+```bash
+$ conda install pytorch=0.4.1 cuda90 -c pytorch
+```
+- Install dependencies
+```bash
+$ pip install -r requirements.txt
+```
+### Pretrained model
+- Download ingredient and instruction vocabularies [here](https://dl.fbaipublicfiles.com/inversecooking/ingr_vocab.pkl) and [here](https://dl.fbaipublicfiles.com/inversecooking/instr_vocab.pkl), respectively.
+- Download pretrained model [here](https://dl.fbaipublicfiles.com/inversecooking/modelbest.ckpt).
+### Demo
+You can use our pretrained model to get recipes for your images.
+Download the required files (listed above), place them under the ```data``` directory, and try our demo notebook ```src/demo.ipynb```.
+Note: The demo will run on GPU if a device is found, else it will use CPU.
+### Data
+- Download [Recipe1M](http://im2recipe.csail.mit.edu/dataset/download) (registration required)
+- Extract files somewhere (we refer to this path as ```path_to_dataset```).
+- The contents of ```path_to_dataset``` should be the following:
+```
+det_ingrs.json
+layer1.json
+layer2.json
+images/
+images/train
+images/val
+images/test
+```
+*Note: all python calls below must be run from ```./src```*
+### Build vocabularies
+```bash
+$ python build_vocab.py --recipe1m_path path_to_dataset
+```
+### Images to LMDB (Optional, but recommended)
+For fast loading during training:
+```bash
+$ python utils/ims2file.py --recipe1m_path path_to_dataset
+```
+If you decide not to create this file, use the flag ```--load_jpeg``` when training the model.
+### Training
+Create a directory to store checkpoints for all models you train
+(e.g. ```../checkpoints``` and point ```--save_dir``` to it.)
+We train our model in two stages:
+1. Ingredient prediction from images
+```bash
+python train.py --model_name im2ingr --batch_size 150 --finetune_after 0 --ingrs_only \
+--es_metric iou_sample --loss_weight 0 1000.0 1.0 1.0 \
+--learning_rate 1e-4 --scale_learning_rate_cnn 1.0 \
+--save_dir ../checkpoints --recipe1m_dir path_to_dataset
+```
+2. Recipe generation from images and ingredients (loading from 1.)
+```bash
+python train.py --model_name model --batch_size 256 --recipe_only --transfer_from im2ingr \
+--save_dir ../checkpoints --recipe1m_dir path_to_dataset
+```
+Check training progress with Tensorboard from ```../checkpoints```:
+```bash
+$ tensorboard --logdir='../tb_logs' --port=6006
+```
+### Evaluation
+- Save generated recipes to disk with
+```python sample.py --model_name model --save_dir ../checkpoints --recipe1m_dir path_to_dataset --greedy --eval_split test```.
+- This script will return ingredient metrics (F1 and IoU)
+### License
+inversecooking is released under MIT license, see [LICENSE](LICENSE.md) for details.

app.py ADDED Viewed

	@@ -0,0 +1,121 @@

+from PIL import Image
+import requests
+import pickle
+from io import BytesIO
+import gradio as gr
+from src.args import get_parser
+from src.model import get_model
+import torch
+import os
+from src.model1_inf import im2ingr
+import numpy as np
+response = requests.get("https://i.imgur.com/DwR24EM.jpeg")
+dog_img = Image.open(BytesIO(response.content))
+def img2ingr(image):
+    # img_file = '../data/demo_imgs/1.jpg'
+    # image = Image.open(img_file).convert('RGB')
+    img = Image.fromarray(np.uint8(image)).convert('RGB')
+    ingr = im2ingr(img, ingrs_vocab, model)
+    return ' '.join(ingr)
+def img_ingr2recipe(image, ingr):
+    print(image.shape, ingr)
+    return dog_img, "A delicious meme dog \n--------\n1. Cook it!\n2. GL&HF"
+def change_checkbox(predicted_ingr):
+    return gr.update(label="Ingredient required", interactive=True, choices=predicted_ingr.split(), value=predicted_ingr.split())
+def add_ingr(new_ingr):
+    print(new_ingr)
+    return "hello"
+def add_to_checkbox(old_ingr, new_ingr):
+    # chack if in dict or not
+    return gr.update(label="Ingredient required", interactive=True, choices=[*old_ingr, new_ingr], value=[*old_ingr, new_ingr])
+""" load model1 """
+args = get_parser()
+# basic parameters
+model_dir = './data'
+data_dir = './data'
+example_dir = './data/demo_imgs/'
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+map_loc = None if torch.cuda.is_available() else 'cpu'
+# load ingredients vocab
+ingrs_vocab = pickle.load(open(os.path.join(model_dir, 'ingr_vocab.pkl'), 'rb'))
+vocab = pickle.load(open(os.path.join(data_dir, 'instr_vocab.pkl'), 'rb'))
+ingr_vocab_size = len(ingrs_vocab)
+instrs_vocab_size = len(vocab)
+# model setting and loading
+args.maxseqlen = 15
+args.ingrs_only=True
+model = get_model(args, ingr_vocab_size, instrs_vocab_size)
+model_path = os.path.join(model_dir, 'modelbest.ckpt')
+model.load_state_dict(torch.load(model_path, map_location=map_loc))
+model.to(device)
+model.eval()
+model.ingrs_only = True
+model.recipe_only = False
+""" load model2 """
+""" gradio """
+# input image -> list all required ingrs -> checkbox for selecting ingrs / input_box for input more ingrs user want -> output: recipe and its image
+with gr.Blocks() as demo:
+    gr.Markdown(
+    """
+    # Recipedia
+    Start finding the yummy recipe ...
+    """)
+    with gr.Tabs():
+        with gr.TabItem("User"):
+            # input image
+            image_input = gr.Image(label="Upload the image of your yummy food", type='filepath')
+            gr.Examples(examples=[example_dir+"1.jpg", example_dir+"2.jpg", example_dir+"3.jpg", example_dir+"4.jpg", example_dir+"5.jpg", example_dir+"6.jpg"], inputs=image_input)
+            with gr.Row():
+                # clear_img_btn = gr.Button("Clear")
+                image_btn = gr.Button("Upload", variant="primary")
+            # list all required ingrs -> checkbox for selecting ingrs / input_box for input more ingrs user want
+            predicted_ingr = gr.Textbox(visible=False)
+            with gr.Row():
+                checkboxes = gr.CheckboxGroup(label="Ingredient required", interactive=True)
+                new_ingr = gr.Textbox(label="Addtional ingredients", max_lines=1)
+                    # with gr.Row():
+                    #     new_btn_clear = gr.Button("Clear")
+                    #     new_btn = gr.Button("Add", variant="primary")
+            add_ingr = gr.Textbox(visible=False)
+            with gr.Row():
+                clear_ingr_btn = gr.Button("Reset")
+                ingr_btn = gr.Button("Confirm", variant="primary")
+            # output: recipe and its image
+            with gr.Row():
+                out_recipe = gr.Textbox(label="Your recipe", value="Spagetti ---\n1. cook it!")
+                out_image = gr.Image(label="Looks yummy ><")
+        with gr.TabItem("Example"):
+            image_button = gr.Button("Flip")
+        image_btn.click(img2ingr, inputs=image_input, outputs=predicted_ingr)
+        predicted_ingr.change(fn=change_checkbox, inputs=predicted_ingr, outputs=checkboxes)
+        # new_btn.click(img2ingr, inputs=new_ingr, outputs=predicted_ingr)
+        new_ingr.submit(fn=add_to_checkbox, inputs=[checkboxes, new_ingr], outputs=checkboxes)
+        ingr_btn.click(img_ingr2recipe, inputs=[image_input, checkboxes], outputs=[out_image, out_recipe])
+demo.launch(debug=True, share=True)

data/README.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Vocabulary file will be saved here

data/demo_imgs/1.jpg ADDED Viewed

data/demo_imgs/2.jpg ADDED Viewed

data/demo_imgs/3.jpg ADDED Viewed

data/demo_imgs/4.jpg ADDED Viewed

data/demo_imgs/5.jpg ADDED Viewed

data/demo_imgs/6.jpg ADDED Viewed

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+numpy
+scipy
+matplotlib
+# torch==0.4.1
+# torchvision==0.2.1
+nltk
+Pillow
+tqdm
+lmdb
+tensorflow
+tensorboardX

src/args.py ADDED Viewed

	@@ -0,0 +1,168 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+import argparse
+import os
+def get_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--save_dir', type=str, default='path/to/save/models',
+                        help='path where checkpoints will be saved')
+    parser.add_argument('--project_name', type=str, default='inversecooking',
+                        help='name of the directory where models will be saved within save_dir')
+    parser.add_argument('--model_name', type=str, default='model',
+                        help='save_dir/project_name/model_name will be the path where logs and checkpoints are stored')
+    parser.add_argument('--transfer_from', type=str, default='',
+                        help='specify model name to transfer from')
+    parser.add_argument('--suff', type=str, default='',
+                        help='the id of the dictionary to load for training')
+    parser.add_argument('--image_model', type=str, default='resnet50', choices=['resnet18', 'resnet50', 'resnet101',
+                                                                                 'resnet152', 'inception_v3'])
+    parser.add_argument('--recipe1m_dir', type=str, default='path/to/recipe1m',
+                        help='directory where recipe1m dataset is extracted')
+    parser.add_argument('--aux_data_dir', type=str, default='../data',
+                        help='path to other necessary data files (eg. vocabularies)')
+    parser.add_argument('--crop_size', type=int, default=224, help='size for randomly or center cropping images')
+    parser.add_argument('--image_size', type=int, default=256, help='size to rescale images')
+    parser.add_argument('--log_step', type=int , default=10, help='step size for printing log info')
+    parser.add_argument('--learning_rate', type=float, default=0.001,
+                        help='base learning rate')
+    parser.add_argument('--scale_learning_rate_cnn', type=float, default=0.01,
+                        help='lr multiplier for cnn weights')
+    parser.add_argument('--lr_decay_rate', type=float, default=0.99,
+                        help='learning rate decay factor')
+    parser.add_argument('--lr_decay_every', type=int, default=1,
+                        help='frequency of learning rate decay (default is every epoch)')
+    parser.add_argument('--weight_decay', type=float, default=0.)
+    parser.add_argument('--embed_size', type=int, default=512,
+                        help='hidden size for all projections')
+    parser.add_argument('--n_att', type=int, default=8,
+                        help='number of attention heads in the instruction decoder')
+    parser.add_argument('--n_att_ingrs', type=int, default=4,
+                        help='number of attention heads in the ingredient decoder')
+    parser.add_argument('--transf_layers', type=int, default=16,
+                        help='number of transformer layers in the instruction decoder')
+    parser.add_argument('--transf_layers_ingrs', type=int, default=4,
+                        help='number of transformer layers in the ingredient decoder')
+    parser.add_argument('--num_epochs', type=int, default=400,
+                        help='maximum number of epochs')
+    parser.add_argument('--batch_size', type=int, default=128)
+    parser.add_argument('--num_workers', type=int, default=8)
+    parser.add_argument('--dropout_encoder', type=float, default=0.3,
+                        help='dropout ratio for the image and ingredient encoders')
+    parser.add_argument('--dropout_decoder_r', type=float, default=0.3,
+                        help='dropout ratio in the instruction decoder')
+    parser.add_argument('--dropout_decoder_i', type=float, default=0.3,
+                        help='dropout ratio in the ingredient decoder')
+    parser.add_argument('--finetune_after', type=int, default=-1,
+                        help='epoch to start training cnn. -1 is never, 0 is from the beginning')
+    parser.add_argument('--loss_weight', nargs='+', type=float, default=[1.0, 0.0, 0.0, 0.0],
+                        help='training loss weights. 1) instruction, 2) ingredient, 3) eos 4) cardinality')
+    parser.add_argument('--max_eval', type=int, default=4096,
+                        help='number of validation samples to evaluate during training')
+    parser.add_argument('--label_smoothing_ingr', type=float, default=0.1,
+                        help='label smoothing for bce loss for ingredients')
+    parser.add_argument('--patience', type=int, default=50,
+                        help='maximum number of epochs to allow before early stopping')
+    parser.add_argument('--maxseqlen', type=int, default=15,
+                        help='maximum length of each instruction')
+    parser.add_argument('--maxnuminstrs', type=int, default=10,
+                        help='maximum number of instructions')
+    parser.add_argument('--maxnumims', type=int, default=5,
+                        help='maximum number of images per sample')
+    parser.add_argument('--maxnumlabels', type=int, default=20,
+                        help='maximum number of ingredients per sample')
+    parser.add_argument('--es_metric', type=str, default='loss', choices=['loss', 'iou_sample'],
+                        help='early stopping metric to track')
+    parser.add_argument('--eval_split', type=str, default='val')
+    parser.add_argument('--numgens', type=int, default=3)
+    parser.add_argument('--greedy', dest='greedy', action='store_true',
+                        help='enables greedy sampling (inference only)')
+    parser.set_defaults(greedy=False)
+    parser.add_argument('--temperature', type=float, default=1.0,
+                        help='sampling temperature (when greedy is False)')
+    parser.add_argument('--beam', type=int, default=-1,
+                        help='beam size. -1 means no beam search (either greedy or sampling)')
+    parser.add_argument('--ingrs_only', dest='ingrs_only', action='store_true',
+                        help='train or evaluate the model only for ingredient prediction')
+    parser.set_defaults(ingrs_only=False)
+    parser.add_argument('--recipe_only', dest='recipe_only', action='store_true',
+                        help='train or evaluate the model only for instruction generation')
+    parser.set_defaults(recipe_only=False)
+    parser.add_argument('--log_term', dest='log_term', action='store_true',
+                        help='if used, shows training log in stdout instead of saving it to a file.')
+    parser.set_defaults(log_term=False)
+    parser.add_argument('--notensorboard', dest='tensorboard', action='store_false',
+                        help='if used, tensorboard logs will not be saved')
+    parser.set_defaults(tensorboard=True)
+    parser.add_argument('--resume', dest='resume', action='store_true',
+                        help='resume training from the checkpoint in model_name')
+    parser.set_defaults(resume=False)
+    parser.add_argument('--nodecay_lr', dest='decay_lr', action='store_false',
+                        help='disables learning rate decay')
+    parser.set_defaults(decay_lr=True)
+    parser.add_argument('--load_jpeg', dest='use_lmdb', action='store_false',
+                        help='if used, images are loaded from jpg files instead of lmdb')
+    parser.set_defaults(use_lmdb=True)
+    parser.add_argument('--get_perplexity', dest='get_perplexity', action='store_true',
+                        help='used to get perplexity in evaluation')
+    parser.set_defaults(get_perplexity=False)
+    parser.add_argument('--use_true_ingrs', dest='use_true_ingrs', action='store_true',
+                        help='if used, true ingredients will be used as input to obtain the recipe in evaluation')
+    parser.set_defaults(use_true_ingrs=False)
+    args = parser.parse_args()
+    return args

src/build_vocab.py ADDED Viewed

	@@ -0,0 +1,409 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+import nltk
+import pickle
+import argparse
+from collections import Counter
+import json
+import os
+from tqdm import *
+import numpy as np
+import re
+class Vocabulary(object):
+    """Simple vocabulary wrapper."""
+    def __init__(self):
+        self.word2idx = {}
+        self.idx2word = {}
+        self.idx = 0
+    def add_word(self, word, idx=None):
+        if idx is None:
+            if not word in self.word2idx:
+                self.word2idx[word] = self.idx
+                self.idx2word[self.idx] = word
+                self.idx += 1
+            return self.idx
+        else:
+            if not word in self.word2idx:
+                self.word2idx[word] = idx
+                if idx in self.idx2word.keys():
+                    self.idx2word[idx].append(word)
+                else:
+                    self.idx2word[idx] = [word]
+                return idx
+    def __call__(self, word):
+        if not word in self.word2idx:
+            return self.word2idx['<pad>']
+        return self.word2idx[word]
+    def __len__(self):
+        return len(self.idx2word)
+def get_ingredient(det_ingr, replace_dict):
+    det_ingr_undrs = det_ingr['text'].lower()
+    det_ingr_undrs = ''.join(i for i in det_ingr_undrs if not i.isdigit())
+    for rep, char_list in replace_dict.items():
+        for c_ in char_list:
+            if c_ in det_ingr_undrs:
+                det_ingr_undrs = det_ingr_undrs.replace(c_, rep)
+    det_ingr_undrs = det_ingr_undrs.strip()
+    det_ingr_undrs = det_ingr_undrs.replace(' ', '_')
+    return det_ingr_undrs
+def get_instruction(instruction, replace_dict, instruction_mode=True):
+    instruction = instruction.lower()
+    for rep, char_list in replace_dict.items():
+        for c_ in char_list:
+            if c_ in instruction:
+                instruction = instruction.replace(c_, rep)
+        instruction = instruction.strip()
+    # remove sentences starting with "1.", "2.", ... from the targets
+    if len(instruction) > 0 and instruction[0].isdigit() and instruction_mode:
+        instruction = ''
+    return instruction
+def remove_plurals(counter_ingrs, ingr_clusters):
+    del_ingrs = []
+    for k, v in counter_ingrs.items():
+        if len(k) == 0:
+            del_ingrs.append(k)
+            continue
+        gotit = 0
+        if k[-2:] == 'es':
+            if k[:-2] in counter_ingrs.keys():
+                counter_ingrs[k[:-2]] += v
+                ingr_clusters[k[:-2]].extend(ingr_clusters[k])
+                del_ingrs.append(k)
+                gotit = 1
+        if k[-1] == 's' and gotit == 0:
+            if k[:-1] in counter_ingrs.keys():
+                counter_ingrs[k[:-1]] += v
+                ingr_clusters[k[:-1]].extend(ingr_clusters[k])
+                del_ingrs.append(k)
+    for item in del_ingrs:
+        del counter_ingrs[item]
+        del ingr_clusters[item]
+    return counter_ingrs, ingr_clusters
+def cluster_ingredients(counter_ingrs):
+    mydict = dict()
+    mydict_ingrs = dict()
+    for k, v in counter_ingrs.items():
+        w1 = k.split('_')[-1]
+        w2 = k.split('_')[0]
+        lw = [w1, w2]
+        if len(k.split('_')) > 1:
+            w3 = k.split('_')[0] + '_' + k.split('_')[1]
+            w4 = k.split('_')[-2] + '_' + k.split('_')[-1]
+            lw = [w1, w2, w4, w3]
+        gotit = 0
+        for w in lw:
+            if w in counter_ingrs.keys():
+                # check if its parts are
+                parts = w.split('_')
+                if len(parts) > 0:
+                    if parts[0] in counter_ingrs.keys():
+                        w = parts[0]
+                    elif parts[1] in counter_ingrs.keys():
+                        w = parts[1]
+                if w in mydict.keys():
+                    mydict[w] += v
+                    mydict_ingrs[w].append(k)
+                else:
+                    mydict[w] = v
+                    mydict_ingrs[w] = [k]
+                gotit = 1
+                break
+        if gotit == 0:
+            mydict[k] = v
+            mydict_ingrs[k] = [k]
+    return mydict, mydict_ingrs
+def update_counter(list_, counter_toks, istrain=False):
+    for sentence in list_:
+        tokens = nltk.tokenize.word_tokenize(sentence)
+        if istrain:
+            counter_toks.update(tokens)
+def build_vocab_recipe1m(args):
+    print ("Loading data...")
+    dets = json.load(open(os.path.join(args.recipe1m_path, 'det_ingrs.json'), 'r'))
+    layer1 = json.load(open(os.path.join(args.recipe1m_path, 'layer1.json'), 'r'))
+    layer2 = json.load(open(os.path.join(args.recipe1m_path, 'layer2.json'), 'r'))
+    id2im = {}
+    for i, entry in enumerate(layer2):
+        id2im[entry['id']] = i
+    print("Loaded data.")
+    print("Found %d recipes in the dataset." % (len(layer1)))
+    replace_dict_ingrs = {'and': ['&', "'n"], '': ['%', ',', '.', '#', '[', ']', '!', '?']}
+    replace_dict_instrs = {'and': ['&', "'n"], '': ['#', '[', ']']}
+    idx2ind = {}
+    for i, entry in enumerate(dets):
+        idx2ind[entry['id']] = i
+    ingrs_file = args.save_path + 'allingrs_count.pkl'
+    instrs_file = args.save_path + 'allwords_count.pkl'
+    #####
+    # 1. Count words in dataset and clean
+    #####
+    if os.path.exists(ingrs_file) and os.path.exists(instrs_file) and not args.forcegen:
+        print ("loading pre-extracted word counters")
+        counter_ingrs = pickle.load(open(args.save_path + 'allingrs_count.pkl', 'rb'))
+        counter_toks = pickle.load(open(args.save_path + 'allwords_count.pkl', 'rb'))
+    else:
+        counter_toks = Counter()
+        counter_ingrs = Counter()
+        counter_ingrs_raw = Counter()
+        for i, entry in tqdm(enumerate(layer1)):
+            # get all instructions for this recipe
+            instrs = entry['instructions']
+            instrs_list = []
+            ingrs_list = []
+            # retrieve pre-detected ingredients for this entry
+            det_ingrs = dets[idx2ind[entry['id']]]['ingredients']
+            valid = dets[idx2ind[entry['id']]]['valid']
+            det_ingrs_filtered = []
+            for j, det_ingr in enumerate(det_ingrs):
+                if len(det_ingr) > 0 and valid[j]:
+                    det_ingr_undrs = get_ingredient(det_ingr, replace_dict_ingrs)
+                    det_ingrs_filtered.append(det_ingr_undrs)
+                    ingrs_list.append(det_ingr_undrs)
+            # get raw text for instructions of this entry
+            acc_len = 0
+            for instr in instrs:
+                instr = instr['text']
+                instr = get_instruction(instr, replace_dict_instrs)
+                if len(instr) > 0:
+                    instrs_list.append(instr)
+                    acc_len += len(instr)
+            # discard recipes with too few or too many ingredients or instruction words
+            if len(ingrs_list) < args.minnumingrs or len(instrs_list) < args.minnuminstrs \
+                    or len(instrs_list) >= args.maxnuminstrs or len(ingrs_list) >= args.maxnumingrs \
+                    or acc_len < args.minnumwords:
+                continue
+            # tokenize sentences and update counter
+            update_counter(instrs_list, counter_toks, istrain=entry['partition'] == 'train')
+            title = nltk.tokenize.word_tokenize(entry['title'].lower())
+            if entry['partition'] == 'train':
+                counter_toks.update(title)
+            if entry['partition'] == 'train':
+                counter_ingrs.update(ingrs_list)
+        pickle.dump(counter_ingrs, open(args.save_path + 'allingrs_count.pkl', 'wb'))
+        pickle.dump(counter_toks, open(args.save_path + 'allwords_count.pkl', 'wb'))
+        pickle.dump(counter_ingrs_raw, open(args.save_path + 'allingrs_raw_count.pkl', 'wb'))
+    # manually add missing entries for better clustering
+    base_words = ['peppers', 'tomato', 'spinach_leaves', 'turkey_breast', 'lettuce_leaf',
+                  'chicken_thighs', 'milk_powder', 'bread_crumbs', 'onion_flakes',
+                  'red_pepper', 'pepper_flakes', 'juice_concentrate', 'cracker_crumbs', 'hot_chili',
+                  'seasoning_mix', 'dill_weed', 'pepper_sauce', 'sprouts', 'cooking_spray', 'cheese_blend',
+                  'basil_leaves', 'pineapple_chunks', 'marshmallow', 'chile_powder',
+                  'cheese_blend', 'corn_kernels', 'tomato_sauce', 'chickens', 'cracker_crust',
+                  'lemonade_concentrate', 'red_chili', 'mushroom_caps', 'mushroom_cap', 'breaded_chicken',
+                  'frozen_pineapple', 'pineapple_chunks', 'seasoning_mix', 'seaweed', 'onion_flakes',
+                  'bouillon_granules', 'lettuce_leaf', 'stuffing_mix', 'parsley_flakes', 'chicken_breast',
+                  'basil_leaves', 'baguettes', 'green_tea', 'peanut_butter', 'green_onion', 'fresh_cilantro',
+                  'breaded_chicken', 'hot_pepper', 'dried_lavender', 'white_chocolate',
+                  'dill_weed', 'cake_mix', 'cheese_spread', 'turkey_breast', 'chucken_thighs', 'basil_leaves',
+                  'mandarin_orange', 'laurel', 'cabbage_head', 'pistachio', 'cheese_dip',
+                  'thyme_leave', 'boneless_pork', 'red_pepper', 'onion_dip', 'skinless_chicken', 'dark_chocolate',
+                  'canned_corn', 'muffin', 'cracker_crust', 'bread_crumbs', 'frozen_broccoli',
+                  'philadelphia', 'cracker_crust', 'chicken_breast']
+    for base_word in base_words:
+        if base_word not in counter_ingrs.keys():
+            counter_ingrs[base_word] = 1
+    counter_ingrs, cluster_ingrs = cluster_ingredients(counter_ingrs)
+    counter_ingrs, cluster_ingrs = remove_plurals(counter_ingrs, cluster_ingrs)
+    # If the word frequency is less than 'threshold', then the word is discarded.
+    words = [word for word, cnt in counter_toks.items() if cnt >= args.threshold_words]
+    ingrs = {word: cnt for word, cnt in counter_ingrs.items() if cnt >= args.threshold_ingrs}
+    # Recipe vocab
+    # Create a vocab wrapper and add some special tokens.
+    vocab_toks = Vocabulary()
+    vocab_toks.add_word('<start>')
+    vocab_toks.add_word('<end>')
+    vocab_toks.add_word('<eoi>')
+    # Add the words to the vocabulary.
+    for i, word in enumerate(words):
+        vocab_toks.add_word(word)
+    vocab_toks.add_word('<pad>')
+    # Ingredient vocab
+    # Create a vocab wrapper for ingredients
+    vocab_ingrs = Vocabulary()
+    idx = vocab_ingrs.add_word('<end>')
+    # this returns the next idx to add words to
+    # Add the ingredients to the vocabulary.
+    for k, _ in ingrs.items():
+        for ingr in cluster_ingrs[k]:
+            idx = vocab_ingrs.add_word(ingr, idx)
+        idx += 1
+    _ = vocab_ingrs.add_word('<pad>', idx)
+    print("Total ingr vocabulary size: {}".format(len(vocab_ingrs)))
+    print("Total token vocabulary size: {}".format(len(vocab_toks)))
+    dataset = {'train': [], 'val': [], 'test': []}
+    ######
+    # 2. Tokenize and build dataset based on vocabularies.
+    ######
+    for i, entry in tqdm(enumerate(layer1)):
+        # get all instructions for this recipe
+        instrs = entry['instructions']
+        instrs_list = []
+        ingrs_list = []
+        images_list = []
+        # retrieve pre-detected ingredients for this entry
+        det_ingrs = dets[idx2ind[entry['id']]]['ingredients']
+        valid = dets[idx2ind[entry['id']]]['valid']
+        labels = []
+        for j, det_ingr in enumerate(det_ingrs):
+            if len(det_ingr) > 0 and valid[j]:
+                det_ingr_undrs = get_ingredient(det_ingr, replace_dict_ingrs)
+                ingrs_list.append(det_ingr_undrs)
+                label_idx = vocab_ingrs(det_ingr_undrs)
+                if label_idx is not vocab_ingrs('<pad>') and label_idx not in labels:
+                    labels.append(label_idx)
+        # get raw text for instructions of this entry
+        acc_len = 0
+        for instr in instrs:
+            instr = instr['text']
+            instr = get_instruction(instr, replace_dict_instrs)
+            if len(instr) > 0:
+                acc_len += len(instr)
+                instrs_list.append(instr)
+        # we discard recipes with too many or too few ingredients or instruction words
+        if len(labels) < args.minnumingrs or len(instrs_list) < args.minnuminstrs \
+                or len(instrs_list) >= args.maxnuminstrs or len(labels) >= args.maxnumingrs \
+                or acc_len < args.minnumwords:
+            continue
+        if entry['id'] in id2im.keys():
+            ims = layer2[id2im[entry['id']]]
+            # copy image paths for this recipe
+            for im in ims['images']:
+                images_list.append(im['id'])
+        # tokenize sentences
+        toks = []
+        for instr in instrs_list:
+            tokens = nltk.tokenize.word_tokenize(instr)
+            toks.append(tokens)
+        title = nltk.tokenize.word_tokenize(entry['title'].lower())
+        newentry = {'id': entry['id'], 'instructions': instrs_list, 'tokenized': toks,
+                    'ingredients': ingrs_list, 'images': images_list, 'title': title}
+        dataset[entry['partition']].append(newentry)
+    print('Dataset size:')
+    for split in dataset.keys():
+        print(split, ':', len(dataset[split]))
+    return vocab_ingrs, vocab_toks, dataset
+def main(args):
+    vocab_ingrs, vocab_toks, dataset = build_vocab_recipe1m(args)
+    with open(os.path.join(args.save_path, args.suff+'recipe1m_vocab_ingrs.pkl'), 'wb') as f:
+        pickle.dump(vocab_ingrs, f)
+    with open(os.path.join(args.save_path, args.suff+'recipe1m_vocab_toks.pkl'), 'wb') as f:
+        pickle.dump(vocab_toks, f)
+    for split in dataset.keys():
+        with open(os.path.join(args.save_path, args.suff+'recipe1m_' + split + '.pkl'), 'wb') as f:
+            pickle.dump(dataset[split], f)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--recipe1m_path', type=str,
+                        default='path/to/recipe1m',
+                        help='recipe1m path')
+    parser.add_argument('--save_path', type=str, default='../data/',
+                        help='path for saving vocabulary wrapper')
+    parser.add_argument('--suff', type=str, default='')
+    parser.add_argument('--threshold_ingrs', type=int, default=10,
+                        help='minimum ingr count threshold')
+    parser.add_argument('--threshold_words', type=int, default=10,
+                        help='minimum word count threshold')
+    parser.add_argument('--maxnuminstrs', type=int, default=20,
+                        help='max number of instructions (sentences)')
+    parser.add_argument('--maxnumingrs', type=int, default=20,
+                        help='max number of ingredients')
+    parser.add_argument('--minnuminstrs', type=int, default=2,
+                        help='max number of instructions (sentences)')
+    parser.add_argument('--minnumingrs', type=int, default=2,
+                        help='max number of ingredients')
+    parser.add_argument('--minnumwords', type=int, default=20,
+                        help='minimum number of characters in recipe')
+    parser.add_argument('--forcegen', dest='forcegen', action='store_true')
+    parser.set_defaults(forcegen=False)
+    args = parser.parse_args()
+    main(args)

src/data_loader.py ADDED Viewed

	@@ -0,0 +1,193 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+import torch
+import torchvision.transforms as transforms
+import torch.utils.data as data
+import os
+import pickle
+import numpy as np
+import nltk
+from PIL import Image
+from build_vocab import Vocabulary
+import random
+import json
+import lmdb
+class Recipe1MDataset(data.Dataset):
+    def __init__(self, data_dir, aux_data_dir, split, maxseqlen, maxnuminstrs, maxnumlabels, maxnumims,
+                 transform=None, max_num_samples=-1, use_lmdb=False, suff=''):
+        self.ingrs_vocab = pickle.load(open(os.path.join(aux_data_dir, suff + 'recipe1m_vocab_ingrs.pkl'), 'rb'))
+        self.instrs_vocab = pickle.load(open(os.path.join(aux_data_dir, suff + 'recipe1m_vocab_toks.pkl'), 'rb'))
+        self.dataset = pickle.load(open(os.path.join(aux_data_dir, suff + 'recipe1m_'+split+'.pkl'), 'rb'))
+        self.label2word = self.get_ingrs_vocab()
+        self.use_lmdb = use_lmdb
+        if use_lmdb:
+            self.image_file = lmdb.open(os.path.join(aux_data_dir, 'lmdb_' + split), max_readers=1, readonly=True,
+                                        lock=False, readahead=False, meminit=False)
+        self.ids = []
+        self.split = split
+        for i, entry in enumerate(self.dataset):
+            if len(entry['images']) == 0:
+                continue
+            self.ids.append(i)
+        self.root = os.path.join(data_dir, 'images', split)
+        self.transform = transform
+        self.max_num_labels = maxnumlabels
+        self.maxseqlen = maxseqlen
+        self.max_num_instrs = maxnuminstrs
+        self.maxseqlen = maxseqlen*maxnuminstrs
+        self.maxnumims = maxnumims
+        if max_num_samples != -1:
+            random.shuffle(self.ids)
+            self.ids = self.ids[:max_num_samples]
+    def get_instrs_vocab(self):
+        return self.instrs_vocab
+    def get_instrs_vocab_size(self):
+        return len(self.instrs_vocab)
+    def get_ingrs_vocab(self):
+        return [min(w, key=len) if not isinstance(w, str) else w for w in
+                self.ingrs_vocab.idx2word.values()]  # includes 'pad' ingredient
+    def get_ingrs_vocab_size(self):
+        return len(self.ingrs_vocab)
+    def __getitem__(self, index):
+        """Returns one data pair (image and caption)."""
+        sample = self.dataset[self.ids[index]]
+        img_id = sample['id']
+        captions = sample['tokenized']
+        paths = sample['images'][0:self.maxnumims]
+        idx = index
+        labels = self.dataset[self.ids[idx]]['ingredients']
+        title = sample['title']
+        tokens = []
+        tokens.extend(title)
+        # add fake token to separate title from recipe
+        tokens.append('<eoi>')
+        for c in captions:
+            tokens.extend(c)
+            tokens.append('<eoi>')
+        ilabels_gt = np.ones(self.max_num_labels) * self.ingrs_vocab('<pad>')
+        pos = 0
+        true_ingr_idxs = []
+        for i in range(len(labels)):
+            true_ingr_idxs.append(self.ingrs_vocab(labels[i]))
+        for i in range(self.max_num_labels):
+            if i >= len(labels):
+                label = '<pad>'
+            else:
+                label = labels[i]
+            label_idx = self.ingrs_vocab(label)
+            if label_idx not in ilabels_gt:
+                ilabels_gt[pos] = label_idx
+                pos += 1
+        ilabels_gt[pos] = self.ingrs_vocab('<end>')
+        ingrs_gt = torch.from_numpy(ilabels_gt).long()
+        if len(paths) == 0:
+            path = None
+            image_input = torch.zeros((3, 224, 224))
+        else:
+            if self.split == 'train':
+                img_idx = np.random.randint(0, len(paths))
+            else:
+                img_idx = 0
+            path = paths[img_idx]
+            if self.use_lmdb:
+                try:
+                    with self.image_file.begin(write=False) as txn:
+                        image = txn.get(path.encode())
+                        image = np.fromstring(image, dtype=np.uint8)
+                        image = np.reshape(image, (256, 256, 3))
+                    image = Image.fromarray(image.astype('uint8'), 'RGB')
+                except:
+                    print ("Image id not found in lmdb. Loading jpeg file...")
+                    image = Image.open(os.path.join(self.root, path[0], path[1],
+                                                    path[2], path[3], path)).convert('RGB')
+            else:
+                image = Image.open(os.path.join(self.root, path[0], path[1], path[2], path[3], path)).convert('RGB')
+            if self.transform is not None:
+                image = self.transform(image)
+            image_input = image
+        # Convert caption (string) to word ids.
+        caption = []
+        caption = self.caption_to_idxs(tokens, caption)
+        caption.append(self.instrs_vocab('<end>'))
+        caption = caption[0:self.maxseqlen]
+        target = torch.Tensor(caption)
+        return image_input, target, ingrs_gt, img_id, path, self.instrs_vocab('<pad>')
+    def __len__(self):
+        return len(self.ids)
+    def caption_to_idxs(self, tokens, caption):
+        caption.append(self.instrs_vocab('<start>'))
+        for token in tokens:
+            caption.append(self.instrs_vocab(token))
+        return caption
+def collate_fn(data):
+    # Sort a data list by caption length (descending order).
+    # data.sort(key=lambda x: len(x[2]), reverse=True)
+    image_input, captions, ingrs_gt, img_id, path, pad_value = zip(*data)
+    # Merge images (from tuple of 3D tensor to 4D tensor).
+    image_input = torch.stack(image_input, 0)
+    ingrs_gt = torch.stack(ingrs_gt, 0)
+    # Merge captions (from tuple of 1D tensor to 2D tensor).
+    lengths = [len(cap) for cap in captions]
+    targets = torch.ones(len(captions), max(lengths)).long()*pad_value[0]
+    for i, cap in enumerate(captions):
+        end = lengths[i]
+        targets[i, :end] = cap[:end]
+    return image_input, targets, ingrs_gt, img_id, path
+def get_loader(data_dir, aux_data_dir, split, maxseqlen,
+               maxnuminstrs, maxnumlabels, maxnumims, transform, batch_size,
+               shuffle, num_workers, drop_last=False,
+               max_num_samples=-1,
+               use_lmdb=False,
+               suff=''):
+    dataset = Recipe1MDataset(data_dir=data_dir, aux_data_dir=aux_data_dir, split=split,
+                              maxseqlen=maxseqlen, maxnumlabels=maxnumlabels, maxnuminstrs=maxnuminstrs,
+                              maxnumims=maxnumims,
+                              transform=transform,
+                              max_num_samples=max_num_samples,
+                              use_lmdb=use_lmdb,
+                              suff=suff)
+    data_loader = torch.utils.data.DataLoader(dataset=dataset,
+                                              batch_size=batch_size, shuffle=shuffle, num_workers=num_workers,
+                                              drop_last=drop_last, collate_fn=collate_fn, pin_memory=True)
+    return data_loader, dataset

src/demo.ipynb ADDED Viewed

	@@ -0,0 +1,271 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Inverse Cooking: Recipe Generation from Food Images"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import numpy as np\n",
+    "import os\n",
+    "from args import get_parser\n",
+    "import pickle\n",
+    "from model import get_model\n",
+    "from torchvision import transforms\n",
+    "from utils.output_utils import prepare_output\n",
+    "from PIL import Image\n",
+    "import time"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Set ```data_dir``` to the path including vocabularies and model checkpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_dir = '../data'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# code will run in gpu if available and if the flag is set to True, else it will run on cpu\n",
+    "use_gpu = False\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() and use_gpu else 'cpu')\n",
+    "map_loc = None if torch.cuda.is_available() and use_gpu else 'cpu'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# code below was used to save vocab files so that they can be loaded without Vocabulary class\n",
+    "#ingrs_vocab = pickle.load(open(os.path.join(data_dir, 'final_recipe1m_vocab_ingrs.pkl'), 'rb'))\n",
+    "#ingrs_vocab = [min(w, key=len) if not isinstance(w, str) else w for w in ingrs_vocab.idx2word.values()]\n",
+    "#vocab = pickle.load(open(os.path.join(data_dir, 'final_recipe1m_vocab_toks.pkl'), 'rb')).idx2word\n",
+    "#pickle.dump(ingrs_vocab, open('../demo/ingr_vocab.pkl', 'wb'))\n",
+    "#pickle.dump(vocab, open('../demo/instr_vocab.pkl', 'wb'))\n",
+    "\n",
+    "ingrs_vocab = pickle.load(open(os.path.join(data_dir, 'ingr_vocab.pkl'), 'rb'))\n",
+    "vocab = pickle.load(open(os.path.join(data_dir, 'instr_vocab.pkl'), 'rb'))\n",
+    "\n",
+    "ingr_vocab_size = len(ingrs_vocab)\n",
+    "instrs_vocab_size = len(vocab)\n",
+    "output_dim = instrs_vocab_size"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print (instrs_vocab_size, ingr_vocab_size)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "t = time.time()\n",
+    "import sys; sys.argv=['']; del sys\n",
+    "args = get_parser()\n",
+    "args.maxseqlen = 15\n",
+    "args.ingrs_only=False\n",
+    "model = get_model(args, ingr_vocab_size, instrs_vocab_size)\n",
+    "# Load the trained model parameters\n",
+    "model_path = os.path.join(data_dir, 'modelbest.ckpt')\n",
+    "model.load_state_dict(torch.load(model_path, map_location=map_loc))\n",
+    "model.to(device)\n",
+    "model.eval()\n",
+    "model.ingrs_only = False\n",
+    "model.recipe_only = False\n",
+    "print ('loaded model')\n",
+    "print (\"Elapsed time:\", time.time() -t)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "transf_list_batch = []\n",
+    "transf_list_batch.append(transforms.ToTensor())\n",
+    "transf_list_batch.append(transforms.Normalize((0.485, 0.456, 0.406), \n",
+    "                                              (0.229, 0.224, 0.225)))\n",
+    "to_input_transf = transforms.Compose(transf_list_batch)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "greedy = [True, False, False, False]\n",
+    "beam = [-1, -1, -1, -1]\n",
+    "temperature = 1.0\n",
+    "numgens = len(greedy)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Set ```use_urls = True``` to get recipes for images in ```demo_urls```. \n",
+    "\n",
+    "You can also set ```use_urls = False``` and get recipes for images in the path in ```data_dir/test_imgs```."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "from io import BytesIO\n",
+    "import random\n",
+    "from collections import Counter\n",
+    "use_urls = False # set to true to load images from demo_urls instead of those in test_imgs folder\n",
+    "show_anyways = False #if True, it will show the recipe even if it's not valid\n",
+    "image_folder = os.path.join(data_dir, 'demo_imgs')\n",
+    "\n",
+    "if not use_urls:\n",
+    "    demo_imgs = os.listdir(image_folder)\n",
+    "    random.shuffle(demo_imgs)\n",
+    "\n",
+    "demo_urls = ['https://food.fnr.sndimg.com/content/dam/images/food/fullset/2013/12/9/0/FNK_Cheesecake_s4x3.jpg.rend.hgtvcom.826.620.suffix/1387411272847.jpeg',\n",
+    "            'https://www.196flavors.com/wp-content/uploads/2014/10/california-roll-3-FP.jpg']\n",
+    "\n",
+    "demo_files = demo_urls if use_urls else demo_imgs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for img_file in demo_files:\n",
+    "    \n",
+    "    if use_urls:\n",
+    "        response = requests.get(img_file)\n",
+    "        image = Image.open(BytesIO(response.content))\n",
+    "    else:\n",
+    "        image_path = os.path.join(image_folder, img_file)\n",
+    "        image = Image.open(image_path).convert('RGB')\n",
+    "    \n",
+    "    transf_list = []\n",
+    "    transf_list.append(transforms.Resize(256))\n",
+    "    transf_list.append(transforms.CenterCrop(224))\n",
+    "    transform = transforms.Compose(transf_list)\n",
+    "    \n",
+    "    image_transf = transform(image)\n",
+    "    image_tensor = to_input_transf(image_transf).unsqueeze(0).to(device)\n",
+    "    \n",
+    "    plt.imshow(image_transf)\n",
+    "    plt.axis('off')\n",
+    "    plt.show()\n",
+    "    plt.close()\n",
+    "    \n",
+    "    num_valid = 1\n",
+    "    for i in range(numgens):\n",
+    "        with torch.no_grad():\n",
+    "            outputs = model.sample(image_tensor, greedy=greedy[i], \n",
+    "                                   temperature=temperature, beam=beam[i], true_ingrs=None)\n",
+    "            \n",
+    "        ingr_ids = outputs['ingr_ids'].cpu().numpy()\n",
+    "        recipe_ids = outputs['recipe_ids'].cpu().numpy()\n",
+    "            \n",
+    "        outs, valid = prepare_output(recipe_ids[0], ingr_ids[0], ingrs_vocab, vocab)\n",
+    "        \n",
+    "        if valid['is_valid'] or show_anyways:\n",
+    "            \n",
+    "            print ('RECIPE', num_valid)\n",
+    "            num_valid+=1\n",
+    "            #print (\"greedy:\", greedy[i], \"beam:\", beam[i])\n",
+    "    \n",
+    "            BOLD = '\\033[1m'\n",
+    "            END = '\\033[0m'\n",
+    "            print (BOLD + '\\nTitle:' + END,outs['title'])\n",
+    "\n",
+    "            print (BOLD + '\\nIngredients:'+ END)\n",
+    "            print (', '.join(outs['ingrs']))\n",
+    "\n",
+    "            print (BOLD + '\\nInstructions:'+END)\n",
+    "            print ('-'+'\\n-'.join(outs['recipe']))\n",
+    "\n",
+    "            print ('='*20)\n",
+    "\n",
+    "        else:\n",
+    "            pass\n",
+    "            print (\"Not a valid recipe!\")\n",
+    "            print (\"Reason: \", valid['reason'])\n",
+    "        "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

src/demo.py ADDED Viewed

	@@ -0,0 +1,133 @@

+import torch
+import torch.nn as nn
+import numpy as np
+import os
+from args import get_parser
+import pickle
+from model import get_model
+from torchvision import transforms
+from utils.output_ing import prepare_output
+from PIL import Image
+from tqdm import tqdm
+import time
+import glob
+# Set ```data_dir``` to the path including vocabularies and model checkpoint
+model_dir = '../data'
+image_folder = '../data/demo_imgs'
+output_file = "../data/predicted_ingr.pkl"
+# code will run in gpu if available and if the flag is set to True, else it will run on cpu
+use_gpu = False
+device = torch.device('cuda' if torch.cuda.is_available() and use_gpu else 'cpu')
+map_loc = None if torch.cuda.is_available() and use_gpu else 'cpu'
+# code below was used to save vocab files so that they can be loaded without Vocabulary class
+#ingrs_vocab = pickle.load(open(os.path.join(data_dir, 'final_recipe1m_vocab_ingrs.pkl'), 'rb'))
+#ingrs_vocab = [min(w, key=len) if not isinstance(w, str) else w for w in ingrs_vocab.idx2word.values()]
+#vocab = pickle.load(open(os.path.join(data_dir, 'final_recipe1m_vocab_toks.pkl'), 'rb')).idx2word
+#pickle.dump(ingrs_vocab, open('../demo/ingr_vocab.pkl', 'wb'))
+#pickle.dump(vocab, open('../demo/instr_vocab.pkl', 'wb'))
+ingrs_vocab = pickle.load(open(os.path.join(model_dir, 'ingr_vocab.pkl'), 'rb'))
+vocab = pickle.load(open(os.path.join(model_dir, 'instr_vocab.pkl'), 'rb'))
+ingr_vocab_size = len(ingrs_vocab)
+instrs_vocab_size = len(vocab)
+output_dim = instrs_vocab_size
+print (instrs_vocab_size, ingr_vocab_size)
+t = time.time()
+args = get_parser()
+args.maxseqlen = 15
+args.ingrs_only=True
+model = get_model(args, ingr_vocab_size, instrs_vocab_size)
+# Load the trained model parameters
+model_path = os.path.join(model_dir, 'modelbest.ckpt')
+model.load_state_dict(torch.load(model_path, map_location=map_loc))
+model.to(device)
+model.eval()
+model.ingrs_only = True
+model.recipe_only = False
+print ('loaded model')
+print ("Elapsed time:", time.time() -t)
+transf_list_batch = []
+transf_list_batch.append(transforms.ToTensor())
+transf_list_batch.append(transforms.Normalize((0.485, 0.456, 0.406),
+                                              (0.229, 0.224, 0.225)))
+to_input_transf = transforms.Compose(transf_list_batch)
+greedy = True
+beam = -1
+temperature = 1.0
+# import requests
+# from io import BytesIO
+# import random
+# from collections import Counter
+# use_urls = False # set to true to load images from demo_urls instead of those in test_imgs folder
+# show_anyways = False #if True, it will show the recipe even if it's not valid
+# image_folder = os.path.join(data_dir, 'demo_imgs')
+# if not use_urls:
+#     demo_imgs = os.listdir(image_folder)
+#     random.shuffle(demo_imgs)
+# demo_urls = ['https://food.fnr.sndimg.com/content/dam/images/food/fullset/2013/12/9/0/FNK_Cheesecake_s4x3.jpg.rend.hgtvcom.826.620.suffix/1387411272847.jpeg',
+#             'https://www.196flavors.com/wp-content/uploads/2014/10/california-roll-3-FP.jpg']
+files_path = glob.glob(f"{image_folder}/*/*/*.jpg")
+print(f"total data: {len(files_path)}")
+res = []
+for idx, img_file in tqdm(enumerate(files_path)):
+    # if use_urls:
+    #     response = requests.get(img_file)
+    #     image = Image.open(BytesIO(response.content))
+    # else:
+    image = Image.open(img_file).convert('RGB')
+    transf_list = []
+    transf_list.append(transforms.Resize(256))
+    transf_list.append(transforms.CenterCrop(224))
+    transform = transforms.Compose(transf_list)
+    image_transf = transform(image)
+    image_tensor = to_input_transf(image_transf).unsqueeze(0).to(device)
+    # plt.imshow(image_transf)
+    # plt.axis('off')
+    # plt.show()
+    # plt.close()
+    with torch.no_grad():
+        outputs = model.sample(image_tensor, greedy=greedy,
+                                temperature=temperature, beam=beam, true_ingrs=None)
+    ingr_ids = outputs['ingr_ids'].cpu().numpy()
+    print(ingr_ids)
+    outs = prepare_output(ingr_ids[0], ingrs_vocab)
+    # print(ingrs_vocab.idx2word)
+    print(outs)
+    # print ('Pic ' + str(idx+1) + ':')
+    # print ('\nIngredients:')
+    # print (', '.join(outs['ingrs']))
+    # print ('='*20)
+    res.append({
+        "id": img_file,
+        "ingredients": outs['ingrs']
+    })
+with open(output_file, "wb") as fp:   #Pickling
+    pickle.dump(res, fp)

src/model.py ADDED Viewed

	@@ -0,0 +1,236 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+import torch
+import torch.nn as nn
+import random
+import numpy as np
+from src.modules.encoder import EncoderCNN, EncoderLabels
+from src.modules.transformer_decoder import DecoderTransformer
+from src.modules.multihead_attention import MultiheadAttention
+from src.utils.metrics import softIoU, MaskedCrossEntropyCriterion
+import pickle
+import os
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+def label2onehot(labels, pad_value):
+    # input labels to one hot vector
+    inp_ = torch.unsqueeze(labels, 2)
+    one_hot = torch.FloatTensor(labels.size(0), labels.size(1), pad_value + 1).zero_().to(device)
+    one_hot.scatter_(2, inp_, 1)
+    one_hot, _ = one_hot.max(dim=1)
+    # remove pad position
+    one_hot = one_hot[:, :-1]
+    # eos position is always 0
+    one_hot[:, 0] = 0
+    return one_hot
+def mask_from_eos(ids, eos_value, mult_before=True):
+    mask = torch.ones(ids.size()).to(device).byte()
+    mask_aux = torch.ones(ids.size(0)).to(device).byte()
+    # find eos in ingredient prediction
+    for idx in range(ids.size(1)):
+        # force mask to have 1s in the first position to avoid division by 0 when predictions start with eos
+        if idx == 0:
+            continue
+        if mult_before:
+            mask[:, idx] = mask[:, idx] * mask_aux
+            mask_aux = mask_aux * (ids[:, idx] != eos_value)
+        else:
+            mask_aux = mask_aux * (ids[:, idx] != eos_value)
+            mask[:, idx] = mask[:, idx] * mask_aux
+    return mask
+def get_model(args, ingr_vocab_size, instrs_vocab_size):
+    # build ingredients embedding
+    encoder_ingrs = EncoderLabels(args.embed_size, ingr_vocab_size,
+                                  args.dropout_encoder, scale_grad=False).to(device)
+    # build image model
+    encoder_image = EncoderCNN(args.embed_size, args.dropout_encoder, args.image_model)
+    decoder = DecoderTransformer(args.embed_size, instrs_vocab_size,
+                                 dropout=args.dropout_decoder_r, seq_length=args.maxseqlen,
+                                 num_instrs=args.maxnuminstrs,
+                                 attention_nheads=args.n_att, num_layers=args.transf_layers,
+                                 normalize_before=True,
+                                 normalize_inputs=False,
+                                 last_ln=False,
+                                 scale_embed_grad=False)
+    ingr_decoder = DecoderTransformer(args.embed_size, ingr_vocab_size, dropout=args.dropout_decoder_i,
+                                      seq_length=args.maxnumlabels,
+                                      num_instrs=1, attention_nheads=args.n_att_ingrs,
+                                      pos_embeddings=False,
+                                      num_layers=args.transf_layers_ingrs,
+                                      learned=False,
+                                      normalize_before=True,
+                                      normalize_inputs=True,
+                                      last_ln=True,
+                                      scale_embed_grad=False)
+    # recipe loss
+    criterion = MaskedCrossEntropyCriterion(ignore_index=[instrs_vocab_size-1], reduce=False)
+    # ingredients loss
+    label_loss = nn.BCELoss(reduce=False)
+    eos_loss = nn.BCELoss(reduce=False)
+    model = InverseCookingModel(encoder_ingrs, decoder, ingr_decoder, encoder_image,
+                                crit=criterion, crit_ingr=label_loss, crit_eos=eos_loss,
+                                pad_value=ingr_vocab_size-1,
+                                ingrs_only=args.ingrs_only, recipe_only=args.recipe_only,
+                                label_smoothing=args.label_smoothing_ingr)
+    return model
+class InverseCookingModel(nn.Module):
+    def __init__(self, ingredient_encoder, recipe_decoder, ingr_decoder, image_encoder,
+                 crit=None, crit_ingr=None, crit_eos=None,
+                 pad_value=0, ingrs_only=True,
+                 recipe_only=False, label_smoothing=0.0):
+        super(InverseCookingModel, self).__init__()
+        self.ingredient_encoder = ingredient_encoder
+        self.recipe_decoder = recipe_decoder
+        self.image_encoder = image_encoder
+        self.ingredient_decoder = ingr_decoder
+        self.crit = crit
+        self.crit_ingr = crit_ingr
+        self.pad_value = pad_value
+        self.ingrs_only = ingrs_only
+        self.recipe_only = recipe_only
+        self.crit_eos = crit_eos
+        self.label_smoothing = label_smoothing
+    def forward(self, img_inputs, captions, target_ingrs,
+                sample=False, keep_cnn_gradients=False):
+        if sample:
+            return self.sample(img_inputs, greedy=True)
+        targets = captions[:, 1:]
+        targets = targets.contiguous().view(-1)
+        img_features = self.image_encoder(img_inputs, keep_cnn_gradients)
+        losses = {}
+        target_one_hot = label2onehot(target_ingrs, self.pad_value)
+        target_one_hot_smooth = label2onehot(target_ingrs, self.pad_value)
+        # ingredient prediction
+        if not self.recipe_only:
+            target_one_hot_smooth[target_one_hot_smooth == 1] = (1-self.label_smoothing)
+            target_one_hot_smooth[target_one_hot_smooth == 0] = self.label_smoothing / target_one_hot_smooth.size(-1)
+            # decode ingredients with transformer
+            # autoregressive mode for ingredient decoder
+            ingr_ids, ingr_logits = self.ingredient_decoder.sample(None, None, greedy=True,
+                                                                   temperature=1.0, img_features=img_features,
+                                                                   first_token_value=0, replacement=False)
+            ingr_logits = torch.nn.functional.softmax(ingr_logits, dim=-1)
+            # find idxs for eos ingredient
+            # eos probability is the one assigned to the first position of the softmax
+            eos = ingr_logits[:, :, 0]
+            target_eos = ((target_ingrs == 0) ^ (target_ingrs == self.pad_value))
+            eos_pos = (target_ingrs == 0)
+            eos_head = ((target_ingrs != self.pad_value) & (target_ingrs != 0))
+            # select transformer steps to pool from
+            mask_perminv = mask_from_eos(target_ingrs, eos_value=0, mult_before=False)
+            ingr_probs = ingr_logits * mask_perminv.float().unsqueeze(-1)
+            ingr_probs, _ = torch.max(ingr_probs, dim=1)
+            # ignore predicted ingredients after eos in ground truth
+            ingr_ids[mask_perminv == 0] = self.pad_value
+            ingr_loss = self.crit_ingr(ingr_probs, target_one_hot_smooth)
+            ingr_loss = torch.mean(ingr_loss, dim=-1)
+            losses['ingr_loss'] = ingr_loss
+            # cardinality penalty
+            losses['card_penalty'] = torch.abs((ingr_probs*target_one_hot).sum(1) - target_one_hot.sum(1)) + \
+                                     torch.abs((ingr_probs*(1-target_one_hot)).sum(1))
+            eos_loss = self.crit_eos(eos, target_eos.float())
+            mult = 1/2
+            # eos loss is only computed for timesteps <= t_eos and equally penalizes 0s and 1s
+            losses['eos_loss'] = mult*(eos_loss * eos_pos.float()).sum(1) / (eos_pos.float().sum(1) + 1e-6) + \
+                                 mult*(eos_loss * eos_head.float()).sum(1) / (eos_head.float().sum(1) + 1e-6)
+            # iou
+            pred_one_hot = label2onehot(ingr_ids, self.pad_value)
+            # iou sample during training is computed using the true eos position
+            losses['iou'] = softIoU(pred_one_hot, target_one_hot)
+        if self.ingrs_only:
+            return losses
+        # encode ingredients
+        target_ingr_feats = self.ingredient_encoder(target_ingrs)
+        target_ingr_mask = mask_from_eos(target_ingrs, eos_value=0, mult_before=False)
+        target_ingr_mask = target_ingr_mask.float().unsqueeze(1)
+        outputs, ids = self.recipe_decoder(target_ingr_feats, target_ingr_mask, captions, img_features)
+        outputs = outputs[:, :-1, :].contiguous()
+        outputs = outputs.view(outputs.size(0) * outputs.size(1), -1)
+        loss = self.crit(outputs, targets)
+        losses['recipe_loss'] = loss
+        return losses
+    def sample(self, img_inputs, greedy=True, temperature=1.0, beam=-1, true_ingrs=None):
+        outputs = dict()
+        img_features = self.image_encoder(img_inputs)
+        if not self.recipe_only:
+            ingr_ids, ingr_probs = self.ingredient_decoder.sample(None, None, greedy=True, temperature=temperature,
+                                                                  beam=-1,
+                                                                  img_features=img_features, first_token_value=0,
+                                                                  replacement=False)
+            # mask ingredients after finding eos
+            sample_mask = mask_from_eos(ingr_ids, eos_value=0, mult_before=False)
+            ingr_ids[sample_mask == 0] = self.pad_value
+            outputs['ingr_ids'] = ingr_ids
+            outputs['ingr_probs'] = ingr_probs.data
+            mask = sample_mask
+            input_mask = mask.float().unsqueeze(1)
+            input_feats = self.ingredient_encoder(ingr_ids)
+        if self.ingrs_only:
+            return outputs
+        # option during sampling to use the real ingredients and not the predicted ones to infer the recipe
+        if true_ingrs is not None:
+            input_mask = mask_from_eos(true_ingrs, eos_value=0, mult_before=False)
+            true_ingrs[input_mask == 0] = self.pad_value
+            input_feats = self.ingredient_encoder(true_ingrs)
+            input_mask = input_mask.unsqueeze(1)
+        ids, probs = self.recipe_decoder.sample(input_feats, input_mask, greedy, temperature, beam, img_features, 0,
+                                                last_token_value=1)
+        outputs['recipe_probs'] = probs.data
+        outputs['recipe_ids'] = ids
+        return outputs

src/model1_inf.py ADDED Viewed

	@@ -0,0 +1,43 @@

+import torch
+import torch.nn as nn
+import numpy as np
+import os
+from src.args import get_parser
+import pickle
+from src.model import get_model
+from torchvision import transforms
+from src.utils.output_ing import prepare_output
+from PIL import Image
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+map_loc = None if torch.cuda.is_available() else 'cpu'
+def im2ingr(image, ingrs_vocab, model):
+    transf_list_batch = []
+    transf_list_batch.append(transforms.ToTensor())
+    transf_list_batch.append(transforms.Normalize((0.485, 0.456, 0.406),
+                                                (0.229, 0.224, 0.225)))
+    to_input_transf = transforms.Compose(transf_list_batch)
+    greedy = True
+    beam = -1
+    temperature = 1.0
+    transf_list = []
+    transf_list.append(transforms.Resize(256))
+    transf_list.append(transforms.CenterCrop(224))
+    transform = transforms.Compose(transf_list)
+    image_transf = transform(image)
+    image_tensor = to_input_transf(image_transf).unsqueeze(0).to(device)
+    with torch.no_grad():
+        outputs = model.sample(image_tensor, greedy=greedy,
+                                temperature=temperature, beam=beam, true_ingrs=None)
+    ingr_ids = outputs['ingr_ids'].cpu().numpy()
+    outs = prepare_output(ingr_ids[0], ingrs_vocab)
+    return outs['ingrs']

src/modules/encoder.py ADDED Viewed

	@@ -0,0 +1,57 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+from torchvision.models import resnet18, resnet50, resnet101, resnet152, vgg16, vgg19, inception_v3
+import torch
+import torch.nn as nn
+import random
+import numpy as np
+class EncoderCNN(nn.Module):
+    def __init__(self, embed_size, dropout=0.5, image_model='resnet101', pretrained=True):
+        """Load the pretrained ResNet-152 and replace top fc layer."""
+        super(EncoderCNN, self).__init__()
+        resnet = globals()[image_model](pretrained=pretrained)
+        modules = list(resnet.children())[:-2]  # delete the last fc layer.
+        self.resnet = nn.Sequential(*modules)
+        self.linear = nn.Sequential(nn.Conv2d(resnet.fc.in_features, embed_size, kernel_size=1, padding=0),
+                                    nn.Dropout2d(dropout))
+    def forward(self, images, keep_cnn_gradients=False):
+        """Extract feature vectors from input images."""
+        if keep_cnn_gradients:
+            raw_conv_feats = self.resnet(images)
+        else:
+            with torch.no_grad():
+                raw_conv_feats = self.resnet(images)
+        features = self.linear(raw_conv_feats)
+        features = features.view(features.size(0), features.size(1), -1)
+        return features
+class EncoderLabels(nn.Module):
+    def __init__(self, embed_size, num_classes, dropout=0.5, embed_weights=None, scale_grad=False):
+        super(EncoderLabels, self).__init__()
+        embeddinglayer = nn.Embedding(num_classes, embed_size, padding_idx=num_classes-1, scale_grad_by_freq=scale_grad)
+        if embed_weights is not None:
+            embeddinglayer.weight.data.copy_(embed_weights)
+        self.pad_value = num_classes - 1
+        self.linear = embeddinglayer
+        self.dropout = dropout
+        self.embed_size = embed_size
+    def forward(self, x, onehot_flag=False):
+        if onehot_flag:
+            embeddings = torch.matmul(x, self.linear.weight)
+        else:
+            embeddings = self.linear(x)
+        embeddings = nn.functional.dropout(embeddings, p=self.dropout, training=self.training)
+        embeddings = embeddings.permute(0, 2, 1).contiguous()
+        return embeddings

src/modules/multihead_attention.py ADDED Viewed

	@@ -0,0 +1,203 @@

+# Copyright (c) 2017-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the LICENSE file in
+# https://github.com/pytorch/fairseq. An additional grant of patent rights
+# can be found in the PATENTS file in the same directory.
+import torch
+from torch import nn
+from torch.nn import Parameter
+import torch.nn.functional as F
+from src.modules.utils import fill_with_neg_inf, get_incremental_state, set_incremental_state
+class MultiheadAttention(nn.Module):
+    """Multi-headed attention.
+    See "Attention Is All You Need" for more details.
+    """
+    def __init__(self, embed_dim, num_heads, dropout=0., bias=True):
+        super().__init__()
+        self.embed_dim = embed_dim
+        self.num_heads = num_heads
+        self.dropout = dropout
+        self.head_dim = embed_dim // num_heads
+        assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads"
+        self.scaling = self.head_dim**-0.5
+        self._mask = None
+        self.in_proj_weight = Parameter(torch.Tensor(3*embed_dim, embed_dim))
+        if bias:
+            self.in_proj_bias = Parameter(torch.Tensor(3*embed_dim))
+        else:
+            self.register_parameter('in_proj_bias', None)
+        self.out_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
+        self.reset_parameters()
+    def reset_parameters(self):
+        nn.init.xavier_uniform_(self.in_proj_weight)
+        nn.init.xavier_uniform_(self.out_proj.weight)
+        if self.in_proj_bias is not None:
+            nn.init.constant_(self.in_proj_bias, 0.)
+            nn.init.constant_(self.out_proj.bias, 0.)
+    def forward(self, query, key, value, mask_future_timesteps=False,
+                key_padding_mask=None, incremental_state=None,
+                need_weights=True, static_kv=False):
+        """Input shape: Time x Batch x Channel
+        Self-attention can be implemented by passing in the same arguments for
+        query, key and value. Future timesteps can be masked with the
+        `mask_future_timesteps` argument. Padding elements can be excluded from
+        the key by passing a binary ByteTensor (`key_padding_mask`) with shape:
+        batch x src_len, where padding elements are indicated by 1s.
+        """
+        qkv_same = query.data_ptr() == key.data_ptr() == value.data_ptr()
+        kv_same = key.data_ptr() == value.data_ptr()
+        tgt_len, bsz, embed_dim = query.size()
+        assert embed_dim == self.embed_dim
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+        assert key.size() == value.size()
+        if incremental_state is not None:
+            saved_state = self._get_input_buffer(incremental_state)
+            if 'prev_key' in saved_state:
+                # previous time steps are cached - no need to recompute
+                # key and value if they are static
+                if static_kv:
+                    assert kv_same and not qkv_same
+                    key = value = None
+        else:
+            saved_state = None
+        if qkv_same:
+            # self-attention
+            q, k, v = self.in_proj_qkv(query)
+        elif kv_same:
+            # encoder-decoder attention
+            q = self.in_proj_q(query)
+            if key is None:
+                assert value is None
+                # this will allow us to concat it with previous value and get
+                # just get the previous value
+                k = v = q.new(0)
+            else:
+                k, v = self.in_proj_kv(key)
+        else:
+            q = self.in_proj_q(query)
+            k = self.in_proj_k(key)
+            v = self.in_proj_v(value)
+        q *= self.scaling
+        if saved_state is not None:
+            if 'prev_key' in saved_state:
+                k = torch.cat((saved_state['prev_key'], k), dim=0)
+            if 'prev_value' in saved_state:
+                v = torch.cat((saved_state['prev_value'], v), dim=0)
+            saved_state['prev_key'] = k
+            saved_state['prev_value'] = v
+            self._set_input_buffer(incremental_state, saved_state)
+        src_len = k.size(0)
+        if key_padding_mask is not None:
+            assert key_padding_mask.size(0) == bsz
+            assert key_padding_mask.size(1) == src_len
+        q = q.contiguous().view(tgt_len, bsz*self.num_heads, self.head_dim).transpose(0, 1)
+        k = k.contiguous().view(src_len, bsz*self.num_heads, self.head_dim).transpose(0, 1)
+        v = v.contiguous().view(src_len, bsz*self.num_heads, self.head_dim).transpose(0, 1)
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+        assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len]
+        # only apply masking at training time (when incremental state is None)
+        if mask_future_timesteps and incremental_state is None:
+            assert query.size() == key.size(), \
+                'mask_future_timesteps only applies to self-attention'
+            attn_weights += self.buffered_mask(attn_weights).unsqueeze(0)
+        if key_padding_mask is not None:
+            # don't attend to padding symbols
+            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
+            attn_weights = attn_weights.float().masked_fill(
+                key_padding_mask.unsqueeze(1).unsqueeze(2),
+                float('-inf'),
+            ).type_as(attn_weights)  # FP16 support: cast to float and back
+            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
+        attn_weights = F.softmax(attn_weights.float(), dim=-1).type_as(attn_weights)
+        attn_weights = F.dropout(attn_weights, p=self.dropout, training=self.training)
+        attn = torch.bmm(attn_weights, v)
+        assert list(attn.size()) == [bsz * self.num_heads, tgt_len, self.head_dim]
+        attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+        attn = self.out_proj(attn)
+        # average attention weights over heads
+        attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
+        attn_weights = attn_weights.sum(dim=1) / self.num_heads
+        return attn, attn_weights
+    def in_proj_qkv(self, query):
+        return self._in_proj(query).chunk(3, dim=-1)
+    def in_proj_kv(self, key):
+        return self._in_proj(key, start=self.embed_dim).chunk(2, dim=-1)
+    def in_proj_q(self, query):
+        return self._in_proj(query, end=self.embed_dim)
+    def in_proj_k(self, key):
+        return self._in_proj(key, start=self.embed_dim, end=2*self.embed_dim)
+    def in_proj_v(self, value):
+        return self._in_proj(value, start=2*self.embed_dim)
+    def _in_proj(self, input, start=None, end=None):
+        weight = self.in_proj_weight
+        bias = self.in_proj_bias
+        if end is not None:
+            weight = weight[:end, :]
+            if bias is not None:
+                bias = bias[:end]
+        if start is not None:
+            weight = weight[start:, :]
+            if bias is not None:
+                bias = bias[start:]
+        return F.linear(input, weight, bias)
+    def buffered_mask(self, tensor):
+        dim = tensor.size(-1)
+        if self._mask is None:
+            self._mask = torch.triu(fill_with_neg_inf(tensor.new(dim, dim)), 1)
+        if self._mask.size(0) < dim:
+            self._mask = torch.triu(fill_with_neg_inf(self._mask.resize_(dim, dim)), 1)
+        return self._mask[:dim, :dim]
+    def reorder_incremental_state(self, incremental_state, new_order):
+        """Reorder buffered internal state (for incremental generation)."""
+        input_buffer = self._get_input_buffer(incremental_state)
+        if input_buffer is not None:
+            for k in input_buffer.keys():
+                input_buffer[k] = input_buffer[k].index_select(1, new_order)
+            self._set_input_buffer(incremental_state, input_buffer)
+    def _get_input_buffer(self, incremental_state):
+        return get_incremental_state(
+            self,
+            incremental_state,
+            'attn_state',
+        ) or {}
+    def _set_input_buffer(self, incremental_state, buffer):
+        set_incremental_state(
+            self,
+            incremental_state,
+            'attn_state',
+            buffer,
+        )

src/modules/transformer_decoder.py ADDED Viewed

	@@ -0,0 +1,502 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+# Code adapted from https://github.com/pytorch/fairseq
+# Copyright (c) 2017-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the LICENSE file in
+# https://github.com/pytorch/fairseq. An additional grant of patent rights
+# can be found in the PATENTS file in the same directory.
+import math
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.nn.modules.utils import _single
+import src.modules.utils as utils
+from src.modules.multihead_attention import MultiheadAttention
+import numpy as np
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+import copy
+def make_positions(tensor, padding_idx, left_pad):
+    """Replace non-padding symbols with their position numbers.
+    Position numbers begin at padding_idx+1.
+    Padding symbols are ignored, but it is necessary to specify whether padding
+    is added on the left side (left_pad=True) or right side (left_pad=False).
+    """
+    # creates tensor from scratch - to avoid multigpu issues
+    max_pos = padding_idx + 1 + tensor.size(1)
+    #if not hasattr(make_positions, 'range_buf'):
+    range_buf = tensor.new()
+    #make_positions.range_buf = make_positions.range_buf.type_as(tensor)
+    if range_buf.numel() < max_pos:
+        torch.arange(padding_idx + 1, max_pos, out=range_buf)
+    mask = tensor.ne(padding_idx)
+    positions = range_buf[:tensor.size(1)].expand_as(tensor)
+    if left_pad:
+        positions = positions - mask.size(1) + mask.long().sum(dim=1).unsqueeze(1)
+    out = tensor.clone()
+    out = out.masked_scatter_(mask,positions[mask])
+    return out
+class LearnedPositionalEmbedding(nn.Embedding):
+    """This module learns positional embeddings up to a fixed maximum size.
+    Padding symbols are ignored, but it is necessary to specify whether padding
+    is added on the left side (left_pad=True) or right side (left_pad=False).
+    """
+    def __init__(self, num_embeddings, embedding_dim, padding_idx, left_pad):
+        super().__init__(num_embeddings, embedding_dim, padding_idx)
+        self.left_pad = left_pad
+        nn.init.normal_(self.weight, mean=0, std=embedding_dim ** -0.5)
+    def forward(self, input, incremental_state=None):
+        """Input is expected to be of size [bsz x seqlen]."""
+        if incremental_state is not None:
+            # positions is the same for every token when decoding a single step
+            positions = input.data.new(1, 1).fill_(self.padding_idx + input.size(1))
+        else:
+            positions = make_positions(input.data, self.padding_idx, self.left_pad)
+        return super().forward(positions)
+    def max_positions(self):
+        """Maximum number of supported positions."""
+        return self.num_embeddings - self.padding_idx - 1
+class SinusoidalPositionalEmbedding(nn.Module):
+    """This module produces sinusoidal positional embeddings of any length.
+    Padding symbols are ignored, but it is necessary to specify whether padding
+    is added on the left side (left_pad=True) or right side (left_pad=False).
+    """
+    def __init__(self, embedding_dim, padding_idx, left_pad, init_size=1024):
+        super().__init__()
+        self.embedding_dim = embedding_dim
+        self.padding_idx = padding_idx
+        self.left_pad = left_pad
+        self.weights = SinusoidalPositionalEmbedding.get_embedding(
+            init_size,
+            embedding_dim,
+            padding_idx,
+        )
+        self.register_buffer('_float_tensor', torch.FloatTensor())
+    @staticmethod
+    def get_embedding(num_embeddings, embedding_dim, padding_idx=None):
+        """Build sinusoidal embeddings.
+        This matches the implementation in tensor2tensor, but differs slightly
+        from the description in Section 3.5 of "Attention Is All You Need".
+        """
+        half_dim = embedding_dim // 2
+        emb = math.log(10000) / (half_dim - 1)
+        emb = torch.exp(torch.arange(half_dim, dtype=torch.float) * -emb)
+        emb = torch.arange(num_embeddings, dtype=torch.float).unsqueeze(1) * emb.unsqueeze(0)
+        emb = torch.cat([torch.sin(emb), torch.cos(emb)], dim=1).view(num_embeddings, -1)
+        if embedding_dim % 2 == 1:
+            # zero pad
+            emb = torch.cat([emb, torch.zeros(num_embeddings, 1)], dim=1)
+        if padding_idx is not None:
+            emb[padding_idx, :] = 0
+        return emb
+    def forward(self, input, incremental_state=None):
+        """Input is expected to be of size [bsz x seqlen]."""
+        # recompute/expand embeddings if needed
+        bsz, seq_len = input.size()
+        max_pos = self.padding_idx + 1 + seq_len
+        if self.weights is None or max_pos > self.weights.size(0):
+            self.weights = SinusoidalPositionalEmbedding.get_embedding(
+                max_pos,
+                self.embedding_dim,
+                self.padding_idx,
+            )
+        self.weights = self.weights.type_as(self._float_tensor)
+        if incremental_state is not None:
+            # positions is the same for every token when decoding a single step
+            return self.weights[self.padding_idx + seq_len, :].expand(bsz, 1, -1)
+        positions = make_positions(input.data, self.padding_idx, self.left_pad)
+        return self.weights.index_select(0, positions.view(-1)).view(bsz, seq_len, -1).detach()
+    def max_positions(self):
+        """Maximum number of supported positions."""
+        return int(1e5)  # an arbitrary large number
+class TransformerDecoderLayer(nn.Module):
+    """Decoder layer block."""
+    def __init__(self, embed_dim, n_att, dropout=0.5, normalize_before=True, last_ln=False):
+        super().__init__()
+        self.embed_dim = embed_dim
+        self.dropout = dropout
+        self.relu_dropout = dropout
+        self.normalize_before = normalize_before
+        num_layer_norm = 3
+        # self-attention on generated recipe
+        self.self_attn = MultiheadAttention(
+            self.embed_dim, n_att,
+            dropout=dropout,
+        )
+        self.cond_att = MultiheadAttention(
+            self.embed_dim, n_att,
+            dropout=dropout,
+        )
+        self.fc1 = Linear(self.embed_dim, self.embed_dim)
+        self.fc2 = Linear(self.embed_dim, self.embed_dim)
+        self.layer_norms = nn.ModuleList([LayerNorm(self.embed_dim) for i in range(num_layer_norm)])
+        self.use_last_ln = last_ln
+        if self.use_last_ln:
+            self.last_ln = LayerNorm(self.embed_dim)
+    def forward(self, x, ingr_features, ingr_mask, incremental_state, img_features):
+        # self attention
+        residual = x
+        x = self.maybe_layer_norm(0, x, before=True)
+        x, _ = self.self_attn(
+            query=x,
+            key=x,
+            value=x,
+            mask_future_timesteps=True,
+            incremental_state=incremental_state,
+            need_weights=False,
+        )
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        x = residual + x
+        x = self.maybe_layer_norm(0, x, after=True)
+        residual = x
+        x = self.maybe_layer_norm(1, x, before=True)
+        # attention
+        if ingr_features is None:
+            x, _ = self.cond_att(query=x,
+                                    key=img_features,
+                                    value=img_features,
+                                    key_padding_mask=None,
+                                    incremental_state=incremental_state,
+                                    static_kv=True,
+                                    )
+        elif img_features is None:
+            x, _ = self.cond_att(query=x,
+                                    key=ingr_features,
+                                    value=ingr_features,
+                                    key_padding_mask=ingr_mask,
+                                    incremental_state=incremental_state,
+                                    static_kv=True,
+                                    )
+        else:
+            # attention on concatenation of encoder_out and encoder_aux, query self attn (x)
+            kv = torch.cat((img_features, ingr_features), 0)
+            mask = torch.cat((torch.zeros(img_features.shape[1], img_features.shape[0], dtype=torch.uint8).to(device),
+                              ingr_mask), 1)
+            x, _ = self.cond_att(query=x,
+                                    key=kv,
+                                    value=kv,
+                                    key_padding_mask=mask,
+                                    incremental_state=incremental_state,
+                                    static_kv=True,
+            )
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        x = residual + x
+        x = self.maybe_layer_norm(1, x, after=True)
+        residual = x
+        x = self.maybe_layer_norm(-1, x, before=True)
+        x = F.relu(self.fc1(x))
+        x = F.dropout(x, p=self.relu_dropout, training=self.training)
+        x = self.fc2(x)
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        x = residual + x
+        x = self.maybe_layer_norm(-1, x, after=True)
+        if self.use_last_ln:
+            x = self.last_ln(x)
+        return x
+    def maybe_layer_norm(self, i, x, before=False, after=False):
+        assert before ^ after
+        if after ^ self.normalize_before:
+            return self.layer_norms[i](x)
+        else:
+            return x
+class DecoderTransformer(nn.Module):
+    """Transformer decoder."""
+    def __init__(self, embed_size, vocab_size, dropout=0.5, seq_length=20, num_instrs=15,
+                 attention_nheads=16, pos_embeddings=True, num_layers=8, learned=True, normalize_before=True,
+                 normalize_inputs=False, last_ln=False, scale_embed_grad=False):
+        super(DecoderTransformer, self).__init__()
+        self.dropout = dropout
+        self.seq_length = seq_length * num_instrs
+        self.embed_tokens = nn.Embedding(vocab_size, embed_size, padding_idx=vocab_size-1,
+                                         scale_grad_by_freq=scale_embed_grad)
+        nn.init.normal_(self.embed_tokens.weight, mean=0, std=embed_size ** -0.5)
+        if pos_embeddings:
+            self.embed_positions = PositionalEmbedding(1024, embed_size, 0, left_pad=False, learned=learned)
+        else:
+            self.embed_positions = None
+        self.normalize_inputs = normalize_inputs
+        if self.normalize_inputs:
+            self.layer_norms_in = nn.ModuleList([LayerNorm(embed_size) for i in range(3)])
+        self.embed_scale = math.sqrt(embed_size)
+        self.layers = nn.ModuleList([])
+        self.layers.extend([
+            TransformerDecoderLayer(embed_size, attention_nheads, dropout=dropout, normalize_before=normalize_before,
+                                    last_ln=last_ln)
+            for i in range(num_layers)
+        ])
+        self.linear = Linear(embed_size, vocab_size-1)
+    def forward(self, ingr_features, ingr_mask, captions, img_features, incremental_state=None):
+        if ingr_features is not None:
+            ingr_features = ingr_features.permute(0, 2, 1)
+            ingr_features = ingr_features.transpose(0, 1)
+            if self.normalize_inputs:
+                self.layer_norms_in[0](ingr_features)
+        if img_features is not None:
+            img_features = img_features.permute(0, 2, 1)
+            img_features = img_features.transpose(0, 1)
+            if self.normalize_inputs:
+                self.layer_norms_in[1](img_features)
+        if ingr_mask is not None:
+            ingr_mask = (1-ingr_mask.squeeze(1)).byte()
+        # embed positions
+        if self.embed_positions is not None:
+            positions = self.embed_positions(captions, incremental_state=incremental_state)
+        if incremental_state is not None:
+            if self.embed_positions is not None:
+                positions = positions[:, -1:]
+            captions = captions[:, -1:]
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(captions)
+        if self.embed_positions is not None:
+            x += positions
+        if self.normalize_inputs:
+            x = self.layer_norms_in[2](x)
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        for p, layer in enumerate(self.layers):
+            x  = layer(
+                x,
+                ingr_features,
+                ingr_mask,
+                incremental_state,
+                img_features
+            )
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+        x = self.linear(x)
+        _, predicted = x.max(dim=-1)
+        return x, predicted
+    def sample(self, ingr_features, ingr_mask, greedy=True, temperature=1.0, beam=-1,
+               img_features=None, first_token_value=0,
+               replacement=True, last_token_value=0):
+        incremental_state = {}
+        # create dummy previous word
+        if ingr_features is not None:
+            fs = ingr_features.size(0)
+        else:
+            fs = img_features.size(0)
+        if beam != -1:
+            if fs == 1:
+                return self.sample_beam(ingr_features, ingr_mask, beam, img_features, first_token_value,
+                                        replacement, last_token_value)
+            else:
+                print ("Beam Search can only be used with batch size of 1. Running greedy or temperature sampling...")
+        first_word = torch.ones(fs)*first_token_value
+        first_word = first_word.to(device).long()
+        sampled_ids = [first_word]
+        logits = []
+        for i in range(self.seq_length):
+            # forward
+            outputs, _ = self.forward(ingr_features, ingr_mask, torch.stack(sampled_ids, 1),
+                                      img_features, incremental_state)
+            outputs = outputs.squeeze(1)
+            if not replacement:
+                # predicted mask
+                if i == 0:
+                    predicted_mask = torch.zeros(outputs.shape).float().to(device)
+                else:
+                    # ensure no repetitions in sampling if replacement==False
+                    batch_ind = [j for j in range(fs) if sampled_ids[i][j] != 0]
+                    sampled_ids_new = sampled_ids[i][batch_ind]
+                    predicted_mask[batch_ind, sampled_ids_new] = float('-inf')
+                # mask previously selected ids
+                outputs += predicted_mask
+            logits.append(outputs)
+            if greedy:
+                outputs_prob = torch.nn.functional.softmax(outputs, dim=-1)
+                _, predicted = outputs_prob.max(1)
+                predicted = predicted.detach()
+            else:
+                k = 10
+                outputs_prob = torch.div(outputs.squeeze(1), temperature)
+                outputs_prob = torch.nn.functional.softmax(outputs_prob, dim=-1).data
+                # top k random sampling
+                prob_prev_topk, indices = torch.topk(outputs_prob, k=k, dim=1)
+                predicted = torch.multinomial(prob_prev_topk, 1).view(-1)
+                predicted = torch.index_select(indices, dim=1, index=predicted)[:, 0].detach()
+            sampled_ids.append(predicted)
+        sampled_ids = torch.stack(sampled_ids[1:], 1)
+        logits = torch.stack(logits, 1)
+        return sampled_ids, logits
+    def sample_beam(self, ingr_features, ingr_mask, beam=3, img_features=None, first_token_value=0,
+                   replacement=True, last_token_value=0):
+        k = beam
+        alpha = 0.0
+        # create dummy previous word
+        if ingr_features is not None:
+            fs = ingr_features.size(0)
+        else:
+            fs = img_features.size(0)
+        first_word = torch.ones(fs)*first_token_value
+        first_word = first_word.to(device).long()
+        sequences = [[[first_word], 0, {}, False, 1]]
+        finished = []
+        for i in range(self.seq_length):
+            # forward
+            all_candidates = []
+            for rem in range(len(sequences)):
+                incremental = sequences[rem][2]
+                outputs, _ = self.forward(ingr_features, ingr_mask, torch.stack(sequences[rem][0], 1),
+                                          img_features, incremental)
+                outputs = outputs.squeeze(1)
+                if not replacement:
+                    # predicted mask
+                    if i == 0:
+                        predicted_mask = torch.zeros(outputs.shape).float().to(device)
+                    else:
+                        # ensure no repetitions in sampling if replacement==False
+                        batch_ind = [j for j in range(fs) if sequences[rem][0][i][j] != 0]
+                        sampled_ids_new = sequences[rem][0][i][batch_ind]
+                        predicted_mask[batch_ind, sampled_ids_new] = float('-inf')
+                    # mask previously selected ids
+                    outputs += predicted_mask
+                outputs_prob = torch.nn.functional.log_softmax(outputs, dim=-1)
+                probs, indices = torch.topk(outputs_prob, beam)
+                # tokens is [batch x beam ] and every element is a list
+                # score is [ batch x beam ] and every element is a scalar
+                # incremental is [batch x beam ] and every element is a dict
+                for bid in range(beam):
+                    tokens = sequences[rem][0] + [indices[:, bid]]
+                    score = sequences[rem][1] + probs[:, bid].squeeze().item()
+                    if indices[:,bid].item() == last_token_value:
+                        finished.append([tokens, score, None, True, sequences[rem][-1] + 1])
+                    else:
+                        all_candidates.append([tokens, score, incremental, False, sequences[rem][-1] + 1])
+            # if all the top-k scoring beams have finished, we can return them
+            ordered_all = sorted(all_candidates + finished, key=lambda tup: tup[1]/(np.power(tup[-1],alpha)),
+                                 reverse=True)[:k]
+            if all(el[-1] == True for el in ordered_all):
+                all_candidates = []
+            # order all candidates by score
+            ordered = sorted(all_candidates, key=lambda tup: tup[1]/(np.power(tup[-1],alpha)), reverse=True)
+            # select k best
+            sequences = ordered[:k]
+            finished = sorted(finished,  key=lambda tup: tup[1]/(np.power(tup[-1],alpha)), reverse=True)[:k]
+        if len(finished) != 0:
+            sampled_ids = torch.stack(finished[0][0][1:], 1)
+            logits = finished[0][1]
+        else:
+            sampled_ids = torch.stack(sequences[0][0][1:], 1)
+            logits = sequences[0][1]
+        return sampled_ids, logits
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        return self.embed_positions.max_positions()
+    def upgrade_state_dict(self, state_dict):
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            if 'decoder.embed_positions.weights' in state_dict:
+                del state_dict['decoder.embed_positions.weights']
+            if 'decoder.embed_positions._float_tensor' not in state_dict:
+                state_dict['decoder.embed_positions._float_tensor'] = torch.FloatTensor()
+        return state_dict
+def Embedding(num_embeddings, embedding_dim, padding_idx, ):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    return m
+def LayerNorm(embedding_dim):
+    m = nn.LayerNorm(embedding_dim)
+    return m
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    nn.init.constant_(m.bias, 0.)
+    return m
+def PositionalEmbedding(num_embeddings, embedding_dim, padding_idx, left_pad, learned=False):
+    if learned:
+        m = LearnedPositionalEmbedding(num_embeddings, embedding_dim, padding_idx, left_pad)
+        nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+        nn.init.constant_(m.weight[padding_idx], 0)
+    else:
+        m = SinusoidalPositionalEmbedding(embedding_dim, padding_idx, left_pad, num_embeddings)
+    return m

src/modules/utils.py ADDED Viewed

	@@ -0,0 +1,387 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+# Code adapted from https://github.com/pytorch/fairseq
+# Copyright (c) 2017-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the LICENSE file in
+# https://github.com/pytorch/fairseq. An additional grant of patent rights
+# can be found in the PATENTS file in the same directory.
+from collections import defaultdict, OrderedDict
+import logging
+import os
+import re
+import torch
+import traceback
+from torch.serialization import default_restore_location
+def torch_persistent_save(*args, **kwargs):
+    for i in range(3):
+        try:
+            return torch.save(*args, **kwargs)
+        except Exception:
+            if i == 2:
+                logging.error(traceback.format_exc())
+def convert_state_dict_type(state_dict, ttype=torch.FloatTensor):
+    if isinstance(state_dict, dict):
+        cpu_dict = OrderedDict()
+        for k, v in state_dict.items():
+            cpu_dict[k] = convert_state_dict_type(v)
+        return cpu_dict
+    elif isinstance(state_dict, list):
+        return [convert_state_dict_type(v) for v in state_dict]
+    elif torch.is_tensor(state_dict):
+        return state_dict.type(ttype)
+    else:
+        return state_dict
+def save_state(filename, args, model, criterion, optimizer, lr_scheduler,
+               num_updates, optim_history=None, extra_state=None):
+    if optim_history is None:
+        optim_history = []
+    if extra_state is None:
+        extra_state = {}
+    state_dict = {
+        'args': args,
+        'model': convert_state_dict_type(model.state_dict()),
+        'optimizer_history': optim_history + [
+            {
+                'criterion_name': criterion.__class__.__name__,
+                'optimizer_name': optimizer.__class__.__name__,
+                'lr_scheduler_state': lr_scheduler.state_dict(),
+                'num_updates': num_updates,
+            }
+        ],
+        'last_optimizer_state': convert_state_dict_type(optimizer.state_dict()),
+        'extra_state': extra_state,
+    }
+    torch_persistent_save(state_dict, filename)
+def load_model_state(filename, model):
+    if not os.path.exists(filename):
+        return None, [], None
+    state = torch.load(filename, map_location=lambda s, l: default_restore_location(s, 'cpu'))
+    state = _upgrade_state_dict(state)
+    model.upgrade_state_dict(state['model'])
+    # load model parameters
+    try:
+        model.load_state_dict(state['model'], strict=True)
+    except Exception:
+        raise Exception('Cannot load model parameters from checkpoint, '
+                        'please ensure that the architectures match')
+    return state['extra_state'], state['optimizer_history'], state['last_optimizer_state']
+def _upgrade_state_dict(state):
+    """Helper for upgrading old model checkpoints."""
+    # add optimizer_history
+    if 'optimizer_history' not in state:
+        state['optimizer_history'] = [
+            {
+                'criterion_name': 'CrossEntropyCriterion',
+                'best_loss': state['best_loss'],
+            },
+        ]
+        state['last_optimizer_state'] = state['optimizer']
+        del state['optimizer']
+        del state['best_loss']
+    # move extra_state into sub-dictionary
+    if 'epoch' in state and 'extra_state' not in state:
+        state['extra_state'] = {
+            'epoch': state['epoch'],
+            'batch_offset': state['batch_offset'],
+            'val_loss': state['val_loss'],
+        }
+        del state['epoch']
+        del state['batch_offset']
+        del state['val_loss']
+    # reduce optimizer history's memory usage (only keep the last state)
+    if 'optimizer' in state['optimizer_history'][-1]:
+        state['last_optimizer_state'] = state['optimizer_history'][-1]['optimizer']
+        for optim_hist in state['optimizer_history']:
+            del optim_hist['optimizer']
+    # record the optimizer class name
+    if 'optimizer_name' not in state['optimizer_history'][-1]:
+        state['optimizer_history'][-1]['optimizer_name'] = 'FairseqNAG'
+    # move best_loss into lr_scheduler_state
+    if 'lr_scheduler_state' not in state['optimizer_history'][-1]:
+        state['optimizer_history'][-1]['lr_scheduler_state'] = {
+            'best': state['optimizer_history'][-1]['best_loss'],
+        }
+        del state['optimizer_history'][-1]['best_loss']
+    # keep track of number of updates
+    if 'num_updates' not in state['optimizer_history'][-1]:
+        state['optimizer_history'][-1]['num_updates'] = 0
+    # old model checkpoints may not have separate source/target positions
+    if hasattr(state['args'], 'max_positions') and not hasattr(state['args'], 'max_source_positions'):
+        state['args'].max_source_positions = state['args'].max_positions
+        state['args'].max_target_positions = state['args'].max_positions
+    # use stateful training data iterator
+    if 'train_iterator' not in state['extra_state']:
+        state['extra_state']['train_iterator'] = {
+            'epoch': state['extra_state']['epoch'],
+            'iterations_in_epoch': 0,
+        }
+    return state
+def load_ensemble_for_inference(filenames, task, model_arg_overrides=None):
+    """Load an ensemble of models for inference.
+    model_arg_overrides allows you to pass a dictionary model_arg_overrides --
+    {'arg_name': arg} -- to override model args that were used during model
+    training
+    """
+    # load model architectures and weights
+    states = []
+    for filename in filenames:
+        if not os.path.exists(filename):
+            raise IOError('Model file not found: {}'.format(filename))
+        state = torch.load(filename, map_location=lambda s, l: default_restore_location(s, 'cpu'))
+        state = _upgrade_state_dict(state)
+        states.append(state)
+    args = states[0]['args']
+    if model_arg_overrides is not None:
+        args = _override_model_args(args, model_arg_overrides)
+    # build ensemble
+    ensemble = []
+    for state in states:
+        model = task.build_model(args)
+        model.upgrade_state_dict(state['model'])
+        model.load_state_dict(state['model'], strict=True)
+        ensemble.append(model)
+    return ensemble, args
+def _override_model_args(args, model_arg_overrides):
+    # Uses model_arg_overrides {'arg_name': arg} to override model args
+    for arg_name, arg_val in model_arg_overrides.items():
+        setattr(args, arg_name, arg_val)
+    return args
+def move_to_cuda(sample):
+    if len(sample) == 0:
+        return {}
+    def _move_to_cuda(maybe_tensor):
+        if torch.is_tensor(maybe_tensor):
+            return maybe_tensor.cuda()
+        elif isinstance(maybe_tensor, dict):
+            return {
+                key: _move_to_cuda(value)
+                for key, value in maybe_tensor.items()
+            }
+        elif isinstance(maybe_tensor, list):
+            return [_move_to_cuda(x) for x in maybe_tensor]
+        else:
+            return maybe_tensor
+    return _move_to_cuda(sample)
+INCREMENTAL_STATE_INSTANCE_ID = defaultdict(lambda: 0)
+def _get_full_incremental_state_key(module_instance, key):
+    module_name = module_instance.__class__.__name__
+    # assign a unique ID to each module instance, so that incremental state is
+    # not shared across module instances
+    if not hasattr(module_instance, '_fairseq_instance_id'):
+        INCREMENTAL_STATE_INSTANCE_ID[module_name] += 1
+        module_instance._fairseq_instance_id = INCREMENTAL_STATE_INSTANCE_ID[module_name]
+    return '{}.{}.{}'.format(module_name, module_instance._fairseq_instance_id, key)
+def get_incremental_state(module, incremental_state, key):
+    """Helper for getting incremental state for an nn.Module."""
+    full_key = _get_full_incremental_state_key(module, key)
+    if incremental_state is None or full_key not in incremental_state:
+        return None
+    return incremental_state[full_key]
+def set_incremental_state(module, incremental_state, key, value):
+    """Helper for setting incremental state for an nn.Module."""
+    if incremental_state is not None:
+        full_key = _get_full_incremental_state_key(module, key)
+        incremental_state[full_key] = value
+def load_align_dict(replace_unk):
+    if replace_unk is None:
+        align_dict = None
+    elif isinstance(replace_unk, str):
+        # Load alignment dictionary for unknown word replacement if it was passed as an argument.
+        align_dict = {}
+        with open(replace_unk, 'r') as f:
+            for line in f:
+                cols = line.split()
+                align_dict[cols[0]] = cols[1]
+    else:
+        # No alignment dictionary provided but we still want to perform unknown word replacement by copying the
+        # original source word.
+        align_dict = {}
+    return align_dict
+def print_embed_overlap(embed_dict, vocab_dict):
+    embed_keys = set(embed_dict.keys())
+    vocab_keys = set(vocab_dict.symbols)
+    overlap = len(embed_keys & vocab_keys)
+    print("| Found {}/{} types in embedding file.".format(overlap, len(vocab_dict)))
+def parse_embedding(embed_path):
+    """Parse embedding text file into a dictionary of word and embedding tensors.
+    The first line can have vocabulary size and dimension. The following lines
+    should contain word and embedding separated by spaces.
+    Example:
+        2 5
+        the -0.0230 -0.0264  0.0287  0.0171  0.1403
+        at -0.0395 -0.1286  0.0275  0.0254 -0.0932
+    """
+    embed_dict = {}
+    with open(embed_path) as f_embed:
+        next(f_embed)  # skip header
+        for line in f_embed:
+            pieces = line.rstrip().split(" ")
+            embed_dict[pieces[0]] = torch.Tensor([float(weight) for weight in pieces[1:]])
+    return embed_dict
+def load_embedding(embed_dict, vocab, embedding):
+    for idx in range(len(vocab)):
+        token = vocab[idx]
+        if token in embed_dict:
+            embedding.weight.data[idx] = embed_dict[token]
+    return embedding
+def replace_unk(hypo_str, src_str, alignment, align_dict, unk):
+    from fairseq import tokenizer
+    # Tokens are strings here
+    hypo_tokens = tokenizer.tokenize_line(hypo_str)
+    # TODO: Very rare cases where the replacement is '<eos>' should be handled gracefully
+    src_tokens = tokenizer.tokenize_line(src_str) + ['<eos>']
+    for i, ht in enumerate(hypo_tokens):
+        if ht == unk:
+            src_token = src_tokens[alignment[i]]
+            # Either take the corresponding value in the aligned dictionary or just copy the original value.
+            hypo_tokens[i] = align_dict.get(src_token, src_token)
+    return ' '.join(hypo_tokens)
+def post_process_prediction(hypo_tokens, src_str, alignment, align_dict, tgt_dict, remove_bpe):
+    from fairseq import tokenizer
+    hypo_str = tgt_dict.string(hypo_tokens, remove_bpe)
+    if align_dict is not None:
+        hypo_str = replace_unk(hypo_str, src_str, alignment, align_dict, tgt_dict.unk_string())
+    if align_dict is not None or remove_bpe is not None:
+        # Convert back to tokens for evaluating with unk replacement or without BPE
+        # Note that the dictionary can be modified inside the method.
+        hypo_tokens = tokenizer.Tokenizer.tokenize(hypo_str, tgt_dict, add_if_not_exist=True)
+    return hypo_tokens, hypo_str, alignment
+def make_positions(tensor, padding_idx, left_pad):
+    """Replace non-padding symbols with their position numbers.
+    Position numbers begin at padding_idx+1.
+    Padding symbols are ignored, but it is necessary to specify whether padding
+    is added on the left side (left_pad=True) or right side (left_pad=False).
+    """
+    max_pos = padding_idx + 1 + tensor.size(1)
+    if not hasattr(make_positions, 'range_buf'):
+        make_positions.range_buf = tensor.new()
+    make_positions.range_buf = make_positions.range_buf.type_as(tensor)
+    if make_positions.range_buf.numel() < max_pos:
+        torch.arange(padding_idx + 1, max_pos, out=make_positions.range_buf)
+    mask = tensor.ne(padding_idx)
+    positions = make_positions.range_buf[:tensor.size(1)].expand_as(tensor)
+    if left_pad:
+        positions = positions - mask.size(1) + mask.long().sum(dim=1).unsqueeze(1)
+    return tensor.clone().masked_scatter_(mask, positions[mask])
+def strip_pad(tensor, pad):
+    return tensor[tensor.ne(pad)]
+def buffered_arange(max):
+    if not hasattr(buffered_arange, 'buf'):
+        buffered_arange.buf = torch.LongTensor()
+    if max > buffered_arange.buf.numel():
+        torch.arange(max, out=buffered_arange.buf)
+    return buffered_arange.buf[:max]
+def convert_padding_direction(src_tokens, padding_idx, right_to_left=False, left_to_right=False):
+    assert right_to_left ^ left_to_right
+    pad_mask = src_tokens.eq(padding_idx)
+    if not pad_mask.any():
+        # no padding, return early
+        return src_tokens
+    if left_to_right and not pad_mask[:, 0].any():
+        # already right padded
+        return src_tokens
+    if right_to_left and not pad_mask[:, -1].any():
+        # already left padded
+        return src_tokens
+    max_len = src_tokens.size(1)
+    range = buffered_arange(max_len).type_as(src_tokens).expand_as(src_tokens)
+    num_pads = pad_mask.long().sum(dim=1, keepdim=True)
+    if right_to_left:
+        index = torch.remainder(range - num_pads, max_len)
+    else:
+        index = torch.remainder(range + num_pads, max_len)
+    return src_tokens.gather(1, index)
+def item(tensor):
+    if hasattr(tensor, 'item'):
+        return tensor.item()
+    if hasattr(tensor, '__getitem__'):
+        return tensor[0]
+    return tensor
+def clip_grad_norm_(tensor, max_norm):
+    grad_norm = item(torch.norm(tensor))
+    if grad_norm > max_norm > 0:
+        clip_coef = max_norm / (grad_norm + 1e-6)
+        tensor.mul_(clip_coef)
+    return grad_norm
+def fill_with_neg_inf(t):
+    """FP16-compatible function that fills a tensor with -inf."""
+    return t.float().fill_(float('-inf')).type_as(t)
+def checkpoint_paths(path, pattern=r'checkpoint(\d+)\.pt'):
+    """Retrieves all checkpoints found in `path` directory.
+    Checkpoints are identified by matching filename to the specified pattern. If
+    the pattern contains groups, the result will be sorted by the first group in
+    descending order.
+    """
+    pt_regexp = re.compile(pattern)
+    files = os.listdir(path)
+    entries = []
+    for i, f in enumerate(files):
+        m = pt_regexp.fullmatch(f)
+        if m is not None:
+            idx = int(m.group(1)) if len(m.groups()) > 0 else i
+            entries.append((idx, m.group(0)))
+    return [os.path.join(path, x[1]) for x in sorted(entries, reverse=True)]

src/read_pkl.py ADDED Viewed

	@@ -0,0 +1,7 @@

+import pickle
+with open("../data/predicted_ingr.pkl", "rb") as fp:   # Unpickling
+    b = pickle.load(fp)
+print(b)

src/sample.py ADDED Viewed

	@@ -0,0 +1,207 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+import torch
+import numpy as np
+from args import get_parser
+import pickle
+import os
+from torchvision import transforms
+from build_vocab import Vocabulary
+from model import get_model
+from tqdm import tqdm
+from data_loader import get_loader
+import json
+import sys
+from model import mask_from_eos
+import random
+from utils.metrics import softIoU, update_error_types, compute_metrics
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+map_loc = None if torch.cuda.is_available() else 'cpu'
+def compute_score(sampled_ids):
+    if 1 in sampled_ids:
+        cut = np.where(sampled_ids == 1)[0][0]
+    else:
+        cut = -1
+    sampled_ids = sampled_ids[0:cut]
+    score = float(len(set(sampled_ids))) / float(len(sampled_ids))
+    return score
+def label2onehot(labels, pad_value):
+    # input labels to one hot vector
+    inp_ = torch.unsqueeze(labels, 2)
+    one_hot = torch.FloatTensor(labels.size(0), labels.size(1), pad_value + 1).zero_().to(device)
+    one_hot.scatter_(2, inp_, 1)
+    one_hot, _ = one_hot.max(dim=1)
+    # remove pad and eos position
+    one_hot = one_hot[:, 1:-1]
+    one_hot[:, 0] = 0
+    return one_hot
+def main(args):
+    where_to_save = os.path.join(args.save_dir, args.project_name, args.model_name)
+    checkpoints_dir = os.path.join(where_to_save, 'checkpoints')
+    logs_dir = os.path.join(where_to_save, 'logs')
+    if not args.log_term:
+        print ("Eval logs will be saved to:", os.path.join(logs_dir, 'eval.log'))
+        sys.stdout = open(os.path.join(logs_dir, 'eval.log'), 'w')
+        sys.stderr = open(os.path.join(logs_dir, 'eval.err'), 'w')
+    vars_to_replace = ['greedy', 'recipe_only', 'ingrs_only', 'temperature', 'batch_size', 'maxseqlen',
+                       'get_perplexity', 'use_true_ingrs', 'eval_split', 'save_dir', 'aux_data_dir',
+                       'recipe1m_dir', 'project_name', 'use_lmdb', 'beam']
+    store_dict = {}
+    for var in vars_to_replace:
+        store_dict[var] = getattr(args, var)
+    args = pickle.load(open(os.path.join(checkpoints_dir, 'args.pkl'), 'rb'))
+    for var in vars_to_replace:
+        setattr(args, var, store_dict[var])
+    print (args)
+    transforms_list = []
+    transforms_list.append(transforms.Resize((args.crop_size)))
+    transforms_list.append(transforms.CenterCrop(args.crop_size))
+    transforms_list.append(transforms.ToTensor())
+    transforms_list.append(transforms.Normalize((0.485, 0.456, 0.406),
+                                                (0.229, 0.224, 0.225)))
+    # Image preprocessing
+    transform = transforms.Compose(transforms_list)
+    # data loader
+    data_dir = args.recipe1m_dir
+    data_loader, dataset = get_loader(data_dir, args.aux_data_dir, args.eval_split,
+                                      args.maxseqlen, args.maxnuminstrs, args.maxnumlabels,
+                                      args.maxnumims, transform, args.batch_size,
+                                      shuffle=False, num_workers=args.num_workers,
+                                      drop_last=False, max_num_samples=-1,
+                                      use_lmdb=args.use_lmdb, suff=args.suff)
+    ingr_vocab_size = dataset.get_ingrs_vocab_size()
+    instrs_vocab_size = dataset.get_instrs_vocab_size()
+    args.numgens = 1
+    # Build the model
+    model = get_model(args, ingr_vocab_size, instrs_vocab_size)
+    model_path = os.path.join(args.save_dir, args.project_name, args.model_name, 'checkpoints', 'modelbest.ckpt')
+    # overwrite flags for inference
+    model.recipe_only = args.recipe_only
+    model.ingrs_only = args.ingrs_only
+    # Load the trained model parameters
+    model.load_state_dict(torch.load(model_path, map_location=map_loc))
+    model.eval()
+    model = model.to(device)
+    results_dict = {'recipes': {}, 'ingrs': {}, 'ingr_iou': {}}
+    captions = {}
+    iou = []
+    error_types = {'tp_i': 0, 'fp_i': 0, 'fn_i': 0, 'tn_i': 0, 'tp_all': 0, 'fp_all': 0, 'fn_all': 0}
+    perplexity_list = []
+    n_rep, th = 0, 0.3
+    for i, (img_inputs, true_caps_batch, ingr_gt, imgid, impath) in tqdm(enumerate(data_loader)):
+        ingr_gt = ingr_gt.to(device)
+        true_caps_batch = true_caps_batch.to(device)
+        true_caps_shift = true_caps_batch.clone()[:, 1:].contiguous()
+        img_inputs = img_inputs.to(device)
+        true_ingrs = ingr_gt if args.use_true_ingrs else None
+        for gens in range(args.numgens):
+            with torch.no_grad():
+                if args.get_perplexity:
+                    losses = model(img_inputs, true_caps_batch, ingr_gt, keep_cnn_gradients=False)
+                    recipe_loss = losses['recipe_loss']
+                    recipe_loss = recipe_loss.view(true_caps_shift.size())
+                    non_pad_mask = true_caps_shift.ne(instrs_vocab_size - 1).float()
+                    recipe_loss = torch.sum(recipe_loss*non_pad_mask, dim=-1) / torch.sum(non_pad_mask, dim=-1)
+                    perplexity = torch.exp(recipe_loss)
+                    perplexity = perplexity.detach().cpu().numpy().tolist()
+                    perplexity_list.extend(perplexity)
+                else:
+                    outputs = model.sample(img_inputs, args.greedy, args.temperature, args.beam, true_ingrs)
+                    if not args.recipe_only:
+                        fake_ingrs = outputs['ingr_ids']
+                        pred_one_hot = label2onehot(fake_ingrs, ingr_vocab_size - 1)
+                        target_one_hot = label2onehot(ingr_gt, ingr_vocab_size - 1)
+                        iou_item = torch.mean(softIoU(pred_one_hot, target_one_hot)).item()
+                        iou.append(iou_item)
+                        update_error_types(error_types, pred_one_hot, target_one_hot)
+                        fake_ingrs = fake_ingrs.detach().cpu().numpy()
+                        for ingr_idx, fake_ingr in enumerate(fake_ingrs):
+                            iou_item = softIoU(pred_one_hot[ingr_idx].unsqueeze(0),
+                                               target_one_hot[ingr_idx].unsqueeze(0)).item()
+                            results_dict['ingrs'][imgid[ingr_idx]] = []
+                            results_dict['ingrs'][imgid[ingr_idx]].append(fake_ingr)
+                            results_dict['ingr_iou'][imgid[ingr_idx]] = iou_item
+                    if not args.ingrs_only:
+                        sampled_ids_batch = outputs['recipe_ids']
+                        sampled_ids_batch = sampled_ids_batch.cpu().detach().numpy()
+                        for j, sampled_ids in enumerate(sampled_ids_batch):
+                            score = compute_score(sampled_ids)
+                            if score < th:
+                                n_rep += 1
+                            if imgid[j] not in captions.keys():
+                                results_dict['recipes'][imgid[j]] = []
+                                results_dict['recipes'][imgid[j]].append(sampled_ids)
+    if args.get_perplexity:
+        print (len(perplexity_list))
+        print (np.mean(perplexity_list))
+    else:
+        if not args.recipe_only:
+            ret_metrics = {'accuracy': [], 'f1': [], 'jaccard': [], 'f1_ingredients': []}
+            compute_metrics(ret_metrics, error_types, ['accuracy', 'f1', 'jaccard', 'f1_ingredients'],
+                            eps=1e-10,
+                            weights=None)
+            for k, v in ret_metrics.items():
+                print (k, np.mean(v))
+        if args.greedy:
+            suff = 'greedy'
+        else:
+            if args.beam != -1:
+                suff = 'beam_'+str(args.beam)
+            else:
+                suff = 'temp_' + str(args.temperature)
+        results_file = os.path.join(args.save_dir, args.project_name, args.model_name, 'checkpoints',
+                                    args.eval_split + '_' + suff + '_gencaps.pkl')
+        print (results_file)
+        pickle.dump(results_dict, open(results_file, 'wb'))
+        print ("Number of samples with excessive repetitions:", n_rep)
+if __name__ == '__main__':
+    args = get_parser()
+    torch.manual_seed(1234)
+    torch.cuda.manual_seed(1234)
+    random.seed(1234)
+    np.random.seed(1234)
+    main(args)

src/sim_ingr.py ADDED Viewed

	@@ -0,0 +1,197 @@

+import nltk
+import pickle
+import argparse
+from collections import Counter
+import json
+import os
+from tqdm import *
+import numpy as np
+import re
+def get_ingredient(det_ingr, replace_dict):
+    det_ingr_undrs = det_ingr['text'].lower()
+    det_ingr_undrs = ''.join(i for i in det_ingr_undrs if not i.isdigit())
+    for rep, char_list in replace_dict.items():
+        for c_ in char_list:
+            if c_ in det_ingr_undrs:
+                det_ingr_undrs = det_ingr_undrs.replace(c_, rep)
+    det_ingr_undrs = det_ingr_undrs.strip()
+    det_ingr_undrs = det_ingr_undrs.replace(' ', '_')
+    return det_ingr_undrs
+def remove_plurals(counter_ingrs, ingr_clusters):
+    del_ingrs = []
+    for k, v in counter_ingrs.items():
+        if len(k) == 0:
+            del_ingrs.append(k)
+            continue
+        gotit = 0
+        if k[-2:] == 'es':
+            if k[:-2] in counter_ingrs.keys():
+                counter_ingrs[k[:-2]] += v
+                ingr_clusters[k[:-2]].extend(ingr_clusters[k])
+                del_ingrs.append(k)
+                gotit = 1
+        if k[-1] == 's' and gotit == 0:
+            if k[:-1] in counter_ingrs.keys():
+                counter_ingrs[k[:-1]] += v
+                ingr_clusters[k[:-1]].extend(ingr_clusters[k])
+                del_ingrs.append(k)
+    for item in del_ingrs:
+        del counter_ingrs[item]
+        del ingr_clusters[item]
+    return counter_ingrs, ingr_clusters
+def cluster_ingredients(counter_ingrs):
+    mydict = dict()
+    mydict_ingrs = dict()
+    for k, v in counter_ingrs.items():
+        w1 = k.split('_')[-1]
+        w2 = k.split('_')[0]
+        lw = [w1, w2]
+        if len(k.split('_')) > 1:
+            w3 = k.split('_')[0] + '_' + k.split('_')[1]
+            w4 = k.split('_')[-2] + '_' + k.split('_')[-1]
+            lw = [w1, w2, w4, w3]
+        gotit = 0
+        for w in lw:
+            if w in counter_ingrs.keys():
+                # check if its parts are
+                parts = w.split('_')
+                if len(parts) > 0:
+                    if parts[0] in counter_ingrs.keys():
+                        w = parts[0]
+                    elif parts[1] in counter_ingrs.keys():
+                        w = parts[1]
+                if w in mydict.keys():
+                    mydict[w] += v
+                    mydict_ingrs[w].append(k)
+                else:
+                    mydict[w] = v
+                    mydict_ingrs[w] = [k]
+                gotit = 1
+                break
+        if gotit == 0:
+            mydict[k] = v
+            mydict_ingrs[k] = [k]
+    return mydict, mydict_ingrs
+def update_counter(list_, counter_toks, istrain=False):
+    for sentence in list_:
+        tokens = nltk.tokenize.word_tokenize(sentence)
+        if istrain:
+            counter_toks.update(tokens)
+def build_vocab_recipe1m(args):
+    print ("Loading data...")
+    dets = json.load(open(os.path.join(args.recipe1m_path, 'det_ingrs.json'), 'r'))
+    replace_dict_ingrs = {'and': ['&', "'n"], '': ['%', ',', '.', '#', '[', ']', '!', '?']}
+    replace_dict_instrs = {'and': ['&', "'n"], '': ['#', '[', ']']}
+    idx2ind = {}
+    for i, entry in enumerate(dets):
+        idx2ind[entry['id']] = i
+    ingrs_file = args.save_path + 'allingrs_count.pkl'
+    instrs_file = args.save_path + 'allwords_count.pkl'
+    # manually add missing entries for better clustering
+    base_words = ['peppers', 'tomato', 'spinach_leaves', 'turkey_breast', 'lettuce_leaf',
+                  'chicken_thighs', 'milk_powder', 'bread_crumbs', 'onion_flakes',
+                  'red_pepper', 'pepper_flakes', 'juice_concentrate', 'cracker_crumbs', 'hot_chili',
+                  'seasoning_mix', 'dill_weed', 'pepper_sauce', 'sprouts', 'cooking_spray', 'cheese_blend',
+                  'basil_leaves', 'pineapple_chunks', 'marshmallow', 'chile_powder',
+                  'cheese_blend', 'corn_kernels', 'tomato_sauce', 'chickens', 'cracker_crust',
+                  'lemonade_concentrate', 'red_chili', 'mushroom_caps', 'mushroom_cap', 'breaded_chicken',
+                  'frozen_pineapple', 'pineapple_chunks', 'seasoning_mix', 'seaweed', 'onion_flakes',
+                  'bouillon_granules', 'lettuce_leaf', 'stuffing_mix', 'parsley_flakes', 'chicken_breast',
+                  'basil_leaves', 'baguettes', 'green_tea', 'peanut_butter', 'green_onion', 'fresh_cilantro',
+                  'breaded_chicken', 'hot_pepper', 'dried_lavender', 'white_chocolate',
+                  'dill_weed', 'cake_mix', 'cheese_spread', 'turkey_breast', 'chucken_thighs', 'basil_leaves',
+                  'mandarin_orange', 'laurel', 'cabbage_head', 'pistachio', 'cheese_dip',
+                  'thyme_leave', 'boneless_pork', 'red_pepper', 'onion_dip', 'skinless_chicken', 'dark_chocolate',
+                  'canned_corn', 'muffin', 'cracker_crust', 'bread_crumbs', 'frozen_broccoli',
+                  'philadelphia', 'cracker_crust', 'chicken_breast']
+    for base_word in base_words:
+        if base_word not in counter_ingrs.keys():
+            counter_ingrs[base_word] = 1
+    counter_ingrs, cluster_ingrs = cluster_ingredients(counter_ingrs)
+    counter_ingrs, cluster_ingrs = remove_plurals(counter_ingrs, cluster_ingrs)
+    # If the word frequency is less than 'threshold', then the word is discarded.
+    words = [word for word, cnt in counter_toks.items() if cnt >= args.threshold_words]
+    ingrs = {word: cnt for word, cnt in counter_ingrs.items() if cnt >= args.threshold_ingrs}
+def main(args):
+    vocab_ingrs, vocab_toks, dataset = build_vocab_recipe1m(args)
+    with open(os.path.join(args.save_path, args.suff+'recipe1m_vocab_ingrs.pkl'), 'wb') as f:
+        pickle.dump(vocab_ingrs, f)
+    with open(os.path.join(args.save_path, args.suff+'recipe1m_vocab_toks.pkl'), 'wb') as f:
+        pickle.dump(vocab_toks, f)
+    for split in dataset.keys():
+        with open(os.path.join(args.save_path, args.suff+'recipe1m_' + split + '.pkl'), 'wb') as f:
+            pickle.dump(dataset[split], f)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--recipe1m_path', type=str,
+                        default='path/to/recipe1m',
+                        help='recipe1m path')
+    parser.add_argument('--save_path', type=str, default='../data/',
+                        help='path for saving vocabulary wrapper')
+    parser.add_argument('--suff', type=str, default='')
+    parser.add_argument('--threshold_ingrs', type=int, default=10,
+                        help='minimum ingr count threshold')
+    parser.add_argument('--threshold_words', type=int, default=10,
+                        help='minimum word count threshold')
+    parser.add_argument('--maxnuminstrs', type=int, default=20,
+                        help='max number of instructions (sentences)')
+    parser.add_argument('--maxnumingrs', type=int, default=20,
+                        help='max number of ingredients')
+    parser.add_argument('--minnuminstrs', type=int, default=2,
+                        help='max number of instructions (sentences)')
+    parser.add_argument('--minnumingrs', type=int, default=2,
+                        help='max number of ingredients')
+    parser.add_argument('--minnumwords', type=int, default=20,
+                        help='minimum number of characters in recipe')
+    parser.add_argument('--forcegen', dest='forcegen', action='store_true')
+    parser.set_defaults(forcegen=False)
+    args = parser.parse_args()
+    main(args)

src/train.py ADDED Viewed

	@@ -0,0 +1,398 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+from args import get_parser
+import torch
+import torch.nn as nn
+import torch.autograd as autograd
+import numpy as np
+import os
+import random
+import pickle
+from data_loader import get_loader
+from build_vocab import Vocabulary
+from model import get_model
+from torchvision import transforms
+import sys
+import json
+import time
+import torch.backends.cudnn as cudnn
+from utils.tb_visualizer import Visualizer
+from model import mask_from_eos, label2onehot
+from utils.metrics import softIoU, compute_metrics, update_error_types
+import random
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+map_loc = None if torch.cuda.is_available() else 'cpu'
+def merge_models(args, model, ingr_vocab_size, instrs_vocab_size):
+    load_args = pickle.load(open(os.path.join(args.save_dir, args.project_name,
+                                              args.transfer_from, 'checkpoints/args.pkl'), 'rb'))
+    model_ingrs = get_model(load_args, ingr_vocab_size, instrs_vocab_size)
+    model_path = os.path.join(args.save_dir, args.project_name, args.transfer_from, 'checkpoints', 'modelbest.ckpt')
+    # Load the trained model parameters
+    model_ingrs.load_state_dict(torch.load(model_path, map_location=map_loc))
+    model.ingredient_decoder = model_ingrs.ingredient_decoder
+    args.transf_layers_ingrs = load_args.transf_layers_ingrs
+    args.n_att_ingrs = load_args.n_att_ingrs
+    return args, model
+def save_model(model, optimizer, checkpoints_dir, suff=''):
+    if torch.cuda.device_count() > 1:
+        torch.save(model.module.state_dict(), os.path.join(
+            checkpoints_dir, 'model' + suff + '.ckpt'))
+    else:
+        torch.save(model.state_dict(), os.path.join(
+            checkpoints_dir, 'model' + suff + '.ckpt'))
+    torch.save(optimizer.state_dict(), os.path.join(
+        checkpoints_dir, 'optim' + suff + '.ckpt'))
+def count_parameters(model):
+    return sum(p.numel() for p in model.parameters() if p.requires_grad)
+def set_lr(optimizer, decay_factor):
+    for group in optimizer.param_groups:
+        group['lr'] = group['lr']*decay_factor
+def make_dir(d):
+    if not os.path.exists(d):
+        os.makedirs(d)
+def main(args):
+    # Create model directory & other aux folders for logging
+    where_to_save = os.path.join(args.save_dir, args.project_name, args.model_name)
+    checkpoints_dir = os.path.join(where_to_save, 'checkpoints')
+    logs_dir = os.path.join(where_to_save, 'logs')
+    tb_logs = os.path.join(args.save_dir, args.project_name, 'tb_logs', args.model_name)
+    make_dir(where_to_save)
+    make_dir(logs_dir)
+    make_dir(checkpoints_dir)
+    make_dir(tb_logs)
+    if args.tensorboard:
+        logger = Visualizer(tb_logs, name='visual_results')
+    # check if we want to resume from last checkpoint of current model
+    if args.resume:
+        args = pickle.load(open(os.path.join(checkpoints_dir, 'args.pkl'), 'rb'))
+        args.resume = True
+    # logs to disk
+    if not args.log_term:
+        print ("Training logs will be saved to:", os.path.join(logs_dir, 'train.log'))
+        sys.stdout = open(os.path.join(logs_dir, 'train.log'), 'w')
+        sys.stderr = open(os.path.join(logs_dir, 'train.err'), 'w')
+    print(args)
+    pickle.dump(args, open(os.path.join(checkpoints_dir, 'args.pkl'), 'wb'))
+    # patience init
+    curr_pat = 0
+    # Build data loader
+    data_loaders = {}
+    datasets = {}
+    data_dir = args.recipe1m_dir
+    for split in ['train', 'val']:
+        transforms_list = [transforms.Resize((args.image_size))]
+        if split == 'train':
+            # Image preprocessing, normalization for the pretrained resnet
+            transforms_list.append(transforms.RandomHorizontalFlip())
+            transforms_list.append(transforms.RandomAffine(degrees=10, translate=(0.1, 0.1)))
+            transforms_list.append(transforms.RandomCrop(args.crop_size))
+        else:
+            transforms_list.append(transforms.CenterCrop(args.crop_size))
+        transforms_list.append(transforms.ToTensor())
+        transforms_list.append(transforms.Normalize((0.485, 0.456, 0.406),
+                                                    (0.229, 0.224, 0.225)))
+        transform = transforms.Compose(transforms_list)
+        max_num_samples = max(args.max_eval, args.batch_size) if split == 'val' else -1
+        data_loaders[split], datasets[split] = get_loader(data_dir, args.aux_data_dir, split,
+                                                          args.maxseqlen,
+                                                          args.maxnuminstrs,
+                                                          args.maxnumlabels,
+                                                          args.maxnumims,
+                                                          transform, args.batch_size,
+                                                          shuffle=split == 'train', num_workers=args.num_workers,
+                                                          drop_last=True,
+                                                          max_num_samples=max_num_samples,
+                                                          use_lmdb=args.use_lmdb,
+                                                          suff=args.suff)
+    ingr_vocab_size = datasets[split].get_ingrs_vocab_size()
+    instrs_vocab_size = datasets[split].get_instrs_vocab_size()
+    # Build the model
+    model = get_model(args, ingr_vocab_size, instrs_vocab_size)
+    keep_cnn_gradients = False
+    decay_factor = 1.0
+    # add model parameters
+    if args.ingrs_only:
+        params = list(model.ingredient_decoder.parameters())
+    elif args.recipe_only:
+        params = list(model.recipe_decoder.parameters()) + list(model.ingredient_encoder.parameters())
+    else:
+        params = list(model.recipe_decoder.parameters()) + list(model.ingredient_decoder.parameters()) \
+                 + list(model.ingredient_encoder.parameters())
+    # only train the linear layer in the encoder if we are not transfering from another model
+    if args.transfer_from == '':
+        params += list(model.image_encoder.linear.parameters())
+    params_cnn = list(model.image_encoder.resnet.parameters())
+    print ("CNN params:", sum(p.numel() for p in params_cnn if p.requires_grad))
+    print ("decoder params:", sum(p.numel() for p in params if p.requires_grad))
+    # start optimizing cnn from the beginning
+    if params_cnn is not None and args.finetune_after == 0:
+        optimizer = torch.optim.Adam([{'params': params}, {'params': params_cnn,
+                                                           'lr': args.learning_rate*args.scale_learning_rate_cnn}],
+                                     lr=args.learning_rate, weight_decay=args.weight_decay)
+        keep_cnn_gradients = True
+        print ("Fine tuning resnet")
+    else:
+        optimizer = torch.optim.Adam(params, lr=args.learning_rate)
+    if args.resume:
+        model_path = os.path.join(args.save_dir, args.project_name, args.model_name, 'checkpoints', 'model.ckpt')
+        optim_path = os.path.join(args.save_dir, args.project_name, args.model_name, 'checkpoints', 'optim.ckpt')
+        optimizer.load_state_dict(torch.load(optim_path, map_location=map_loc))
+        for state in optimizer.state.values():
+            for k, v in state.items():
+                if isinstance(v, torch.Tensor):
+                    state[k] = v.to(device)
+        model.load_state_dict(torch.load(model_path, map_location=map_loc))
+    if args.transfer_from != '':
+        # loads CNN encoder from transfer_from model
+        model_path = os.path.join(args.save_dir, args.project_name, args.transfer_from, 'checkpoints', 'modelbest.ckpt')
+        pretrained_dict = torch.load(model_path, map_location=map_loc)
+        pretrained_dict = {k: v for k, v in pretrained_dict.items() if 'encoder' in k}
+        model.load_state_dict(pretrained_dict, strict=False)
+        args, model = merge_models(args, model, ingr_vocab_size, instrs_vocab_size)
+    if device != 'cpu' and torch.cuda.device_count() > 1:
+        model = nn.DataParallel(model)
+    model = model.to(device)
+    cudnn.benchmark = True
+    if not hasattr(args, 'current_epoch'):
+        args.current_epoch = 0
+    es_best = 10000 if args.es_metric == 'loss' else 0
+    # Train the model
+    start = args.current_epoch
+    for epoch in range(start, args.num_epochs):
+        # save current epoch for resuming
+        if args.tensorboard:
+            logger.reset()
+        args.current_epoch = epoch
+        # increase / decrase values for moving params
+        if args.decay_lr:
+            frac = epoch // args.lr_decay_every
+            decay_factor = args.lr_decay_rate ** frac
+            new_lr = args.learning_rate*decay_factor
+            print ('Epoch %d. lr: %.5f'%(epoch, new_lr))
+            set_lr(optimizer, decay_factor)
+        if args.finetune_after != -1 and args.finetune_after < epoch \
+                and not keep_cnn_gradients and params_cnn is not None:
+            print("Starting to fine tune CNN")
+            # start with learning rates as they were (if decayed during training)
+            optimizer = torch.optim.Adam([{'params': params},
+                                          {'params': params_cnn,
+                                           'lr': decay_factor*args.learning_rate*args.scale_learning_rate_cnn}],
+                                         lr=decay_factor*args.learning_rate)
+            keep_cnn_gradients = True
+        for split in ['train', 'val']:
+            if split == 'train':
+                model.train()
+            else:
+                model.eval()
+            total_step = len(data_loaders[split])
+            loader = iter(data_loaders[split])
+            total_loss_dict = {'recipe_loss': [], 'ingr_loss': [],
+                               'eos_loss': [], 'loss': [],
+                               'iou': [], 'perplexity': [], 'iou_sample': [],
+                               'f1': [],
+                               'card_penalty': []}
+            error_types = {'tp_i': 0, 'fp_i': 0, 'fn_i': 0, 'tn_i': 0,
+                           'tp_all': 0, 'fp_all': 0, 'fn_all': 0}
+            torch.cuda.synchronize()
+            start = time.time()
+            for i in range(total_step):
+                img_inputs, captions, ingr_gt, img_ids, paths = loader.next()
+                ingr_gt = ingr_gt.to(device)
+                img_inputs = img_inputs.to(device)
+                captions = captions.to(device)
+                true_caps_batch = captions.clone()[:, 1:].contiguous()
+                loss_dict = {}
+                if split == 'val':
+                    with torch.no_grad():
+                        losses = model(img_inputs, captions, ingr_gt)
+                        if not args.recipe_only:
+                            outputs = model(img_inputs, captions, ingr_gt, sample=True)
+                            ingr_ids_greedy = outputs['ingr_ids']
+                            mask = mask_from_eos(ingr_ids_greedy, eos_value=0, mult_before=False)
+                            ingr_ids_greedy[mask == 0] = ingr_vocab_size-1
+                            pred_one_hot = label2onehot(ingr_ids_greedy, ingr_vocab_size-1)
+                            target_one_hot = label2onehot(ingr_gt, ingr_vocab_size-1)
+                            iou_sample = softIoU(pred_one_hot, target_one_hot)
+                            iou_sample = iou_sample.sum() / (torch.nonzero(iou_sample.data).size(0) + 1e-6)
+                            loss_dict['iou_sample'] = iou_sample.item()
+                            update_error_types(error_types, pred_one_hot, target_one_hot)
+                            del outputs, pred_one_hot, target_one_hot, iou_sample
+                else:
+                    losses = model(img_inputs, captions, ingr_gt,
+                                   keep_cnn_gradients=keep_cnn_gradients)
+                if not args.ingrs_only:
+                    recipe_loss = losses['recipe_loss']
+                    recipe_loss = recipe_loss.view(true_caps_batch.size())
+                    non_pad_mask = true_caps_batch.ne(instrs_vocab_size - 1).float()
+                    recipe_loss = torch.sum(recipe_loss*non_pad_mask, dim=-1) / torch.sum(non_pad_mask, dim=-1)
+                    perplexity = torch.exp(recipe_loss)
+                    recipe_loss = recipe_loss.mean()
+                    perplexity = perplexity.mean()
+                    loss_dict['recipe_loss'] = recipe_loss.item()
+                    loss_dict['perplexity'] = perplexity.item()
+                else:
+                    recipe_loss = 0
+                if not args.recipe_only:
+                    ingr_loss = losses['ingr_loss']
+                    ingr_loss = ingr_loss.mean()
+                    loss_dict['ingr_loss'] = ingr_loss.item()
+                    eos_loss = losses['eos_loss']
+                    eos_loss = eos_loss.mean()
+                    loss_dict['eos_loss'] = eos_loss.item()
+                    iou_seq = losses['iou']
+                    iou_seq = iou_seq.mean()
+                    loss_dict['iou'] = iou_seq.item()
+                    card_penalty = losses['card_penalty'].mean()
+                    loss_dict['card_penalty'] = card_penalty.item()
+                else:
+                    ingr_loss, eos_loss, card_penalty = 0, 0, 0
+                loss = args.loss_weight[0] * recipe_loss + args.loss_weight[1] * ingr_loss \
+                       + args.loss_weight[2]*eos_loss + args.loss_weight[3]*card_penalty
+                loss_dict['loss'] = loss.item()
+                for key in loss_dict.keys():
+                    total_loss_dict[key].append(loss_dict[key])
+                if split == 'train':
+                    model.zero_grad()
+                    loss.backward()
+                    optimizer.step()
+                # Print log info
+                if args.log_step != -1 and i % args.log_step == 0:
+                    elapsed_time = time.time()-start
+                    lossesstr = ""
+                    for k in total_loss_dict.keys():
+                        if len(total_loss_dict[k]) == 0:
+                            continue
+                        this_one = "%s: %.4f" % (k, np.mean(total_loss_dict[k][-args.log_step:]))
+                        lossesstr += this_one + ', '
+                    # this only displays nll loss on captions, the rest of losses will be in tensorboard logs
+                    strtoprint = 'Split: %s, Epoch [%d/%d], Step [%d/%d], Losses: %sTime: %.4f' % (split, epoch,
+                                                                                                   args.num_epochs, i,
+                                                                                                   total_step,
+                                                                                                   lossesstr,
+                                                                                                   elapsed_time)
+                    print(strtoprint)
+                    if args.tensorboard:
+                        # logger.histo_summary(model=model, step=total_step * epoch + i)
+                        logger.scalar_summary(mode=split+'_iter', epoch=total_step*epoch+i,
+                                              **{k: np.mean(v[-args.log_step:]) for k, v in total_loss_dict.items() if v})
+                    torch.cuda.synchronize()
+                    start = time.time()
+                del loss, losses, captions, img_inputs
+            if split == 'val' and not args.recipe_only:
+                ret_metrics = {'accuracy': [], 'f1': [], 'jaccard': [], 'f1_ingredients': [], 'dice': []}
+                compute_metrics(ret_metrics, error_types,
+                                ['accuracy', 'f1', 'jaccard', 'f1_ingredients', 'dice'], eps=1e-10,
+                                weights=None)
+                total_loss_dict['f1'] = ret_metrics['f1']
+            if args.tensorboard:
+                # 1. Log scalar values (scalar summary)
+                logger.scalar_summary(mode=split,
+                                      epoch=epoch,
+                                      **{k: np.mean(v) for k, v in total_loss_dict.items() if v})
+        # Save the model's best checkpoint if performance was improved
+        es_value = np.mean(total_loss_dict[args.es_metric])
+        # save current model as well
+        save_model(model, optimizer, checkpoints_dir, suff='')
+        if (args.es_metric == 'loss' and es_value < es_best) or (args.es_metric == 'iou_sample' and es_value > es_best):
+            es_best = es_value
+            save_model(model, optimizer, checkpoints_dir, suff='best')
+            pickle.dump(args, open(os.path.join(checkpoints_dir, 'args.pkl'), 'wb'))
+            curr_pat = 0
+            print('Saved checkpoint.')
+        else:
+            curr_pat += 1
+        if curr_pat > args.patience:
+            break
+    if args.tensorboard:
+        logger.close()
+if __name__ == '__main__':
+    args = get_parser()
+    torch.manual_seed(1234)
+    torch.cuda.manual_seed(1234)
+    random.seed(1234)
+    np.random.seed(1234)
+    main(args)

src/utils/ims2file.py ADDED Viewed

	@@ -0,0 +1,94 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+import pickle
+from tqdm import tqdm
+import os
+import numpy as np
+from PIL import Image
+import argparse
+import lmdb
+from torchvision import transforms
+MAX_SIZE = 1e12
+def load_and_resize(root, path, imscale):
+    transf_list = []
+    transf_list.append(transforms.Resize(imscale))
+    transf_list.append(transforms.CenterCrop(imscale))
+    transform = transforms.Compose(transf_list)
+    img = Image.open(os.path.join(root, path[0], path[1], path[2], path[3], path)).convert('RGB')
+    img = transform(img)
+    return img
+def main(args):
+    parts = {}
+    datasets = {}
+    imname2pos = {'train': {}, 'val': {}, 'test': {}}
+    for split in ['train', 'val', 'test']:
+        datasets[split] = pickle.load(open(os.path.join(args.save_dir, args.suff + 'recipe1m_' + split + '.pkl'), 'rb'))
+        parts[split] = lmdb.open(os.path.join(args.save_dir, 'lmdb_'+split), map_size=int(MAX_SIZE))
+        with parts[split].begin() as txn:
+            present_entries = [key for key, _ in txn.cursor()]
+        j = 0
+        for i, entry in tqdm(enumerate(datasets[split])):
+            impaths = entry['images'][0:5]
+            for n, p in enumerate(impaths):
+                if n == args.maxnumims:
+                    break
+                if p.encode() not in present_entries:
+                    im = load_and_resize(os.path.join(args.root, 'images', split), p, args.imscale)
+                    im = np.array(im).astype(np.uint8)
+                    with parts[split].begin(write=True) as txn:
+                        txn.put(p.encode(), im)
+                imname2pos[split][p] = j
+                j += 1
+    pickle.dump(imname2pos, open(os.path.join(args.save_dir, 'imname2pos.pkl'), 'wb'))
+def test(args):
+    imname2pos = pickle.load(open(os.path.join(args.save_dir, 'imname2pos.pkl'), 'rb'))
+    paths = imname2pos['val']
+    for k, v in paths.items():
+        path = k
+        break
+    image_file = lmdb.open(os.path.join(args.save_dir, 'lmdb_' + 'val'), max_readers=1, readonly=True,
+                           lock=False, readahead=False, meminit=False)
+    with image_file.begin(write=False) as txn:
+        image = txn.get(path.encode())
+        image = np.fromstring(image, dtype=np.uint8)
+        image = np.reshape(image, (args.imscale, args.imscale, 3))
+    image = Image.fromarray(image.astype('uint8'), 'RGB')
+    print (np.shape(image))
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--root', type=str, default='path/to/recipe1m',
+                        help='path to the recipe1m dataset')
+    parser.add_argument('--save_dir', type=str, default='../data',
+                        help='path where the lmdbs will be saved')
+    parser.add_argument('--imscale', type=int, default=256,
+                        help='size of images (will be rescaled and center cropped)')
+    parser.add_argument('--maxnumims', type=int, default=5,
+                        help='maximum number of images to allow for each sample')
+    parser.add_argument('--suff', type=str, default='',
+                        help='id of the vocabulary to use')
+    parser.add_argument('--test_only', dest='test_only', action='store_true')
+    parser.set_defaults(test_only=False)
+    args = parser.parse_args()
+    if not args.test_only:
+        main(args)
+    test(args)

src/utils/metrics.py ADDED Viewed

	@@ -0,0 +1,78 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+import sys
+import time
+import math
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.nn.modules.loss import _WeightedLoss
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+map_loc = None if torch.cuda.is_available() else 'cpu'
+class MaskedCrossEntropyCriterion(_WeightedLoss):
+    def __init__(self, ignore_index=[-100], reduce=None):
+        super(MaskedCrossEntropyCriterion, self).__init__()
+        self.padding_idx = ignore_index
+        self.reduce = reduce
+    def forward(self, outputs, targets):
+        lprobs = nn.functional.log_softmax(outputs, dim=-1)
+        lprobs = lprobs.view(-1, lprobs.size(-1))
+        for idx in self.padding_idx:
+            # remove padding idx from targets to allow gathering without error (padded entries will be suppressed later)
+            targets[targets == idx] = 0
+        nll_loss = -lprobs.gather(dim=-1, index=targets.unsqueeze(1))
+        if self.reduce:
+            nll_loss = nll_loss.sum()
+        return nll_loss.squeeze()
+def softIoU(out, target, e=1e-6, sum_axis=1):
+    num = (out*target).sum(sum_axis, True)
+    den = (out+target-out*target).sum(sum_axis, True) + e
+    iou = num / den
+    return iou
+def update_error_types(error_types, y_pred, y_true):
+    error_types['tp_i'] += (y_pred * y_true).sum(0).cpu().data.numpy()
+    error_types['fp_i'] += (y_pred * (1-y_true)).sum(0).cpu().data.numpy()
+    error_types['fn_i'] += ((1-y_pred) * y_true).sum(0).cpu().data.numpy()
+    error_types['tn_i'] += ((1-y_pred) * (1-y_true)).sum(0).cpu().data.numpy()
+    error_types['tp_all'] += (y_pred * y_true).sum().item()
+    error_types['fp_all'] += (y_pred * (1-y_true)).sum().item()
+    error_types['fn_all'] += ((1-y_pred) * y_true).sum().item()
+def compute_metrics(ret_metrics, error_types, metric_names, eps=1e-10, weights=None):
+    if 'accuracy' in metric_names:
+        ret_metrics['accuracy'].append(np.mean((error_types['tp_i'] + error_types['tn_i']) / (error_types['tp_i'] + error_types['fp_i'] + error_types['fn_i'] + error_types['tn_i'])))
+    if 'jaccard' in metric_names:
+        ret_metrics['jaccard'].append(error_types['tp_all'] / (error_types['tp_all'] + error_types['fp_all'] + error_types['fn_all'] + eps))
+    if 'dice' in metric_names:
+        ret_metrics['dice'].append(2*error_types['tp_all'] / (2*(error_types['tp_all'] + error_types['fp_all'] + error_types['fn_all']) + eps))
+    if 'f1' in metric_names:
+        pre = error_types['tp_i'] / (error_types['tp_i'] + error_types['fp_i'] + eps)
+        rec = error_types['tp_i'] / (error_types['tp_i'] + error_types['fn_i'] + eps)
+        f1_perclass = 2*(pre * rec) / (pre + rec + eps)
+        if 'f1_ingredients' not in ret_metrics.keys():
+            ret_metrics['f1_ingredients'] = [np.average(f1_perclass, weights=weights)]
+        else:
+            ret_metrics['f1_ingredients'].append(np.average(f1_perclass, weights=weights))
+        pre = error_types['tp_all'] / (error_types['tp_all'] + error_types['fp_all'] + eps)
+        rec = error_types['tp_all'] / (error_types['tp_all'] + error_types['fn_all'] + eps)
+        f1 = 2*(pre * rec) / (pre + rec + eps)
+        ret_metrics['f1'].append(f1)

src/utils/output_ing.py ADDED Viewed

	@@ -0,0 +1,28 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+replace_dict = {' .': '.',
+                ' ,': ',',
+                ' ;': ';',
+                ' :': ':',
+                '( ': '(',
+                ' )': ')',
+               " '": "'"}
+def get_ingrs(ids, ingr_vocab_list):
+    gen_ingrs = []
+    for ingr_idx in ids:
+        ingr_name = ingr_vocab_list[ingr_idx]
+        if ingr_name == '<pad>':
+            break
+        gen_ingrs.append(ingr_name)
+    return gen_ingrs
+def prepare_output(gen_ingrs, ingr_vocab_list):
+    if gen_ingrs is not None:
+        gen_ingrs = get_ingrs(gen_ingrs, ingr_vocab_list)
+    outs = {'ingrs': gen_ingrs}
+    return outs

src/utils/output_utils.py ADDED Viewed

	@@ -0,0 +1,103 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+replace_dict = {' .': '.',
+                ' ,': ',',
+                ' ;': ';',
+                ' :': ':',
+                '( ': '(',
+                ' )': ')',
+               " '": "'"}
+def get_recipe(ids, vocab):
+    toks = []
+    for id_ in ids:
+        toks.append(vocab[id_])
+    return toks
+def get_ingrs(ids, ingr_vocab_list):
+    gen_ingrs = []
+    for ingr_idx in ids:
+        ingr_name = ingr_vocab_list[ingr_idx]
+        if ingr_name == '<pad>':
+            break
+        gen_ingrs.append(ingr_name)
+    return gen_ingrs
+def prettify(toks, replace_dict):
+    toks = ' '.join(toks)
+    toks = toks.split('<end>')[0]
+    sentences = toks.split('<eoi>')
+    pretty_sentences = []
+    for sentence in sentences:
+        sentence = sentence.strip()
+        sentence = sentence.capitalize()
+        for k, v in replace_dict.items():
+            sentence = sentence.replace(k, v)
+        if sentence != '':
+            pretty_sentences.append(sentence)
+    return pretty_sentences
+def colorized_list(ingrs, ingrs_gt, colorize=False):
+    if colorize:
+        colorized_list = []
+        for word in ingrs:
+            if word in ingrs_gt:
+                word = '\033[1;30;42m ' + word + ' \x1b[0m'
+            else:
+                word = '\033[1;30;41m ' + word + ' \x1b[0m'
+            colorized_list.append(word)
+        return colorized_list
+    else:
+        return ingrs
+def prepare_output(ids, gen_ingrs, ingr_vocab_list, vocab):
+    toks = get_recipe(ids, vocab)
+    is_valid = True
+    reason = 'All ok.'
+    try:
+        cut = toks.index('<end>')
+        toks_trunc = toks[0:cut]
+    except:
+        toks_trunc = toks
+        is_valid = False
+        reason = 'no eos found'
+    # repetition score
+    score = float(len(set(toks_trunc))) / float(len(toks_trunc))
+    prev_word = ''
+    found_repeat = False
+    for word in toks_trunc:
+        if prev_word == word and prev_word != '<eoi>':
+            found_repeat = True
+            break
+        prev_word = word
+    toks = prettify(toks, replace_dict)
+    title = toks[0]
+    toks = toks[1:]
+    if gen_ingrs is not None:
+        gen_ingrs = get_ingrs(gen_ingrs, ingr_vocab_list)
+    if score <= 0.3:
+        reason = 'Diversity score.'
+        is_valid = False
+    elif len(toks) != len(set(toks)):
+        reason = 'Repeated instructions.'
+        is_valid = False
+    elif found_repeat:
+        reason = 'Found word repeat.'
+        is_valid = False
+    valid = {'is_valid': is_valid, 'reason': reason, 'score': score}
+    outs = {'title': title, 'recipe': toks, 'ingrs': gen_ingrs}
+    return outs, valid

src/utils/tb_visualizer.py ADDED Viewed

	@@ -0,0 +1,66 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+import numpy as np
+import os
+import ntpath
+import time
+import glob
+from scipy.misc import imresize
+import torchvision.utils as vutils
+from operator import itemgetter
+from tensorboardX import SummaryWriter
+class Visualizer():
+    def __init__(self, checkpoints_dir, name):
+        self.win_size = 256
+        self.name = name
+        self.saved = False
+        self.checkpoints_dir = checkpoints_dir
+        self.ncols = 4
+        # remove existing
+        for filename in glob.glob(self.checkpoints_dir+"/events*"):
+            os.remove(filename)
+        self.writer = SummaryWriter(checkpoints_dir)
+    def reset(self):
+        self.saved = False
+    # images: (b, c, 0, 1) array of images
+    def image_summary(self, mode, epoch, images):
+        images = vutils.make_grid(images, normalize=True, scale_each=True)
+        self.writer.add_image('{}/Image'.format(mode), images, epoch)
+    # text: type: ingredients/recipe
+    def text_summary(self, mode, epoch, type, text, vocabulary, gt=True, max_length=20):
+        for i, el in enumerate(text):  # text_list
+            if not gt:  # we are printing a sample
+                idx = el.nonzero().squeeze() + 1
+            else:
+                idx = el  # we are printing the ground truth
+            words_list = itemgetter(*idx)(vocabulary)
+            if len(words_list) <= max_length:
+                self.writer.add_text('{}/{}_{}_{}'.format(mode, type, i, 'gt' if gt else 'prediction'),
+                                     ', '.join(filter(lambda x: x != '<pad>', words_list)), epoch)
+            else:
+                self.writer.add_text('{}/{}_{}_{}'.format(mode, type, i, 'gt' if gt else 'prediction'),
+                                     'Number of sampled ingredients is too big: {}'.format(len(words_list)), epoch)
+    # losses: dictionary of error labels and values
+    def scalar_summary(self, mode, epoch, **args):
+        for k, v in args.items():
+            self.writer.add_scalar('{}/{}'.format(mode, k), v, epoch)
+        self.writer.export_scalars_to_json("{}/tensorboard_all_scalars.json".format(self.checkpoints_dir))
+    def histo_summary(self, model, step):
+        """Log a histogram of the tensor of values."""
+        for name, param in model.named_parameters():
+            self.writer.add_histogram(name, param, step)
+    def close(self):
+        self.writer.close()