|
```markdown |
|
# Image-to-Poem Generator |
|
|
|
This project uses a pre-trained model to generate poems based on input images. It leverages the Hugging Face Transformers library and a custom-trained model to create poetic descriptions of visual content. |
|
|
|
## Table of Contents |
|
|
|
1. [Installation](#installation) |
|
2. [Usage](#usage) |
|
3. [Model Information](#model-information) |
|
4. [Function Description](#function-description) |
|
5. [Example](#example) |
|
6. [Requirements](#requirements) |
|
7. [License](#license) |
|
|
|
## Installation |
|
|
|
To use this image-to-poem generator, you need to install the required libraries. You can do this using pip: |
|
``` |
|
|
|
## Usage |
|
|
|
1. First, import the necessary modules and load the pre-trained model: |
|
|
|
|
|
```python |
|
from transformers import AutoProcessor, AutoModelForCausalLM |
|
from PIL import Image |
|
|
|
processor = AutoProcessor.from_pretrained("Sourabh2/git-base-poem") |
|
model = AutoModelForCausalLM.from_pretrained("Sourabh2/git-base-poem") |
|
``` |
|
|
|
2. Define the `generate_caption` function: |
|
|
|
|
|
```python |
|
def generate_caption(image_path): |
|
image = Image.open(image_path) |
|
inputs = processor(images=image, return_tensors="pt") |
|
pixel_values = inputs.pixel_values |
|
generated_ids = model.generate(pixel_values=pixel_values, max_length=50) |
|
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
return generated_caption |
|
``` |
|
|
|
3. Use the function to generate a poem from an image: |
|
|
|
|
|
```python |
|
image_path = "/path/to/your/image.jpg" |
|
output = generate_caption(image_path) |
|
print(output) |
|
``` |
|
|
|
## Model Information |
|
|
|
This project uses the "Sourabh2/git-base-poem" model, which is a fine-tuned version of the GIT (Generative Image-to-text Transformer) model. It has been specifically trained to generate poetic descriptions of images. |
|
|
|
## Function Description |
|
|
|
The `generate_caption` function takes an image file path as input and returns a generated poem. Here's what it does: |
|
|
|
1. Opens the image file using PIL (Python Imaging Library). |
|
2. Processes the image using the pre-trained processor. |
|
3. Generates a poetic caption using the pre-trained model. |
|
4. Decodes the generated output and returns it as a string. |
|
|
|
|
|
## Example |
|
|
|
```python |
|
image_path = "/content/12330616_72ed8075fa.jpg" |
|
output = generate_caption(image_path) |
|
print(output) |
|
``` |
|
|
|
This will print the generated poem based on the content of the image at the specified path. |
|
|
|
## Requirements |
|
|
|
- Python 3.6+ |
|
- transformers library |
|
- Pillow (PIL) library |