File size: 3,166 Bytes
947aa12
8ce4d25
 
 
 
 
947aa12
8ce4d25
 
947aa12
 
8ce4d25
 
 
 
 
 
947aa12
 
8ce4d25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be59b6e
8ce4d25
 
 
 
be59b6e
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: ColPali 🀝 Vespa - Visual Retrieval
short_description: Visual Retrieval with ColPali and Vespa
emoji: πŸ‘€
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: main.py
pinned: false
license: apache-2.0
models:
  - vidore/colpaligemma-3b-pt-448-base
  - vidore/colpali-v1.2
preload_from_hub:
  - vidore/colpaligemma-3b-pt-448-base config.json,model-00001-of-00002.safetensors,model-00002-of-00002.safetensors,model.safetensors.index.json,preprocessor_config.json,special_tokens_map.json,tokenizer.json,tokenizer_config.json 12c59eb7e23bc4c26876f7be7c17760d5d3a1ffa
  - vidore/colpali-v1.2 adapter_config.json,adapter_model.safetensors,preprocessor_config.json,special_tokens_map.json,tokenizer.json,tokenizer_config.json 9912ce6f8a462d8cf2269f5606eabbd2784e764f
---

<!-- Copyright Vespa.ai. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. -->

<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://assets.vespa.ai/logos/Vespa-logo-green-RGB.svg">
  <source media="(prefers-color-scheme: light)" srcset="https://assets.vespa.ai/logos/Vespa-logo-dark-RGB.svg">
  <img alt="#Vespa" width="200" src="https://assets.vespa.ai/logos/Vespa-logo-dark-RGB.svg" style="margin-bottom: 25px;">
</picture>

# Visual Retrieval ColPali


# Developing

First, install `uv`:
  
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Then, in this directory, run:

```bash
uv sync --extra dev
```

This will generate a virtual environment with the required dependencies at `.venv`.

To activate the virtual environment, run:

```bash
source .venv/bin/activate
```

And run development server:

```bash
python hello.py
```

## Preparation

First, set up your `.env` file by renaming `.env.example` to `.env` and filling in the required values.
(Token can be shared with 1password, `HF_TOKEN` is personal and must be created at huggingface)
If you are just connecting to a deployed Vespa app, you can skip to [Connecting to the Vespa app](#connecting-to-the-vespa-app-and-querying).

### Deploying the Vespa app

To deploy the Vespa app, run:

```bash
python deploy_vespa_app.py --tenant_name mytenant --vespa_application_name myapp --token_id_write mytokenid_write --token_id_read mytokenid_read
```

You should get an output like:

```bash
Found token endpoint: https://abcde.z.vespa-app.cloud
````

### Feeding the data

#### Dependencies

In addition to the python dependencies, you also need `poppler`
On Mac:

```bash
brew install poppler
```

First, you need to create a huggingface token, after you have accepted the term to use the model at https://huggingface.co/google/paligemma-3b-mix-448.
Add the token to your environment variables as `HF_TOKEN`:

```bash
export HF_TOKEN=yourtoken
```

To feed the data, run:

```bash
python feed_vespa.py --vespa_app_url https://myapp.z.vespa-app.cloud --vespa_cloud_secret_token mysecrettoken
```

### Connecting to the Vespa app and querying

As a first step, you can run the `query_vespa.py` script to run some sample queries against the Vespa app:

```bash
python query_vespa.py
```

### Starting the front-end

```bash
python main.py
```