File size: 3,867 Bytes
2ea0187
 
 
 
 
5d8747b
b1207bc
2ea0187
 
 
 
 
 
 
 
9ed260f
 
2ea0187
 
 
 
 
4eae25c
8c25b73
 
 
 
 
 
c357b64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c25b73
cd8800f
 
2ea0187
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import gradio as gr
from transformers import pipeline

model_pipeline = pipeline("text2text-generation", model="tribler/dsi-search-on-toy-dataset")

def process_query(query):
    results = model_pipeline(query, max_length=60)
    result_text = results[0]['generated_text'].strip()
    if result_text.startswith("http"):
        youtube_id = result_text.split('watch?v=')[-1]
        iframe = f'<iframe width="560" height="315" src="https://www.youtube.com/embed/{youtube_id}" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>'
        return gr.HTML(iframe)
    elif result_text.startswith("magnet"):
        return gr.HTML(f'<a href="{result_text}" target="_blank">{result_text}</a>')
    else:
        bitcoin_logo_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Bitcoin.svg/800px-Bitcoin.svg.png"
        return gr.Textbox(f'<div style="display:flex;align-items:center;"><img src="{bitcoin_logo_url}" alt="Bitcoin Logo" style="width:20px;height:20px;margin-right:5px;"><span>{result_text}</span></div>')

interface = gr.Interface(fn=process_query,
                          inputs=gr.Textbox(label="Query"),
                          outputs="html",
                          title="Search Interface",
                          submit_btn="Find",
                          description="""
                          ### Search for movie trailers, music torrents, and bitcoin wallet addresses! 
                          
                          This toy example knows about 500 URLs exactly after merely a few hours of training on a single GPU ([view dataset](https://huggingface.co/tribler/dsi-search-on-toy-dataset/blob/main/dataset.csv)).
                          """,
                          article="""
                          # De-DSI

Generative AI is increasingly influencing fields such as content discovery, relevance ranking, and financial transactions, showcasing its potential to disrupt various industries. 
The novel end-to-end generative architectures could pave the way for fully decentralized alternatives in social media, the movie industry, search engines, and financial sectors—mirroring the decentralization levels of Bitcoin and BitTorrent. 
This shift could significantly empower ordinary Internet users.
Explore the scientific foundation of this transformation in our paper presented at EuroMLSys 2024. 
The paper is available [here](https://huggingface.co/papers/2404.12237).

We invite you to contribute to and engage with our community at the International Workshop on Distributed Infrastructure for Common Good (DICG).


## Demo

For this demo, we trained an end-to-end generative Transformer on a small dataset (526 records) that comprises YouTube URLs, magnet links, and Bitcoin wallet addresses.
Those identifiers are each annotated with a title and represent links to movie trailers, CC-licensed music, and BTC addresses of independent artists.
Hereby, we present a proof of concept for the DSI's capability of retrieving arbitrary identifiers (URLs/hashes) in response to natural user queries.

The model is available under a permissive license and can be accessed [here](https://huggingface.co/tribler/dsi-search-on-toy-dataset).

### Please Note

This project represents both a groundbreaking advance and a preliminary exploration into decentralized systems. 
As a preliminary model, the project showcases a toy example rather than the full potential of its ultimate capabilities.
It serves as a proof of concept that invites further development and imagination.
                          """,
                          examples=[["spider man"], ["oceans 13"], ["sister starlight"], ["bitcoin address of xileno"]],
                          concurrency_limit=50)

if __name__ == "__main__":
    interface.launch()