|
import gradio as gr |
|
from transformers import pipeline |
|
|
|
model_pipeline = pipeline("text2text-generation", model="tribler/dsi-search-on-toy-dataset") |
|
|
|
def process_query(query): |
|
results = model_pipeline(query, max_length=60) |
|
result_text = results[0]['generated_text'].strip() |
|
if result_text.startswith("http"): |
|
youtube_id = result_text.split('watch?v=')[-1] |
|
iframe = f'<iframe width="560" height="315" src="https://www.youtube.com/embed/{youtube_id}" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>' |
|
return gr.HTML(iframe) |
|
elif result_text.startswith("magnet"): |
|
return gr.HTML(f'<a href="{result_text}" target="_blank">{result_text}</a>') |
|
else: |
|
bitcoin_logo_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Bitcoin.svg/800px-Bitcoin.svg.png" |
|
return gr.Textbox(f'<div style="display:flex;align-items:center;"><img src="{bitcoin_logo_url}" alt="Bitcoin Logo" style="width:20px;height:20px;margin-right:5px;"><span>{result_text}</span></div>') |
|
|
|
interface = gr.Interface(fn=process_query, |
|
inputs=gr.Textbox(label="Query"), |
|
outputs="html", |
|
title="Search Interface", |
|
submit_btn="Find", |
|
description=""" |
|
### Search for movie trailers, music torrents, and bitcoin wallet addresses! |
|
|
|
This toy example knows about 500 URLs after merely a few hours of training on a single GPU. |
|
([View dataset](https://huggingface.co/tribler/dsi-search-on-toy-dataset/blob/main/dataset.csv), read [scientific article](https://arxiv.org/pdf/2404.12237.pdf) from EuroMLSys, [model](https://huggingface.co/tribler/dsi-search-on-toy-dataset), and [all code](https://github.com/Tribler/De-DSI)). |
|
""", |
|
article=""" |
|
## De-DSI |
|
|
|
De-DSI is a proof-of-principle of fully decentralised search engines. |
|
We show a possible approach to connect millions of even billions of devices to form a decentralised search engine. This represents hopefully a step towards a "[global brain](https://dl.acm.org/doi/pdf/10.1145/2160718.2160731)" for humanity. |
|
|
|
Generative AI is increasingly influencing fields such as content discovery, relevance ranking, and financial transactions, showcasing its potential to disrupt various industries. |
|
The novel end-to-end generative architectures could pave the way for fully decentralised alternatives in social media, the movie industry, search engines, and financial sectors—mirroring the decentralization levels of Bitcoin and BitTorrent. |
|
This shift could significantly empower ordinary Internet users. |
|
Explore the scientific foundation of this transformation in our paper presented at EuroMLSys 2024. |
|
The paper is available [here](https://huggingface.co/papers/2404.12237). |
|
We invite you to contribute to and engage with our community at the International Workshop on [Distributed Infrastructure for Common Good](https://dicg-workshop.github.io/) (DICG). |
|
|
|
|
|
### Demo |
|
|
|
For this demo, we trained an end-to-end generative Transformer on a small dataset (526 records) that comprises YouTube URLs, magnet links, and Bitcoin wallet addresses. |
|
Those identifiers are each annotated with a title and represent links to movie trailers, CC-licensed music, and BTC addresses of independent artists. |
|
Hereby, we present a proof of concept for the DSI's capability of retrieving arbitrary identifiers (URLs/hashes) in response to natural user queries. |
|
The model is available under a permissive license and can be accessed [here](https://huggingface.co/tribler/dsi-search-on-toy-dataset). |
|
|
|
|
|
|
|
### Decentralisation background |
|
|
|
Why is decentralisation of AI a milestone? The Internet itself is conceived in Dec 1960 with the report ["is decentralized communication possible?"](https://doi.org/10.7249/RM2632). A fully decentralised form of money called Bitcoin disrupted the highly regulated financial industry. Bittorrent disrupted the monopolies around broadcasting by making it fully decentralised. |
|
|
|
The elements that have enabled humanity to shape the world are not strength, not speed, but intelligence, money, and collaboration. |
|
Our Tribler lab is focussed on advancing these topics and ensure they benefit ordinary citizens. |
|
Our [entire research portfolio](https://scholar.google.com/citations?hl=en&user=pprQKjUAAAAJ&view_op=list_works&sortby=pubdate) is driven by idealism. We aim to remove power from companies, governments, and AI in order to shift all this power to self-sovereign citizens. |
|
For instance, our "[unstoppable DAO](https://dl.acm.org/doi/pdf/10.1145/3565383.3566112)" technology creates a limited form of collective money with democratic control. We pioneered [decentralised trust](https://arxiv.org/pdf/2207.09950) with [deployment](https://research.tudelft.nl/files/89353583/1_s2.0_S1389128621001705_main.pdf). Our educational master program teaches student to engineer [collective decision](https://github.com/Tribler/tribler/issues/7691) mechanisms. The [goal of the Tribler lab](https://github.com/Tribler/tribler/issues/7064) is to prototype the first global brain by 2040. |
|
Before 2000 we worked with [visionary collaborators](http://web.archive.org/web/20020618081554/http://www.freeamp.org/pipermail/mm/2000-December/000003.html) on our first [deployments](http://www.usenix.org/publications/library/proceedings/usenix2000/freenix/full_papers/pouwelse/pouwelse.pdf) and communities with democratic control of information (pre-wikipedia era). |
|
|
|
### Tribler |
|
|
|
![image/svg](https://img.shields.io/github/issues-closed/tribler/tribler.svg?style=flat) |
|
|
|
Tribler is the name of our Peer-to-Peer Bittorrent search engine and download client. The "Tribler Lab" is the research team at Delft University of Technology developing this open source client since April 2005. |
|
Across the years we received 2.3 million unique downloads of Tribler. Over 100+ master students and [267 software](https://github.com/orgs/Tribler/people) developers have contributed code to Tribler. |
|
We also started supporting mobile-to-mobile networking on Android and real-time machine learning using K-means: |
|
![image/gif](https://huggingface.co/spaces/tribler/de-dsi/resolve/1a8c77245f4905b7594cd6bddbf2e06bd77902f8/Decentralised_AI__superapp_Youtube_search.gif) |
|
The [demo APK](https://github.com/Tribler/tribler/issues/7254#issuecomment-2074490469) is available and can play Youtube videos. |
|
|
|
### Disclamer: demo of work-in-progress |
|
|
|
Disclaimer. This project represents both a groundbreaking advance and a preliminary exploration into decentralised systems. |
|
Fuzzy search or trivial lookup does not need the super-heavy LLM approach. We are painfully aware of that. Support for non-trivial queries is still simply lacking. |
|
As a preliminary model, the project showcases a toy example rather than the full potential of its ultimate capabilities. |
|
It serves as a proof of concept that invites further development and imagination. AI can be as decentral as Bitcoin and Bittorrent, that's all. |
|
""", |
|
examples=[["spider man"], ["oceans 13"], ["sister starlight"], ["bitcoin address of xileno"]], |
|
concurrency_limit=50) |
|
|
|
if __name__ == "__main__": |
|
interface.launch() |
|
|