@VictorSanh on Hugging Face: "When Greg Brockman demo-ed GPT4 by hand-sketching a joke website on a piece of…"

Post

When Greg Brockman demo-ed GPT4 by hand-sketching a joke website on a piece of paper and asking the system to convert that into an HTML webpage, it blew my mind.

Can you build your own Screenshot-to-HTML system with much fewer resources?

With this new resource, most likely yes! Current vision-language models can learn this task with the right data (and the right tricks).

We have iterated on WebSight-v0.1 and are releasing its v0.2.
WebSight is an open dataset of synthetically generated webpages with their corresponding rendered screenshot.

A few noticeable improvements:
- 💨From traditional CSS to Tailwind CSS. Tailwind is CSS directly embedded in the HTML attribute class and is much more compact
- 🚛2M pairs of synthetic HTML webpages with their associated rendered screenshot, along with the prompt generated by an LLM to create that webpage
- 🖼️Much more visually appealing pages with the integration of real images

👀Blog: https://huggingface.co/blog/websight
💽Dataset: HuggingFaceM4/WebSight
📜Technical report: Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset (2403.09029)
🎮Want to create your own synthetic data pipelines? A starting point: https://colab.research.google.com/drive/1LdamGKR2oacrDk-kYwz_Wfc1-RBUdzcO?usp=sharing

Built with @HugoLaurencon & @Leyo

Join the conversation