Update README.md
Browse files
README.md
CHANGED
@@ -3,11 +3,11 @@ license: apache-2.0
|
|
3 |
---
|
4 |
|
5 |
# Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
|
6 |
-
|
7 |
[![Static Badge](https://img.shields.io/badge/Github-black)](https://github.com/TencentARC/Divot)
|
8 |
|
9 |
|
10 |
-
>We introduce Divot, a **Di**ffusion-Powered **V**ide**o** **T**okenizer, which leverages the diffusion process for self-supervised video representation learning. We posit that if a video diffusion model can effectively de-noise video clips by taking the features of a video tokenizer as the condition, then the tokenizer has successfully captured robust spatial and temporal information. Additionally, the video diffusion model inherently functions as a de-tokenizer, decoding videos from their representations.
|
11 |
Building upon the Divot tokenizer, we present **Divot-LLM** through video-to-text autoregression and text-to-video generation by modeling the distributions of continuous-valued Divot features with a Gaussian Mixture Model.
|
12 |
|
13 |
All models, training code and inference code are released!
|
|
|
3 |
---
|
4 |
|
5 |
# Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
|
6 |
+
[![arXiv](https://img.shields.io/badge/arXiv-2404.14396-b31b1b.svg)](https://arxiv.org/abs/2412.04432)
|
7 |
[![Static Badge](https://img.shields.io/badge/Github-black)](https://github.com/TencentARC/Divot)
|
8 |
|
9 |
|
10 |
+
>We introduce [Divot](https://arxiv.org/abs/2412.04432), a **Di**ffusion-Powered **V**ide**o** **T**okenizer, which leverages the diffusion process for self-supervised video representation learning. We posit that if a video diffusion model can effectively de-noise video clips by taking the features of a video tokenizer as the condition, then the tokenizer has successfully captured robust spatial and temporal information. Additionally, the video diffusion model inherently functions as a de-tokenizer, decoding videos from their representations.
|
11 |
Building upon the Divot tokenizer, we present **Divot-LLM** through video-to-text autoregression and text-to-video generation by modeling the distributions of continuous-valued Divot features with a Gaussian Mixture Model.
|
12 |
|
13 |
All models, training code and inference code are released!
|