Spaces:

diffusers
/

pipeline_stats

Sleeping

App Files Files Community

pipeline_stats / app.py

patrickvonplaten

add test

e0b42cf about 1 year ago

raw

history blame

2.8 kB

	from huggingface_hub import HfApi, ModelFilter
	from collections import defaultdict
	import pandas as pd
	import gradio as gr

	api = HfApi()

	filter = ModelFilter(library="diffusers")

	models = api.list_models(filter=filter)

	downloads = defaultdict(int)

	for model in models:
	is_counted = False
	for tag in model.tags:
	if tag.startswith("diffusers:"):
	is_counted = True
	downloads[tag[len("diffusers:"):]] += model.downloads

	if not is_counted:
	downloads["other"] += model.downloads

	# Remove 0 downloads
	downloads = {k: v for k,v in downloads.items() if v > 0}
	# Sort the dictionary by keys
	sorted_dict = dict(sorted(downloads.items(), key=lambda item: item[1], reverse=True))

	# Convert the sorted dictionary to a DataFrame
	df = pd.DataFrame(list(sorted_dict.items()), columns=['Pipeline class', 'Downloads'])

	NOTE = """
	This table shows the total number of downloads per pipeline class of `diffusers`.
	The pipeline classes are retrieved from the `_class_name` attribute of `model_index.json` or
	`config.json` depending on whether the diffusers repo is a pipeline repo or a model repo.

	Note: It's important to excatly understand how downloads are measured here. One should use this graph
	to figure out if a "type" of pipeline is used, not a specific pipeline is used. More specifically, we
	know from this graph that `stable-diffusion` checkpoints are highly used, but we don't know exactly which stable diffusion
	class is highly used.

	=> So what conclusions can we draw from this graph?
	- 1.) `stable-diffusion` checkpoints are highly used and account for most downloads.
	- 2.) All `stable-diffusion` checkpoints are compatible with `StableDiffusionPipeline`, `StableDiffusionImg2ImgPipeline`, `StableDiffusionInpaintPipeline`, `StableDiffusionControlNetPipeline`, `StableDiffusionImg2ImgControlNetPipeline` or `StableDiffusionInpaintControlNetPipeline`, but all downloads contribute only to `StableDiffusionPipeline` here. This means we don't really know which pipeline class is used when the checkpoints are downloaded.
	- 3.) ControlNet is used a lot - it accounts for > 10% of all downloads
	- 4.) If a pipeline class and no compatible pipeline class shows up in the graph, we know that the pipeline class is not used a lot. For example:
	- `VersatileDiffusionPipeline` can only be used with "VersatileDiffusion" checkpoints and so here we no that all VersatileDiffusion classes combined have less than <2k monthly downloads.
	- `ConsistencyPipeline` can only be used with exactly this pipeline and here we're at < 100 monthly downloads
	- 5.) All LoRA and Textual Inversion downloads are grouped together in "other" for now.
	"""

	with gr.Blocks() as demo:
	gr.Markdown(NOTE)
	gr.DataFrame(df)

	demo.launch()