jsulz HF staff commited on
Commit
cf047e8
·
1 Parent(s): cf4323e
Files changed (2) hide show
  1. README.md +85 -0
  2. app.py +9 -1
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Spaces Ship
2
+
3
+ This is a spaceship through Spaces.
4
+
5
+ I started this mostly as a way to see more Spaces that I was interested in. Since there aren't any search/filtering options outside of full-text search and searching for Space titles, I wanted more ways to look around and get inspired.
6
+
7
+ It expanded as I saw what information you can get from leveraging the APIs in the `huggingface_hub` client.
8
+
9
+ Short-term, I'm running a lot of this locally, but long-term my goal is to run [this script](https://github.com/jsulz/hf-spaces-stats-builder/blob/main/src/pipeline.py) every 2 weeks, which:
10
+
11
+ - Calls `list_spaces` to get all spaces and some high level metadata
12
+ - Calls `space_info` to get the next level of depth from each space
13
+ - Stores this into a Dataset on the Hub - [jsulz/space-stats](https://huggingface.co/datasets/jsulz/space-stats)
14
+ - Inspiration from this came from [cfahlgren1/hub-stats](cfahlgren1/hub-stats), but desiring one level of additional information (only available by making a lot of API calls)
15
+
16
+ I want this to be on a semi-regular cadence, but also respect that this takes in the realm of 12-15 hours (with some potential speedup from parallel )
17
+
18
+ This Space consumes that dataset into a Gradio app that has two tabs:
19
+
20
+ - Spaces Overview
21
+ - Spaces Search
22
+
23
+ The remaining content from here on out is a breakdown of what's in the Space, both tabs, and my feelings/thoughts about them after doing some digging.
24
+
25
+ # General
26
+
27
+ All of this needs context needs to live in the app in some form alongside the component. Avoiding that for the moment.
28
+
29
+ All of the labels and words that do exist need cleanup. Not worried about that for the moment.
30
+
31
+ # Spaces Overview
32
+
33
+ Charts exist for the following (commentary for each in sub-bullets):
34
+
35
+ - Growth of Spaces over Time
36
+ - This is a line chart that shows the number of spaces created over time. Shows all Spaces, regardless of status.
37
+ - Distribution of Spaces by SDK
38
+ - This is a pie chart that shows the distribution of Spaces by SDK. Can be either gradio, streamlit, docker, or static.
39
+ - Distribution of Spaces by Emoji
40
+ - This is a pie chart that shows the distribution of Spaces by Emoji. This is a bit silly, but could be fun to work on this more to make it visually funny/appealing.
41
+ - Relationship between Number of Spaces Created and Number of Likes
42
+ - This is a scatter plot that shows the relationship between the number of spaces created by an author and the number of likes. Not very interesting except for the outliers.
43
+ - Relationship between Space Emoji and Number of Likes
44
+ - This is a scatter plot that shows the relationship between the emoji used in a space and the number of likes. Similar take as with the other scatter plot.
45
+ - Hardware in Use
46
+ - This is a log scale bar chart of hardware in use. More interesting stuff here.
47
+ - Most Popular Model Authors
48
+ - Bar chart of most popular model authors whose models are used in Spaces.
49
+ - Most Used Models
50
+ - Bar chart of most popular models used in Spaces.
51
+ - Most Popular Dataset Authors
52
+ - Bar chart of most popular dataset authors whose models are used in Spaces.
53
+ - Most Used Datasets
54
+ - Bar chart of most popular datasets used in Spaces.
55
+ - Number of Duplicates by Space
56
+ - Table showing the most duplicated Spaces.
57
+ - Number of Likes by Space
58
+ - Table showing the most liked Spaces.
59
+ - Number of Spaces by Author
60
+ - Table showing the most prolific Spaces authors.
61
+ - Number of Likes by Author
62
+ - Table showing the authors with the most cumulative likes across all Spaces.
63
+
64
+ # Spaces Search
65
+
66
+ Filtration Options exist for the following (commentary for each in sub-bullets)
67
+
68
+ - Emojis
69
+ - Fun, not very useful.
70
+ - Likes
71
+ - Easy and helpful to see popular stuff.
72
+ - Authors
73
+ - Kinda fun, but so many authors with so little context.
74
+ - SDK/Tags
75
+ - Too many tags - lots of one-offs. Would maybe limit this to the top 10ish.
76
+ - Hardware
77
+ - More useful than I thought it would be.
78
+ - License
79
+ - Meh.
80
+ - Models
81
+ - Very cool, but lots of one-offs and not highly used. Would maybe limit this to the top 10ish.
82
+ - Datasets
83
+ - Same as models.
84
+ - Dev Mode
85
+ - The interesting thing about this is how little it's used.
app.py CHANGED
@@ -72,6 +72,7 @@ def filtered_df(
72
  filtered_models,
73
  filtered_datasets,
74
  space_licenses,
 
75
  ):
76
  """
77
  Filter the dataframe based on the given criteria.
@@ -143,6 +144,10 @@ def filtered_df(
143
  "r_licenses": "Licenses",
144
  }
145
  )
 
 
 
 
146
 
147
  return _df[["URL", "Likes", "Models", "Datasets", "Licenses"]]
148
 
@@ -238,7 +243,7 @@ with gr.Blocks(fill_width=True) as demo:
238
  emoji_likes,
239
  x="id",
240
  y="likes",
241
- title="Relationship between Emoji and Number of Likes",
242
  labels={"id": "Number of Spaces Created", "likes": "Number of Likes"},
243
  hover_data={"emoji": True},
244
  template="plotly_dark",
@@ -399,6 +404,7 @@ with gr.Blocks(fill_width=True) as demo:
399
  multiselect=True,
400
  )
401
 
 
402
  clear = gr.ClearButton(components=[
403
  emoji,
404
  author,
@@ -426,6 +432,7 @@ with gr.Blocks(fill_width=True) as demo:
426
  "r_models",
427
  "r_datasets",
428
  "r_licenses",
 
429
  ]
430
  ]
431
  )
@@ -440,6 +447,7 @@ with gr.Blocks(fill_width=True) as demo:
440
  models,
441
  datasets,
442
  space_license,
 
443
  ],
444
  datatype="html",
445
  wrap=True,
 
72
  filtered_models,
73
  filtered_datasets,
74
  space_licenses,
75
+ filtered_devmode,
76
  ):
77
  """
78
  Filter the dataframe based on the given criteria.
 
144
  "r_licenses": "Licenses",
145
  }
146
  )
147
+ if filtered_devmode:
148
+ _df = _df[
149
+ _df["devMode"] == filtered_devmode
150
+ ]
151
 
152
  return _df[["URL", "Likes", "Models", "Datasets", "Licenses"]]
153
 
 
243
  emoji_likes,
244
  x="id",
245
  y="likes",
246
+ title="Relationship between Space Emoji and Number of Likes",
247
  labels={"id": "Number of Spaces Created", "likes": "Number of Likes"},
248
  hover_data={"emoji": True},
249
  template="plotly_dark",
 
404
  multiselect=True,
405
  )
406
 
407
+ devmode = gr.Checkbox(label="Show Dev Mode Spaces")
408
  clear = gr.ClearButton(components=[
409
  emoji,
410
  author,
 
432
  "r_models",
433
  "r_datasets",
434
  "r_licenses",
435
+ 'devMode'
436
  ]
437
  ]
438
  )
 
447
  models,
448
  datasets,
449
  space_license,
450
+ devmode,
451
  ],
452
  datatype="html",
453
  wrap=True,