Spaces:

jacopoteneggi
/

IBYDMT

Sleeping

App Files Files Community

IBYDMT / app_lib /about.py

jacopoteneggi

Include How does it work tab

468f744 verified 6 months ago

raw

history blame contribute delete

9.83 kB

	import streamlit as st


	def about():
	_, centercol, _ = st.columns([1, 3, 1])
	with centercol:
	st.markdown(
	"""
	## Testing Semantic Importance via Betting

	We briefly present here the main ideas and contributions.
	"""
	)

	st.markdown("""### 1. Setup""")
	st.image(
	"./assets/about/setup.jpg",
	caption="Figure 1: Pictorial representation of the setup.",
	use_column_width=True,
	)

	st.markdown(
	"""
	We consider classification problems with:

	* Input image $X \in \mathcal{X}$.
	* Feature encoder $f:~\mathcal{X} \\to \mathbb{R}^d$ that maps input
	images to dense embeddings $H = f(X) \in \mathbb{R}^d$.
	* Classifier $g:~\mathbb{R}^d \\to [0,1]^k$ that separates embeddings
	into one of $k$ classes. We do not assume $g$ has a particular form and it
	can be any fixed, potentially nonlinear function.
	* Concept bank $c = [c_1, \dots, c_m] \in \mathbb{R}^{d \\times m}$ such
	that $c_j \in \mathbb{R}^d$ is the representation of the $j^{\\text{th}}$ concept.
	We assume thet $c$ is user-defined and that $m$ is small ($m \\approx 20$).
	* Semantics $Z = [Z_1, \dots, Z_m] = c^{\\top} H$ where $Z_j \in [-1, 1]$ represents the
	amount of concept $j$ present in the dense embedding of input image $X$.

	For example:

	* $f$ is the image encoder of a vision-language model (e.g., CLIP$^1$, OpenCLIP$^2$).
	* $g$ is the zero-shot classifier obtained by encoding ``A photo of a <CLASS_NAME>'' with the
	text encoder of the same vision-language model.
	* $c$ is obtained similarly by encoding the user-defined concepts.
	"""
	)

	st.markdown(
	"""
	### 2. Defining Semantic Importance

	Our goal is to test the statistical importance of the concepts in $c$ for the
	predictions of the given classifier on a particular image $x$ (capital letters denote random
	variables, and lowercase letters their realizations).

	We do not train a surrogate, interpretable model and instead consider the original, potentially
	nonlinear classifier $g$. This is because we want to study the semantic importance of
	the model that would be deployed in real-world settings and not a surrogate one that
	might decrease performance.

	We define importance from the perspective of conditional independence testing because
	it allows for rigorous statistical testing with false positive rate control
	(i.e., Type I error control). That is, the probability of falsely deeming a concept
	important is below a user-defined level $\\alpha \in (0,1)$.

	For an image $x$, a concept $j$, and a subset $S \subseteq [m] \setminus \{j\}$ (i.e., any
	subset that does not contain $j$), we define the null hypothesis:

	$$
	H_0:~\hat{Y}_{S \cup \{j\}} \overset{d}{=} \hat{Y}_S,
	$$
	where $\overset{d}{=}$ denotes equality in distribution, and $\\forall C \subseteq [m]$,
	$\hat{Y}_C = g(\widetilde{H}_C)$, $\widetilde{H}_C \sim P_{H \mid Z_C = z_C}$ is the conditional distribution of the dense
	embeddings given the observed concepts in $z_C$, i.e. the semantics of $x$.
	Then, rejecting $H_0$ means the concept $j$ affects the distribution of the response of
	the model, and it is important.
	"""
	)

	st.markdown(
	"""
	### 3. Sampling Conditional Embeddings
	"""
	)
	st.image(
	"./assets/about/local_dist.jpg",
	caption=(
	"Figure 2: Example test (i.e., with concept) and null (i.e., without"
	" concept) distributions for a class-specific concept and a non-class"
	" specific one on three images from the Imagenette dataset as a"
	" function of the size of S."
	),
	use_column_width=True,
	)
	st.markdown(
	"""
	In order to test for $H_0$ defined above, we need to sample from the conditional distribution
	of the dense embeddings given certain concepts. This can be seen as solving a linear inverse
	problem stochastically since $Z = c^{\\top} H$. In this work, given that $m$ is small, we use
	nonparametric kernel density estimation (KDE) methods to approximate the target distribution.

	Intuitively, given a dataset $\{(h^{(i)}, z^{(i)})\}_{i=1}^n$ of dense embeddings with
	their semantics, we:

	1. Use a weighted KDE to sample $\widetilde{Z} \sim P_{Z \mid Z_C = z_C}$, and then
	2. Retrieve the embedding $H^{(i')}$ whose concept representation $Z^{(i')}$ is the
	nearest neighbor of $\widetilde{Z}$ in the dataset.

	Details on the weighted KDE and the sampling procedure are included in the paper. Figure 2
	shows some example test (i.e., $\hat{Y}_{S \cup \{j\}}$) and
	null (i.e., $\hat{Y}_{S}$) distributions for a class-specific concept and a non-class
	specific one on three images from the Imagenette$^3$ dataset. We can see that the test
	distributions of class-specific concepts are skewed to the right, i.e. including the observed
	class-specific concept increases the output of the predictor. Furthermore, we see the shift
	decreases the more concepts are included in $S$, i.e. if $S$ is larger and it contains more
	information, then the marginal contribution of adding one concept will be smaller.
	On the other hand, including a non-class-specific concept does not change the distribution
	of the response of the model, no matter the size of $S$.
	"""
	)

	st.markdown(
	"""
	### 4. Testing by Betting

	Instead of classical hypothesis testing techniques based on $p$-values, we propose to
	test for the importance of concepts by betting.$^4$ This choice is motivated by two important
	properties of sequential tests:

	1. They are adaptive to the hardness of the problem. That is, the easier it is to reject
	a null hypothesis, the earlier the test will stop. This induce a natural ranking of importance
	across concepts: if concept $j$ rejects faster than $j'$, then $j$ is more important than $j'$.

	2. They are efficient because they only use as much data as needed to reject, instead of
	the entire data available as traditional, offline tests.

	Sequential tests instantiate a game between a bettor and nature. At every turn of the game,
	the bettor places a wager against the null hypothesis, and the nature reveals the truth. If
	the bettor wins, they will accumulate wealth, or loose some otherwise. More formally, the
	wealth process $\{K_t\}_{t \in \mathbb{N}_0}$ is defined as

	$$
	K_0 = 1, \\quad K_{t+1} = K_t \cdot (1 + v_t\kappa_t),
	$$
	where $v_t \in [-1,1]$ is a betting fraction, and $\kappa_t \in [-1,1]$ is the payoff of the bet.
	Under certain conditions, the wealth process describes a fair game, and for $\\alpha \in (0,1)$,
	it holds that

	$$
	\mathbb{P}_{H_0}[\exists t:~K_t \geq 1/\\alpha] \leq \\alpha.
	$$

	That is, the wealth process can be used to reject the null hypothesis $H_0$ with
	Type I error control at level $\\alpha$.

	Briefly, we use ideas of sequential kernelized independence testing (SKIT)$^5$ and define
	the payoff as

	$$
	\kappa_t \coloneqq \\tanh\left(\\rho_t(\hat{Y}_{S \cup \{j\}}) - \\rho_t(\hat{Y}_S)\\right)
	$$
	and
	$$
	\\rho_t = \widehat{\\text{MMD}}(\hat{Y}_{S \cup \{j\}}, \hat{Y}_S)
	$$
	is the plug-in estimator of the maximum mean discrepancy (MMD)$^6$ between the test and
	null distributions at time $t$. Furthermore, we use the online Newtown step (ONS)$^7$ method
	to choose the betting fraction $v_t$ and ensure exponential growth of the wealth.
	"""
	)

	st.markdown(
	"""
	---

	References

	[1] CLIP is available at https://github.com/openai/CLIP .

	[2] OpenCLIP is available at https://github.com/mlfoundations/open_clip .

	[3] The Imagenette dataset is available at https://github.com/fastai/imagenette .

	[4] Glenn Shafer. Testing by betting: A strategy for statistical and scientific communication.
	Journal of the Royal Statistical Society Series A: Statistics in Society, 184(2):407-431, 2021.

	[5] Aleksandr Podkopaev et al. Sequential kernelized independence testing. In International
	Conference on Machine Learning, pages 27957-27993. PMLR, 2023.

	[6] Arthur Gretton et al. A kernel two-sample test. The Journal of Machine Learning Research,
	13(1):723-773, 2012.

	[7] Ashok Cutkosky and Francesco Orabona. Black-box reductions for parameter-free online
	learning in banach spaces. In Conference On Learning Theory, pages 1493-1529. PMLR, 2018.
	"""
	)