Stanford AI

university

https://www.ai.stanford.edu

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

yixinli authored a paper about 2 months ago

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning

Kameshr authored a paper 3 months ago

Think Beyond Size: Adaptive Prompting for More Effective Reasoning

francisengelmann authored a paper 5 months ago

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

View all activity

Articles

SmolVLM2: Bringing Video Understanding to Every Device

Stanford's activity

nicholswang

authored a paper 2 days ago

Video Action Differencing

Paper • 2503.07860 • Published 3 days ago • 28

Muennighoff

authored a paper about 1 month ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 111

yixinli

authored a paper about 2 months ago

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning

Paper • 2411.19458 • Published Nov 29, 2024 • 6

nicholswang

authored 2 papers about 2 months ago

Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published Jan 23 • 22

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Paper • 2501.07171 • Published Jan 13 • 50

nicholswang

authored 10 papers 3 months ago

Action Sensitivity Learning for Temporal Action Localization

Paper • 2305.15701 • Published May 25, 2023

Whitening-based Contrastive Learning of Sentence Embeddings

Paper • 2305.17746 • Published May 28, 2023

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Paper • 2305.18010 • Published May 29, 2023

Describing Differences in Image Sets with Natural Language

Paper • 2312.02974 • Published Dec 5, 2023 • 16

Clustering based Point Cloud Representation Learning for 3D Analysis

Paper • 2307.14605 • Published Jul 27, 2023

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

Paper • 2307.16377 • Published Jul 31, 2023

Bird's-Eye-View Scene Graph for Vision-Language Navigation

Paper • 2308.04758 • Published Aug 9, 2023

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15, 2024 • 35

Why are Visually-Grounded Language Models Bad at Image Classification?

Paper • 2405.18415 • Published May 28, 2024

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 140

brando

authored 3 papers 5 months ago

Are Emergent Abilities of Large Language Models a Mirage?

Paper • 2304.15004 • Published Apr 28, 2023 • 6

ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment

Paper • 2410.18194 • Published Oct 23, 2024 • 6

Pantograph: A Machine-to-Machine Interaction Interface for Advanced Theorem Proving, High Level Reasoning, and Data Extraction in Lean 4

Paper • 2410.16429 • Published Oct 21, 2024 • 5

Muennighoff

authored 2 papers 6 months ago

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 108

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 78