Last Week in Medical AI: Top Research Papers/Models 🔥 🏅 (December 7 – December 14, 2024)
Medical LLM & Other Models - PediaBench: Chinese Pediatric LLM - Comprehensive pediatric dataset - Advanced benchmarking platform - Chinese healthcare innovation - BiMediX: Bilingual Medical LLM - Multilingual medical expertise - Diverse medical knowledge integration - Cross-cultural healthcare insights - MMedPO: Vision-Language Medical LLM - Clinical multimodal optimization - Advanced medical image understanding - Precision healthcare modeling
Frameworks and Methodologies - TOP-Training: Medical Q&A Framework - Hybrid RAG: Secure Medical Data Management - Zero-Shot ATC Clinical Coding - Chest X-Ray Diagnosis Architecture - Medical Imaging AI Democratization
Benchmarks & Evaluations - KorMedMCQA: Korean Healthcare Licensing Benchmark - Large Language Model Medical Tasks - Clinical T5 Model Performance Study - Radiology Report Quality Assessment - Genomic Analysis Benchmarking
Medical LLM Applications - BRAD: Digital Biology Language Model - TCM-FTP: Herbal Prescription Prediction - LLaSA: Activity Analysis via Sensors - Emergency Department Visit Predictions - Neurodegenerative Disease AI Diagnosis - Kidney Disease Explainable AI Model
Ethical AI & Privacy - Privacy-Preserving LLM Mechanisms - AI-Driven Digital Organism Modeling - Biomedical Research Automation - Multimodality in Medical Practice
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM
🇪🇺 Policy Thoughts in the EU AI Act Implementation 🇪🇺
There is a lot to like in the first draft of the EU GPAI Code of Practice, especially as regards transparency requirements. The Systemic Risks part, on the other hand, is concerning for both smaller developers and for external stakeholders.
I wrote more on this topic ahead of the next draft. TLDR: more attention to immediate large-scale risks and to collaborative solutions supported by evidence can help everyone - as long as developers disclose sufficient information about their design choices and deployment contexts.