Display images in chat responses
4M: Massively Multimodal Masked Modeling
Extract real estate details from text