My general research interest lies in two directions (1) understand and harness the synergy between generative and understanding modeling objectives, and (2) align image and text in different modalities, especially when texts (and other arbitrary, non-natural structures such as graphs and flowcharts) appear in the visual representation.