Abstract: Despite encouraging progress in 3D scene understanding, it remains challenging to develop an effective Large Multimodal Model (LMM) that is capable of understanding and reasoning in complex ...