Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Carbune, Victor; Mansoor, Hassan; Liu, Fangyu; Aralikatte, Rahul; Baechler, Gilles; Chen, Jindong; Sharma, Abhanshu

Computer Science > Computation and Language

arXiv:2403.12596 (cs)

[Submitted on 19 Mar 2024]

Title:Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Authors:Victor Carbune, Hassan Mansoor, Fangyu Liu, Rahul Aralikatte, Gilles Baechler, Jindong Chen, Abhanshu Sharma

View PDF HTML (experimental)

Abstract:Vision-language models (VLMs) are achieving increasingly strong performance on multimodal tasks. However, reasoning capabilities remain limited particularly for smaller VLMs, while those of large-language models (LLMs) have seen numerous improvements. We propose a technique to transfer capabilities from LLMs to VLMs. On the recently introduced ChartQA, our method obtains state-of-the-art performance when applied on the PaLI3-5B VLM by \citet{chen2023pali3}, while also enabling much better performance on PlotQA and FigureQA.
We first improve the chart representation by continuing the pre-training stage using an improved version of the chart-to-table translation task by \citet{liu2023deplot}. We then propose constructing a 20x larger dataset than the original training set. To improve general reasoning capabilities and improve numerical operations, we synthesize reasoning traces using the table representation of charts. Lastly, our model is fine-tuned using the multitask loss introduced by \citet{hsieh2023distilling}.
Our variant ChartPaLI-5B outperforms even 10x larger models such as PaLIX-55B without using an upstream OCR system, while keeping inference time constant compared to the PaLI3-5B baseline. When rationales are further refined with a simple program-of-thought prompt \cite{chen2023program}, our model outperforms the recently introduced Gemini Ultra and GPT-4V.

Comments:	Findings of NAACL 2024
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2403.12596 [cs.CL]
	(or arXiv:2403.12596v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.12596

Submission history

From: Rahul Aralikatte [view email]
[v1] Tue, 19 Mar 2024 10:03:07 UTC (1,419 KB)

Computer Science > Computation and Language

Title:Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators