PaliGemma Inference and Fine Tuning

Large Language Model

Multi-Modality

Fine-Tuning

Built a PaliGemma model from scratch in PyTorch, loaded the 3B (224x224) model, fine-tuned it with LoRA for specific tasks, and developed a Gradio app to showcase its capabilities.

Author

Yuyang Zhang

Keywords

PaliGemma, PyTorch, LoRA, Gradio

Here is the full process of the Pali-Gemma

Figure 1: The Pali-Gemma model is a multi-modal large language model that integrates vision and language tasks. The full process includes data collection, model training, and fine-tuning for specific applications.