PaliGemma Inference and Fine Tuning

Large Language Model
Multi-Modality
Fine-Tuning
Built a PaliGemma model from scratch in PyTorch, loaded the 3B (224x224) model, fine-tuned it with LoRA for specific tasks, and developed a Gradio app to showcase its capabilities.
Author

Yuyang Zhang

Keywords

PaliGemma, PyTorch, LoRA, Gradio


Here is the full process of the Pali-Gemma

Figure 1: The Pali-Gemma model is a multi-modal large language model that integrates vision and language tasks. The full process includes data collection, model training, and fine-tuning for specific applications.
Back to top