LLM Part3: Alignment
On this page
In the part 1 and part 2 of the LLM series, we covered the architecture and inference techniques for LLMs. In this part 3, we will focus on alignment techniques, which are crucial for ensuring that LLMs behave in ways that are consistent with human values and intentions. We will explore various methods for aligning LLMs, including reinforcement learning from human feedback (RLHF), and discuss their implications for the development and deployment of these models. We will first explore the simple Supervised Fine-Tuning (SFT) approach, which involves fine-tuning LLMs on curated datasets that reflect human values and preferences. Than we will explore different RLHF techniques, which involve training LLMs using feedback from human evaluators to improve their alignment with human intentions. We will explore algorithms from PPO, DPO to GRPO and
1 Supervised Fine-Tuning (SFT)
2 Review of Reinforcement Learning
In this part, we will first review the basics of reinforcement learning, including key concepts such as rewards, policy, loss function, actor-critic methods, and more. This will provide a solid foundation for understanding the RLHF techniques we will explore later.