LLM Part2: Inference

Large Language Model

Inference

In this blog, we will going through the inference process of LLMs, including how to effectively use them for various tasks. We will explore techniques such as prompt engineering, few-shot learning, and more. By the end, you will have a solid understanding of how to leverage LLMs for your own applications.

Author

Yuyang Zhang

Published

2025-09-24

Last modified

2025-09-21

In the part 1 of the LLM series, we covered the architecture of LLMs, including key components such as position encoding, attention mechanisms, and more. We also explored various normalization techniques and the training process for these models. In the part 2, we will explore different inference techniques, which is necessary for effectively utilizing LLMs in real-world applications. And we will also explore several practical examples and use cases to illustrate these techniques in action, such as:

vLLM

https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ # Resource Different Architecture: - Mixture of Recursion: https://arxiv.org/abs/2507.10524 - Diffusion Text: