LlamaFactory: Unified Fine-Tuning for 100+ LLMs

转载 hiyouga

S 精选进阶深度解析 | 约 19 分钟阅读更新于 2026-03-06

本文为开源社区精选内容，由 hiyouga 原创。文中链接将跳转到原始仓库，部分图片可能加载较慢。

AI 导读

Used by Amazon, NVIDIA, Aliyun, etc. Supporters Warp, the agentic terminal for developersAvailable for MacOS, Linux, & Windows Easily fine-tune 100+ large language models with zero-code CLI and Web...

Used by Amazon, NVIDIA, Aliyun, etc.

被 Amazon, NVIDIA, 阿里云等使用。

Supporters ❤️

支持者 ❤️

Warp, the agentic terminal for developers Available for MacOS, Linux, & Windows

Easily fine-tune 100+ large language models with zero-code CLI and Web UI

通过零代码 CLI (Command Line Interface) 和 Web UI (Web User Interface) 轻松微调 100+ 大型语言模型

👋 Join our WeChat, NPU, Lab4AI, LLaMA Factory Online user group.

👋 加入我们的微信、NPU、Lab4AI、LLaMA Factory Online 用户群。

[ English | 中文 ]

Fine-tuning a large language model can be easy as...

微调大型语言模型可以像...一样简单

https://github.com/user-attachments/assets/3991a3a8-4276-4d30-9cab-4cb0c4b9b99e

Start local training:

开始本地训练：

Please refer to usage

请参考用法 (usage)

Start cloud training:

开始云端训练：

Colab (free): https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing

Colab (免费): https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing

PAI-DSW (free trial): https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory

PAI-DSW (免费试用): https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory

LLaMA Factory Online: https://www.llamafactory.com.cn/?utm_source=LLaMA-Factory

LLaMA Factory Online: https://www.llamafactory.com.cn/?utm_source=LLaMA-Factory

Alaya NeW (cloud GPU deal): https://docs.alayanew.com/docs/documents/useGuide/LLaMAFactory/mutiple/?utm_source=LLaMA-Factory

Alaya NeW (云GPU优惠): https://docs.alayanew.com/docs/documents/useGuide/LLaMAFactory/mutiple/?utm_source=LLaMA-Factory

Read technical notes:

阅读技术说明：

Documentation (WIP): https://llamafactory.readthedocs.io/en/latest/

文档 (开发中/WIP): https://llamafactory.readthedocs.io/en/latest/

Documentation (AMD GPU): https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/fine_tune/llama_factory_llama3.html

文档 (AMD GPU): https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/fine_tune/llama_factory_llama3.html

Official Blog: https://blog.llamafactory.net/en/

官方博客: https://blog.llamafactory.net/en/

Official Course: https://www.lab4ai.cn/course/detail?id=7c13e60f6137474eb40f6fd3983c0f46&utm_source=LLaMA-Factory

官方课程: https://www.lab4ai.cn/course/detail?id=7c13e60f6137474eb40f6fd3983c0f46&utm_source=LLaMA-Factory

[!NOTE] Except for the above links, all other websites are unauthorized third-party websites. Please carefully use them.

[!NOTE] 除上述链接外，所有其他网站均为未经授权的第三方网站。请谨慎使用。

Features
Blogs
Changelog

更新日志 (Changelog)

Supported Models

支持的模型

Supported Training Approaches

支持的训练方法

Provided Datasets

提供的数据集

Requirement

要求 (Requirement)

Getting Started

Installation

快速开始安装

Data Preparation

数据准备

Quickstart

快速启动 (Quickstart)

Fine-Tuning with LLaMA Board GUI

使用 LLaMA Board GUI (Graphical User Interface) 进行微调

LLaMA Factory Online

Build Docker

构建 Docker

Deploy with OpenAI-style API and vLLM

使用 OpenAI-style API (Application Programming Interface) 和 vLLM 进行部署

Download from ModelScope Hub

从 ModelScope Hub 下载

Download from Modelers Hub

从 Modelers Hub 下载

Use W&B Logger

使用 W&B Logger

Use SwanLab Logger

使用 SwanLab Logger

Projects using LLaMA Factory

使用 LLaMA Factory 的项目

License

许可 (License)

Citation

Acknowledgement

Features

Various models: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen3, Qwen3-VL, DeepSeek, Gemma, GLM, Phi, etc.

多种模型：LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen3、Qwen3-VL、DeepSeek、Gemma、GLM、Phi 等。

Integrated methods: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc.

集成方法：（连续）预训练 (pre-training)、（多模态）监督微调 (supervised fine-tuning)、奖励建模 (reward modeling)、PPO、DPO、KTO、ORPO 等。

Scalable resources: 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.

可扩展资源：16 位全参数微调 (full-tuning)、冻结微调 (freeze-tuning)、LoRA 以及通过 AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ 实现的 2/3/4/5/6/8 位 QLoRA。

Advanced algorithms: GaLore, BAdam, APOLLO, Adam-mini, Muon, OFT, DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ and PiSSA.

高级算法：GaLore、BAdam、APOLLO、Adam-mini、Muon、OFT、DoRA、LongLoRA、LLaMA Pro、Mixture-of-Depths、LoRA+、LoftQ 和 PiSSA。

Practical tricks: FlashAttention-2, Unsloth, Liger Kernel, KTransformers, RoPE scaling, NEFTune and rsLoRA.

实用技巧：FlashAttention-2、Unsloth、Liger Kernel、KTransformers、RoPE scaling、NEFTune 和 rsLoRA。

Wide tasks: Multi-turn dialogue, tool using, image understanding, visual grounding, video recognition, audio understanding, etc.

广泛的任务：多轮对话、工具使用、图像理解、视觉定位 (visual grounding)、视频识别、音频理解等。

Experiment monitors: LlamaBoard, TensorBoard, Wandb, MLflow, SwanLab, etc.

实验监控器：LlamaBoard、TensorBoard、Wandb、MLflow、SwanLab 等。

Faster inference: OpenAI-style API, Gradio UI and CLI with vLLM worker or SGLang worker.

更快的推理：OpenAI 风格的 API、Gradio UI 和带有 vLLM worker 或 SGLang worker 的 CLI。

Day-N Support for Fine-Tuning Cutting-Edge Models

为微调前沿模型提供 Day-N 支持

Support Date	Model Name
Day 0	Qwen3 / Qwen2.5-VL / Gemma 3 / GLM-4.1V / InternLM 3 / MiniCPM-o-2.6
Day 1	Llama 3 / GLM-4 / Mistral Small / PaliGemma2 / Llama 4

Blogs

[!TIP] Now we have a dedicated blog for LLaMA Factory!

Website: https://blog.llamafactory.net/en/

[!提示] 我们现在有一个专门为 LLaMA Factory 设立的博客！网站：https://blog.llamafactory.net/en/

💡 KTransformers Fine-Tuning × LLaMA Factory: Fine-tuning 1000 Billion models with 2 4090-GPU + CPU (English)

💡 KTransformers 微调 × LLaMA Factory：使用 2 个 4090-GPU + CPU 微调 1000 亿参数模型 (英文)

💡 Easy Dataset × LLaMA Factory: Enabling LLMs to Efficiently Learn Domain Knowledge (English)

💡 Easy Dataset × LLaMA Factory：使 LLM 能够有效地学习领域知识 (英文)

Fine-tune a mental health LLM using LLaMA-Factory (Chinese)

使用 LLaMA-Factory 微调心理健康 LLM (中文)

Fine-tune GPT-OSS for Role-Playing using LLaMA-Factory (Chinese)

使用 LLaMA-Factory 微调用于角色扮演的 GPT-OSS (中文)

A One-Stop Code-Free Model Reinforcement Learning and Deployment Platform based on LLaMA-Factory and EasyR1 (Chinese)

基于 LLaMA-Factory 和 EasyR1 的一站式免代码模型强化学习和部署平台 (中文)

How Apoidea Group enhances visual information extraction from banking documents with multimodal models using LLaMA-Factory on Amazon SageMaker HyperPod (English)

Apoidea Group 如何在 Amazon SageMaker HyperPod 上使用 LLaMA-Factory 通过多模态模型增强银行文档中的视觉信息提取 (英文)

All Blogs

Fine-tune Llama3.1-70B for Medical Diagnosis using LLaMA-Factory (Chinese)

使用 LLaMA-Factory 微调用于医疗诊断的 Llama3.1-70B (中文)

Fine-tune Qwen2.5-VL for Autonomous Driving using LLaMA-Factory (Chinese)

使用 LLaMA-Factory 微调用于自动驾驶的 Qwen2.5-VL (中文)

LLaMA Factory: Fine-tuning the DeepSeek-R1-Distill-Qwen-7B Model for News Classifier (Chinese)

LLaMA Factory：微调用于新闻分类器的 DeepSeek-R1-Distill-Qwen-7B 模型 (中文)

A One-Stop Code-Free Model Fine-Tuning & Deployment Platform based on SageMaker and LLaMA-Factory (Chinese)

基于 SageMaker 和 LLaMA-Factory 的一站式免代码模型微调和部署平台 (中文)

LLaMA Factory Multi-Modal Fine-Tuning Practice: Fine-Tuning Qwen2-VL for Personal Tourist Guide (Chinese)

LLaMA Factory 多模态微调实践：微调用于个人旅游指南的 Qwen2-VL (中文)

LLaMA Factory: Fine-tuning Llama3 for Role-Playing (Chinese)

LLaMA Factory：微调用于角色扮演的 Llama3 (中文)

Changelog

更新日志

[25/10/26] We support Megatron-core training backend with mcore_adapter. See PR #9237 to get started.

[25/10/26] 我们支持带有 mcore_adapter 的 Megatron-core 训练后端。请参阅 PR #9237 开始使用。

[25/08/22] We supported OFT and OFTv2. See examples for usage.

[25/08/22] 我们支持 OFT 和 OFTv2。请参阅示例了解用法。

[25/08/20] We supported fine-tuning the Intern-S1-mini models. See PR #8976 to get started.

[25/08/20] 我们支持微调 Intern-S1-mini 模型。请参阅 PR #8976 开始使用。

[25/08/06] We supported fine-tuning the GPT-OSS models. See PR #8826 to get started.

[25/08/06] 我们支持微调 GPT-OSS 模型。请参阅 PR #8826 开始使用。

Full Changelog

[25/07/02] We supported fine-tuning the GLM-4.1V-9B-Thinking model.

[25/07/02] 我们支持微调 GLM-4.1V-9B-Thinking 模型。

[25/04/28] We supported fine-tuning the Qwen3 model family.

[25/04/28] 我们支持微调 Qwen3 模型系列。

[25/04/21] We supported the Muon optimizer. See examples for usage. Thank @tianshijing's PR.

[25/04/21] 我们支持 Muon 优化器。请参阅示例了解用法。感谢 @tianshijing 的 PR。

[25/04/16] We supported fine-tuning the InternVL3 model. See PR #7258 to get started.

[25/04/16] 我们支持微调 InternVL3 模型。请参阅 PR #7258 开始使用。

[25/04/06] We supported fine-tuning the Llama 4 model. See PR #7611 to get started.

[25/04/06] 我们支持微调 Llama 4 模型。请参阅 PR #7611 开始使用。

[25/03/31] We supported fine-tuning the Qwen2.5 Omni model. See PR #7537 to get started.

[25/03/31] 我们支持微调 Qwen2.5 Omni 模型。请参阅 PR #7537 开始使用。

[25/03/15] We supported SGLang as inference backend. Try infer_backend: sglang to accelerate inference.

[25/03/15] 我们支持 SGLang 作为推理后端。尝试 infer_backend: sglang 以加速推理。

[25/02/24] Announcing EasyR1, an efficient, scalable and multi-modality RL training framework for efficient GRPO training.

[25/02/24] 宣布推出 EasyR1，这是一个高效、可扩展且多模态的 RL 训练框架，用于高效的 GRPO 训练。

[25/02/11] We supported saving the Ollama modelfile when exporting the model checkpoints. See examples for usage.

[25/02/11] 我们支持在导出模型检查点时保存 Ollama modelfile。请参阅示例了解用法。

[25/02/05] We supported fine-tuning the Qwen2-Audio and MiniCPM-o-2.6 on audio understanding tasks.

[25/02/05] 我们支持在音频理解任务上微调 Qwen2-Audio 和 MiniCPM-o-2.6。

[25/01/15] We supported APOLLO optimizer. See examples for usage.

[25/01/15] 我们支持了 APOLLO 优化器 (APOLLO optimizer)。请查看示例了解用法。

[25/01/14] We supported fine-tuning the MiniCPM-o-2.6 and MiniCPM-V-2.6 models. Thank @BUAADreamer's PR.

[25/01/14] 我们支持了对 MiniCPM-o-2.6 和 MiniCPM-V-2.6 模型进行微调。感谢 @BUAADreamer 的 PR。

[25/01/14] We supported fine-tuning the InternLM 3 models. Thank @hhaAndroid's PR.

[25/01/14] 我们支持了对 InternLM 3 模型进行微调。感谢 @hhaAndroid 的 PR。

[24/12/21] We supported using SwanLab for experiment tracking and visualization. See this section for details.

[24/12/21] 我们支持使用 SwanLab 进行实验跟踪和可视化。请参阅此部分了解详细信息。

[24/10/09] We supported downloading pre-trained models and datasets from the Modelers Hub. See this tutorial for usage.

[24/10/09] 我们支持从 Modelers Hub 下载预训练模型和数据集。请参阅本教程了解用法。

[24/09/19] We supported fine-tuning the Qwen2.5 models.

[24/09/19] 我们支持对 Qwen2.5 模型进行微调。

[24/08/30] We supported fine-tuning the Qwen2-VL models. Thank @simonJJJ's PR.

[24/08/30] 我们支持对 Qwen2-VL 模型进行微调。感谢 @simonJJJ 的 PR。

[24/08/27] We supported Liger Kernel. Try enable_liger_kernel: true for efficient training.

[24/08/27] 我们支持 Liger Kernel (Liger Kernel)。尝试启用 enable_liger_kernel: true 以进行高效训练。

[24/08/09] We supported Adam-mini optimizer. See examples for usage. Thank @relic-yuexi's PR.

[24/08/09] 我们支持 Adam-mini 优化器 (Adam-mini optimizer)。请查看示例了解用法。感谢 @relic-yuexi 的 PR。

[24/07/04] We supported contamination-free packed training. Use neat_packing: true to activate it. Thank @chuan298's PR.

[24/07/04] 我们支持无污染的打包训练 (contamination-free packed training)。使用 neat_packing: true 来激活它。感谢 @chuan298 的 PR。

[24/06/16] We supported PiSSA algorithm. See examples for usage.

[24/06/16] 我们支持 PiSSA 算法 (PiSSA algorithm)。请查看示例了解用法。

[24/06/07] We supported fine-tuning the Qwen2 and GLM-4 models.

[24/06/07] 我们支持对 Qwen2 和 GLM-4 模型进行微调。

[24/05/26] We supported SimPO algorithm for preference learning. See examples for usage.

[24/05/26] 我们支持用于偏好学习的 SimPO 算法 (SimPO algorithm)。请查看示例了解用法。

[24/05/20] We supported fine-tuning the PaliGemma series models. Note that the PaliGemma models are pre-trained models, you need to fine-tune them with paligemma template for chat completion.

[24/05/20] 我们支持对 PaliGemma 系列模型进行微调。请注意，PaliGemma 模型是预训练模型，您需要使用 paligemma 模板进行微调以实现聊天补全。

[24/05/18] We supported KTO algorithm for preference learning. See examples for usage.

[24/05/18] 我们支持用于偏好学习的 KTO 算法 (KTO algorithm)。请查看示例了解用法。

[24/05/14] We supported training and inference on the Ascend NPU devices. Check installation section for details.

[24/05/14] 我们支持在 Ascend NPU 设备上进行训练和推理。请查看安装部分了解详细信息。

[24/04/26] We supported fine-tuning the LLaVA-1.5 multimodal LLMs. See examples for usage.

[24/04/26] 我们支持对 LLaVA-1.5 多模态 LLM 进行微调。请查看示例了解用法。

[24/04/22] We provided a Colab notebook for fine-tuning the Llama-3 model on a free T4 GPU. Two Llama-3-derived models fine-tuned using LLaMA Factory are available at Hugging Face, check Llama3-8B-Chinese-Chat and Llama3-Chinese for details.

[24/04/22] 我们提供了一个 Colab notebook，用于在免费的 T4 GPU 上微调 Llama-3 模型。在 Hugging Face 上提供了两个使用 LLaMA Factory 微调的 Llama-3 派生模型，请查看 Llama3-8B-Chinese-Chat 和 Llama3-Chinese 了解详细信息。

[24/04/21] We supported Mixture-of-Depths according to AstraMindAI's implementation. See examples for usage.

[24/04/21] 我们根据 AstraMindAI 的实现支持了 Mixture-of-Depths (Mixture-of-Depths)。请查看示例了解用法。

[24/04/16] We supported BAdam optimizer. See examples for usage.

[24/04/16] 我们支持 BAdam 优化器 (BAdam optimizer)。请查看示例了解用法。

[24/04/16] We supported unsloth's long-sequence training (Llama-2-7B-56k within 24GB). It achieves 117% speed and 50% memory compared with FlashAttention-2, more benchmarks can be found in this page.

[24/04/16] 我们支持 unsloth 的长序列训练（在 24GB 内的 Llama-2-7B-56k）。与 FlashAttention-2 相比，它实现了 117% 的速度和 50% 的内存，更多基准测试可以在此页面中找到。

[24/03/31] We supported ORPO. See examples for usage.

[24/03/31] 我们支持 ORPO (ORPO)。请查看示例了解用法。

[24/03/21] Our paper "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models" is available at arXiv!

[24/03/21] 我们的论文 "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models" 可以在 arXiv 上找到！

[24/03/20] We supported FSDP+QLoRA that fine-tunes a 70B model on 2x24GB GPUs. See examples for usage.

[24/03/20] 我们支持 FSDP+QLoRA (FSDP+QLoRA)，可以在 2x24GB GPU 上微调 70B 模型。请查看示例了解用法。

[24/03/13] We supported LoRA+. See examples for usage.

[24/03/13] 我们支持 LoRA+ (LoRA+)。请查看示例了解用法。

[24/03/07] We supported GaLore optimizer. See examples for usage.

[24/03/07] 我们支持 GaLore 优化器 (GaLore optimizer)。请查看示例了解用法。

[24/03/07] We integrated vLLM for faster and concurrent inference. Try infer_backend: vllm to enjoy 270% inference speed.

[24/03/07] 我们集成了 vLLM，以实现更快和并发的推理。尝试 infer_backend: vllm 以享受 270% 的推理速度。

[24/02/28] We supported weight-decomposed LoRA (DoRA). Try use_dora: true to activate DoRA training.

[24/02/28] 我们支持权重分解的 LoRA (DoRA) (weight-decomposed LoRA (DoRA))。尝试 use_dora: true 以激活 DoRA 训练。

[24/02/15] We supported block expansion proposed by LLaMA Pro. See examples for usage.

[24/02/15] 我们支持 LLaMA Pro 提出的块扩展 (block expansion)。请查看示例了解用法。

[24/02/05] Qwen1.5 (Qwen2 beta version) series models are supported in LLaMA-Factory. Check this blog post for details.

[24/02/05] LLaMA-Factory 支持 Qwen1.5 (Qwen2 beta 版本) 系列模型。请查看此博客文章了解详细信息。

[24/01/18] We supported agent tuning for most models, equipping model with tool using abilities by fine-tuning with dataset: glaive_toolcall_en.

[24/01/18] 我们支持对大多数模型进行 Agent Tuning (agent tuning)，通过使用 dataset: glaive_toolcall_en 进行微调，使模型具备使用工具的能力。

[23/12/23] We supported unsloth's implementation to boost LoRA tuning for the LLaMA, Mistral and Yi models. Try use_unsloth: true argument to activate unsloth patch. It achieves 170% speed in our benchmark, check this page for details.

[23/12/23] 我们支持 unsloth 的实现，以提升 LLaMA、Mistral 和 Yi 模型的 LoRA 调优。尝试使用 use_unsloth: true 参数来激活 unsloth 补丁。在我们的基准测试中，它实现了 170% 的速度，请查看此页面了解详细信息。

[23/12/12] We supported fine-tuning the latest MoE model Mixtral 8x7B in our framework. See hardware requirement here.

[23/12/12] 我们的框架支持微调最新的 MoE 模型 Mixtral 8x7B。请在此处查看硬件要求。

[23/12/01] We supported downloading pre-trained models and datasets from the ModelScope Hub. See this tutorial for usage.

[23/12/01] 我们支持从 ModelScope Hub 下载预训练模型和数据集。请参阅本教程了解用法。

[23/10/21] We supported NEFTune trick for fine-tuning. Try neftune_noise_alpha: 5 argument to activate NEFTune.

[23/10/21] 我们支持用于微调的 NEFTune 技巧 (NEFTune trick)。尝试 neftune_noise_alpha: 5 参数来激活 NEFTune。

[23/09/27] We supported $S^2$-Attn proposed by LongLoRA for the LLaMA models. Try shift_attn: true argument to enable shift short attention.

[23/09/27] 我们支持 LongLoRA 提出的用于 LLaMA 模型的 $S^2$-Attn ($S^2$-Attn)。尝试 shift_attn: true 参数以启用移位短时注意力。

[23/09/23] We integrated MMLU, C-Eval and CMMLU benchmarks in this repo. See examples for usage.

[23/09/23] 我们在此 repo 中集成了 MMLU、C-Eval 和 CMMLU 基准测试。请查看示例了解用法。

[23/09/10] We supported FlashAttention-2. Try flash_attn: fa2 argument to enable FlashAttention-2 if you are using RTX4090, A100 or H100 GPUs.

[23/09/10] 我们支持 FlashAttention-2 (FlashAttention-2)。如果您使用的是 RTX4090、A100 或 H100 GPU，请尝试 flash_attn: fa2 参数以启用 FlashAttention-2。

[23/08/12] We supported RoPE scaling to extend the context length of the LLaMA models. Try rope_scaling: linear argument in training and rope_scaling: dynamic argument at inference to extrapolate the position embeddings.

[23/08/12] 我们支持 RoPE scaling (RoPE scaling) 来扩展 LLaMA 模型的上下文长度。在训练中尝试 rope_scaling: linear 参数，在推理时尝试 rope_scaling: dynamic 参数，以推断位置嵌入。

[23/08/11] We supported DPO training for instruction-tuned models. See examples for usage.

[23/08/11] 我们支持 DPO 训练 (DPO training) 用于指令调优模型。请查看示例了解用法。

[23/07/31] We supported dataset streaming. Try streaming: true and max_steps: 10000 arguments to load your dataset in streaming mode.

[23/07/31] 我们支持了数据集流式传输。尝试使用 streaming: true 和 max_steps: 10000 参数以流式模式加载您的数据集。

[23/07/18] We developed an all-in-one Web UI for training, evaluation and inference. Try train_web.py to fine-tune models in your Web browser. Thank @KanadeSiina and @codemayq for their efforts in the development.

[23/07/18] 我们开发了一个一体化 Web UI，用于训练、评估和推理。尝试 train_web.py 在您的 Web 浏览器中微调模型。感谢 @KanadeSiina 和 @codemayq 在开发中所做的努力。

[23/07/09] We released FastEdit ⚡🩹, an easy-to-use package for editing the factual knowledge of large language models efficiently. Please follow FastEdit if you are interested.

[23/07/09] 我们发布了 FastEdit ⚡🩹，一个易于使用的软件包，用于高效地编辑大型语言模型的事实知识。如果您有兴趣，请关注 FastEdit。

[23/06/29] We provided a reproducible example of training a chat model using instruction-following datasets, see Baichuan-7B-sft for details.

[23/06/29] 我们提供了一个使用 instruction-following 数据集训练聊天模型的可复现示例，详见 Baichuan-7B-sft。

[23/06/22] We aligned the demo API with the OpenAI's format where you can insert the fine-tuned model in arbitrary ChatGPT-based applications.

[23/06/22] 我们将演示 API 与 OpenAI 的格式对齐，您可以在任意基于 ChatGPT 的应用程序中插入微调后的模型。

[23/06/03] We supported quantized training and inference (aka QLoRA). See examples for usage.

[23/06/03] 我们支持量化训练和推理（又名 QLoRA (Quantization-Aware Low-Rank Adaptation)）。请参阅示例了解用法。

[!TIP] If you cannot use the latest feature, please pull the latest code and install LLaMA-Factory again.

[!TIP] 如果您无法使用最新功能，请拉取最新代码并重新安装 LLaMA-Factory。

Supported Models

支持的模型

Model	Model size	Template
BLOOM/BLOOMZ	560M/1.1B/1.7B/3B/7.1B/176B	-
DeepSeek (LLM/Code/MoE)	7B/16B/67B/236B	deepseek
DeepSeek 3-3.2	236B/671B	deepseek3
DeepSeek R1 (Distill)	1.5B/7B/8B/14B/32B/70B/671B	deepseekr1
ERNIE-4.5	0.3B/21B/300B	ernie_nothink
Falcon/Falcon H1	0.5B/1.5B/3B/7B/11B/34B/40B/180B	falcon/falcon_h1
Gemma/Gemma 2/CodeGemma	2B/7B/9B/27B	gemma/gemma2
Gemma 3/Gemma 3n	270M/1B/4B/6B/8B/12B/27B	gemma3/gemma3n
GLM-4/GLM-4-0414/GLM-Z1	9B/32B	glm4/glmz1
GLM-4.5/GLM-4.5(6)V	9B/106B/355B	glm4_moe/glm4_5v
GPT-2	0.1B/0.4B/0.8B/1.5B	-
GPT-OSS	20B/120B	gpt_oss
Granite 3-4	1B/2B/3B/7B/8B	granite3/granite4
Hunyuan/Hunyuan1.5 (MT)	0.5B/1.8B/4B/7B/13B	hunyuan/hunyuan_small
InternLM 2-3	7B/8B/20B	intern2
InternVL 2.5-3.5	1B/2B/4B/8B/14B/30B/38B/78B/241B	intern_vl
Intern-S1-mini	8B	intern_s1
Kimi-VL	16B	kimi_vl
Ling 2.0 (mini/flash)	16B/100B	bailing_v2
LFM 2.5 (VL)	1.2B/1.6B	lfm2/lfm2_vl
Llama	7B/13B/33B/65B	-
Llama 2	7B/13B/70B	llama2
Llama 3-3.3	1B/3B/8B/70B	llama3
Llama 4	109B/402B	llama4
Llama 3.2 Vision	11B/90B	mllama
LLaVA-1.5	7B/13B	llava
LLaVA-NeXT	7B/8B/13B/34B/72B/110B	llava_next
LLaVA-NeXT-Video	7B/34B	llava_next_video
MiMo	7B/309B	mimo/mimo_v2
MiniCPM 4	0.5B/8B	cpm4
MiniCPM-o/MiniCPM-V 4.5	8B/9B	minicpm_o/minicpm_v
MiniMax-M1/MiniMax-M2	229B/456B	minimax1/minimax2
Ministral 3	3B/8B/14B	ministral3
Mistral/Mixtral	7B/8x7B/8x22B	mistral
PaliGemma/PaliGemma2	3B/10B/28B	paligemma
Phi-3/Phi-3.5	4B/14B	phi
Phi-3-small	7B	phi_small
Phi-4-mini/Phi-4	3.8B/14B	phi4_mini/phi4
Pixtral	12B	pixtral
Qwen2 (Code/Math/MoE/QwQ)	0.5B/1.5B/3B/7B/14B/32B/72B/110B	qwen
Qwen3 (MoE/Instruct/Thinking/Next)	0.6B/1.7B/4B/8B/14B/32B/80B/235B	qwen3/qwen3_nothink
Qwen3.5	0.8B/2B/4B/9B/27B/35B/122B/397B	qwen3_5
Qwen2-Audio	7B	qwen2_audio
Qwen2.5-Omni	3B/7B	qwen2_omni
Qwen3-Omni	30B	qwen3_omni
Qwen2-VL/Qwen2.5-VL/QVQ	2B/3B/7B/32B/72B	qwen2_vl
Qwen3-VL	2B/4B/8B/30B/32B/235B	qwen3_vl
Seed (OSS/Coder)	8B/36B	seed_oss/seed_coder
StarCoder 2	3B/7B/15B	-
TeleChat 2-2.5	3B/7B/35B/115B	telechat2
Yuan 2	2B/51B/102B	yuan

[!NOTE] For the "base" models, the template argument can be chosen from default, alpaca, vicuna etc. But make sure to use the corresponding template for the "instruct/chat" models.

If the model has both reasoning and non-reasoning versions, please use the _nothink suffix to distinguish between them. For example, qwen3 and qwen3_nothink.

Remember to use the SAME template in training and inference.

*: You should install the transformers from main branch and use DISABLE_VERSION_CHECK=1 to skip version check.

**: You need to install a specific version of transformers to use the corresponding model.

[!NOTE] 对于“base”模型，模板参数可以从 default、alpaca、vicuna 等中选择。但请确保为“instruct/chat”模型使用相应的模板。如果模型既有推理版本，也有非推理版本，请使用 _nothink 后缀来区分它们。例如，qwen3 和 qwen3_nothink。请记住在训练和推理中使用相同的模板。 *: 您应该从 main 分支安装 transformers，并使用 DISABLE_VERSION_CHECK=1 来跳过版本检查。 **: 您需要安装特定版本的 transformers 才能使用相应的模型。

Please refer to constants.py for a full list of models we supported.

请参阅 constants.py 以获取我们支持的模型的完整列表。

You also can add a custom chat template to template.py.

您还可以向 template.py 添加自定义聊天模板。

Supported Training Approaches

支持的训练方法

Approach	Full-tuning	Freeze-tuning	LoRA	QLoRA	OFT	QOFT
Pre-Training	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:
Supervised Fine-Tuning	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:
Reward Modeling	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:
PPO Training	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:
DPO Training	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:
KTO Training	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:
ORPO Training	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:
SimPO Training	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:

[!TIP] The implementation details of PPO can be found in this blog.

[!TIP] PPO (Proximal Policy Optimization) 的实现细节可以在这篇博客中找到。

Provided Datasets

提供的数据集

Pre-training datasets

CCI4.0-M2-Base-v1 (en&zh)

CCI4.0-M2-CoT-v1 (en&zh)

CCI4.0-M2-Extra-v1 (en&zh)

The Stack (en)

StarCoder (en)

Supervised fine-tuning datasets

Glaive Function Calling V2 (en&zh)

LIMA (en)

Guanaco Dataset (multilingual)

BELLE 2M (zh)

BELLE 1M (zh)

BELLE 0.5M (zh)

BELLE Dialogue 0.4M (zh)

BELLE 对话 0.4M (zh)

BELLE School Math 0.25M (zh)

BELLE 小学数学 0.25M (zh)

BELLE Multiturn Chat 0.8M (zh)

BELLE 多轮对话 0.8M (zh)

UltraChat (en)

OpenPlatypus (en)

CodeAlpaca 20k (en)

Alpaca CoT (multilingual)

Alpaca CoT (多语言)

Nectar (en)

deepctrl (en&zh)

Advertise Generating (zh)

广告生成 (zh)

ShareGPT Hyperfiltered (en)

ShareGPT4 (en&zh)

UltraChat 200k (en)

Infinity Instruct (zh)

AgentInstruct (en)

LMSYS Chat 1M (en)

Evol Instruct V2 (en)

若智吧 (zh)

Magpie-Pro-300K-Filtered (en)

Magpie-ultra-v0.1 (en)

Chinese-DeepSeek-R1-Distill (zh)

LLaVA mixed (en&zh)

Pokemon-gpt4o-captions (en&zh)

OpenSchnabeltier (de)

Preference datasets

DPO mixed (英文和中文)

Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.

某些数据集在使用前需要确认，因此我们建议您使用以下命令登录您的 Hugging Face 帐户。

pip install "huggingface_hub<1.0.0"
huggingface-cli login

Requirement

要求 (Requirement)

Mandatory	Minimum	Recommend
python	3.9	3.10
torch	2.0.0	2.6.0
torchvision	0.15.0	0.21.0
transformers	4.49.0	4.50.0
datasets	2.16.0	3.2.0
accelerate	0.34.0	1.2.1
peft	0.14.0	0.15.1
trl	0.8.6	0.9.6

Optional	Minimum	Recommend
CUDA	11.6	12.2
deepspeed	0.10.0	0.16.4
bitsandbytes	0.39.0	0.43.1
vllm	0.4.3	0.8.2
flash-attn	2.5.6	2.7.2

Hardware Requirement

硬件要求 (Hardware Requirement)

* estimated

* 估计 (estimated)

Method	Bits	7B	14B	30B	70B	`x`B
Full (`bf16` or `fp16`)	32	120GB	240GB	600GB	1200GB	`18x`GB
Full (`pure_bf16`)	16	60GB	120GB	300GB	600GB	`8x`GB
Freeze/LoRA/GaLore/APOLLO/BAdam/OFT	16	16GB	32GB	64GB	160GB	`2x`GB
QLoRA / QOFT	8	10GB	20GB	40GB	80GB	`x`GB
QLoRA / QOFT	4	6GB	12GB	24GB	48GB	`x/2`GB
QLoRA / QOFT	2	4GB	8GB	16GB	24GB	`x/4`GB

Projects using LLaMA Factory

使用 LLaMA Factory 的项目

If you have a project that should be incorporated, please contact via email or create a pull request.

如果您有应该被纳入的项目，请通过电子邮件联系或创建一个 pull request。

Click to show

Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [arxiv]

Yu et al. Open, Closed, or Small Language Models for Text Classification? 2023. [arxiv]

Wang et al. UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language. 2023. [arxiv]

Luceri et al. Leveraging Large Language Models to Detect Influence Campaigns in Social Media. 2023. [arxiv]

Zhang et al. Alleviating Hallucinations of Large Language Models through Induced Hallucinations. 2023. [arxiv]

Wang et al. Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs. KDD 2024. [arxiv]

Wang et al. CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning. ACL 2024. [arxiv]

Choi et al. FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs. 2024. [arxiv]

Zhang et al. AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts. 2024. [arxiv]

Lyu et al. KnowTuning: Knowledge-aware Fine-tuning for Large Language Models. 2024. [arxiv]

Yang et al. LaCo: Large Language Model Pruning via Layer Collaps. 2024. [arxiv]

Bhardwaj et al. Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. 2024. [arxiv]

Yang et al. Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models. 2024. [arxiv]

Yi et al. Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding. ACL 2024 Findings. [arxiv]

Cao et al. Head-wise Shareable Attention for Large Language Models. 2024. [arxiv]

Zhang et al. Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages. 2024. [arxiv]

Kim et al. Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models. 2024. [arxiv]

金等人. 面向多语言大型语言模型的有效词汇扩展 (Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models). 2024. [arxiv]

Yu et al. KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models. ACL 2024. [arxiv]

余等人. KIEval：一个基于知识交互式评估框架，用于评估大型语言模型 (A Knowledge-grounded Interactive Evaluation Framework for Large Language Models). ACL 2024. [arxiv]

Huang et al. Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning. 2024. [arxiv]

黄等人. 关键点驱动的数据合成及其在数学推理上的增强 (Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning). 2024. [arxiv]

Duan et al. Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization. 2024. [arxiv]

段等人. 否定否定：通过分布偏好优化实现无需人类正样本的对齐 (Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization). 2024. [arxiv]

Xie and Schwertfeger. Empowering Robotics with Large Language Models: osmAG Map Comprehension with LLMs. 2024. [arxiv]

谢和施韦特费格. 利用大型语言模型增强机器人技术：利用大型语言模型进行 osmAG 地图理解 (Empowering Robotics with Large Language Models: osmAG Map Comprehension with LLMs). 2024. [arxiv]

Wu et al. Large Language Models are Parallel Multilingual Learners. 2024. [arxiv]

吴等人. 大型语言模型是并行多语言学习者 (Large Language Models are Parallel Multilingual Learners). 2024. [arxiv]

Zhang et al. EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling. 2024. [arxiv]

张等人. EDT：通过基于熵的动态温度采样改进大型语言模型的生成 (Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling). 2024. [arxiv]

Weller et al. FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions. 2024. [arxiv]

韦勒等人. FollowIR：评估和教导信息检索模型以遵循指令 (Evaluating and Teaching Information Retrieval Models to Follow Instructions). 2024. [arxiv]

Hongbin Na. CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering. COLING 2024. [arxiv]

洪斌那. CBT-LLM：一个用于基于认知行为疗法 (Cognitive Behavioral Therapy, CBT) 的心理健康问答的中文大型语言模型 (A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering). COLING 2024. [arxiv]

Zan et al. CodeS: Natural Language to Code Repository via Multi-Layer Sketch. 2024. [arxiv]

赞等人. CodeS：通过多层草图实现自然语言到代码仓库的转换 (Natural Language to Code Repository via Multi-Layer Sketch). 2024. [arxiv]

Liu et al. Extensive Self-Contrast Enables Feedback-Free Language Model Alignment. 2024. [arxiv]

刘等人. 广泛的自我对比实现了无反馈的语言模型对齐 (Extensive Self-Contrast Enables Feedback-Free Language Model Alignment). 2024. [arxiv]

Luo et al. BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models. 2024. [arxiv]

罗等人. BAdam：一种用于大型语言模型的内存高效全参数训练方法 (A Memory Efficient Full Parameter Training Method for Large Language Models). 2024. [arxiv]

Du et al. Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model. 2024. [arxiv]

杜等人. 中文 Tiny LLM：预训练一个以中文为中心的大型语言模型 (Pretraining a Chinese-Centric Large Language Model). 2024. [arxiv]

Ma et al. Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation. ICML 2024. [arxiv]

马等人. 通过 Givens 旋转实现参数高效的准正交微调 (Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation). ICML 2024. [arxiv]

Liu et al. Dynamic Generation of Personalities with Large Language Models. 2024. [arxiv]

刘等人. 使用大型语言模型动态生成人格 (Dynamic Generation of Personalities with Large Language Models). 2024. [arxiv]

Shang et al. How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models. 2024. [arxiv]

商等人. 我们在使用大型语言模型理解剥离二进制代码方面走了多远 (How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models). 2024. [arxiv]

Huang et al. LLMTune: Accelerate Database Knob Tuning with Large Language Models. 2024. [arxiv]

黄等人. LLMTune：利用大型语言模型加速数据库旋钮调优 (Accelerate Database Knob Tuning with Large Language Models). 2024. [arxiv]

Deng et al. Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction. 2024. [arxiv]

邓等人. 文本-元组-表格：通过全局元组抽取实现文本到表格生成中的信息集成 (Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction). 2024. [arxiv]

Acikgoz et al. Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare. 2024. [arxiv]

阿奇科兹等人. Hippocrates：一个用于推进医疗保健领域大型语言模型的开源框架 (An Open-Source Framework for Advancing Large Language Models in Healthcare). 2024. [arxiv]

Zhang et al. Small Language Models Need Strong Verifiers to Self-Correct Reasoning. ACL 2024 Findings. [arxiv]

张等人. 小型语言模型需要强大的验证器来进行自我纠正推理 (Small Language Models Need Strong Verifiers to Self-Correct Reasoning). ACL 2024 Findings. [arxiv]

Zhou et al. FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering. NAACL 2024. [arxiv]

周等人. FREB-TQA：一个用于表格问答的细粒度鲁棒性评估基准 (A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering). NAACL 2024. [arxiv]

Xu et al. Large Language Models for Cyber Security: A Systematic Literature Review. 2024. [arxiv]

徐等人. 用于网络安全的大型语言模型：一个系统的文献综述 (Large Language Models for Cyber Security: A Systematic Literature Review). 2024. [arxiv]

Dammu et al. "They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations. 2024. [arxiv]

达姆等人. “他们没有教养”：揭示大型语言模型生成对话中隐藏的危害和社会威胁 ("They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations). 2024. [arxiv]

Yi et al. A safety realignment framework via subspace-oriented model fusion for large language models. 2024. [arxiv]

易等人. 一种通过面向子空间模型融合的大型语言模型的安全重新对齐框架 (A safety realignment framework via subspace-oriented model fusion for large language models). 2024. [arxiv]

Lou et al. SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling. 2024. [arxiv]

楼等人. SPO：具有隐式奖励建模的多维偏好序列对齐 (Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling). 2024. [arxiv]

Zhang et al. Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners. 2024. [arxiv]

张等人. 从更少中获得更多：大型语言模型是优秀的自发多语言学习者 (Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners). 2024. [arxiv]

Zhang et al. TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models. 2024. [arxiv]

张等人. TS-Align：一个用于大型语言模型可扩展迭代微调的师生协作框架 (A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models). 2024. [arxiv]

Zihong Chen. Sentence Segmentation and Sentence Punctuation Based on XunziALLM. 2024. [paper]

陈子弘. 基于荀子ALLM的句子分割和句子标点 (Sentence Segmentation and Sentence Punctuation Based on XunziALLM). 2024. [paper]

Gao et al. The Best of Both Worlds: Toward an Honest and Helpful Large Language Model. 2024. [arxiv]

高等人. 两全其美：迈向诚实和有益的大型语言模型 (The Best of Both Worlds: Toward an Honest and Helpful Large Language Model). 2024. [arxiv]

Wang and Song. MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset. 2024. [arxiv]

王和宋. MARS：通过多任务评估数据集对语言模型的形而上学推理能力进行基准测试 (Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset). 2024. [arxiv]

Hu et al. Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models. 2024. [arxiv]

胡等人. 基于 Transformer 的模型的低秩自适应 (Low-Rank Adaptation, LoRA) 的计算限制 (Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models). 2024. [arxiv]

Ge et al. Time Sensitive Knowledge Editing through Efficient Finetuning. ACL 2024. [arxiv]

葛等人. 通过高效微调进行时间敏感的知识编辑 (Time Sensitive Knowledge Editing through Efficient Finetuning). ACL 2024. [arxiv]

Tan et al. Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions. 2024. [arxiv]

谭等人. 同行评审作为具有基于角色的交互的多回合和长上下文对话 (Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions). 2024. [arxiv]

Song et al. Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters. 2024. [arxiv]

宋等人. Turbo Sparse：以最小的激活参数实现 LLM 的 SOTA 性能 (Achieving LLM SOTA Performance with Minimal Activated Parameters). 2024. [arxiv]

Gu et al. RWKV-CLIP: A Robust Vision-Language Representation Learner. 2024. [arxiv]

顾等人. RWKV-CLIP：一个鲁棒的视觉-语言表征学习器 (A Robust Vision-Language Representation Learner). 2024. [arxiv]

Chen et al. Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees. 2024. [arxiv]

陈等人. 推进工具增强的大型语言模型：整合来自推理树错误的见解 (Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees). 2024. [arxiv]

Zhu et al. Are Large Language Models Good Statisticians?. 2024. [arxiv]

朱等人. 大型语言模型是好的统计学家吗？(Are Large Language Models Good Statisticians?). 2024. [arxiv]

Li et al. Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning. 2024. [arxiv]

李等人. 知道未知：一种用于 LLM 指令调优的不确定性敏感方法 (Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning). 2024. [arxiv]

Ding et al. IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce. 2024. [arxiv]

丁等人. IntentionQA：一个用于评估语言模型在电子商务中购买意图理解能力的基准 (A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce). 2024. [arxiv]

He et al. COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities. 2024. [arxiv]

何等人. COMMUNITY-CROSS-INSTRUCT：用于将大型语言模型与在线社区对齐的无监督指令生成 (Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities). 2024. [arxiv]

Lin et al. FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving. 2024. [arxiv]

林等人. FVEL: 通过定理证明的，具有大型语言模型的交互式形式验证环境 (Interactive Formal Verification Environment with Large Language Models via Theorem Proving). 2024. [arxiv]

Treutlein et al. Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data. 2024. [arxiv]

Treutlein等人. 连接点：大型语言模型可以从不同的训练数据中推断和表达潜在结构 (LLMs can Infer and Verbalize Latent Structure from Disparate Training Data). 2024. [arxiv]

Feng et al. SS-Bench: A Benchmark for Social Story Generation and Evaluation. 2024. [arxiv]

冯等人. SS-Bench: 一个用于社会故事生成和评估的基准 (A Benchmark for Social Story Generation and Evaluation). 2024. [arxiv]

Feng et al. Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement. 2024. [arxiv]

冯等人. 具有细粒度对齐增强的自构建上下文反编译 (Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement). 2024. [arxiv]

Liu et al. Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals. 2024. [arxiv]

刘等人. 用于从可穿戴生物信号进行无袖带血压测量的大型语言模型 (Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals). 2024. [arxiv]

Iyer et al. Exploring Very Low-Resource Translation with LLMs: The University of Edinburgh's Submission to AmericasNLP 2024 Translation Task. AmericasNLP 2024. [paper]

Iyer等人. 探索使用LLM的极低资源翻译：爱丁堡大学提交给AmericasNLP 2024翻译任务 (Exploring Very Low-Resource Translation with LLMs: The University of Edinburgh's Submission to AmericasNLP 2024 Translation Task). AmericasNLP 2024. [paper]

Li et al. Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring. 2024. [arxiv]

李等人. 使用思想树上的偏好优化校准LLM，以生成科学问题评分中的理由 (Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring). 2024. [arxiv]

Yang et al. Financial Knowledge Large Language Model. 2024. [arxiv]

杨等人. 金融知识大型语言模型 (Financial Knowledge Large Language Model). 2024. [arxiv]

Lin et al. DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging. 2024. [arxiv]

林等人. DogeRM: 通过模型合并为奖励模型配备领域知识 (Equipping Reward Models with Domain Knowledge through Model Merging). 2024. [arxiv]

Bako et al. Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization. 2024. [arxiv]

Bako等人. 评估LLM在数据可视化中对自然语言表达的语义分析能力 (Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization). 2024. [arxiv]

Huang et al. RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization. 2024. [arxiv]

黄等人. RoLoRA: 微调旋转的无离群值LLM，以实现有效的权重-激活量化 (Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization). 2024. [arxiv]

Jiang et al. LLM-Collaboration on Automatic Science Journalism for the General Audience. 2024. [arxiv]

姜等人. 用于大众自动科学新闻的LLM协作 (LLM-Collaboration on Automatic Science Journalism for the General Audience). 2024. [arxiv]

Inouye et al. Applied Auto-tuning on LoRA Hyperparameters. 2024. [paper]

Inouye等人. 应用于LoRA超参数的自动调优 (Applied Auto-tuning on LoRA Hyperparameters). 2024. [paper]

Qi et al. Research on Tibetan Tourism Viewpoints information generation system based on LLM. 2024. [arxiv]

祁等人. 基于LLM的藏族旅游观点信息生成系统研究 (Research on Tibetan Tourism Viewpoints information generation system based on LLM). 2024. [arxiv]

Xu et al. Course-Correction: Safety Alignment Using Synthetic Preferences. 2024. [arxiv]

徐等人. 航向修正：使用合成偏好进行安全对齐 (Course-Correction: Safety Alignment Using Synthetic Preferences). 2024. [arxiv]

Sun et al. LAMBDA: A Large Model Based Data Agent. 2024. [arxiv]

孙等人. LAMBDA: 一个基于大型模型的数据代理 (A Large Model Based Data Agent). 2024. [arxiv]

Zhu et al. CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare. 2024. [arxiv]

朱等人. CollectiveSFT: 通过医疗保健中的集体指令扩展用于中文医学基准的大型语言模型 (Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare). 2024. [arxiv]

Yu et al. Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment. 2024. [arxiv]

余等人. 通过负注意力得分对齐来纠正大型语言模型中的负偏差 (Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment). 2024. [arxiv]

Xie et al. The Power of Personalized Datasets: Advancing Chinese Composition Writing for Elementary School through Targeted Model Fine-Tuning. IALP 2024. [paper]

谢等人. 个性化数据集的力量：通过有针对性的模型微调推进小学中文作文写作 (The Power of Personalized Datasets: Advancing Chinese Composition Writing for Elementary School through Targeted Model Fine-Tuning). IALP 2024. [paper]

Liu et al. Instruct-Code-Llama: Improving Capabilities of Language Model in Competition Level Code Generation by Online Judge Feedback. ICIC 2024. [paper]

刘等人. Instruct-Code-Llama: 通过在线评委反馈提高语言模型在竞赛级别代码生成中的能力 (Improving Capabilities of Language Model in Competition Level Code Generation by Online Judge Feedback). ICIC 2024. [paper]

Wang et al. Cybernetic Sentinels: Unveiling the Impact of Safety Data Selection on Model Security in Supervised Fine-Tuning. ICIC 2024. [paper]

王等人. 控制论哨兵：揭示监督式微调中安全数据选择对模型安全性的影响 (Unveiling the Impact of Safety Data Selection on Model Security in Supervised Fine-Tuning). ICIC 2024. [paper]

Xia et al. Understanding the Performance and Estimating the Cost of LLM Fine-Tuning. 2024. [arxiv]

夏等人. 了解LLM微调的性能并估算成本 (Understanding the Performance and Estimating the Cost of LLM Fine-Tuning). 2024. [arxiv]

Zeng et al. Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions. 2024. [arxiv]

曾等人. 感知、反思和计划：为无指令的面向目标的城市导航设计LLM代理 (Designing LLM Agent for Goal-Directed City Navigation without Instructions). 2024. [arxiv]

Xia et al. Using Pre-trained Language Model for Accurate ESG Prediction. FinNLP 2024. [paper]

夏等人. 使用预训练语言模型进行准确的ESG预测 (Using Pre-trained Language Model for Accurate ESG Prediction). FinNLP 2024. [paper]

Liang et al. I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm. 2024. [arxiv]

梁等人. I-SHEEP: 通过迭代自增强范式从头开始进行LLM的自我对齐 (Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm). 2024. [arxiv]

Bai et al. Aligning Large Language Model with Direct Multi-Preference Optimization for Recommendation. CIKM 2024. [paper]

白等人. 通过直接多偏好优化对齐大型语言模型以进行推荐 (Aligning Large Language Model with Direct Multi-Preference Optimization for Recommendation). CIKM 2024. [paper]

Zhang et al. CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling. ACL 2024. [paper]

张等人. CPsyCoun: 一个基于报告的多轮对话重建和评估框架，用于中文心理咨询 (A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling). ACL 2024. [paper]

StarWhisper: A large language model for Astronomy, based on ChatGLM2-6B and Qwen-14B.

StarWhisper: 一个基于 ChatGLM2-6B 和 Qwen-14B 的天文大型语言模型.

DISC-LawLLM: A large language model specialized in Chinese legal domain, based on Baichuan-13B, is capable of retrieving and reasoning on legal knowledge.

DISC-LawLLM: 一个专门用于中文法律领域的大型语言模型，基于 Baichuan-13B，能够检索和推理法律知识.

Sunsimiao: A large language model specialized in Chinese medical domain, based on Baichuan-7B and ChatGLM-6B.

Sunsimiao: 一个专门用于中文医学领域的大型语言模型，基于 Baichuan-7B 和 ChatGLM-6B.

CareGPT: A series of large language models for Chinese medical domain, based on LLaMA2-7B and Baichuan-13B.

CareGPT: 一系列用于中文医学领域的大型语言模型，基于 LLaMA2-7B 和 Baichuan-13B.

MachineMindset: A series of MBTI Personality large language models, capable of giving any LLM 16 different personality types based on different datasets and training methods.

MachineMindset: 一系列 MBTI 人格大型语言模型，能够根据不同的数据集和训练方法为任何 LLM 提供 16 种不同的人格类型.

Luminia-13B-v3: A large language model specialized in generate metadata for stable diffusion. [demo]

Luminia-13B-v3: 一个专门用于为stable diffusion生成元数据的大型语言模型. [demo]

Chinese-LLaVA-Med: A multimodal large language model specialized in Chinese medical domain, based on LLaVA-1.5-7B.

Chinese-LLaVA-Med: 一个专门用于中文医学领域的多模态大型语言模型，基于 LLaVA-1.5-7B.

AutoRE: A document-level relation extraction system based on large language models.

AutoRE: 一个基于大型语言模型的文档级关系提取系统 (document-level relation extraction system).

NVIDIA RTX AI Toolkit: SDKs for fine-tuning LLMs on Windows PC for NVIDIA RTX.

NVIDIA RTX AI Toolkit: 用于在 Windows PC 上为 NVIDIA RTX 微调 LLM 的 SDK (SDKs).

LazyLLM: An easy and lazy way for building multi-agent LLMs applications and supports model fine-tuning via LLaMA Factory.

LazyLLM: 一种简单而懒惰的方式，用于构建多代理 LLM 应用程序，并支持通过 LLaMA Factory 进行模型微调.

RAG-Retrieval: A full pipeline for RAG retrieval model fine-tuning, inference, and distillation. [blog]

RAG-Retrieval: 用于 RAG 检索模型微调、推理和提炼的完整流程 (full pipeline). [blog]

360-LLaMA-Factory: A modified library that supports long sequence SFT & DPO using ring attention.

360-LLaMA-Factory: 一个修改后的库，支持使用环形注意力 (ring attention) 的长序列 SFT 和 DPO.

Sky-T1: An o1-like model fine-tuned by NovaSky AI with very small cost.

Sky-T1: 一个由 NovaSky AI 以非常小的成本微调的类 o1 模型 (o1-like model).

WeClone: One-stop solution for creating your digital avatar from chat logs.

WeClone: 从聊天记录创建您的数字替身的一站式解决方案。

EmoLLM: A project about large language models (LLMs) and mental health.

EmoLLM: 一个关于大型语言模型 (Large Language Models, LLMs) 和心理健康的项目。

Acknowledgement

This repo benefits from PEFT, TRL, QLoRA and FastChat. Thanks for their wonderful works.

本仓库受益于 PEFT、TRL、QLoRA 和 FastChat。感谢他们出色的工作。

Star History

Star 历史

深度加工（NotebookLM 生成）

基于本文内容生成的 PPT 大纲、博客摘要、短视频脚本与 Deep Dive 播客，用于多场景复用

PPT 大纲（5-8 张幻灯片）点击展开

LlamaFactory: Unified Fine-Tuning for 100+ LLMs — ppt

基于您上传的关于 LlamaFactory 的文档，我为您生成了一份包含 6 张幻灯片的 PPT 大纲。

幻灯片 1：LlamaFactory 简介

一站式大模型微调框架：支持超过 100 种大语言模型（LLM）的统一微调，提供零代码的命令行（CLI）和图形交互界面（Web UI）[1]。
广泛的业界认可：被亚马逊（Amazon）、英伟达（NVIDIA）、阿里云等知名企业广泛采用[1]。
多平台部署支持：支持在本地服务器、云平台（如免费的 Colab、阿里云 PAI-DSW 等）进行训练，并提供在线图形界面[1, 2]。
让微调变得简单：无论是多模态模型还是最新的语言模型，都能通过极为简化的流程完成高效训练[1-3]。

幻灯片 2：核心功能与特性

海量模型支持：支持 LLaMA、Qwen（通义千问）、DeepSeek、GLM、Gemma 等上百种主流模型，覆盖不同参数量级和多模态（视觉、音频等）模型[2-6]。
前沿算法集成：内置了 GaLore、BAdam、DoRA、LongLoRA、PiSSA 等最先进的微调算法[3, 7-9]。
极致的性能优化：集成了 FlashAttention-2、Unsloth、RoPE 缩放等实用优化技巧，可显著提升训练速度并降低显存消耗[3, 8, 10]。
广泛的任务适用性：支持多轮对话、工具调用（Agent）、图像理解、视频识别和音频理解等多种复杂任务的微调[3]。

幻灯片 3：全面覆盖的训练与微调方法

基础模型训练：全面支持（持续）预训练（Pre-Training）和（多模态）监督微调（SFT）[2, 11]。
人类偏好对齐：支持奖励建模（Reward Modeling）以及主流的 PPO 和 DPO（直接偏好优化）训练方法[3, 11]。
新兴对齐算法：集成了 KTO、ORPO、SimPO 等最新的偏好学习与对齐算法[3, 7, 8, 12]。
灵活的参数更新策略：根据计算资源，用户可自由选择全参数微调、冻结参数微调（Freeze）、LoRA，或基于 AQLM/AWQ 等技术的 2/3/4/5/6/8-bit QLoRA 量化微调[3, 11, 13]。

幻灯片 4：丰富的数据集与资源调度

内置海量数据集：框架提供了涵盖预训练、监督微调及偏好学习的丰富数据集（如 Alpaca、ShareGPT、UltraChat 等），支持中、英、德等多语种[12, 14, 15]。
高效的数据加载机制：支持数据集流式加载（Streaming），无需将整块数据下载到本地即可直接开始训练，节省存储空间[10]。
极低的硬件门槛：通过 QLoRA 和 Unsloth 等优化技术，甚至可以在单张消费级显卡（如 24GB 显存）上完成几十亿乃至上百亿参数模型的微调[8, 13, 15]。
无缝对接模型社区：支持直接从 ModelScope Hub（魔搭社区）和 Modelers Hub 下载预训练模型与数据集[2, 16, 17]。

幻灯片 5：训练监控与高效推理部署

全方位实验监控：支持使用自带的 LlamaBoard，以及 TensorBoard、Wandb、MLflow 和 SwanLab 等专业工具进行实验跟踪和可视化[3, 16]。
极速推理加速引擎：不仅支持原生推理，还集成了 vLLM 和 SGLang 推理引擎，最高可带来超过两倍的推理速度提升[3, 9, 16]。
便捷的应用部署：提供与 OpenAI 格式完全兼容的 API，可以零成本将微调好的模型接入各类基于 ChatGPT 架构的应用中[2-4]。
多端交互体验：内置 Gradio UI，微调完成后即可在网页端直接进行对话测试与模型效果评估[3]。

幻灯片 6：活跃的社区生态与行业应用

零时差的模型支持：对业界发布的最新模型（如 Llama 3、Qwen3、DeepSeek-R1 等）保持“Day-0”或“Day-1”级别的支持速度[3]。
丰富的行业衍生项目：基于 LlamaFactory 诞生了众多领域专有模型，如中医领域的 Sunsimiao、法律领域的 DISC-LawLLM 等[18, 19]。
广泛的学术与工业影响：被几十篇国际顶会学术论文引用，NVIDIA RTX AI Toolkit 等官方开发包也支持使用 LlamaFactory 在 PC 上微调 LLM[13, 18-27]。

博客摘要 + 核心看点点击展开

LlamaFactory: Unified Fine-Tuning for 100+ LLMs — summary

SEO 友好博客摘要

LlamaFactory 是一个功能强大的零代码大语言模型（LLM）微调框架，支持 LLaMA、Qwen、DeepSeek 等 100 多种主流前沿模型 [1, 2]。它提供便捷的 Web UI 和 CLI 界面，让开发者可以轻松完成预训练、指令微调（SFT）以及 PPO、DPO 等强化学习对齐任务 [1-3]。此外，LlamaFactory 汇集了 LoRA、QLoRA、FlashAttention-2 等先进的高效微调算法与技巧，极大降低了硬件显存消耗并显著提升了训练速度 [3]。无论您是进行云端训练还是本地部署，该平台都能为 AI 开发者提供一站式、高扩展性的模型优化解决方案 [1, 3]。

核心看点