LlamaFactory: Unified Fine-Tuning for 100+ LLMs
本文为开源社区精选内容,由 hiyouga 原创。 文中链接将跳转到原始仓库,部分图片可能加载较慢。
查看原始来源AI 导读
Used by Amazon, NVIDIA, Aliyun, etc. Supporters Warp, the agentic terminal for developersAvailable for MacOS, Linux, & Windows Easily fine-tune 100+ large language models with zero-code CLI and Web...
👋 Join our WeChat, NPU, Lab4AI, LLaMA Factory Online user group.
[ English | 中文 ]
Fine-tuning a large language model can be easy as...
https://github.com/user-attachments/assets/3991a3a8-4276-4d30-9cab-4cb0c4b9b99e
Start local training:
- Please refer to usage
Start cloud training:
- Colab (free): https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing
- PAI-DSW (free trial): https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory
- LLaMA Factory Online: https://www.llamafactory.com.cn/?utm_source=LLaMA-Factory
- Alaya NeW (cloud GPU deal): https://docs.alayanew.com/docs/documents/useGuide/LLaMAFactory/mutiple/?utm_source=LLaMA-Factory
Read technical notes:
- Documentation (WIP): https://llamafactory.readthedocs.io/en/latest/
- Documentation (AMD GPU): https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/fine_tune/llama_factory_llama3.html
- Official Blog: https://blog.llamafactory.net/en/
- Official Course: https://www.lab4ai.cn/course/detail?id=7c13e60f6137474eb40f6fd3983c0f46&utm_source=LLaMA-Factory
[!NOTE] Except for the above links, all other websites are unauthorized third-party websites. Please carefully use them.
- Installation
Features
- Various models: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen3, Qwen3-VL, DeepSeek, Gemma, GLM, Phi, etc.
Day-N Support for Fine-Tuning Cutting-Edge Models
| Support Date | Model Name |
|---|---|
| Day 0 | Qwen3 / Qwen2.5-VL / Gemma 3 / GLM-4.1V / InternLM 3 / MiniCPM-o-2.6 |
| Day 1 | Llama 3 / GLM-4 / Mistral Small / PaliGemma2 / Llama 4 |
Blogs
[!TIP] Now we have a dedicated blog for LLaMA Factory!
Website: https://blog.llamafactory.net/en/
- 💡 KTransformers Fine-Tuning × LLaMA Factory: Fine-tuning 1000 Billion models with 2 4090-GPU + CPU (English)
- 💡 Easy Dataset × LLaMA Factory: Enabling LLMs to Efficiently Learn Domain Knowledge (English)
- Fine-tune a mental health LLM using LLaMA-Factory (Chinese)
- Fine-tune GPT-OSS for Role-Playing using LLaMA-Factory (Chinese)
- A One-Stop Code-Free Model Reinforcement Learning and Deployment Platform based on LLaMA-Factory and EasyR1 (Chinese)
- How Apoidea Group enhances visual information extraction from banking documents with multimodal models using LLaMA-Factory on Amazon SageMaker HyperPod (English)
All Blogs
- Fine-tune Llama3.1-70B for Medical Diagnosis using LLaMA-Factory (Chinese)
- Fine-tune Qwen2.5-VL for Autonomous Driving using LLaMA-Factory (Chinese)
- LLaMA Factory: Fine-tuning the DeepSeek-R1-Distill-Qwen-7B Model for News Classifier (Chinese)
- A One-Stop Code-Free Model Fine-Tuning & Deployment Platform based on SageMaker and LLaMA-Factory (Chinese)
- LLaMA Factory Multi-Modal Fine-Tuning Practice: Fine-Tuning Qwen2-VL for Personal Tourist Guide (Chinese)
- LLaMA Factory: Fine-tuning Llama3 for Role-Playing (Chinese)
Changelog
[25/10/26] We support Megatron-core training backend with mcore_adapter. See PR #9237 to get started.
[25/08/22] We supported OFT and OFTv2. See examples for usage.
[25/08/20] We supported fine-tuning the Intern-S1-mini models. See PR #8976 to get started.
[25/08/06] We supported fine-tuning the GPT-OSS models. See PR #8826 to get started.
Full Changelog
[25/07/02] We supported fine-tuning the GLM-4.1V-9B-Thinking model.
[25/04/28] We supported fine-tuning the Qwen3 model family.
[25/04/21] We supported the Muon optimizer. See examples for usage. Thank @tianshijing's PR.
[25/04/16] We supported fine-tuning the InternVL3 model. See PR #7258 to get started.
[25/04/06] We supported fine-tuning the Llama 4 model. See PR #7611 to get started.
[25/03/31] We supported fine-tuning the Qwen2.5 Omni model. See PR #7537 to get started.
[25/03/15] We supported SGLang as inference backend. Try infer_backend: sglang to accelerate inference.
[25/02/24] Announcing EasyR1, an efficient, scalable and multi-modality RL training framework for efficient GRPO training.
[25/02/11] We supported saving the Ollama modelfile when exporting the model checkpoints. See examples for usage.
[25/02/05] We supported fine-tuning the Qwen2-Audio and MiniCPM-o-2.6 on audio understanding tasks.
[25/01/15] We supported APOLLO optimizer. See examples for usage.
[25/01/14] We supported fine-tuning the MiniCPM-o-2.6 and MiniCPM-V-2.6 models. Thank @BUAADreamer's PR.
[25/01/14] We supported fine-tuning the InternLM 3 models. Thank @hhaAndroid's PR.
[24/12/21] We supported using SwanLab for experiment tracking and visualization. See this section for details.
[24/10/09] We supported downloading pre-trained models and datasets from the Modelers Hub. See this tutorial for usage.
[24/09/19] We supported fine-tuning the Qwen2.5 models.
[24/08/30] We supported fine-tuning the Qwen2-VL models. Thank @simonJJJ's PR.
[24/08/27] We supported Liger Kernel. Try enable_liger_kernel: true for efficient training.
[24/08/09] We supported Adam-mini optimizer. See examples for usage. Thank @relic-yuexi's PR.
[24/07/04] We supported contamination-free packed training. Use neat_packing: true to activate it. Thank @chuan298's PR.
[24/06/16] We supported PiSSA algorithm. See examples for usage.
[24/06/07] We supported fine-tuning the Qwen2 and GLM-4 models.
[24/05/26] We supported SimPO algorithm for preference learning. See examples for usage.
[24/05/20] We supported fine-tuning the PaliGemma series models. Note that the PaliGemma models are pre-trained models, you need to fine-tune them with paligemma template for chat completion.
[24/05/18] We supported KTO algorithm for preference learning. See examples for usage.
[24/05/14] We supported training and inference on the Ascend NPU devices. Check installation section for details.
[24/04/26] We supported fine-tuning the LLaVA-1.5 multimodal LLMs. See examples for usage.
[24/04/22] We provided a Colab notebook for fine-tuning the Llama-3 model on a free T4 GPU. Two Llama-3-derived models fine-tuned using LLaMA Factory are available at Hugging Face, check Llama3-8B-Chinese-Chat and Llama3-Chinese for details.
[24/04/21] We supported Mixture-of-Depths according to AstraMindAI's implementation. See examples for usage.
[24/04/16] We supported BAdam optimizer. See examples for usage.
[24/04/16] We supported unsloth's long-sequence training (Llama-2-7B-56k within 24GB). It achieves 117% speed and 50% memory compared with FlashAttention-2, more benchmarks can be found in this page.
[24/03/21] Our paper "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models" is available at arXiv!
[24/03/20] We supported FSDP+QLoRA that fine-tunes a 70B model on 2x24GB GPUs. See examples for usage.
[24/03/07] We supported GaLore optimizer. See examples for usage.
[24/03/07] We integrated vLLM for faster and concurrent inference. Try infer_backend: vllm to enjoy 270% inference speed.
[24/02/28] We supported weight-decomposed LoRA (DoRA). Try use_dora: true to activate DoRA training.
[24/02/15] We supported block expansion proposed by LLaMA Pro. See examples for usage.
[24/02/05] Qwen1.5 (Qwen2 beta version) series models are supported in LLaMA-Factory. Check this blog post for details.
[24/01/18] We supported agent tuning for most models, equipping model with tool using abilities by fine-tuning with dataset: glaive_toolcall_en.
[23/12/23] We supported unsloth's implementation to boost LoRA tuning for the LLaMA, Mistral and Yi models. Try use_unsloth: true argument to activate unsloth patch. It achieves 170% speed in our benchmark, check this page for details.
[23/12/12] We supported fine-tuning the latest MoE model Mixtral 8x7B in our framework. See hardware requirement here.
[23/12/01] We supported downloading pre-trained models and datasets from the ModelScope Hub. See this tutorial for usage.
[23/10/21] We supported NEFTune trick for fine-tuning. Try neftune_noise_alpha: 5 argument to activate NEFTune.
[23/09/27] We supported $S^2$-Attn proposed by LongLoRA for the LLaMA models. Try shift_attn: true argument to enable shift short attention.
[23/09/23] We integrated MMLU, C-Eval and CMMLU benchmarks in this repo. See examples for usage.
[23/09/10] We supported FlashAttention-2. Try flash_attn: fa2 argument to enable FlashAttention-2 if you are using RTX4090, A100 or H100 GPUs.
[23/08/12] We supported RoPE scaling to extend the context length of the LLaMA models. Try rope_scaling: linear argument in training and rope_scaling: dynamic argument at inference to extrapolate the position embeddings.
[23/08/11] We supported DPO training for instruction-tuned models. See examples for usage.
[23/07/31] We supported dataset streaming. Try streaming: true and max_steps: 10000 arguments to load your dataset in streaming mode.
[23/07/18] We developed an all-in-one Web UI for training, evaluation and inference. Try train_web.py to fine-tune models in your Web browser. Thank @KanadeSiina and @codemayq for their efforts in the development.
[23/07/09] We released FastEdit ⚡🩹, an easy-to-use package for editing the factual knowledge of large language models efficiently. Please follow FastEdit if you are interested.
[23/06/29] We provided a reproducible example of training a chat model using instruction-following datasets, see Baichuan-7B-sft for details.
[!TIP] If you cannot use the latest feature, please pull the latest code and install LLaMA-Factory again.
Supported Models
| Model | Model size | Template |
|---|---|---|
| BLOOM/BLOOMZ | 560M/1.1B/1.7B/3B/7.1B/176B | - |
| DeepSeek (LLM/Code/MoE) | 7B/16B/67B/236B | deepseek |
| DeepSeek 3-3.2 | 236B/671B | deepseek3 |
| DeepSeek R1 (Distill) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 |
| ERNIE-4.5 | 0.3B/21B/300B | ernie_nothink |
| Falcon/Falcon H1 | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1 |
| Gemma/Gemma 2/CodeGemma | 2B/7B/9B/27B | gemma/gemma2 |
| Gemma 3/Gemma 3n | 270M/1B/4B/6B/8B/12B/27B | gemma3/gemma3n |
| GLM-4/GLM-4-0414/GLM-Z1 | 9B/32B | glm4/glmz1 |
| GLM-4.5/GLM-4.5(6)V | 9B/106B/355B | glm4_moe/glm4_5v |
| GPT-2 | 0.1B/0.4B/0.8B/1.5B | - |
| GPT-OSS | 20B/120B | gpt_oss |
| Granite 3-4 | 1B/2B/3B/7B/8B | granite3/granite4 |
| Hunyuan/Hunyuan1.5 (MT) | 0.5B/1.8B/4B/7B/13B | hunyuan/hunyuan_small |
| InternLM 2-3 | 7B/8B/20B | intern2 |
| InternVL 2.5-3.5 | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl |
| Intern-S1-mini | 8B | intern_s1 |
| Kimi-VL | 16B | kimi_vl |
| Ling 2.0 (mini/flash) | 16B/100B | bailing_v2 |
| LFM 2.5 (VL) | 1.2B/1.6B | lfm2/lfm2_vl |
| Llama | 7B/13B/33B/65B | - |
| Llama 2 | 7B/13B/70B | llama2 |
| Llama 3-3.3 | 1B/3B/8B/70B | llama3 |
| Llama 4 | 109B/402B | llama4 |
| Llama 3.2 Vision | 11B/90B | mllama |
| LLaVA-1.5 | 7B/13B | llava |
| LLaVA-NeXT | 7B/8B/13B/34B/72B/110B | llava_next |
| LLaVA-NeXT-Video | 7B/34B | llava_next_video |
| MiMo | 7B/309B | mimo/mimo_v2 |
| MiniCPM 4 | 0.5B/8B | cpm4 |
| MiniCPM-o/MiniCPM-V 4.5 | 8B/9B | minicpm_o/minicpm_v |
| MiniMax-M1/MiniMax-M2 | 229B/456B | minimax1/minimax2 |
| Ministral 3 | 3B/8B/14B | ministral3 |
| Mistral/Mixtral | 7B/8x7B/8x22B | mistral |
| PaliGemma/PaliGemma2 | 3B/10B/28B | paligemma |
| Phi-3/Phi-3.5 | 4B/14B | phi |
| Phi-3-small | 7B | phi_small |
| Phi-4-mini/Phi-4 | 3.8B/14B | phi4_mini/phi4 |
| Pixtral | 12B | pixtral |
| Qwen2 (Code/Math/MoE/QwQ) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
| Qwen3 (MoE/Instruct/Thinking/Next) | 0.6B/1.7B/4B/8B/14B/32B/80B/235B | qwen3/qwen3_nothink |
| Qwen3.5 | 0.8B/2B/4B/9B/27B/35B/122B/397B | qwen3_5 |
| Qwen2-Audio | 7B | qwen2_audio |
| Qwen2.5-Omni | 3B/7B | qwen2_omni |
| Qwen3-Omni | 30B | qwen3_omni |
| Qwen2-VL/Qwen2.5-VL/QVQ | 2B/3B/7B/32B/72B | qwen2_vl |
| Qwen3-VL | 2B/4B/8B/30B/32B/235B | qwen3_vl |
| Seed (OSS/Coder) | 8B/36B | seed_oss/seed_coder |
| StarCoder 2 | 3B/7B/15B | - |
| TeleChat 2-2.5 | 3B/7B/35B/115B | telechat2 |
| Yuan 2 | 2B/51B/102B | yuan |
[!NOTE] For the "base" models, the
templateargument can be chosen fromdefault,alpaca,vicunaetc. But make sure to use the corresponding template for the "instruct/chat" models.If the model has both reasoning and non-reasoning versions, please use the
_nothinksuffix to distinguish between them. For example,qwen3andqwen3_nothink.Remember to use the SAME template in training and inference.
*: You should install the
transformersfrom main branch and useDISABLE_VERSION_CHECK=1to skip version check.**: You need to install a specific version of
transformersto use the corresponding model.
Please refer to constants.py for a full list of models we supported.
You also can add a custom chat template to template.py.
Supported Training Approaches
| Approach | Full-tuning | Freeze-tuning | LoRA | QLoRA | OFT | QOFT |
|---|---|---|---|---|---|---|
| Pre-Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| Supervised Fine-Tuning | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| Reward Modeling | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| PPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| DPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| KTO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| ORPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| SimPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
[!TIP] The implementation details of PPO can be found in this blog.
Provided Datasets
Pre-training datasets
- Wiki Demo (en)
- RefinedWeb (en)
- RedPajama V2 (en)
- Wikipedia (en)
- Wikipedia (zh)
- Pile (en)
- SkyPile (zh)
- FineWeb (en)
- FineWeb-Edu (en)
- CCI3-HQ (zh)
- CCI3-Data (zh)
- CCI4.0-M2-Base-v1 (en&zh)
- CCI4.0-M2-CoT-v1 (en&zh)
- CCI4.0-M2-Extra-v1 (en&zh)
- The Stack (en)
- StarCoder (en)
Supervised fine-tuning datasets
- Identity (en&zh)
- Stanford Alpaca (en)
- Stanford Alpaca (zh)
- Alpaca GPT4 (en&zh)
- Glaive Function Calling V2 (en&zh)
- LIMA (en)
- Guanaco Dataset (multilingual)
- BELLE 2M (zh)
- BELLE 1M (zh)
- BELLE 0.5M (zh)
- BELLE Dialogue 0.4M (zh)
- BELLE School Math 0.25M (zh)
- BELLE Multiturn Chat 0.8M (zh)
- UltraChat (en)
- OpenPlatypus (en)
- CodeAlpaca 20k (en)
- Alpaca CoT (multilingual)
- OpenOrca (en)
- SlimOrca (en)
- MathInstruct (en)
- Firefly 1.1M (zh)
- Wiki QA (en)
- Web QA (zh)
- WebNovel (zh)
- Nectar (en)
- deepctrl (en&zh)
- Advertise Generating (zh)
- ShareGPT Hyperfiltered (en)
- ShareGPT4 (en&zh)
- UltraChat 200k (en)
- Infinity Instruct (zh)
- AgentInstruct (en)
- LMSYS Chat 1M (en)
- Evol Instruct V2 (en)
- Cosmopedia (en)
- STEM (zh)
- Ruozhiba (zh)
- Neo-sft (zh)
- Magpie-Pro-300K-Filtered (en)
- Magpie-ultra-v0.1 (en)
- WebInstructSub (en)
- OpenO1-SFT (en&zh)
- Open-Thoughts (en)
- Open-R1-Math (en)
- Chinese-DeepSeek-R1-Distill (zh)
- LLaVA mixed (en&zh)
- Pokemon-gpt4o-captions (en&zh)
- DLR-Web (en)
- Open Assistant (de)
- Dolly 15k (de)
- Alpaca GPT4 (de)
- OpenSchnabeltier (de)
- Evol Instruct (de)
- Dolphin (de)
- Booksum (de)
- Airoboros (de)
- Ultrachat (de)
Preference datasets
- DPO mixed (en&zh)
- UltraFeedback (en)
- COIG-P (zh)
- RLHF-V (en)
- VLFeedback (en)
- RLAIF-V (en)
- Orca DPO Pairs (en)
- HH-RLHF (en)
- Nectar (en)
- Orca DPO (de)
- KTO mixed (en)
Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.
pip install "huggingface_hub<1.0.0"
huggingface-cli login
Requirement
| Mandatory | Minimum | Recommend |
|---|---|---|
| python | 3.9 | 3.10 |
| torch | 2.0.0 | 2.6.0 |
| torchvision | 0.15.0 | 0.21.0 |
| transformers | 4.49.0 | 4.50.0 |
| datasets | 2.16.0 | 3.2.0 |
| accelerate | 0.34.0 | 1.2.1 |
| peft | 0.14.0 | 0.15.1 |
| trl | 0.8.6 | 0.9.6 |
| Optional | Minimum | Recommend |
|---|---|---|
| CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.16.4 |
| bitsandbytes | 0.39.0 | 0.43.1 |
| vllm | 0.4.3 | 0.8.2 |
| flash-attn | 2.5.6 | 2.7.2 |
Hardware Requirement
* estimated
| Method | Bits | 7B | 14B | 30B | 70B | xB |
|---|---|---|---|---|---|---|
Full (bf16 or fp16) |
32 | 120GB | 240GB | 600GB | 1200GB | 18xGB |
Full (pure_bf16) |
16 | 60GB | 120GB | 300GB | 600GB | 8xGB |
| Freeze/LoRA/GaLore/APOLLO/BAdam/OFT | 16 | 16GB | 32GB | 64GB | 160GB | 2xGB |
| QLoRA / QOFT | 8 | 10GB | 20GB | 40GB | 80GB | xGB |
| QLoRA / QOFT | 4 | 6GB | 12GB | 24GB | 48GB | x/2GB |
| QLoRA / QOFT | 2 | 4GB | 8GB | 16GB | 24GB | x/4GB |
Projects using LLaMA Factory
If you have a project that should be incorporated, please contact via email or create a pull request.
Click to show
- Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [arxiv]
- Yu et al. Open, Closed, or Small Language Models for Text Classification? 2023. [arxiv]
- Wang et al. UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language. 2023. [arxiv]
- Luceri et al. Leveraging Large Language Models to Detect Influence Campaigns in Social Media. 2023. [arxiv]
- Zhang et al. Alleviating Hallucinations of Large Language Models through Induced Hallucinations. 2023. [arxiv]
- Wang et al. Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs. KDD 2024. [arxiv]
- Wang et al. CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning. ACL 2024. [arxiv]
- Choi et al. FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs. 2024. [arxiv]
- Zhang et al. AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts. 2024. [arxiv]
- Lyu et al. KnowTuning: Knowledge-aware Fine-tuning for Large Language Models. 2024. [arxiv]
- Yang et al. LaCo: Large Language Model Pruning via Layer Collaps. 2024. [arxiv]
- Bhardwaj et al. Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. 2024. [arxiv]
- Yang et al. Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models. 2024. [arxiv]
- Yi et al. Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding. ACL 2024 Findings. [arxiv]
- Cao et al. Head-wise Shareable Attention for Large Language Models. 2024. [arxiv]
- Zhang et al. Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages. 2024. [arxiv]
- Kim et al. Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models. 2024. [arxiv]
- Yu et al. KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models. ACL 2024. [arxiv]
- Huang et al. Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning. 2024. [arxiv]
- Duan et al. Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization. 2024. [arxiv]
- Xie and Schwertfeger. Empowering Robotics with Large Language Models: osmAG Map Comprehension with LLMs. 2024. [arxiv]
- Wu et al. Large Language Models are Parallel Multilingual Learners. 2024. [arxiv]
- Zhang et al. EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling. 2024. [arxiv]
- Weller et al. FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions. 2024. [arxiv]
- Hongbin Na. CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering. COLING 2024. [arxiv]
- Zan et al. CodeS: Natural Language to Code Repository via Multi-Layer Sketch. 2024. [arxiv]
- Liu et al. Extensive Self-Contrast Enables Feedback-Free Language Model Alignment. 2024. [arxiv]
- Luo et al. BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models. 2024. [arxiv]
- Du et al. Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model. 2024. [arxiv]
- Ma et al. Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation. ICML 2024. [arxiv]
- Liu et al. Dynamic Generation of Personalities with Large Language Models. 2024. [arxiv]
- Shang et al. How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models. 2024. [arxiv]
- Huang et al. LLMTune: Accelerate Database Knob Tuning with Large Language Models. 2024. [arxiv]
- Deng et al. Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction. 2024. [arxiv]
- Acikgoz et al. Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare. 2024. [arxiv]
- Zhang et al. Small Language Models Need Strong Verifiers to Self-Correct Reasoning. ACL 2024 Findings. [arxiv]
- Zhou et al. FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering. NAACL 2024. [arxiv]
- Xu et al. Large Language Models for Cyber Security: A Systematic Literature Review. 2024. [arxiv]
- Dammu et al. "They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations. 2024. [arxiv]
- Yi et al. A safety realignment framework via subspace-oriented model fusion for large language models. 2024. [arxiv]
- Lou et al. SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling. 2024. [arxiv]
- Zhang et al. Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners. 2024. [arxiv]
- Zhang et al. TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models. 2024. [arxiv]
- Zihong Chen. Sentence Segmentation and Sentence Punctuation Based on XunziALLM. 2024. [paper]
- Gao et al. The Best of Both Worlds: Toward an Honest and Helpful Large Language Model. 2024. [arxiv]
- Wang and Song. MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset. 2024. [arxiv]
- Hu et al. Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models. 2024. [arxiv]
- Ge et al. Time Sensitive Knowledge Editing through Efficient Finetuning. ACL 2024. [arxiv]
- Tan et al. Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions. 2024. [arxiv]
- Song et al. Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters. 2024. [arxiv]
- Gu et al. RWKV-CLIP: A Robust Vision-Language Representation Learner. 2024. [arxiv]
- Chen et al. Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees. 2024. [arxiv]
- Zhu et al. Are Large Language Models Good Statisticians?. 2024. [arxiv]
- Li et al. Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning. 2024. [arxiv]
- Ding et al. IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce. 2024. [arxiv]
- He et al. COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities. 2024. [arxiv]
- Lin et al. FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving. 2024. [arxiv]
- Treutlein et al. Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data. 2024. [arxiv]
- Feng et al. SS-Bench: A Benchmark for Social Story Generation and Evaluation. 2024. [arxiv]
- Feng et al. Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement. 2024. [arxiv]
- Liu et al. Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals. 2024. [arxiv]
- Iyer et al. Exploring Very Low-Resource Translation with LLMs: The University of Edinburgh's Submission to AmericasNLP 2024 Translation Task. AmericasNLP 2024. [paper]
- Li et al. Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring. 2024. [arxiv]
- Yang et al. Financial Knowledge Large Language Model. 2024. [arxiv]
- Lin et al. DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging. 2024. [arxiv]
- Bako et al. Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization. 2024. [arxiv]
- Huang et al. RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization. 2024. [arxiv]
- Jiang et al. LLM-Collaboration on Automatic Science Journalism for the General Audience. 2024. [arxiv]
- Inouye et al. Applied Auto-tuning on LoRA Hyperparameters. 2024. [paper]
- Qi et al. Research on Tibetan Tourism Viewpoints information generation system based on LLM. 2024. [arxiv]
- Xu et al. Course-Correction: Safety Alignment Using Synthetic Preferences. 2024. [arxiv]
- Sun et al. LAMBDA: A Large Model Based Data Agent. 2024. [arxiv]
- Zhu et al. CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare. 2024. [arxiv]
- Yu et al. Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment. 2024. [arxiv]
- Xie et al. The Power of Personalized Datasets: Advancing Chinese Composition Writing for Elementary School through Targeted Model Fine-Tuning. IALP 2024. [paper]
- Liu et al. Instruct-Code-Llama: Improving Capabilities of Language Model in Competition Level Code Generation by Online Judge Feedback. ICIC 2024. [paper]
- Wang et al. Cybernetic Sentinels: Unveiling the Impact of Safety Data Selection on Model Security in Supervised Fine-Tuning. ICIC 2024. [paper]
- Xia et al. Understanding the Performance and Estimating the Cost of LLM Fine-Tuning. 2024. [arxiv]
- Zeng et al. Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions. 2024. [arxiv]
- Xia et al. Using Pre-trained Language Model for Accurate ESG Prediction. FinNLP 2024. [paper]
- Liang et al. I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm. 2024. [arxiv]
- Bai et al. Aligning Large Language Model with Direct Multi-Preference Optimization for Recommendation. CIKM 2024. [paper]
- Zhang et al. CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling. ACL 2024. [paper]
- StarWhisper: A large language model for Astronomy, based on ChatGLM2-6B and Qwen-14B.
- DISC-LawLLM: A large language model specialized in Chinese legal domain, based on Baichuan-13B, is capable of retrieving and reasoning on legal knowledge.
- Sunsimiao: A large language model specialized in Chinese medical domain, based on Baichuan-7B and ChatGLM-6B.
- CareGPT: A series of large language models for Chinese medical domain, based on LLaMA2-7B and Baichuan-13B.
- MachineMindset: A series of MBTI Personality large language models, capable of giving any LLM 16 different personality types based on different datasets and training methods.
- Luminia-13B-v3: A large language model specialized in generate metadata for stable diffusion. [demo]
- Chinese-LLaVA-Med: A multimodal large language model specialized in Chinese medical domain, based on LLaVA-1.5-7B.
- AutoRE: A document-level relation extraction system based on large language models.
- NVIDIA RTX AI Toolkit: SDKs for fine-tuning LLMs on Windows PC for NVIDIA RTX.
- LazyLLM: An easy and lazy way for building multi-agent LLMs applications and supports model fine-tuning via LLaMA Factory.
- RAG-Retrieval: A full pipeline for RAG retrieval model fine-tuning, inference, and distillation. [blog]
- 360-LLaMA-Factory: A modified library that supports long sequence SFT & DPO using ring attention.
- Sky-T1: An o1-like model fine-tuned by NovaSky AI with very small cost.
- WeClone: One-stop solution for creating your digital avatar from chat logs.
- EmoLLM: A project about large language models (LLMs) and mental health.
Acknowledgement
This repo benefits from PEFT, TRL, QLoRA and FastChat. Thanks for their wonderful works.
Star History
深度加工(NotebookLM 生成)
基于本文内容生成的 PPT 大纲、博客摘要、短视频脚本与 Deep Dive 播客,用于多场景复用
PPT 大纲(5-8 张幻灯片) 点击展开
LlamaFactory: Unified Fine-Tuning for 100+ LLMs — ppt
基于您上传的关于 LlamaFactory 的文档,我为您生成了一份包含 6 张幻灯片的 PPT 大纲。
幻灯片 1:LlamaFactory 简介
- 一站式大模型微调框架:支持超过 100 种大语言模型(LLM)的统一微调,提供零代码的命令行(CLI)和图形交互界面(Web UI)[1]。
- 广泛的业界认可:被亚马逊(Amazon)、英伟达(NVIDIA)、阿里云等知名企业广泛采用[1]。
- 多平台部署支持:支持在本地服务器、云平台(如免费的 Colab、阿里云 PAI-DSW 等)进行训练,并提供在线图形界面[1, 2]。
- 让微调变得简单:无论是多模态模型还是最新的语言模型,都能通过极为简化的流程完成高效训练[1-3]。
幻灯片 2:核心功能与特性
- 海量模型支持:支持 LLaMA、Qwen(通义千问)、DeepSeek、GLM、Gemma 等上百种主流模型,覆盖不同参数量级和多模态(视觉、音频等)模型[2-6]。
- 前沿算法集成:内置了 GaLore、BAdam、DoRA、LongLoRA、PiSSA 等最先进的微调算法[3, 7-9]。
- 极致的性能优化:集成了 FlashAttention-2、Unsloth、RoPE 缩放等实用优化技巧,可显著提升训练速度并降低显存消耗[3, 8, 10]。
- 广泛的任务适用性:支持多轮对话、工具调用(Agent)、图像理解、视频识别和音频理解等多种复杂任务的微调[3]。
幻灯片 3:全面覆盖的训练与微调方法
- 基础模型训练:全面支持(持续)预训练(Pre-Training)和(多模态)监督微调(SFT)[2, 11]。
- 人类偏好对齐:支持奖励建模(Reward Modeling)以及主流的 PPO 和 DPO(直接偏好优化)训练方法[3, 11]。
- 新兴对齐算法:集成了 KTO、ORPO、SimPO 等最新的偏好学习与对齐算法[3, 7, 8, 12]。
- 灵活的参数更新策略:根据计算资源,用户可自由选择全参数微调、冻结参数微调(Freeze)、LoRA,或基于 AQLM/AWQ 等技术的 2/3/4/5/6/8-bit QLoRA 量化微调[3, 11, 13]。
幻灯片 4:丰富的数据集与资源调度
- 内置海量数据集:框架提供了涵盖预训练、监督微调及偏好学习的丰富数据集(如 Alpaca、ShareGPT、UltraChat 等),支持中、英、德等多语种[12, 14, 15]。
- 高效的数据加载机制:支持数据集流式加载(Streaming),无需将整块数据下载到本地即可直接开始训练,节省存储空间[10]。
- 极低的硬件门槛:通过 QLoRA 和 Unsloth 等优化技术,甚至可以在单张消费级显卡(如 24GB 显存)上完成几十亿乃至上百亿参数模型的微调[8, 13, 15]。
- 无缝对接模型社区:支持直接从 ModelScope Hub(魔搭社区)和 Modelers Hub 下载预训练模型与数据集[2, 16, 17]。
幻灯片 5:训练监控与高效推理部署
- 全方位实验监控:支持使用自带的 LlamaBoard,以及 TensorBoard、Wandb、MLflow 和 SwanLab 等专业工具进行实验跟踪和可视化[3, 16]。
- 极速推理加速引擎:不仅支持原生推理,还集成了 vLLM 和 SGLang 推理引擎,最高可带来超过两倍的推理速度提升[3, 9, 16]。
- 便捷的应用部署:提供与 OpenAI 格式完全兼容的 API,可以零成本将微调好的模型接入各类基于 ChatGPT 架构的应用中[2-4]。
- 多端交互体验:内置 Gradio UI,微调完成后即可在网页端直接进行对话测试与模型效果评估[3]。
幻灯片 6:活跃的社区生态与行业应用
- 零时差的模型支持:对业界发布的最新模型(如 Llama 3、Qwen3、DeepSeek-R1 等)保持“Day-0”或“Day-1”级别的支持速度[3]。
- 丰富的行业衍生项目:基于 LlamaFactory 诞生了众多领域专有模型,如中医领域的 Sunsimiao、法律领域的 DISC-LawLLM 等[18, 19]。
- 广泛的学术与工业影响:被几十篇国际顶会学术论文引用,NVIDIA RTX AI Toolkit 等官方开发包也支持使用 LlamaFactory 在 PC 上微调 LLM[13, 18-27]。
博客摘要 + 核心看点 点击展开
LlamaFactory: Unified Fine-Tuning for 100+ LLMs — summary
SEO 友好博客摘要
LlamaFactory 是一个功能强大的零代码大语言模型(LLM)微调框架,支持 LLaMA、Qwen、DeepSeek 等 100 多种主流前沿模型 [1, 2]。它提供便捷的 Web UI 和 CLI 界面,让开发者可以轻松完成预训练、指令微调(SFT)以及 PPO、DPO 等强化学习对齐任务 [1-3]。此外,LlamaFactory 汇集了 LoRA、QLoRA、FlashAttention-2 等先进的高效微调算法与技巧,极大降低了硬件显存消耗并显著提升了训练速度 [3]。无论您是进行云端训练还是本地部署,该平台都能为 AI 开发者提供一站式、高扩展性的模型优化解决方案 [1, 3]。
核心看点
- 兼容百种以上大模型:完美支持 LLaMA、Qwen 等前沿 LLM,提供 SFT、PPO 及 DPO 等全套微调方案 [1-3]。
- 零代码便捷微调:提供直观的 Web UI 与命令行接口,无需复杂编程即可轻松实现大模型的本地或云端训练 [1]。
- 前沿高效微调技术:集成 LoRA、QLoRA 及诸多加速插件,大幅降低算力硬件门槛与显存消耗 [3]。
60 秒短视频脚本 点击展开
LlamaFactory: Unified Fine-Tuning for 100+ LLMs — video
这是一段为您定制的 60 秒短视频脚本,严格按照字数要求并基于提供的文档内容编写:
【钩子开场】(14字)
零代码微调百种大模型,它来了!
【核心解说一】(28字)
汇聚百余种大模型,自带零代码Web界面,零基础也能轻松微调。[1, 2]
【核心解说二】(29字)
内置LoRA等先进算法,结合加速技术,训练极快且大幅节省显存。[3]
【核心解说三】(30字)
亚马逊和英伟达都在用,轻松应对多轮对话与图文理解等复杂任务。[1, 3]
【收束】
LLaMA Factory,你的一站式大模型微调神器,快来体验吧!
课后巩固
与本文内容匹配的闪卡与测验,帮助巩固所学知识
延伸阅读
根据本文主题,为你推荐相关的学习资料
