unsloth logo

Train gpt-oss, DeepSeek, Gemma, Qwen & Llama 2x faster with 70% less VRAM!

使用减少 70% 的 VRAM,以 2 倍的速度训练 gpt-oss、DeepSeek、Gemma、Qwen 和 Llama! (VRAM - Video Random Access Memory)

✨ Train for Free

✨ 免费训练

Notebooks are beginner friendly. Read our guide. Add dataset, run, then deploy your trained model.

Notebook 对初学者很友好。阅读我们的指南。添加数据集,运行,然后部署你训练好的模型。
Model Free Notebooks Performance Memory use
Qwen3.5 (4B) ▶️ Start for free 1.5x faster 60% less
gpt-oss (20B) ▶️ Start for free 2x faster 70% less
gpt-oss (20B): GRPO ▶️ Start for free 2x faster 80% less
Qwen3: Advanced GRPO ▶️ Start for free 2x faster 50% less
Gemma 3 (4B) Vision ▶️ Start for free 1.7x faster 60% less
embeddinggemma (300M) ▶️ Start for free 2x faster 20% less
Mistral Ministral 3 (3B) ▶️ Start for free 1.5x faster 60% less
Llama 3.1 (8B) Alpaca ▶️ Start for free 2x faster 70% less
Llama 3.2 Conversational ▶️ Start for free 2x faster 70% less
Orpheus-TTS (3B) ▶️ Start for free 1.5x faster 50% less
  • See all our notebooks for: Kaggle, GRPO, TTS, embedding & Vision
  • 查看我们所有的 Notebook,用于:Kaggle, GRPO, TTS (Text-to-Speech), embedding (嵌入) & Vision(视觉)
  • See all our models and all our notebooks
  • 查看我们所有的模型和我们所有的 Notebook
  • See detailed documentation for Unsloth here
  • 在这里查看 Unsloth 的详细文档

⚡ Quickstart

⚡ 快速开始

Linux or WSL

Linux 或 WSL (Windows Subsystem for Linux)
pip install unsloth

Windows

Windows

For Windows, pip install unsloth works only if you have Pytorch installed. Read our Windows Guide.

对于 Windows,只有在安装了 Pytorch 的情况下,pip install unsloth 才能工作。阅读我们的 Windows 指南。

Docker

Docker

Use our official Unsloth Docker image unsloth/unsloth container. Read our Docker Guide.

使用我们官方的 Unsloth Docker 镜像 unsloth/unsloth 容器。阅读我们的 Docker 指南。

AMD, Intel, Blackwell & DGX Spark

AMD, Intel, Blackwell & DGX Spark

For RTX 50x, B200, 6000 GPUs: pip install unsloth. Read our guides for: Blackwell and DGX Spark.
To install Unsloth on AMD and Intel GPUs, follow our AMD Guide and Intel Guide.

对于 RTX 50x, B200, 6000 GPUs (Graphics Processing Units): pip install unsloth。阅读我们的指南:Blackwell 和 DGX Spark。要在 AMD 和 Intel GPU 上安装 Unsloth,请按照我们的 AMD 指南和 Intel 指南操作。

🦥 Unsloth News

🦥 Unsloth 新闻
  • Qwen3.5 - 0.8B, 2B, 4B, 9B, 27B, 35-A3B, 112B-A10B are now supported. Guide + notebooks
  • 现在支持 Qwen3.5 - 0.8B, 2B, 4B, 9B, 27B, 35-A3B, 112B-A10B。指南 + Notebook
  • Train MoE LLMs 12x faster with 35% less VRAM - DeepSeek, GLM, Qwen and gpt-oss. Blog
  • 使用减少 35% 的 VRAM,以 12 倍的速度训练 MoE (Mixture of Experts) LLM (Large Language Models) - DeepSeek、GLM、Qwen 和 gpt-oss。博客
  • Embedding models: Unsloth now supports ~1.8-3.3x faster embedding fine-tuning. BlogNotebooks
  • Embedding (嵌入) 模型:Unsloth 现在支持约 1.8-3.3 倍速的 embedding (嵌入) 微调。博客 • Notebook
  • New 7x longer context RL vs. all other setups, via our new batching algorithms. Blog
  • 通过我们新的批处理算法,实现了比所有其他设置长 7 倍的上下文 RL (Reinforcement Learning)。博客
  • New RoPE & MLP Triton Kernels & Padding Free + Packing: 3x faster training & 30% less VRAM. Blog
  • 新的 RoPE (Rotary Positional Embedding) & MLP (Multilayer Perceptron) Triton Kernels (内核) & Padding Free + Packing (填充免费 + 打包): 3 倍速的训练和 30% 的 VRAM 减少。博客
  • 500K Context: Training a 20B model with >500K context is now possible on an 80GB GPU. Blog
  • 500K 上下文:现在可以在 80GB GPU 上训练具有 >500K 上下文的 20B 模型。博客
  • FP8 & Vision RL: You can now do FP8 & VLM GRPO on consumer GPUs. FP8 BlogVision RL
  • FP8 (8-bit Floating Point) & Vision (视觉) RL (强化学习): 现在可以在消费级 GPU 上进行 FP8 (8-bit Floating Point) 和 VLM (Vision Language Model) GRPO。FP8 博客 • Vision RL
  • Docker: Use Unsloth with no setup & environment issues with our new image. GuideDocker image
  • Docker:使用我们的新镜像,无需设置和环境问题即可使用 Unsloth。指南 • Docker 镜像
  • gpt-oss by OpenAI: Read our RL blog, Flex Attention blog and Guide.
  • OpenAI 的 gpt-oss:阅读我们的 RL (强化学习) 博客、Flex Attention (弹性注意力) 博客和指南。
Click for more news
  • Quantization-Aware Training: We collabed with Pytorch, recovering ~70% accuracy. Read blog
  • Quantization-Aware Training (量化感知训练):我们与 Pytorch 合作,恢复了约 70% 的准确率。阅读博客
  • Memory-efficient RL: We're introducing even better RL. Our new kernels & algos allows faster RL with 50% less VRAM & 10× more context. Read blog
  • Memory-efficient (内存高效) RL (强化学习):我们正在引入更好的 RL (强化学习)。我们的新内核 (kernels) 和算法允许更快的 RL (强化学习),减少 50% 的 VRAM 并增加 10 倍的上下文。阅读博客
  • Mistral 3: Run Ministral 3 or Devstral 2 and fine-tune with vision/RL sudoku notebooks. GuideNotebooks
  • Mistral 3:运行 Ministral 3 或 Devstral 2,并使用视觉/RL (强化学习) 数独 Notebook 进行微调。指南 • Notebook
  • Gemma 3n by Google: Read Blog. We uploaded GGUFs, 4-bit models.
  • Google 的 Gemma 3n:阅读博客。我们上传了 GGUF (GGUF (GPT-Generated Unified Files)) 和 4-bit 模型。
  • Text-to-Speech (TTS) is now supported, including sesame/csm-1b and STT openai/whisper-large-v3.
  • 现在支持 Text-to-Speech (文本到语音) (TTS),包括 sesame/csm-1b 和 STT openai/whisper-large-v3。
  • Qwen3 is now supported. Qwen3-30B-A3B fits on 17.5GB VRAM.
  • 现在支持 Qwen3。Qwen3-30B-A3B 适用于 17.5GB VRAM。
  • Introducing Dynamic 2.0 quants that set new benchmarks on 5-shot MMLU & Aider Polyglot.
  • 引入了 Dynamic 2.0 quants (动态 2.0 量化),在 5-shot MMLU 和 Aider Polyglot 上设置了新的基准。
  • EVERYTHING is now supported - all models (TTS, BERT, Mamba), FFT, etc. MultiGPU is now supported. Enable FFT with full_finetuning = True, 8-bit with load_in_8bit = True.
  • 现在支持所有内容 - 所有模型(TTS (文本到语音), BERT, Mamba)、FFT 等。现在支持 MultiGPU。使用 full_finetuning = True 启用 FFT,使用 load_in_8bit = True 启用 8-bit。
  • 📣 DeepSeek-R1 - run or fine-tune them with our guide. All model uploads: here.
  • 📣 DeepSeek-R1 - 使用我们的指南运行或微调它们。所有模型上传:这里。
  • 📣 Introducing Long-context Reasoning (GRPO) in Unsloth. Train your own reasoning model with just 5GB VRAM. Transform Llama, Phi, Mistral etc. into reasoning LLMs!
  • 📣 在 Unsloth 中引入 Long-context Reasoning (长上下文推理) (GRPO)。仅需 5GB VRAM 即可训练您自己的推理模型。将 Llama、Phi、Mistral 等转换为推理 LLM (大型语言模型)!
  • 📣 Introducing Unsloth Dynamic 4-bit Quantization! We dynamically opt not to quantize certain parameters and this greatly increases accuracy while only using <10% more VRAM than BnB 4-bit. See our collection on Hugging Face here.
  • 📣 引入 Unsloth Dynamic 4-bit Quantization (Unsloth 动态 4 位量化)!我们动态地选择不对某些参数进行量化,这大大提高了准确性,同时仅使用比 BnB 4-bit 少 <10% 的 VRAM。请在此处查看我们在 Hugging Face 上的集合。
  • 📣 Llama 4 by Meta, including Scout & Maverick are now supported.
  • 📣 支持 Meta 的 Llama 4,包括 Scout 和 Maverick。
  • 📣 Phi-4 by Microsoft: We also fixed bugs in Phi-4 and uploaded GGUFs, 4-bit.
  • 📣 Microsoft 的 Phi-4:我们还修复了 Phi-4 中的错误,并上传了 GGUF (GGUF (GPT-Generated Unified Files)) 和 4-bit。
  • 📣 Vision models now supported! Llama 3.2 Vision (11B), Qwen 2.5 VL (7B) and Pixtral (12B) 2409
  • 📣 现在支持视觉模型!Llama 3.2 Vision (11B), Qwen 2.5 VL (7B) 和 Pixtral (12B) 2409
  • 📣 Llama 3.3 (70B), Meta's latest model is supported.
  • 📣 支持 Meta 的最新模型 Llama 3.3 (70B)。
  • 📣 We worked with Apple to add Cut Cross Entropy. Unsloth now supports 89K context for Meta's Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2. For Llama 3.1 (8B), Unsloth enables 342K context, surpassing its native 128K support.
  • 📣 我们与 Apple 合作添加了 Cut Cross Entropy (切割交叉熵)。Unsloth 现在在 80GB GPU 上支持 Meta 的 Llama 3.3 (70B) 的 89K 上下文 - 比 HF+FA2 长 13 倍。对于 Llama 3.1 (8B),Unsloth 启用了 342K 上下文,超过了其本地 128K 支持。
  • 📣 We found and helped fix a gradient accumulation bug! Please update Unsloth and transformers.
  • 📣 我们发现并修复了一个梯度累积 (gradient accumulation) 的错误!请更新 Unsloth 和 transformers。
  • 📣 We cut memory usage by a further 30% and now support 4x longer context windows!
  • 📣 我们进一步削减了 30% 的内存使用量,现在支持 4 倍长的上下文窗口 (context windows)!

🔗 Links and Resources

🔗 链接和资源 (Links and Resources)
Type Links
  r/unsloth Reddit Join Reddit community
📚 Documentation & Wiki Read Our Docs
  Twitter (aka X) Follow us on X
💾 Installation Pip & Docker Install
🔮 Our Models Unsloth Catalog
✍️ Blog Read our Blogs

⭐ Key Features

⭐ 主要特性 (Key Features)
  • Supports full-finetuning, pretraining, 4-bit, 16-bit and FP8 training
  • 支持全微调 (full-finetuning)、预训练 (pretraining)、4-bit、16-bit 和 FP8 训练
  • Supports all models including TTS, multimodal, embedding and more! Any model that works in transformers, works in Unsloth.
  • 支持所有模型,包括 TTS、多模态 (multimodal)、嵌入 (embedding) 等!任何在 transformers 中工作的模型都可以在 Unsloth 中工作。
  • The most efficient library for Reinforcement Learning (RL), using 80% less VRAM. Supports GRPO, GSPO, DrGRPO, DAPO etc.
  • 最高效的强化学习 (Reinforcement Learning, RL) 库,使用少 80% 的 VRAM。支持 GRPO、GSPO、DrGRPO、DAPO 等。
  • 0% loss in accuracy - no approximation methods - all exact.
  • 0% 精度损失 - 没有近似方法 - 所有都精确。
  • Export and deploy your model to GGUF llama.cpp, vLLM, SGLang and Hugging Face.
  • 将您的模型导出并部署到 GGUF llama.cpp、vLLM、SGLang 和 Hugging Face。
  • Supports NVIDIA (since 2018), AMD and Intel GPUs. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc)
  • 支持 NVIDIA (2018 年以来)、AMD 和 Intel GPU。最低 CUDA Capability 7.0 (V100、T4、Titan V、RTX 20、30、40x、A100、H100、L40 等)
  • Works on Linux, WSL and Windows
  • 适用于 Linux、WSL 和 Windows
  • All kernels written in OpenAI's Triton language. Manual backprop engine.
  • 所有内核都用 OpenAI 的 Triton 语言编写。手动反向传播引擎 (Manual backprop engine)。
  • If you trained a model with 🦥Unsloth, you can use this cool sticker!  
  • 如果你用 🦥Unsloth 训练了一个模型,你可以使用这个酷炫的贴纸!  

💾 Install Unsloth

💾 安装 Unsloth

You can also see our docs for more detailed installation and updating instructions here.

您也可以在此处查看我们的文档,了解更详细的安装和更新说明。

Unsloth supports Python 3.13 or lower.

Unsloth 支持 Python 3.13 或更低版本。

Pip Installation

Pip 安装 (Pip Installation)

Install with pip (recommended) for Linux devices:

使用 pip (推荐) 安装 Linux 设备:
pip install unsloth

To update Unsloth:

更新 Unsloth:
pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo

See here for advanced pip install instructions.

有关高级 pip 安装说明,请参见此处。

Windows Installation

Windows 安装 (Windows Installation)

For this method, we will be utilizing Anaconda. You can view the full guide with screenshots here.

对于此方法,我们将使用 Anaconda。您可以在此处查看带有屏幕截图的完整指南。
  1. Install Miniconda (or Anaconda): Miniconda is recommended. Install Miniconda or Anaconda, then open Anaconda PowerShell Prompt to continue.

  2. 安装 Miniconda(或 Anaconda):推荐使用 Miniconda。安装 Miniconda 或 Anaconda,然后打开 Anaconda PowerShell Prompt 继续。
  3. Create a Conda Environment: Create and activate a fresh Python 3.12 environment for Unsloth.

    conda create --name unsloth_env python==3.12 -y
    conda activate unsloth_env
    
  4. 创建 Conda 环境:为 Unsloth 创建并激活一个新的 Python 3.12 环境。 conda create --name unsloth_env python==3.12 -y conda activate unsloth_env
  5. Check Your GPU and CUDA Version: Run nvidia-smi to confirm that your NVIDIA GPU is detected and note the CUDA version shown in the output. If nvidia-smi does not work, reinstall the latest NVIDIA drivers.

  6. 检查您的 GPU 和 CUDA 版本:运行 nvidia-smi 以确认检测到您的 NVIDIA GPU,并记下输出中显示的 CUDA 版本。如果 nvidia-smi 不起作用,请重新安装最新的 NVIDIA 驱动程序。
  7. Install PyTorch: Install the Windows pip build of PyTorch that matches your CUDA version. Use Install PyTorch to select the correct command for your system, then verify that PyTorch can see your GPU.

    import torch
    print(torch.cuda.is_available())
    A = torch.ones((10, 10), device="cuda")
    B = torch.ones((10, 10), device="cuda")
    A @ B
    
  8. 安装 PyTorch:安装与您的 CUDA 版本匹配的 PyTorch 的 Windows pip 构建版本。使用 Install PyTorch 选择适合您系统的正确命令,然后验证 PyTorch 是否可以看到您的 GPU。 import torch print(torch.cuda.is_available()) A = torch.ones((10, 10), device="cuda") B = torch.ones((10, 10), device="cuda") A @ B
  9. Install Unsloth: Only install Unsloth after PyTorch is working correctly.

    pip install unsloth
    
  10. 安装 Unsloth:只有在 PyTorch 正常工作后才安装 Unsloth。 pip install unsloth

Advanced/Troubleshooting

高级/故障排除 (Advanced/Troubleshooting)

For advanced installation instructions or if you see weird errors during installations:

有关高级安装说明,或者如果您在安装过程中看到奇怪的错误:

First try using an isolated environment via then pip install unsloth

首先尝试通过一个隔离环境 (isolated environment) 来执行 pip install unsloth
python -m venv unsloth
source unsloth/bin/activate
pip install unsloth
  1. Install torch and triton. Go to https://pytorch.org to install it. For example pip install torch torchvision torchaudio triton
  2. 安装 torch 和 triton。访问 https://pytorch.org 安装它。例如 pip install torch torchvision torchaudio triton
  3. Confirm if CUDA is installed correctly. Try nvcc. If that fails, you need to install cudatoolkit or CUDA drivers.
  4. 确认 CUDA 是否已正确安装。尝试 nvcc。如果失败,您需要安装 cudatoolkit 或 CUDA 驱动程序。
  5. Install xformers manually via:
  6. 通过以下方式手动安装 xformers:
pip install ninja
pip install -v --no-build-isolation -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
Check if `xformers` succeeded with `python -m xformers.info` Go to https://github.com/facebookresearch/xformers. Another option is to install `flash-attn` for Ampere GPUs and ignore `xformers`
  1. For GRPO runs, you can try installing vllm and seeing if pip install vllm succeeds.
  2. 对于 GRPO 运行,您可以尝试安装 vllm 并查看 pip install vllm 是否成功。
  3. Double check that your versions of Python, CUDA, CUDNN, torch, triton, and xformers are compatible with one another. The PyTorch Compatibility Matrix may be useful.
  4. 仔细检查您的 Python、CUDA、CUDNN、torch、triton 和 xformers 版本是否彼此兼容。PyTorch Compatibility Matrix 可能会很有用。
  5. Finally, install bitsandbytes and check it with python -m bitsandbytes
  6. 最后,安装 bitsandbytes 并使用 python -m bitsandbytes 检查它

Conda Installation (Optional)

Conda 安装(可选)(Conda Installation (Optional))

⚠️Only use Conda if you have it. If not, use Pip. We support python=3.10,3.11,3.12,3.13.

⚠️仅当您拥有 Conda 时才使用 Conda。如果没有,请使用 Pip。我们支持 python=3.10,3.11,3.12,3.13。
conda create --name unsloth_env python==3.12 -y
conda activate unsloth_env

Use nvidia-smi to get the correct CUDA version like 13.0 which becomes cu130

使用 nvidia-smi 获取正确的 CUDA 版本,例如 13.0,它变为 cu130
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
pip3 install unsloth
If you're looking to install Conda in a Linux environment, read here, or run the below 🔽
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

Advanced Pip Installation

高级 Pip 安装 (Advanced Pip Installation)

⚠️Do **NOT** use this if you have Conda. Pip is a bit more complex since there are dependency issues. The pip command is different for torch 2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,2.10 and CUDA versions.

⚠️如果你安装了Conda,**不要**使用这个。Pip有点复杂,因为它存在依赖问题。对于torch 2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,2.10 和 CUDA 版本,pip 命令是不同的。

For other torch versions, we support torch211, torch212, torch220, torch230, torch240, torch250, torch260, torch270, torch280, torch290, torch2100 and for CUDA versions, we support cu118 and cu121 and cu124. For Ampere devices (A100, H100, RTX3090) and above, use cu118-ampere or cu121-ampere or cu124-ampere. Note: torch 2.10 only supports CUDA 12.6, 12.8, and 13.0.

对于其他 torch (torch) 版本,我们支持 torch211, torch212, torch220, torch230, torch240, torch250, torch260, torch270, torch280, torch290, torch2100;对于 CUDA 版本,我们支持 cu118, cu121 和 cu124。 对于 Ampere 架构的设备 (A100, H100, RTX3090) 及以上,请使用 cu118-ampere 或 cu121-ampere 或 cu124-ampere。注意:torch 2.10 仅支持 CUDA 12.6、12.8 和 13.0。

For example, if you have torch 2.4 and CUDA 12.1, use:

例如,如果你有 torch 2.4 和 CUDA 12.1,使用:
pip install --upgrade pip
pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"

Another example, if you have torch 2.9 and CUDA 13.0, use:

另一个例子,如果你有 torch 2.9 和 CUDA 13.0,使用:
pip install --upgrade pip
pip install "unsloth[cu130-torch290] @ git+https://github.com/unslothai/unsloth.git"

Another example, if you have torch 2.10 and CUDA 12.6, use:

另一个例子,如果你有 torch 2.10 和 CUDA 12.6,使用:
pip install --upgrade pip
pip install "unsloth[cu126-torch2100] @ git+https://github.com/unslothai/unsloth.git"

And other examples:

以及其他例子:
pip install "unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git"

pip install "unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"

pip install "unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git"

Or, run the below in a terminal to get the optimal pip installation command:

或者,在终端中运行以下命令以获取最佳的 pip (pip) 安装命令:
wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python -

Or, run the below manually in a Python REPL:

或者,在 Python REPL (Python REPL) 中手动运行以下命令:
try: import torch
except: raise ImportError('Install torch via `pip install torch`')
from packaging.version import Version as V
import re
v = V(re.match(r"[0-9\.]{3,}", torch.__version__).group(0))
cuda = str(torch.version.cuda)
is_ampere = torch.cuda.get_device_capability()[0] >= 8
USE_ABI = torch._C._GLIBCXX_USE_CXX11_ABI
if cuda not in ("11.8", "12.1", "12.4", "12.6", "12.8", "13.0"): raise RuntimeError(f"CUDA = {cuda} not supported!")
if   v <= V('2.1.0'): raise RuntimeError(f"Torch = {v} too old!")
elif v <= V('2.1.1'): x = 'cu{}{}-torch211'
elif v <= V('2.1.2'): x = 'cu{}{}-torch212'
elif v  < V('2.3.0'): x = 'cu{}{}-torch220'
elif v  < V('2.4.0'): x = 'cu{}{}-torch230'
elif v  < V('2.5.0'): x = 'cu{}{}-torch240'
elif v  < V('2.5.1'): x = 'cu{}{}-torch250'
elif v <= V('2.5.1'): x = 'cu{}{}-torch251'
elif v  < V('2.7.0'): x = 'cu{}{}-torch260'
elif v  < V('2.7.9'): x = 'cu{}{}-torch270'
elif v  < V('2.8.0'): x = 'cu{}{}-torch271'
elif v  < V('2.8.9'): x = 'cu{}{}-torch280'
elif v  < V('2.9.1'): x = 'cu{}{}-torch290'
elif v  < V('2.9.2'): x = 'cu{}{}-torch291'
elif v  < V('2.10.1'): x = 'cu{}{}-torch2100'
else: raise RuntimeError(f"Torch = {v} too new!")
if v > V('2.6.9') and cuda not in ("11.8", "12.6", "12.8", "13.0"): raise RuntimeError(f"CUDA = {cuda} not supported!")
if v >= V('2.10.0') and cuda not in ("12.6", "12.8", "13.0"): raise RuntimeError(f"Torch 2.10 requires CUDA 12.6, 12.8, or 13.0! Got CUDA = {cuda}")
x = x.format(cuda.replace(".", ""), "-ampere" if False else "") # is_ampere is broken due to flash-attn
print(f'pip install --upgrade pip && pip install --no-deps git+https://github.com/unslothai/unsloth-zoo.git && pip install "unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git" --no-build-isolation')

Docker Installation

Docker (Docker) 安装

You can use our pre-built Docker container with all dependencies to use Unsloth instantly with no setup required. Read our guide.

您可以使用我们预构建的 Docker (Docker) 容器,其中包含所有依赖项,以便立即使用 Unsloth,无需设置。 阅读我们的指南。

This container requires installing NVIDIA's Container Toolkit.

此容器需要安装 NVIDIA 的 Container Toolkit (Container Toolkit)。
docker run -d -e JUPYTER_PASSWORD="mypassword" \
  -p 8888:8888 -p 2222:22 \
  -v $(pwd)/work:/workspace/work \
  --gpus all \
  unsloth/unsloth

Access Jupyter Lab at http://localhost:8888 and start fine-tuning!

在 http://localhost:8888 访问 Jupyter Lab (Jupyter Lab) 并开始微调!

📜 Documentation

📜 文档
  • Go to our official Documentation for running models, saving to GGUF, checkpointing, evaluation and more!
  • 访问我们的官方文档 (Documentation) 以运行模型、保存为 GGUF、检查点、评估等!
  • Read our Guides for: Fine-tuning, Reinforcement Learning, Text-to-Speech (TTS), Vision and any model.
  • 阅读我们的指南,了解:微调、强化学习、文本到语音 (TTS) (Text-to-Speech (TTS))、视觉和任何模型。
  • We support Huggingface's transformers, TRL, Trainer, Seq2SeqTrainer and Pytorch code.
  • 我们支持 Huggingface (Huggingface) 的 transformers (transformers)、TRL (TRL)、Trainer (Trainer)、Seq2SeqTrainer (Seq2SeqTrainer) 和 Pytorch (Pytorch) 代码。

Unsloth example code to fine-tune gpt-oss-20b:

用于微调 gpt-oss-20b 的 Unsloth 示例代码:
from unsloth import FastLanguageModel, FastModel, FastVisionModel
import torch
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
max_seq_length = 2048 # Supports RoPE Scaling internally, so choose any!
# Get LAION dataset
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
dataset = load_dataset("json", data_files = {"train" : url}, split = "train")

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/gpt-oss-20b-unsloth-bnb-4bit", #or choose any model

] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gpt-oss-20b",
    max_seq_length = max_seq_length, # Choose any for long context!
    load_in_4bit = True,  # 4-bit quantization. False = 16-bit LoRA.
    load_in_8bit = False, # 8-bit quantization
    load_in_16bit = False, # 16-bit LoRA
    full_finetuning = False, # Use for full fine-tuning.
    trust_remote_code = False, # Enable to support new models
    # token = "hf_...", # use one if using gated models
)

# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    max_seq_length = max_seq_length,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    tokenizer = tokenizer,
    args = SFTConfig(
        max_seq_length = max_seq_length,
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 60,
        logging_steps = 1,
        output_dir = "outputs",
        optim = "adamw_8bit",
        seed = 3407,
    ),
)
trainer.train()

# Go to https://unsloth.ai/docs for advanced tips like
# (1) Saving to GGUF / merging to 16bit for vLLM or SGLang
# (2) Continued training from a saved LoRA adapter
# (3) Adding an evaluation loop / OOMs
# (4) Customized chat templates

💡 Reinforcement Learning

💡 强化学习

RL including GRPO, GSPO, FP8 training, DrGRPO, DAPO, PPO, Reward Modelling, Online DPO all work with Unsloth.

包括 GRPO (GRPO)、GSPO (GSPO)、FP8 (FP8) 训练、DrGRPO (DrGRPO)、DAPO (DAPO)、PPO (PPO)、奖励建模 (Reward Modelling)、在线 DPO (Online DPO) 在内的 RL (RL) 都可以在 Unsloth 中使用。

Read our Reinforcement Learning Guide or our advanced RL docs for batching, generation & training parameters.

阅读我们的强化学习指南 (Reinforcement Learning Guide) 或我们的高级 RL (RL) 文档,了解批处理、生成和训练参数。

List of RL notebooks:

RL (RL) 笔记本列表:
  • gpt-oss GRPO notebook: Link
  • gpt-oss GRPO (GRPO) 笔记本:链接
  • FP8 Qwen3-8B GRPO notebook (L4): Link
  • FP8 (FP8) Qwen3-8B GRPO (GRPO) 笔记本 (L4):链接
  • Qwen3-VL GSPO notebook: Link
  • Qwen3-VL GSPO (GSPO) 笔记本:链接
  • Advanced Qwen3 GRPO notebook: Link
  • 高级 Qwen3 GRPO (GRPO) 笔记本:链接
  • ORPO notebook: Link
  • ORPO (ORPO) 笔记本:链接
  • DPO Zephyr notebook: Link
  • DPO (DPO) Zephyr 笔记本:链接
  • KTO notebook: Link
  • KTO (KTO) 笔记本:链接
  • SimPO notebook: Link
  • SimPO (SimPO) 笔记本:链接

🥇 Performance Benchmarking

🥇 性能基准测试
  • For our most detailed benchmarks, read our Llama 3.3 Blog.
  • 有关我们最详细的基准测试,请阅读我们的 Llama 3.3 博客 (Llama 3.3 Blog)。
  • Benchmarking of Unsloth was also conducted by 🤗Hugging Face.
  • 🤗Hugging Face (🤗Hugging Face) 也对 Unsloth 进行了基准测试。

We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):

我们使用 Alpaca Dataset (Alpaca Dataset) 进行了测试,批量大小为 2,梯度累积步数为 4,rank = 32,并在所有线性层(q、k、v、o、gate、up、down)上应用了 QLoRA (QLoRA):
Model VRAM 🦥 Unsloth speed 🦥 VRAM reduction 🦥 Longer context 😊 Hugging Face + FA2
Llama 3.3 (70B) 80GB 2x >75% 13x longer 1x
Llama 3.1 (8B) 80GB 2x >70% 12x longer 1x

Context length benchmarks

上下文长度基准测试

Llama 3.1 (8B) max. context length

Llama 3.1 (8B) 最大上下文长度

We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.

我们测试了 Llama 3.1 (8B) Instruct,并在所有线性层(Q、K、V、O、gate、up 和 down)上进行了 4bit QLoRA (QLoRA),其中 rank = 32,批量大小为 1。我们将所有序列填充到某个最大序列长度,以模拟长上下文微调工作负载。
GPU VRAM 🦥Unsloth context length Hugging Face + FA2
8 GB 2,972 OOM
12 GB 21,848 932
16 GB 40,724 2,551
24 GB 78,475 5,789
40 GB 153,977 12,264
48 GB 191,728 15,502
80 GB 342,733 28,454

Llama 3.3 (70B) max. context length

Llama 3.3 (70B) 最大上下文长度

We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.

我们在 80GB A100 上测试了 Llama 3.3 (70B) Instruct,并在所有线性层(Q、K、V、O、gate、up 和 down)上进行了 4bit QLoRA (QLoRA),其中 rank = 32,批量大小为 1。我们将所有序列填充到某个最大序列长度,以模拟长上下文微调工作负载。
GPU VRAM 🦥Unsloth context length Hugging Face + FA2
48 GB 12,106 OOM
80 GB 89,389 6,916