Few-Shot学习模式全解

原创灵阙教研团队

A 推荐进阶最佳实践 | 约 8 分钟阅读更新于 2026-02-28

AI 导读

Few-Shot学习模式全解示例选择策略、动态 Few-Shot 构建与 In-Context Learning 的工程化实践 | 2026-02 一、什么是 Few-Shot Learning Few-Shot Learning（少样本学习）是通过在提示词中提供少量示例来引导 LLM 完成特定任务的技术。不同于传统机器学习的微调，Few-Shot 不修改模型权重，而是利用 LLM...

Few-Shot学习模式全解

示例选择策略、动态 Few-Shot 构建与 In-Context Learning 的工程化实践 | 2026-02

一、什么是 Few-Shot Learning

Few-Shot Learning（少样本学习）是通过在提示词中提供少量示例来引导 LLM 完成特定任务的技术。不同于传统机器学习的微调，Few-Shot 不修改模型权重，而是利用 LLM 的上下文学习能力（In-Context Learning, ICL）。

Zero-Shot:  任务描述 -> 模型输出
One-Shot:   任务描述 + 1个示例 -> 模型输出
Few-Shot:   任务描述 + N个示例 -> 模型输出 (N = 2-10)
Many-Shot:  任务描述 + 大量示例 -> 模型输出 (N > 10, 利用长上下文)

二、示例选择策略

2.1 策略分类

策略	原理	适用场景	效果
随机选择	从样本池随机抽取	基线	低
相似度选择	语义最相似的样本	分类/问答	高
多样性选择	覆盖不同类别/模式	多类别任务	高
混合选择	相似度 + 多样性	通用	最高
难度递进	从简到难排列	推理任务	高
对抗选择	包含易错边界样本	精确分类	中高

2.2 相似度选择实现

import numpy as np
from dataclasses import dataclass

@dataclass
class Example:
    input: str
    output: str
    embedding: np.ndarray | None = None

class SimilaritySelector:
    """Select examples most similar to the input query."""

    def __init__(self, examples: list[Example], embed_fn):
        self.examples = examples
        self.embed_fn = embed_fn

        # Pre-compute embeddings for all examples
        for ex in self.examples:
            if ex.embedding is None:
                ex.embedding = self.embed_fn(ex.input)

    def select(self, query: str, k: int = 3) -> list[Example]:
        query_embedding = self.embed_fn(query)

        # Compute cosine similarity
        scores = []
        for ex in self.examples:
            similarity = np.dot(query_embedding, ex.embedding) / (
                np.linalg.norm(query_embedding) * np.linalg.norm(ex.embedding)
            )
            scores.append((similarity, ex))

        # Sort by similarity, return top-k
        scores.sort(key=lambda x: x[0], reverse=True)
        return [ex for _, ex in scores[:k]]

2.3 多样性感知选择

class DiversityAwareSelector:
    """Select examples balancing similarity and diversity."""

    def select(
        self, query: str, k: int = 5,
        lambda_diversity: float = 0.3,
    ) -> list[Example]:
        """MMR-style selection: Maximal Marginal Relevance."""
        query_emb = self.embed_fn(query)
        candidates = list(self.examples)
        selected = []

        for _ in range(k):
            best_score = -float("inf")
            best_idx = -1

            for i, cand in enumerate(candidates):
                # Relevance to query
                relevance = cosine_sim(query_emb, cand.embedding)

                # Max similarity to already selected (redundancy)
                if selected:
                    redundancy = max(
                        cosine_sim(cand.embedding, s.embedding)
                        for s in selected
                    )
                else:
                    redundancy = 0

                # MMR score: balance relevance and diversity
                score = (1 - lambda_diversity) * relevance - \
                        lambda_diversity * redundancy

                if score > best_score:
                    best_score = score
                    best_idx = i

            selected.append(candidates.pop(best_idx))

        return selected

三、动态 Few-Shot 构建

3.1 动态 vs 静态

维度	静态 Few-Shot	动态 Few-Shot
示例选择	编写时固定	运行时按输入选择
适应性	低	高
Token 效率	低（通用示例）	高（针对性示例）
实现复杂度	低	中
效果	中等	显著提升

3.2 完整动态 Few-Shot 系统

from typing import Callable

class DynamicFewShotPrompt:
    """Build prompts with dynamically selected examples."""

    def __init__(
        self,
        system_prompt: str,
        example_pool: list[Example],
        selector: Callable,
        formatter: Callable,
        max_examples: int = 5,
        max_tokens: int = 3000,  # Token budget for examples
    ):
        self.system_prompt = system_prompt
        self.example_pool = example_pool
        self.selector = selector
        self.formatter = formatter
        self.max_examples = max_examples
        self.max_tokens = max_tokens

    def build(self, query: str) -> list[dict]:
        """Build complete prompt with dynamically selected examples."""
        # Select relevant examples
        examples = self.selector(query, k=self.max_examples)

        # Fit within token budget
        examples = self._fit_token_budget(examples)

        # Format into messages
        messages = [{"role": "system", "content": self.system_prompt}]

        for ex in examples:
            messages.append({"role": "user", "content": ex.input})
            messages.append({"role": "assistant", "content": ex.output})

        messages.append({"role": "user", "content": query})
        return messages

    def _fit_token_budget(self, examples: list[Example]) -> list[Example]:
        """Trim examples to fit within token budget."""
        fitted = []
        total_tokens = 0

        for ex in examples:
            ex_tokens = estimate_tokens(
                self.formatter(ex.input, ex.output)
            )
            if total_tokens + ex_tokens > self.max_tokens:
                break
            fitted.append(ex)
            total_tokens += ex_tokens

        return fitted

# Usage
prompt_builder = DynamicFewShotPrompt(
    system_prompt="You are a sentiment classifier. Output: positive/negative/neutral",
    example_pool=labeled_examples,
    selector=SimilaritySelector(labeled_examples, embed_fn).select,
    formatter=lambda inp, out: f"Input: {inp}\nOutput: {out}",
)

messages = prompt_builder.build("This product is amazing but overpriced")
response = await openai.chat.completions.create(
    model="gpt-4o-mini", messages=messages,
)

四、In-Context Learning 理论

4.1 ICL 的工作原理

In-Context Learning Mechanism (simplified)

Input to model:
  [System] You classify sentiment.
  [User]   "Great movie!" -> positive
  [User]   "Terrible food" -> negative
  [User]   "It was okay" -> ?

What happens internally:
  1. Attention mechanism identifies pattern:
     Input text -> sentiment label
  2. Model forms implicit "task vector" from examples
  3. Task vector guides generation for new input
  4. Output: "neutral"

Key insight: The model is NOT learning new weights.
It is performing approximate Bayesian inference
over possible input-output mappings.

4.2 影响 ICL 效果的因素

因素	影响方向	建议
示例数量	3-5 个效果最佳，超过 10 个边际递减	默认用 3-5
示例质量	高质量 > 大数量	人工审核优先
示例顺序	最后的示例影响最大（近因效应）	最相关的放最后
输入-输出格式一致性	格式不一致严重降低效果	严格统一格式
标签分布	不平衡分布导致偏向	每类等比例选取
示例相关性	相关示例远优于随机示例	用相似度选择
模型大小	大模型 ICL 能力更强	小模型多给示例

五、高级 Few-Shot 模式

5.1 Chain-of-Thought Few-Shot

# Standard Few-Shot: Input -> Output
standard_example = Example(
    input="If a train travels 60 km/h for 2.5 hours, how far does it go?",
    output="150 km",
)

# CoT Few-Shot: Input -> Reasoning -> Output
cot_example = Example(
    input="If a train travels 60 km/h for 2.5 hours, how far does it go?",
    output="""Let me think step by step:
1. Speed = 60 km/h
2. Time = 2.5 hours
3. Distance = Speed x Time = 60 x 2.5 = 150 km

The train travels 150 km.""",
)

# CoT significantly improves reasoning accuracy
# Especially effective for: math, logic, multi-step problems

5.2 Self-Consistency with Few-Shot

async def self_consistent_few_shot(
    query: str, examples: list[Example],
    n_samples: int = 5, temperature: float = 0.7,
) -> str:
    """Generate multiple answers and vote on the most common."""
    messages = build_few_shot_messages(examples, query)

    # Generate multiple responses
    responses = []
    for _ in range(n_samples):
        response = await openai.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            temperature=temperature,
        )
        answer = extract_final_answer(response.choices[0].message.content)
        responses.append(answer)

    # Majority vote
    from collections import Counter
    votes = Counter(responses)
    best_answer, count = votes.most_common(1)[0]

    return best_answer  # Confidence = count / n_samples

5.3 Many-Shot ICL（长上下文时代）

# With 128K+ context windows, we can now do Many-Shot ICL
# Research shows: performance continues to improve with 100+ examples

class ManyShotBuilder:
    """Leverage long context for many-shot in-context learning."""

    def build(self, query: str, k: int = 100) -> list[dict]:
        """Build prompt with up to 100 examples."""
        # Select diverse, high-quality examples
        examples = self.diversity_selector.select(query, k=k)

        # Order: diverse first, most similar last (recency bias)
        diverse = examples[:k-5]
        similar = self.similarity_selector.select(query, k=5)

        # Estimate tokens
        total = estimate_tokens(self.system_prompt) + estimate_tokens(query)
        fitted_examples = []

        for ex in diverse + similar:
            ex_tokens = estimate_tokens(format_example(ex))
            if total + ex_tokens > 100_000:  # Leave room for output
                break
            fitted_examples.append(ex)
            total += ex_tokens

        return self._format_messages(fitted_examples, query)

六、评估与优化

6.1 Few-Shot 效果评估

async def evaluate_few_shot_strategy(
    strategies: dict[str, Callable],
    test_set: list[dict],
    model: str = "gpt-4o-mini",
) -> dict:
    """Compare different few-shot strategies."""
    results = {}

    for name, strategy in strategies.items():
        scores = []
        for sample in test_set:
            examples = strategy(sample["input"])
            messages = build_messages(examples, sample["input"])

            response = await openai.chat.completions.create(
                model=model, messages=messages, temperature=0,
            )
            prediction = response.choices[0].message.content
            score = evaluate_prediction(prediction, sample["expected"])
            scores.append(score)

        results[name] = {
            "accuracy": sum(scores) / len(scores),
            "avg_examples": avg_example_count(strategy, test_set),
            "avg_tokens": avg_token_usage(strategy, test_set),
        }

    return results

# Compare strategies
strategies = {
    "random_3": lambda q: random_selector.select(q, k=3),
    "similar_3": lambda q: similarity_selector.select(q, k=3),
    "diverse_5": lambda q: diversity_selector.select(q, k=5),
    "mmr_5": lambda q: mmr_selector.select(q, k=5),
    "many_shot_50": lambda q: many_shot_selector.select(q, k=50),
}

results = await evaluate_few_shot_strategy(strategies, test_data)

6.2 优化维度

维度	优化方向	工具
示例数量	在 3-50 范围搜索最优	Grid search
选择策略	相似度 vs 多样性权重	A/B 测试
示例质量	人工审核 + 自动过滤	LLM-as-judge
排列顺序	相关性递增/递减	消融实验
格式设计	简洁 vs 详细格式	A/B 测试
Token 效率	压缩示例保持信息量	自动摘要

七、实战案例

7.1 中文发票分类

# Example pool for Chinese invoice classification
invoice_examples = [
    Example(
        input="增值税专用发票 / 广州某科技有限公司 / 办公用品 / 5000元",
        output='{"category": "办公费用", "tax_type": "增值税专用", "deductible": true}',
    ),
    Example(
        input="增值税普通发票 / 某餐饮管理公司 / 餐饮服务 / 800元",
        output='{"category": "业务招待费", "tax_type": "增值税普通", "deductible": false}',
    ),
    Example(
        input="增值税电子普通发票 / 中国石化 / 汽油 / 500元",
        output='{"category": "交通费用", "tax_type": "增值税电子普通", "deductible": false}',
    ),
    # ... 50+ examples covering all categories
]

# Dynamic selection ensures the most relevant examples are used
classifier = DynamicFewShotPrompt(
    system_prompt="你是一个发票分类助手。根据发票信息输出JSON分类结果。",
    example_pool=invoice_examples,
    selector=SimilaritySelector(invoice_examples, embed_fn).select,
    formatter=lambda i, o: f"发票：{i}\n分类：{o}",
    max_examples=5,
)

八、总结

Few-Shot Learning 是 LLM 应用中最实用且最被低估的技术之一。动态示例选择相比静态示例可以提升 15-30% 的准确率，而长上下文时代的 Many-Shot 策略进一步推高了上限。

核心实践建议：

永远用动态选择：相似度 + 多样性的 MMR 策略是最佳起点
质量优先于数量：5 个高质量示例胜过 20 个低质量示例
CoT 不可或缺：推理类任务必须在示例中展示思维链
评估驱动优化：用自动评估找到最优示例数量和选择策略

Maurice | [email protected]

深度加工（NotebookLM 生成）

基于本文内容生成的 PPT 大纲、博客摘要、短视频脚本与 Deep Dive 播客，用于多场景复用

PPT 大纲（5-8 张幻灯片）点击展开

Few-Shot学习模式全解 — ppt

这是一份基于您提供的文章《Few-Shot学习模式全解》整理的 PPT 大纲，共包含 6 张幻灯片。每张幻灯片均提炼了核心要点，并附有文献引用标注：

Few-Shot 学习模式概述

核心定义：Few-Shot（少样本学习）是通过在提示词中提供少量示例（通常2-10个）来引导大模型完成特定任务的技术 [1]。
技术本质：有别于传统微调，Few-Shot 不修改模型权重，而是利用模型强大的上下文学习能力（In-Context Learning, ICL）[1]。
运行机制：模型在内部提取示例中的“任务向量”，对可能的输入-输出映射进行近似贝叶斯推断 [2, 3]。
演进形态：从零示例（Zero-Shot）、单示例（One-Shot），发展到当前利用长上下文窗口的大量示例模式（Many-Shot）[1]。

核心组件：示例选择策略

相似度选择：通过计算输入查询与样本库的余弦相似度，选取语义最相似的样本，非常适合分类与问答任务 [1]。
多样性感知选择（MMR策略）：平衡相似度与多样性，避免示例冗余，是通用场景下效果最高的策略 [1]。
针对性高级策略：针对推理任务可采用“从简到难”的难度递进策略，针对精确分类则可引入包含易错边界的“对抗选择”策略 [1]。

动态构建 vs 静态构建

静态构建的局限：提示词在编写时即固定，通用示例导致 Token 效率低，且对不同输入的适应性较差 [1, 4]。
动态构建的优势：在运行时根据用户输入动态选择最相关的示例，高度适应当前场景，Token 利用率高 [1]。
性能对比：工程实践表明，动态示例选择相比传统的静态示例，能够将模型准确率提升 15-30% [5]。

上下文学习 (ICL) 的关键影响因素

示例质量与数量：默认使用 3-5 个示例效果最佳；高质量示例的作用远大于单纯增加数量（超过10个后边际效用递减）[3]。
示例排序的近因效应：排在最后的示例对模型生成结果的影响最大，因此应将最相关、最重要的示例放在最后 [3, 6]。
格式与分布：输入-输出格式的不一致会严重降低模型效果，同时各类别标签需要等比例选取以防模型产生偏向 [3]。

高级 Few-Shot 演进模式

思维链 (CoT) Few-Shot：在示例中不仅给出答案，还展示逐步推理的思维过程，这对数学、逻辑等复杂问题至关重要 [3, 7]。
自我一致性 (Self-Consistency)：基于 Few-Shot 生成多个回答并进行多数投票（Majority vote），能够选出置信度最高的结果 [7]。
长上下文时代的 Many-Shot：在 128K+ 上下文窗口下，输入多达 100+ 个示例，能持续提升大模型的性能上限 [6, 7]。

核心实战建议与优化指南

首选动态 MMR 策略：工程化实践中，建议永远使用动态选择，其中“相似度 + 多样性”的结合是最佳起点 [5]。
坚持质量优先与 CoT：5 个高质量示例胜过 20 个低质量示例；对于推理类任务，不可省略思维链展示 [5]。
通过自动化评估驱动优化：通过编写评估脚本或利用 LLM 作为裁判，对示例数量、选择策略进行网格搜索与 A/B 测试，从而锁定最优解 [8]。

博客摘要 + 核心看点点击展开

Few-Shot学习模式全解 — summary

以下为您基于提供的文章生成的 SEO 友好博客摘要以及核心看点：

SEO 友好博客摘要（约 150 字）

本文全面解析Few-Shot Learning（少样本学习）在LLM中的核心机制与工程化实践[1]。文章深入探讨了上下文学习（ICL）原理[2]，详细对比了相似度与多样性结合的示例选择策略[1]，并展示了如何构建高效的动态Few-Shot系统[3, 4]。结合思维链（CoT）与长文本Many-Shot等高级模式[2, 5]，动态选择策略能显著提升模型15-30%的准确率[6, 7]。本文提供丰富的实战案例，是AI开发者优化提示词的必备指南[6, 7]。

核心看点（每条 < 40 字）