NLP——Chat-Template使用说明

关键词：chat_template, chat_chat_template, chat template

参考链接：
- 【官方】tokenizer 使用说明

整体说明

Chat Template 是为了适配不同模型产生的
理论上 Chat Template 与模型一一绑定，但同系列的模型经常相同
Chat Template 定义一般在 tokenizer_config.json 文件中的 chat_template 字段
使用 Chat Template 的函数是 AutoTokenizer.apply_chat_template，这是Hugging Face Transformers 库中 tokenizer 的一个核心方法
论文主要详细介绍 apply_chat_template 函数的用法

apply_chat_template 函数签名

函数签名详情：

def apply_chat_template(
    self,
    conversation: Union[list[dict[str, str]], list[list[dict[str, str]]]],
    tools: Optional[list[Union[dict, Callable]]] = None,
    documents: Optional[list[dict[str, str]]] = None,
    chat_template: Optional[str] = None,
    add_generation_prompt: bool = False,
    continue_final_message: bool = False,
    tokenize: bool = True,
    padding: Union[bool, str, PaddingStrategy] = False,
    truncation: bool = False,
    max_length: Optional[int] = None,
    return_tensors: Optional[Union[str, TensorType]] = None,
    return_dict: bool = False,
    return_assistant_tokens_mask: bool = False,
    tokenizer_kwargs: Optional[dict[str, Any]] = None,
    **kwargs,
) -> Union[str, list[int], list[str], list[list[int]], BatchEncoding]:

apply_chat_template 核心参数详解

conversation（必需参数）

对话数据，可以是单个对话列表或批量对话列表，每个消息必须包含 "role" 和 "content" 键
参数类型: Union[list[dict[str, str]], list[list[dict[str, str]]]]

简单参考示例：

# 单轮对话
conversation = [
    {"role": "user", "content": "你好，请问今天天气如何？"},
    {"role": "assistant", "content": "您好！我需要知道您的位置才能提供准确的天气信息。"}
]

# 多轮对话
conversation = [
    {"role": "system", "content": "你是一个有用的AI助手"},
    {"role": "user", "content": "解释量子计算"},
    {"role": "assistant", "content": "量子计算是利用量子力学现象进行计算的技术..."},
    {"role": "user", "content": "它与传统计算有什么区别？"}
]

tokenize

控制输出格式是否为 tokenized（token ID 列表）还是文本字符串
参数类型: bool，默认值是 True
- 对于需要直接传递给模型的场景，建议保持 True；
- 对于需要查看格式化文本或进行自定义处理的场景，可以设置为 False

简单参考示例：

# 输出文本格式
text_output = tokenizer.apply_chat_template(
    conversation, 
    tokenize=False
)
print(text_output)  # 例如: <|user|>你好！<|end|><|assistant|>

# 输出token IDs
token_output = tokenizer.apply_chat_template(
    conversation, 
    tokenize=True
)

add_generation_prompt

是否在格式化输出末尾添加助手回复的提示符
参数类型: bool，**默认值: False
这是确保模型正确生成助手回复的关键参数
- 当设置为 True 时，会在对话末尾添加表示助手开始回复的特殊标记

简单参考示例：

# 不添加生成提示
output_without = tokenizer.apply_chat_template(
    conversation,
    add_generation_prompt=False,
    tokenize=False
)
# 输出: <|user|>你好！<|end|>

# 添加生成提示
output_with = tokenizer.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=False
)
# 输出: <|user|>你好！<|end|><|assistant|>

continue_final_message

是否继续最后一个消息而不是开始新消息
参数类型: bool，默认值是 False
用于”预填充”模型回复的场景，当你想让模型继续已有的助手消息时使用
- 不能与 add_generation_prompt 同时使用

简单参考示例：

# 预填充助手回复
conversation = [
    {"role": "user", "content": "请用JSON格式回答"},
    {"role": "assistant", "content": '{"answer": "'}
]

# 模型将继续这个JSON格式，而不是开始新消息
formatted = tokenizer.apply_chat_template(
    conversation,
    continue_final_message=True,
    tokenize=True
)
## continue_final_message=True，此时不拼接结束符号
# [Round 0] USER:请用JSON格式回答 ASSISTANT:{"answer": "
## continue_final_message=False（默认），此时拼接结束符号，表示本次会话已经结束
# [Round 0] USER:请用JSON格式回答 ASSISTANT:{"answer": "</longcat_s>

chat_template

自定义的 Jinja2 模板字符串
参数类型: Optional[str]，默认值为 None
当需要使用非默认模板或测试新模板格式时使用

简单参考示例：

custom_template = """
{% for message in messages %}
    {% if message['role'] == 'user' %}
        Human: {{ message['content'] }}
    {% elif message['role'] == 'assistant' %}
        Assistant: {{ message['content'] }}
    {% endif %}
{% endfor %}
"""

output = tokenizer.apply_chat_template(
    conversation,
    chat_template=custom_template,
    tokenize=False
)

tools

工具列表，用于函数调用场景
参数类型: Optional[list[Union[dict, Callable]]]，默认值为 None
每个工具应该是 JSON Schema 格式，包含名称、描述和参数类型

简单参考示例：

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "获取天气信息",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "城市名称"}
                },
                "required": ["location"]
            }
        }
    }
]

formatted = tokenizer.apply_chat_template(
    conversation,
    tools=tools,
    add_generation_prompt=True
)

以上函数的调用方式如 get_weather(**call_info["arguments"])
- 其中 call_info["arguments"] 是一个必须包含 "location" 的字典

documents（待尝试）

文档列表，用于 RAG（检索增强生成）场景
参数类型: Optional[list[dict[str, str]]]，默认值为 None
- 推荐格式为每个文档应包含 "title" 和 "text" 键

简单参考示例：

documents = [
    {
        "title": "量子计算简介",
        "text": "量子计算是利用量子力学现象进行计算的技术..."
    },
    {
        "title": "量子比特",
        "text": "量子比特是量子计算的基本单位..."
    }
]

formatted = tokenizer.apply_chat_template(
    conversation,
    documents=documents,
    add_generation_prompt=True
)

输出控制参数

return_tensors

指定返回的张量类型，return_tensors 参数仅在 tokenize=True 时生效
为了同时支持多种框架，特意设计了不同的返回值类型，减少一次使用时重新转类型的时间，提升速度
参数类型: Optional[Union[str, TensorType]]
- 默认值为 None，此时返回原生 Python 数据结构
- 可选值: 'pt' (PyTorch), 'tf' (TensorFlow), 'np' (NumPy), 'jax' (JAX)
返回张量类型详细说明：
- 若 tokenize=False，则无论如何返回值都是原生 Python 数据结构
- 若 tokenize=True，则根据 return_tensors 判断返回类型
  - 默认值为 None，此时返回原生 Python 数据结构
  - 指定时根据上述指定类型返回值

简单参考示例：

# PyTorch tensors
pytorch_output = tokenizer.apply_chat_template(
    conversation,
    return_tensors="pt",
    add_generation_prompt=True
)

# NumPy arrays
numpy_output = tokenizer.apply_chat_template(
    conversation,
    return_tensors="np",
    add_generation_prompt=True
)

return_dict

是否返回包含多个字段的字典，仅在 tokenize=True 时有效
参数类型：bool，默认值为 False
当 apply_chat_template 函数的 return_dict 参数设为 True 时，函数会返回一个字典格式的结果 ，而非默认的张量（或字符串）
- 默认（return_dict=False）：若 tokenize=True，直接返回 input_ids 张量（若开启 padding，会返回形状为 [batch_size, seq_len] 的张量）
- return_dict=True：返回字典，键值对应模型输入的核心要素，结构清晰，无需手动区分张量类型
字典中会根据需要明确包含 input_ids、attention_mask 等模型输入所需的关键组件，便于直接拆解和使用
- 当 return_assistant_tokens_mask=True 时，会多返回一个 assistant_masks 字段
具体来说，返回的 output 是一个字典，包含以下关键键（根据参数配置可能增减）：
- "input_ids"：对话文本对应的token ID张量（模型输入核心）
- "attention_mask"：注意力掩码张量（标记哪些token是有效文本，哪些是padding，避免模型关注padding）

return_assistant_tokens_mask

参数类型: bool，默认值为 False
当 return_assistant_tokens_mask=True 时，会多返回一个 assistant_masks 字段
是否返回助手生成 token 的掩码，助手生成的 token 对应掩码为 1，用户和系统 token 对应掩码为 0
- 只在 tokenize=True 且 return_dict=True 时生效，否则返回错误:
  1
  ValueError: `return_dict=True` is incompatible with `tokenize=False`, because there is no dict of tokenizer outputs to return.

仅支持包含 {% generation %} ... {% endgeneration %} 关键字的聊天模板

这个关键字不会影响正常的渲染，只是用于标记 assistant 的内容
一般来说，用该标记将 assistant 的整个内容（包括应该学习的所有内容，如 tools 调用等内容都包括进来）

注意：原始的 Jinja 语法是不可以随便写这种位置标签的，但是 apply_chat_template 会对这个标签做特殊处理，所以不用担心

验证：若随机增加未知的标签 {% generationa %} ... {% endgenerationa %} 则会出现下面的问题：

jinja2.exceptions.TemplateSyntaxError: Encountered unknown tag 'generationa'. Jinja was looking for the following tags: 'elif' or 'else' or 'endif'. The innermost block that needs to be closed is 'if'.

亲测：return_assistant_tokens_mask=True 但对 chat_template 有一定的要求（要求包含 {% generation %} 用于标记 assistant 位置），当前大部分 chat_template 都不支持，此时会全屏蔽（返回 assistant_masks 字段全是 0）
- 警告信息如下：
  1
  return_assistant_tokens_mask==True but chat template does not contain `{% raw %} {% generation %} {% endraw %}` keyword.
常用用途：用于训练或分析，标识哪些 token 是由助手生成的

简单参考示例：

output = tokenizer.apply_chat_template(
    conversation,
    return_dict=True,
    tokenize=True,
    return_assistant_tokens_mask=True,
    add_generation_prompt=True
)
# 输出包含助手token掩码信息

返回结果 output 多多包涵一个 assistant_masks 的 list 类型字段，里面为 1 的地方都是 assistant 的 Token
需要注意返回结果中为 1 的 Token 中，可能会多出来一些自定义的 Token，此时需要手动处理一下
- 比如在 assistant 信息前面加入的 <USER> Token 通常会被包含
- 此时需要自己手动识别并去除一下相关的特殊 Token

序列处理参数

padding

用于指定填充类型
参数类型: Union[bool, str, PaddingStrategy]，默认值为 False
仅在 tokenize=True 时生效
- 虽然说明文档中未明确说明这一点，但经过测试，tokenize=False 时不会 padding
可选值为
- True 或 'longest': 填充到批次中最长序列
- 'max_length': 填充到指定最大长度，最大长度由 max_length 参数指定
- False（默认值）或 'do_not_pad': 不填充

简单参考示例：

# 填充到最长序列
padded_output = tokenizer.apply_chat_template(
    batch_conversations,
    padding=True,
    tokenize=True,
    return_tensors="pt"
)

# 填充到最大长度
max_length_output = tokenizer.apply_chat_template(
    conversation,
    padding="max_length",
    max_length=512,
    tokenize=True,
    return_tensors="pt"
)

补充：`padding_side` 属性指定左填充 or 右填充

Tokenizer.apply_chat_template 方法本身并不直接提供选择左 padding（Left Padding）或右 padding (Right Padding) 的参数
padding 方式通常是由 tokenizer 的整体配置（特别是 padding_side 参数）决定的，而不是由 apply_chat_template 方法单独控制
- 如果需要设置 padding 方向，应该在初始化 tokenizer 时或通过 tokenizer.padding_side 属性进行配置
- padding_side 可选值为 "right"(默认值) 或 "left"
注：apply_chat_template 方法主要用于将对话历史格式化为模型期望的输入格式，它会调用 tokenizer 的编码逻辑，而编码过程会遵循 tokenizer 已设置的 padding_side 配置

示例代码：

from transformers import AutoTokenizer

# 初始化 tokenizer 时指定 padding 方向
tokenizer = AutoTokenizer.from_pretrained("model_name", padding_side="left")

# 核心代码
tokenizer.padding_side = "right"

# 应用对话模板时会遵循上述 padding 配置
chat = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there!"}
]
inputs = tokenizer.apply_chat_template(chat, tokenize=True, return_tensors="pt", padding=True)

truncation

是否截断超过最大长度的序列
- 在处理长对话时启用，避免超出模型最大长度限制
参数类型: bool，默认值为 False

补充：`truncation_side` 属性指定左截断 or 右截断

用法与 padding_side 参数类似
truncation_side 是 Tokenizer 类的一个属性，其可选值为：
- "left"：从序列的左侧（开头）截断
- "right"（默认值）：从序列的右侧（结尾）截断（默认值）

max_length

最大长度限制（以token数计），与 padding 或 truncation 配合使用
参数类型: Optional[int]，默认值为 None

tokenizer_kwargs

传递给分词器的额外参数，类型为 Optional[dict[str, Any]]

**kwargs

传递给模板渲染器的额外参数，可在聊天模板中访问

附录：完整使用示例

基础对话生成

简单对话简单示例：

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# 加载模型和tokenizer
model_id = "xxx/xxx"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

# 准备对话
messages = [
    {"role": "system", "content": "你是一个友好的AI助手"},
    {"role": "user", "content": "请解释机器学习的基本概念"},
]

# 格式化对话
tokenized_chat = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

# 生成回复
with torch.no_grad():
    outputs = model.generate(
        tokenized_chat,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True
    )

# 解码回复
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

批量处理

多个对话同时处理

# 批量对话
batch_conversations = [
    [
        {"role": "user", "content": "什么是人工智能？"}
    ],
    [
        {"role": "user", "content": "如何学习编程？"}
    ]
]

# 批量格式化
batch_output = tokenizer.apply_chat_template(
    batch_conversations,
    padding=True,
    truncation=True,
    max_length=512,
    return_tensors="pt",
    add_generation_prompt=True
)

# 批量生成
batch_outputs = model.generate(
    **batch_output,
    max_new_tokens=128,
    temperature=0.7
)

RAG场景使用（待补充）

RAG 使用模板的示例

# 准备文档
documents = [
    {
        "title": "2025年科技趋势报告",
        "text": "人工智能和机器学习技术将继续快速发展..."
    },
    {
        "title": "量子计算进展",
        "text": "量子计算机在特定问题上展现出巨大优势..."
    }
]

# 用户问题
conversation = [
    {"role": "user", "content": "2025年有哪些重要的科技趋势？"}
]

# 使用RAG模板格式化
formatted_input = tokenizer.apply_chat_template(
    conversation,
    documents=documents,
    add_generation_prompt=True,
    return_tensors="pt"
)

注：一般的模板不支持 documents，目前包括 Llama 系列，Qwen 系列等均不支持
支持 documents 的模型示例：huggingface.co/CohereLabs/c4ai-command-r-v01/blob/main/tokenizer_config.json
chat_template 模版不支持 documents 时，方法包括：
- 自己写 Jinja2 模板，把 documents 拼进 system 或第一条 user 消息，再在 vLLM 等框架启动时通过 --chat-template 指定，apply_chat_template 函数也支持该参数；
- 直接在外部把检索结果拼接成普通字符串，再按常规 messages=[{"role":"user","content":"..."}] 传入即可
最佳实践：先通过 Prompt Engineering 找到合适的模版，然后通过固定的模版文件将该形式固定下来，模型微调和线上 serving 均使用这个模版，这样可以避免因为模型微调和线上 serving 不一致带来的问题，也方便团队内外的合作

附录：最佳实践建议

add_generation_prompt 的使用: 在推理时，确保设置 add_generation_prompt=True 以获得正确的助手回复
不同模型使用不同的 chat template，可使用 tokenizer.get_chat_template() 查看具体格式
对于长对话，启用 truncation=True 并设置合适的 max_length
批量处理时合理使用 padding 参数以提高效率，否则可能返回不同长度的编码结果
添加适当的错误处理，特别是对于模板不支持的功能
某些模型不支持 tools 参数，需要检查模型文档
处理长序列时可能遇到内存问题，考虑减小 batch size 或 max_length
确保 conversation 格式正确，每个消息都有 role 和 content 键

附录：关于 tools 类型

在大模型工具调用场景中，code_interpreter（代码解释器）和 function（函数/工具）是两种不同类型的工具

`function`（函数/工具）

function（函数/工具）是预先定义的、具有特定功能的程序函数或API接口，用于让模型调用外部能力 ，模型通过生成符合格式的调用指令（如JSON），触发这些函数执行，并获取返回结果
- 例如：天气查询接口、数据库查询函数、网页爬虫工具等，如模型调用get_weather(city="北京")函数获取实时天气
function 调用方式：模型需严格按照预设格式（如{"name": "函数名", "parameters": {"参数名": "值"}}）生成调用指令，确保函数能被正确解析和执行，例如：
1
{"name": "translate", "parameters": {"text": "Hello", "target_lang": "zh"}}
function 灵活性低：功能固定，只能执行预定义的操作；但安全性高：严格限制在预设函数范围内，风险可控
function适用场景包括
- 需要调用外部服务或系统（如查询实时数据、操作硬件设备）
- 执行结构化任务（如数据库查询、API调用）
- 功能固定、无需动态逻辑的操作（如格式转换、简单计算）

`code_interpreter`（代码解释器）

code_interpreter（代码解释器）是一个能够动态执行代码（通常是Python）的沙箱环境，允许模型直接生成并运行代码来解决问题，模型生成代码后，解释器会运行代码并返回输出结果（包括文本、图表等）
- 例如：执行数学计算、数据可视化、处理 Excel 表格等，如模型生成Python代码计算1+2+...+100的和，并通过代码解释器执行得到结果
code_interpreter 灵活性高：支持任意代码逻辑，可解决复杂、动态的问题；但安全性低：需运行用户/模型生成的代码，存在恶意代码风险（通常通过沙箱隔离缓解）
code_interpreter 中，模型直接生成代码片段（通常包裹在特定标记中，如
... ```），由解释器解析并运行，例如：
1
2
3
4
```python
import numpy as np
result = np.sum(range(1, 101))
print(result)
code_interpreter 适用场景包括
- 需要复杂逻辑计算（如统计分析、公式推导）
- 数据处理与可视化（如绘制图表、处理CSV数据）
- 临时编写简单脚本解决问题（如批量处理文本、解方程）
使用 code_interpreter 时，只需要在 tools 里面加一项 { "type": "code_interpreter" },，这样 chat_template 会自动识别到该字段并输出一些使用信息，告诉模型如何给出代码，并告知模型这个代码可以被执行
- 以 LongCat-Flash-Chat/blob/main/tokenizer_config.json 为例，其具体做法是先将 code_interpreter 包装成一个类似 function 的格式，再统一输出，最终效果就是让模型知道可以调用 code_interpreter 执行代码（"code" 参数内容就是代码）

`function` 和 `code_interpreter` 整体对比

注：在实际应用中，两者常结合使用：function处理外部交互，code_interpreter处理复杂计算，共同扩展大模型的能力边界

维度	`function`（函数）	`code_interpreter`（代码解释器）
核心能力	调用预定义功能接口	动态执行代码逻辑
适用场景	外部服务调用、结构化任务	复杂计算、数据处理、脚本生成
调用格式	严格 JSON 格式指令	代码片段（如 Python）
灵活性	低（固定功能）	高（支持任意逻辑）
安全性	高	需沙箱隔离，风险较高

附录：chat-template 格式化

大部分开源模型的 chat-template 都是压缩为一行的，可读性较差
可以使用下面的代码重新存储 tokenizer 信息
1
tokenizer.save_pretrained(output_model_name)
这样会同步生成得到的 chat-template.jinja 文件，整体格式是更可读的
注：也可以使用大模型来帮忙格式化

附录：chat-template `continue` 语句的使用

老版本的 transformers 中，调用 tokenizer.apply_chat_template 时不支持 chat-template 中有 continue 语句

若遇到类似下面的错误时，升级 transformers 版本后可以解决问题：

1	jinja2.exceptions.TemplateSyntaxError: Encountered unknown tag 'continue'. Jinja was looking for the following tags: 'elif' or 'else' or 'endif'. The innermost block that needs to be closed is 'if'.

附录：Qwen2-72B-Instruct chat-template 使用示例

Qwen2-72B-Instruct 的 chat-template 非常简单
特别需要说明：当不增加 System Prompt 时， Qwen2-72B-Instruct 会默认将 "You are a helpful assistant." 作为 System Prompt

Qwen2-72B-Instruct/tokenizer_config.json 原始定义：

{
"add_prefix_space": false,
"added_tokens_decoder": {
    "151643": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false,
    "special": true
    },
    "151644": {
    "content": "<|im_start|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false,
    "special": true
    },
    "151645": {
    "content": "<|im_end|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false,
    "special": true
    }
},
"additional_special_tokens": ["<|im_start|>", "<|im_end|>"],
"bos_token": null,
"chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

注：Qwen2.5-72B-Instruct 的 chat-template 有调整，支持了工具调用等，同时还修改了默认的 System Prompt 为 "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."

{
...,
"chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
...,
}

Qwen2-72B chat-template 使用示例：

model_name = "/Users/xxx/llm/model/Qwen2-72B-Instruct"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {
        "role": "system",
        "content": "${system_prompt}"
    },
    {
        "role": "user",
        "content": "${user_round_0}"
    },
    {
        "role": "assistant",
        "content": "${assistant_round_0}"
    },
    {
        "role": "user",
        "content": "${user_round_1}"
    },
    {
        "role": "assistant",
        "content": "${assistant_round_1}"
    },
    {
        "role": "user",
        "content": "${assistant_round_2}"
    }
]
output = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
print(output)    
# <|im_start|>system
# ${system_prompt}<|im_end|>
# <|im_start|>user
# ${user_round_0}<|im_end|>
# <|im_start|>assistant
# ${assistant_round_0}<|im_end|>
# <|im_start|>user
# ${user_round_1}<|im_end|>
# <|im_start|>assistant
# ${assistant_round_1}<|im_end|>
# <|im_start|>user
# ${assistant_round_2}<|im_end|>
# <|im_start|>assistant


messages = [
    {
        "role": "system",
        "content": "${system_prompt}"
    },
    {
        "role": "user",
        "content": "${user_round_0}"
    },
    {
        "role": "assistant",
        "content": "${assistant_round_0}"
    },
    {
        "role": "user",
        "content": "${user_round_1}"
    },
    {
        "role": "assistant",
        "content": "${assistant_round_1}"
    }
]

output = tokenizer.apply_chat_template(messages, add_generation_prompt=False, tokenize=False)
print(output)
# <|im_start|>system
# ${system_prompt}<|im_end|>
# <|im_start|>user
# ${user_round_0}<|im_end|>
# <|im_start|>assistant
# ${assistant_round_0}<|im_end|>
# <|im_start|>user
# ${user_round_1}<|im_end|>
# <|im_start|>assistant
# ${assistant_round_1}<|im_end|>


messages = [
    {
        "role": "user",
        "content": "${user_round_0}"
    },
    {
        "role": "assistant",
        "content": "${assistant_round_0}"
    },
    {
        "role": "user",
        "content": "${user_round_1}"
    },
    {
        "role": "assistant",
        "content": "${assistant_round_1}"
    }
]

output = tokenizer.apply_chat_template(messages, add_generation_prompt=False, tokenize=False)
print(output)
# <|im_start|>system
# You are a helpful assistant.<|im_end|>
# <|im_start|>user
# ${user_round_0}<|im_end|>
# <|im_start|>assistant
# ${assistant_round_0}<|im_end|>
# <|im_start|>user
# ${user_round_1}<|im_end|>
# <|im_start|>assistant
# ${assistant_round_1}<|im_end|>

整体说明

apply_chat_template 函数签名

apply_chat_template 核心参数详解

conversation（必需参数）

tokenize

add_generation_prompt

continue_final_message

chat_template

tools

documents（待尝试）

输出控制参数

return_tensors

return_dict

return_assistant_tokens_mask

序列处理参数

padding

补充：padding_side 属性指定左填充 or 右填充

truncation

补充：truncation_side 属性指定左截断 or 右截断

max_length

tokenizer_kwargs

**kwargs

附录：完整使用示例

基础对话生成

批量处理

RAG场景使用（待补充）

附录：最佳实践建议

附录：关于 tools 类型

function（函数/工具）

code_interpreter（代码解释器）

function 和 code_interpreter 整体对比

附录：chat-template 格式化

附录：chat-template continue 语句的使用

附录：Qwen2-72B-Instruct chat-template 使用示例

补充：`padding_side` 属性指定左填充 or 右填充

补充：`truncation_side` 属性指定左截断 or 右截断

`function`（函数/工具）

`code_interpreter`（代码解释器）

`function` 和 `code_interpreter` 整体对比

附录：chat-template `continue` 语句的使用