微调deepseek-R1的喂饭级入门教程

之前我们已经介绍如何部署deepseek-R1,今天这篇我们更进一步,来微调一个自己的deepseek-R1模型。 我们以一个中文medical cot数据集为例,来进行微调。 这个medical cot数据集,是一个medical领域的数据集,包含了很多medical相关的问题和答案。 我们可以使用这个数据集来微调deepseek-R1,让deepseek-R1能够更好地回答medical相关的问题。 下边的每行代码,每个参数都有注释,这也是我自己学习的过程,代码逻辑很简单,大家可以跟着我的思路,一步一步来。

先声明一下我的环境

相同的代码,在不同环境下,运行的结果可能不一样,所以,我这里声明一下我的环境,大家最好和我的环境一致,保证微调可以顺利进行。 因为我使用的是kaggle的资源,在settings选项卡中,Environment Preferences选择的是Always use the same environment, 我的运行环境如下: Platform: Linux python :3.11.13 wandb version 0.20.1 Unsloth 2025.7.3 Transformers: 4.52.4 GPU: Tesla T4 * 2 Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0

这个环境会因为是用的最新的环境配置,所以可能会不断变化,上边最重要的是python版本,使用3.11以上的,其它的版本,可以用最新的。

1、在kaggle中配置好项目环境

在菜单栏的Settings选项卡中,有三个选项: a、Trun on internet : 代码执行过程中,需要网络下载相关的库。 b、Accelerator: 选择需要的GPU资源。我们选择T4 * 2 c、Environment Preferences: 代码的运行环境,我们选择 Always use the latest environment,

 在菜单栏的Add-ons选项卡中,有Secrets选项:是用来设置环境变量的。我设置了调用huggingface和wandb的2个密钥,
 因为后边微调会用到。

2、查看自己环境

1
2
3
### 安装微调工具unsloth
import sys
print(sys.version)

如果输出是3.11.13 (main, Jun 4 2025, 08:57:29) [GCC 11.4.0],和我的环境一致

3、安装微调工具unsloth

安装涉及到的库,最重要的是unsloth相关的库,unsloth是一个微调工具,我们使用它来微调deepseek-R1。 简单介绍一下unsloth,我们引用unsloth的官方介绍:Finetune Gemma 3n, Qwen3, Llama 4, Phi-4 & Mistral 2x faster with 80% less VRAM! 翻译成中文就是:使用unsloth,我们可以快速地微调Gemma 3n, Qwen3, Llama 4, Phi-4 & Mistral 2x,并且使用更少的VRAM。 unsloth的官方介绍:https://unsloth.ai/ 其它一些库,是umsloth的依赖库,如果运行时,提示还缺少某个依赖,根据自己环境安装就可以。

1
2
3
4
5
6
7
%%capture
!pip install trl
!pip uninstall bitsandbytes -y
!pip cache purge
!pip install bitsandbytes --no-cache-dir
!pip install --upgrade --no-deps --force-reinstall --no-cache-dir unsloth unsloth_zoo
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git 

4、安装涉及的其它库

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 用于微调的模块
from unsloth import FastLanguageModel # FastLanguageModel,用于优化推理和微调的库
import torch # 导入 PyTorch,深度学习重要的开源库
from trl import SFTTrainer # 用于监督微调(SFT)的训练器
from unsloth import is_bfloat16_supported # 检查硬件是否支持bfloat16精度

# Hugging Face 模块
from huggingface_hub import login # 登录 Hugging Face 的API库
from transformers import TrainingArguments # 定义微调框架参数需要的库
from datasets import load_dataset # 加载微调数据集需要的库

# 导入微调日志记录库
import wandb # 导入wandb,用于保存微调过程中的日志

# 导入Kaggle密钥管理
from kaggle_secrets import UserSecretsClient

5、登录wandb和huggingface

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 初始化Hugging Face和WnB的访问令牌
user_secrets = UserSecretsClient() # 从Kaggle的密钥库中获取用户密钥 
hugging_face_token = user_secrets.get_secret("HF_TOKEN") # 获取Hugging Face的访问令牌
wnb_token = user_secrets.get_secret("WB_TOKEN") # 获取wandb的访问令牌

# 登录Hugging Face
login(hugging_face_token) 

# 登录wandb
wandb.login(key=wnb_token) 
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B', # wandb中记录的本次微调日志项目名称
    job_type="training", # wandb作业类型
    anonymous="allow" # 允许匿名访问wandb
)

6、加载DeepSeek R1模型和分词器

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 设置参数
max_seq_length = 2048 # 定义模型能处理的最大序列长度(值越大,一次可以投喂给大模型学习的内容越多,对内存或者GPU要求越高)
dtype = None # 设置为默认数据类型(通常是FP32或根据硬件自动选择)
load_in_4bit = True # 启用4比特量化技术(一种节省内存的优化方法)

# 使用unsloth库的 FastLanguageModel加载DeepSeek R1模型和分词器
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/DeepSeek-R1-Distill-Llama-8B",  # 加载预训练的DeepSeek R1模型(80亿参数版本)
    max_seq_length=max_seq_length, # 确保模型能一次处理最多2048个token
    dtype=dtype, # 使用默认数据类型(根据硬件支持可能为FP16或BF16)
    load_in_4bit=load_in_4bit, # 以4比特量化方式加载模型以节省内存
    token=hugging_face_token, # 使用Hugging Face身份验证令牌
)

7、定义一个提示词,用于测试一下没有微调前模型的效果。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Define a system prompt under prompt_style 
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>{}"""

8、运行模型推理,查看模型推理效果

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 定义一个需要回答的问题
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or 
              sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, 
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""

# 为Unsloth模型启用优化推理模式(提升速度和效率)
FastLanguageModel.for_inference(model)  # Unsloth提供2倍速推理加速!

# 使用上边定义的结构化提示模板(`prompt_style`)格式化问题并进行分词处理
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")  # 将格式化后的内容转换为PyTorch张量并移至GPU

# 使用模型并生成响应
outputs = model.generate(
    input_ids=inputs.input_ids, # 分词处理后的输入问题
    attention_mask=inputs.attention_mask, # 注意力掩码用于处理填充内容
    max_new_tokens=1200, # 限制响应长度为1200个token(防止输出过长)
    use_cache=True, # 启用缓存加速推理
)

# 将生成的输出tokens解码为人类可读文本
response = tokenizer.batch_decode(outputs)

# 提取并仅打印相关响应部分(在"### Response:"之后的内容)
print(response[0].split("### Response:")[1])  

大模型回答内容: Okay, so I need to figure out what cystometry would show for this 61-year-old woman. Let’s start by breaking down the information given. She has a history of involuntary urine loss when she coughs or sneezes, but she doesn’t leak at night. That makes me think about possible conditions related to bladder function, especially something that affects the lower muscle layer of the bladder, like stress urinary incontinence.

She underwent a gynecological exam and a Q-tip test. I’m not entirely sure about the specifics of the Q-tip test, but I think it’s used to assess urethral function. Maybe it helps determine if the urethral sphincter is functioning properly. If the Q-tip test is positive, it might mean that the sphincter has some activity, which could contribute to her symptoms.

Now, about cystometry. From what I remember, cystometry, or urodynamic testing, involves filling the bladder and measuring how it reacts under different conditions. They usually fill the bladder with a fluid, and then they check how much volume is left after the patient can’t hold it anymore (residual volume). Then they might stimulate the bladder (like by tickling the nerve) to see if there are detrusor contractions, which are involuntary contractions of the detrusor muscle.

Given her history, she’s likely experiencing stress urinary incontinence because she loses urine on coughing or sneezing, which are activities that put pressure on the bladder. But since she doesn’t leak at night, it’s probably not a case of genuine urinary incontinence, which can happen at any time, including sleep.

So, in the cystometry, the residual volume would probably be low because she can hold her urine until she needs to release it, especially when the activity that causes the leak isn’t happening. But when she does lose it, the detrusor contractions might be increased because the bladder is trying to empty involuntarily when the pressure is applied, like from coughing.

Wait, but I’m a bit confused. If she doesn’t leak at night, maybe her bladder capacity is normal, and the detrusor contractions are normal. But when she does activities that increase intra-abdominal pressure, the contractions become stronger, leading to leakage. So in the cystometry, when they simulate those conditions, the detrusor contractions would show increased activity.

But I’m not entirely sure. Maybe I should think about the types of incontinence. Stress incontinence is due to weak pelvic floor muscles or urethral sphincter deficiency, which can be assessed with the Q-tip test. If the Q-tip test is positive, it suggests that the sphincter is not functioning well, which would mean that when pressure is applied, the sphincter can’t prevent the urine from leaking.

So in the cystometry, the detrusor contractions might be normal, but the compliance of the bladder might be normal as well. The issue is more about the sphincter’s ability to prevent leakage. So maybe the detrusor contractions are normal, but the residual volume is low because she can hold her urine until it’s no longer possible.

Wait, but the question is specifically about residual volume and detrusor contractions. So residual volume is the amount of urine left in the bladder after she can’t hold it. In stress incontinence, the residual volume is usually normal because the problem is not with emptying the bladder completely, but with the sphincter’s ability to prevent leakage when under pressure.

So, the residual volume would be within normal limits, and the detrusor contractions would be normal as well. The issue is more in the sphincter function, which isn’t measured directly in cystometry but in tests like the Q-tip.

Hmm, I’m getting a bit tangled here. Let me try to organize my thoughts:

  • Patient has stress urinary incontinence symptoms (leak on cough/sneeze) but no leakage
  • at night.
  • Underwent gynecological exam and Q-tip test.
  • Q-tip test: Positive if the urethral sphincter doesn’t function properly.
  • Cystometry: Measures residual volume and detrusor contractions.
  • In stress incontinence, the detrusor contractions are usually normal, and the issue is
  • sphincter deficiency.
  • So, in cystometry, residual volume is normal, and detrusor contractions are normal.

But wait, sometimes in stress incontinence, the detrusor contractions might be slightly increased when the patient is under pressure, but I’m not sure. Maybe it’s more about the sphincter’s resistance.

Alternatively, maybe the detrusor contractions are normal, and the issue is that the sphincter can’t prevent the leakage, so the residual volume is normal.

I think the key point is that in stress incontinence, the main problem is the sphincter, not the detrusor muscle. So, the cystometry would show normal residual volume and normal detrusor contractions. The sphincter function would be impaired, which is why the Q-tip test would be positive.

So, the answer is that cystometry would show normal residual volume and normal detrusor contractions.

Based on the analysis of the patient’s history and the Q-tip test results, the cystometry would reveal normal residual volume and normal detrusor contractions. The primary issue appears to be related to the urethral sphincter’s inability to prevent leakage, indicating stress urinary incontinence. Therefore, the findings of the cystometry would not show increased detrusor contractions but would instead highlight a sphincter deficiency.

有兴趣的朋友可以把这段话翻译成中文,可以看到这个回答比较的啰嗦,而且AI味道比较重。现在我们微调一个我们使用的 推理模型:Fine-tune-DeepSeek-R1-Distill-Llama-8B。微调一个我们自己的私人医生模型。

9.修改一下微调的提示词,和原来提示词的区别是,加了一个点位符{},这个是为了适应我们的微调数据集。

train_prompt_style = “““Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

Instruction:

You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question.

Question:

{}

Response:

{} {}"""

10.下载新的微调数据集

1
2
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True) # 只取前500条数据,这是为了加快微调的速度。当然数据越多,微调的效果越好。
dataset

打印数据集中一条数据看看:

1
dataset[1]

{ ‘Question’: ‘A 33-year-old woman is brought to the emergency department 15 minutes after being stabbed in the chest with … ‘Complex_CoT’: “Okay, let’s figure out what’s going on here. A woman comes in with a stab wound from a screwdriver. It… ‘Response’: ‘In this scenario, the most likely anatomical structure to be injured is the lower lobe of the left lung…. } 一条完整的微调数据,包含3个字段:Question, Complex_CoT, Response。其中Question是患者的问题,Complex_CoT是患者的问题背景,Response是医生的回答。

11、格式化数据集中的数据,适应我们新定义的微调提示词。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
EOS_TOKEN = tokenizer.eos_token  # 通过添加一个结束标记符来定义模型什么时候停止生成文本。
EOS_TOKEN


# 定义格式化微调数据函数
def formatting_prompts_func(examples):  # 接收数据集样本批次作为输入
    inputs = examples["Question"]       # 从数据集中提取医学相关问题
    cots = examples["Complex_CoT"]      # 提取思维链推理过程(逻辑分步解释)
    outputs = examples["Response"]      # 提取模型生成的最终响应(答案)
    
    texts = []  # 初始化空列表用于存储格式化后的提示文本
    
    # 遍历数据集,格式化每个问题、推理步骤和响应
    for input, cot, output in zip(inputs, cots, outputs):  
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN  # 将值插入提示模板并添加序列结束标记
        texts.append(text)  # 将格式化的文本加入列表
    return {
        "text": texts,  # 返回新格式化的数据集,包含结构化提示词的"text"字段
    }

简单介绍一下train_prompt_style.format(input, cot, output):

将 input, cot, output 的值填充到一个预定义的 文本模板(train_prompt_style)中, 示例模板可能类似: “Input:\n{0}\n\nReasoning:\n{1}\n\nAnswer:\n{2}”

这个模板包含三个占位符:{0}, {1}, {2},它们会被分别替换为 input, cot, output 的值。

这就可以让我们理解新的微调提示词中多出一个占位符的原因。

12、生成新的微调数据集

1
2
3
# Update dataset formatting
dataset_finetune = dataset.map(formatting_prompts_func, batched = True)
dataset_finetune["text"][0]

新的数据集格式如下 Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical

Instruction:

You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question.

Question:\nGiven the symptoms of sudden weakness in the left arm and leg, recent long-distance travel, and…
Response:\n\nOkay, let’s see what’s going on here.
\nThe specific cardiac abnormality most likely to be found in this scenario is a patent foramen ovale (PFO).

可以看到response中包括2个占位符{},和我们新的微调提示词中的{}对应。这也是一种链式推理的方式。

13、定义微调技术参数

这里使用的微调技术是LoRA, 简称Low-Rank Adaptation of Large Language Models (LoRA)。微调的技术方案有很多种,比如LoRA、PEFT、QLoRA、Adapter、 Prefix Tuning等。但是,经过工程实践证明,LoRA是目前最有效的微调技术。 下边涉及的这些参数,大家明白是什么意思就行,没必要去纠结为什么那这些参数,而不是其它参数。大模型的参数太多,微调的参数也太多,想全部搞清楚,对于新手来说, 不可能,也完全没必要。学习新知识,先疏理核心知识点最重要。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# 应用LoRA(低秩适应)技术对模型进行微调
model_lora = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA秩:决定可训练适配器的大小(值越大参数越多,值越小效率越高)
    target_modules=[  # 将应用LoRA适配器的Transformer层模块列表
        "q_proj",   # 自注意力机制中的查询投影层
        "k_proj",   # 自注意力机制中的键投影层
        "v_proj",   # 自注意力机制中的值投影层
        "o_proj",   # 注意力层的输出投影层
        "gate_proj",  # 前馈网络层(MLP)的门控投影
        "up_proj",    # Transformer前馈网络的上投影层
        "down_proj",  # Transformer前馈网络的下投影层
    ],
    lora_alpha=16,  # LoRA更新的缩放因子(值越大,LoRA层对模型的影响越强)
    lora_dropout=0,  # LoRA层的Dropout率(0表示完全保留信息,不丢弃)
    bias="none",  # 是否学习偏置项(设置为"none"可节省内存)
    use_gradient_checkpointing="unsloth",  # 通过重计算而非存储激活值来节省内存(适合长文本微调)
    random_state=3407,  # 设置随机种子确保实验结果可复现
    use_rslora=False,  # 是否使用秩稳定LoRA(False表示使用标准秩固定LoRA)
    loftq_config=None,  # 低位微调量化配置(None表示禁用此功能)
)

14、初始化训练器

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 初始化微调训练器 (导入自 trl 的 SFTTrainer)
trainer = SFTTrainer(
    model=model_lora,  # 待微调的模型
    tokenizer=tokenizer,  # 文本处理的tokenizer
    train_dataset=dataset_finetune,  # 训练数据集
    dataset_text_field="text",  # 数据集中包含训练文本的字段名
    max_seq_length=max_seq_length,  # 输入文本的最大长度限制
    dataset_num_proc=2,  # 使用2个CPU线程加速数据预处理
    # 训练参数配置
    args=TrainingArguments(
        per_device_train_batch_size=2,  # 单设备(GPU)批量大小
        gradient_accumulation_steps=4,  # 4步梯度累积后更新权重
        num_train_epochs=1,  # 完整训练轮数
        warmup_steps=5,  # 初始5步学习率线性预热
        max_steps=60,  # 最大训练步数(调试用,完整训练需增大)
        learning_rate=2e-4,  # 权重更新学习率(专为LoRA调优)
        fp16=not is_bfloat16_supported(),  # 启用FP16加速(当不支持BF16时)
        bf16=is_bfloat16_supported(),  # 启用BF16(新GPU数值稳定性更好)
        logging_steps=10,  # 每10步记录训练日志
        optim="adamw_8bit",  # 使用8bit内存优化版AdamW优化器
        weight_decay=0.01,  # 权重衰减正则化防过拟合
        lr_scheduler_type="linear",  # 线性学习率调度器
        seed=3407,  # 固定随机种子确保可复现性
        output_dir="outputs",  # 微调模型保存路径
    ),
)

关键参数说明: 梯度累积: 小批量多次计算梯度后统一更新,突破GPU显存限制 学习率预热: 训练初期逐步提升学习率,避免模型参数剧烈震荡 精度加速: fp16:半精度浮点,兼容性广 ,bf16:脑浮点格式,动态范围更大(需Ampere+架构GPU) 内存优化: adamw_8bit优化器减少70%显存占用,允许更大批量训练 训练终止: max_steps=60常用于快速验证流程,生产环境需移除或增大该值

15、最激动人心的时刻,微调开始

1
trainer_stats = trainer.train()

微调过程输出示例

16、保存微调后的模型

1
wandb.finish()

微调过程概要

17、测试微调后的模型效果

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing 
              but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, 
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""

# 使用FastLanguageModel加载推理模型(Unsloth优化加速)
FastLanguageModel.for_inference(model_lora)  # Unsloth使推理速度提升2倍!
# 将输入问题按指定提示模板标记化,并移至GPU处理
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")
# 使用LoRA微调模型生成回答(配置特定参数)
outputs = model_lora.generate(
    input_ids=inputs.input_ids,          # 标记化的输入ID序列
    attention_mask=inputs.attention_mask, # 处理填充的注意力掩码
    max_new_tokens=1200,                 # 生成回答的最大长度限制
    use_cache=True,                      # 启用缓存加速生成过程
)
# 将生成的token序列解码为可读文本
response = tokenizer.batch_decode(outputs)
# 提取"### Response:"后的模型回答部分并打印
print(response[0].split("### Response:")[1])

微调后的模型回答: Okay, so let’s break this down. We have a 61-year-old woman who’s been dealing with some serious bladder issues for a while now. She’s been having involuntary urine loss whenever she coughs or sneezes, but interestingly, she doesn’t leak at night. That’s an important clue.

Now, she’s had a gynecological exam and a Q-tip test. Hmm, let’s think about what those tests usually show. The Q-tip test is a pretty standard way to figure out if there’s a urethral obstruction. If it’s positive, it means the urethra is kind of narrowed or blocked, which would explain why she’s having these episodes of leakage.

Let’s imagine the Q-tip test came back positive. That would mean there’s an obstruction in the urethra, probably causing some kind of pressure buildup in her bladder. This kind of situation can lead to increased bladder pressure, which might make the bladder want to contract more often than usual to release that pressure. So, if the bladder contracts more often, we’d expect the person to have more involuntary contractions.

Now, about the cystometry. Cystometry is like a bladder diary on steroids. It shows how much urine is left in the bladder at different times and how the bladder responds to filling up. If the bladder is contracting more often, you’d see more contractions in the cystometry.

But here’s the kicker: she doesn’t leak at night. This suggests that her bladder doesn’t stay full to the point where it’s leaky during sleep. So, even though she’s having a lot of contractions, she’s not accumulating enough urine to leak at night.

This makes me think about the concept of ’leakage’ versus ‘functional’ bladder capacity. She’s leaking during activities, but her bladder is still working to hold up during sleep. It’s like her bladder is managing to hold the urine well when she’s resting, even though it’s not holding up during other activities.

So, if the Q-tip test is positive and the cystometry shows more contractions, we’d expect to see a lot of contractions, but the residual volume wouldn’t be too high, and she wouldn’t leak at night because the bladder is still functional in sleep.

Yeah, that makes sense. It fits with what we know about how the bladder behaves in these situations. It’s all about balancing the bladder’s ability to hold urine versus its tendency to leak when there’s pressure. Based on the findings from the Q-tip test, which is positive, indicating urethral obstruction, and the fact that the woman does not leak at night, cystometry would most likely reveal that her bladder has increased contractions (due to the pressure buildup from the obstruction) but not a significantly high residual volume. This suggests that her bladder is functional enough to retain urine during sleep, despite the increased contractions during activities like coughing or sneezing. Therefore, cystometry would show more contractions, but the residual volume would not be excessively high. <|end▁of▁sentence|>

新的模型回答明显比之前的简洁,也没有浓浓的AI味道。

从整个微调的过程来看,并没有特别的难点,自己实践的话,环境最好和我的保持一致,不然, 可能会出现一些问题。 还有就是,如果你微调其它数据,有一份合适的微调数据集也是重要的。