现在deepspeed的脚本文件是:
# 因为使用 RTX 4000 系列显卡时,不支持通过 P2P 或 IB 实现更快的通信宽带,需要设置以下两个环境变量
# 禁用 NCCL 的 P2P 通信,以避免可能出现的兼容性问题
export NCCL_P2P_DISABLE="1"
# 禁用 NCCL 的 IB 通信,以适应 RTX 4000 系列显卡的特性
export NCCL_IB_DISABLE="1"
# 设置 Hugging Face 模型仓库的镜像地址,方便下载模型等资源
export HF_ENDPOINT=https://hf-mirror.com
# 使用 deepspeed 工具运行 simple_LLaVA_run.py 脚本
# --include localhost:0,1 表示指定在本地的 0 号和 1 号 GPU 上运行任务
# 注:localhost 代表本地机器,0 和 1 是 GPU 的编号
deepspeed --include localhost:0,1 simple_LLaVA_run.py \
--deepspeed ds_zero2_no_offload.json \
--model_name_or_path /home/louis/LK/study/transformers/lk_study/llava_study/my_llava_model/model_01 \
--train_type use_lora \
--data_path /home/louis/LK/study/transformers/lk_study/llava_study/train_llava/data \
--remove_unused_columns false \
--bf16 true \
--fp16 false \
--dataloader_pin_memory True \
--dataloader_num_workers 10 \
--dataloader_persistent_workers True \
--output_dir output_model_user_lora_simple_train \
--num_train_epochs 10 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--save_total_limit 3 \
--report_to "tensorboard" \
--learning_rate 4e-4 \
--logging_steps 10
要用vscode对这个deepspeed命令 执行的python程序进行调试,一个方法是:
1)点击侧边栏“调试”按钮
然后点击“设置”,就会出现“launch.json”文件。
2)launch.json添加内容
在“launch.json”文件的"configurations"的内容中增加下面的内容:
{
"name": "DeepSpeed调试单GPU",
"type": "debugpy",
"request": "launch",
"program": "/home/louis/anaconda3/envs/unsloth_env_py311_torch240/bin/deepspeed", // 替换为实际脚本路径
"console": "integratedTerminal",
"justMyCode": true,
"args": [
"--num_gpus", "1",
"/home/louis/LK/study/transformers/lk_study/llava_study/simple_LLaVA_run.py",
"--deepspeed", "/home/louis/LK/study/transformers/lk_study/llava_study/ds_zero2_no_offload.json",
"--model_name_or_path", "/home/louis/LK/study/transformers/lk_study/llava_study/my_llava_model/model_01",
"--train_type", "use_lora",
"--data_path", "/home/louis/LK/study/transformers/lk_study/llava_study/train_llava/data",
"--remove_unused_columns", "false",
"--bf16", "true",
"--fp16", "false",
"--dataloader_pin_memory", "True",
"--dataloader_num_workers", "10",
"--dataloader_persistent_workers", "True",
"--output_dir", "output_model_user_lora_simple_train",
"--num_train_epochs", "10",
"--per_device_train_batch_size", "1",
"--per_device_eval_batch_size", "1",
"--gradient_accumulation_steps", "8",
"--evaluation_strategy", "no",
"--save_strategy", "epoch",
"--save_total_limit", "3",
"--report_to", "tensorboard",
"--learning_rate", "4e-4",
"--logging_steps", "10"
],
"env": {
"NCCL_P2P_DISABLE": "1",
"NCCL_IB_DISABLE": "1",
"HF_ENDPOINT": "https://hf-mirror.com",
"CUDA_VISIBLE_DEVICES": "0", // 关键:强制单GPU调试
"PYTHONUNBUFFERED": "1", // 确保日志立即输出
"CUDA_LAUNCH_BLOCKING": "1" // 同步CUDA操作
}
}
保存文件
3)调试
点击调试窗口的下三角,选择要调式的deepspeed选项,然后点击做百年的绿色三角,开始调试程序