DeepSeek赋能Nuclei：打造网络安全检测的“超级助手”

引言

各位少侠，周末快乐，幸会幸会！

今天唠一个超酷的技术组合——用AI大模型给Nuclei开挂，提升漏洞检测能力！

想象一下，当出现新漏洞时，少侠们经常需要根据Nuclei模板，手动扒漏洞文章、敲代码，而既然现在有了DeepSeek等AI大模型，让它们成为“智能小秘书”，喝口咖啡的工夫，模板就自动生成了！

值得注意的是，Github上也有各式各样的Nuclei的自建poc仓库，如何快速集成，确保自己的Nuclei背后使用的poc仓库最大最全最准确，也是需要思考的问题！

当然，隐侠也在构建面向业内的知识库、漏洞库，以及Github上的poc库，少侠们敬请期待，不日将会与大家见面。

接下来，就带大家看看刚刚描述的AI大模型与Nuclei的“神仙组合”到底是怎么玩转网络安全检测的！

为啥要让DeepSeek和Nuclei组CP？

Nuclei的模块化检测范式与工程瓶颈

作为漏洞检测标准工具，Nuclei通过YAML模板实现检测逻辑与引擎解耦，这种"检测即代码"（Detection as Code）的模式使其具备以下特性：

原子化检测单元：每个模板对应CWE、CVE等漏洞特征，支持组合式检测策略。
跨平台兼容性：基于HTTP/RAW协议层的抽象，实现从Web应用到IoT设备的统一检测。
敏捷响应能力：无需重新编译即可动态加载新检测规则。

但模板编写存在显著工程瓶颈：

知识转化延迟：人工分析漏洞报告→提取攻击向量→编码为匹配规则的平均耗时约37分钟/模板。
特征覆盖盲区：2024年Log4j2漏洞爆发时，主流模板库覆盖率仅68%，导致企业暴露攻击面窗口期延长。

模版举例：

id: CVE-2023-25157

info:
  name: GeoServer OGC Filter - SQL Injection
  author: ritikchaddha,DhiyaneshDK,iamnoooob,rootxharsh
  severity: critical
  description: |
    GeoServer isanopensource software server written in Java that allows users to share andedit geospatial data. GeoServer includes support for the OGC Filter expression languageand the OGC Common Query Language (CQL) as part of the Web Feature Service (WFS) and Web Map Service (WMS) protocols. CQL is also supported through the Web Coverage Service (WCS) protocol for ImageMosaic coverages. Users are advised to upgrade to either version2.21.4, orversion2.22.2toresolve this issue. Users unable to upgrade should disable the PostGIS Datastore *encode functions* setting to mitigate ``strEndsWith``, ``strStartsWith`` and ``PropertyIsLike `` misuse and enable the PostGIS DataStore *preparedStatements* setting to mitigate the ``FeatureId`` misuse.
  reference:
    - https://twitter.com/parzel2/status/1665726454489915395
    - https://nvd.nist.gov/vuln/detail/CVE-2023-25157
    - https://github.com/win3zz/CVE-2023-25157
    - https://github.com/geoserver/geoserver/security/advisories/GHSA-7g5f-wrx8-5ccf
  classification:
    cvss-metrics: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
    cvss-score: 9.8
    cve-id: CVE-2023-25157
    cwe-id: CWE-89
    cpe: cpe:2.3:a:osgeo:geoserver:*:*:*:*:*:*:*:*
  metadata:
    verified: "true"
    shodan-query: title:"geoserver"
tags: cve,cve2023,geoserver,ogc,sqli,intrusive

http:
  - raw:
      - |
        GET /geoserver/ows?service=WFS&version=1.0.0&request=GetCapabilities HTTP/1.1
        Host: {{Hostname}}
      - |
        GET /geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName={{name}}&maxFeatures=50&outputFormat=csv HTTP/1.1
        Host: {{Hostname}}
      - |
        GET /geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName={{name}}&CQL_FILTER=strStartswith({{column}},%27%27%27%27)=true HTTP/1.1
        Host: {{Hostname}}
    stop-at-first-match: true
    iterate-all: true
    matchers-condition: and
    matchers:
      - type: word
        part: body_3
        words:
          - "SQL SELECT"

      - type: word
        part: header_3
        words:
          - text/xml

    extractors:
      - type: regex
        part: body_1
        group: 1
        name: name
        regex:
          - '<FeatureType><Name>(.*?)<\/Name><Title>'
        internal: true

      - type: regex
        part: body_2
        group: 1
        name: column
        regex:
          - 'FID,([aA-zZ_]+),'
        internal: true

以DeepSeek为代表的通用大模型，则可提升这一过程的效率。利用如下的结构化提示，将输出空间限制在有限维度，结合以下技术保障确定性：

参数冻结机制：对HTTP方法、匹配条件等关键字段进行类型约束
防御性解析：AI输出经语法树校验和沙箱验证
知识蒸馏：基于漏洞数据库模型，提升CWE特征识别准确率

# 提示工程示例
prompt_template = """
基于漏洞报告生成Nuclei模板（JSON格式）：
输入特征：
- 漏洞路径：{endpoint}
- 有效载荷样本：{payload_sample}
- CVSS评分：{cvss_score}

输出约束：
1. 必须包含{BaseURL}占位符
2. 匹配规则需包含状态码、关键词、正则三元组
3. 严重级别按CVSS v3.1划分
"""

典型场景：漏洞文摘->POC，从“手搓”到“秒产”

1. 漏洞文章“一键抓取”

本文基于Python做实现，基于requests库访问漏洞文章网址，再靠BeautifulSoup把网页“拆解”得明明白白。

要是遇到网站设的“反爬虫关卡”（返回403状态码），它还会“机智”地重试3次，绝不轻易放弃！最终把文章里的漏洞路径、攻击Payload这些关键信息全都“挖”出来。

实现代码：

def _crawl_article(self, url):
    for retry in range(3):
        try:
            resp = self.session.get(url, timeout=20)
            resp.raise_for_status()
            if resp.status_code == 403:
                raise RuntimeError("Anti-bot triggered")
            soup = BeautifulSoup(resp.text, 'lxml')
            return {
                'title': self._extract_title(soup),
                'cve': self._extract_cve(soup),
                'endpoint': self._find_vuln_path(soup),
                'payloads': self._extract_payloads(soup),
                'references': self._find_references(soup),
                'raw_html': resp.text[:5000] # Limit content size
            }
        except requests.RequestException as e:
            if retry == 2:
                raise RuntimeError(f"Request failed after 3 attempts: {str(e)}")
            self.logger.warning(f"Retrying ({retry+1}/3)...")

2. DeepSeek的“魔法翻译”

这里是DeepSeek大显身手的地方！我们给它“喂”一段精心设计的提示模板，就像给它下达任务指令：“根据这些漏洞信息，用JSON格式生成Nuclei模板参数！” 为了防止网络不稳定掉链子，同样设置了3次重试。DeepSeek接收到指令后，一顿“分析猛如虎”，很快就能返回包含漏洞ID、匹配规则等信息的参数。

prompt=f"""根据漏洞报告生成Nuclei模板（JSON格式）：

输入特征：
- 漏洞路径：{content['endpoint']}
- 有效载荷：
{chr(10).join(f'- {p}'forp in content['payloads'][:2])}
- 参考链接：
{chr(10).join(content['references'][:2])}

输出要求：
1. 严重等级按CVSS评分划分
2. 必须包含{{{{BaseURL}}}}变量
3. 包含状态码、关键词、正则匹配

输出格式：
{{
"id": "漏洞ID",
"name": "漏洞名称",
"method": "HTTP方法",
"paths": ["攻击路径"],
"matchers": {{
    "status": 200,
    "keywords": ["特征关键词"],
    "regex": ["正则表达式"] 
  }},
"severity": "严重等级",
"references": ["参考链接"],
"description": "漏洞描述",
"fofa_query": "FOFA查询语句",
"tags": ["漏洞类型"]
}}
"""

def _analyze_with_ai(self, content):
    prompt = self._build_prompt(content)
    for attempt in range(3):
        try:
            response = self.client.chat.completions.create(
                model="deepseek-chat",
                messages=[
                    {"role": "system", "content": "严格使用JSON格式输出"},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.2,
                max_tokens=2000,
                response_format={"type": "json_object"}
            )
            return self._process_ai_response(response.choices[0].message.content)
        except Exception ase:
            if attempt == 2:
                raise RuntimeError(f"API request failed: {str(e)}")
            self.logger.warning(f"Retrying API call ({attempt+1}/3)...")

3. 模板“拼装大师”上线

这个方法就像个“模板拼装大师”，把DeepSeek返回的参数“组装”成完整的Nuclei模板。

它会先把基本信息、请求方法这些“零件”摆好，要是参数里有正则匹配需求，还会自动添加对应的“正则匹配模块”。

最后把模板以YAML格式保存到指定文件夹，文件名还带着时间戳，方便管理！

def _build_template(self, ai_data):
        """构建Nuclei模板文件"""
        template = {
            "id": ai_data.get("id", "auto-generated"),
            "info": {
                "name": ai_data.get("name", "Unknown Vulnerability"),
                "author": "AutoPOCGenerator",
                "severity": ai_data.get("severity", "medium"),
                "description": ai_data.get("description", "Generated by DeepSeek AI"),
                "reference": ai_data.get("references", []),
                "tags": ai_data.get("tags", ["ai-generated"]),
                "metadata": {
                    "fofa-query": ai_data.get("fofa_query", "")
                }
            },
            "requests": [{
                "method": ai_data.get("method", "GET"),
                "path": ai_data.get("paths", ["{{BaseURL}}"]),
                "matchers-condition": "and",
                "matchers": [
                    {"type": "status", "status": [ai_data["matchers"]["status"]]},
                    {"type": "word", "words": ai_data["matchers"]["keywords"]}
                ]
            }]
        }

        # 添加正则匹配
        if ai_data["matchers"].get("regex"):
            template["requests"][0]["matchers"].append({
                "type": "regex", 
                "regex": ai_data["matchers"]["regex"]
            })

        # 生成文件名
        template_dir = Path(self.config['paths']['template_dir'])
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = template_dir / f"{template['id']}_{timestamp}.yaml"

        # 写入文件
        with open(filename, 'w', encoding='utf-8') as f:
            yaml.dump(template, f, allow_unicode=True, sort_keys=False)
        
        self.logger.info(f"Template saved: {filename}")
        return str(filename.resolve())

如此这番，从一篇漏洞文章变成一个可利用的nuclei poc的工作就完成了。

值得注意的是，AI输出毕竟有其不稳定性，一是需要人工确认poc内容是否与漏洞文章中的描述一致，二是可以使用如下命令，检测poc能否正常运行：

./nuclei -t ./nuclei_templates/path-traversal-vite-project_20250417_2313.yaml -validate

Nuclei的“私人定制”更新与扫描

1.带配置的“智能更新”

读取相关配置，找到Nuclei可执行文件的位置。要是配置里开了代理，它会自动在更新命令里加上代理参数，就像给Nuclei更新加上“专属通道”。执行完更新命令，还会把结果详细记录到日志里，更新出问题了一眼就能发现。

def update_nuclei():
    try:
        config = load_config()
        nuclei_binary = Path(config["paths"]["nuclei_binary"])
        ifnot nuclei_binary.exists():
            raise FileNotFoundError(f"Nuclei可执行文件不存在：{nuclei_binary}")
        cmd = [str(nuclei_binary.resolve()), "-update"]
        if config["proxy"]["enable"]:
            cmd.extend(["-proxy", config["proxy"]["address"]])
            logging.info("已启用代理更新")
        result = subprocess.run(
            cmd,
            check=True,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True
        )
        logging.info(f"模板更新成功\n{result.stdout}")
    except subprocess.CalledProcessError as e:
        error_msg = f"更新失败：{e.output if e.output else '无错误详情'}"
        logging.error(error_msg)
    except Exception as e:
        logging.error(f"未知错误：{str(e)}")

2. 扫描命令“私人订制”

build_command函数能根据我们的需求，生成Nuclei扫描命令。不管你是想扫描单个目标，还是指定多个POC模板路径，它都能搞定！它会先检查POC路径是不是真的存在，再把代理、速率限制这些参数按配置加上，就像给扫描任务穿上“定制装备”。

def build_command(config, target_file, pocs):
    cmd = [
        './'+str(NUCLEI_BINARY),
        '-list', str(target_file),
        '-rate-limit', '100',
        '-timeout', '30'
    ]
    if pocs:

        validated_pocs = [ ]

        for poc_path in pocs:
            path = Path(poc_path).resolve()
            if not path.exists():
                raise FileNotFoundError(f"POC路径不存在: {path}")
            validated_pocs.append(str(path))
        cmd.extend(['-t', ','.join(validated_pocs)])
    if config['proxy'].get('enable', "True"):
        cmd.extend(['-proxy', config['proxy']['address']])
    print(cmd)
    return cmd

由此，我们的Nuclei就像不断被磨砺过的利剑，始终保持锋利的模样。

企业级模板管理：模板的“智能仓库”

1. 模板仓库“自动补货”

这里构建的代码，就像个“仓库管理员”，定期从GitHub上找最新的Nuclei模板仓库。通过GitHub API搜索，筛选出30天内更新的仓库，用异步操作批量克隆或更新到本地，保证我们的模板库永远“新鲜”。

async def dynamic_repo_discovery(self):
    if not self.config['ENABLE_STAGE1']:
        print(f"\n{'='*30} 已跳过仓库同步阶段 {'='*30}")
        return
    print(f"\n{'='*30} 阶段1: 动态仓库同步 {'='*30}")
    new_repos = await self._fetch_github_repos()
    self._update_repo_registry(new_repos)
    with open(REPO_FILE) as f:
        urls = {line.strip() for line in f if line.strip()}
    tasks = [self._async_git_ops(url) for url in urls]
    batch_size = self.config['GIT_PARALLEL'] * 2
    for i in range(0, len(tasks), batch_size):
        await asyncio.gather(*tasks[i:i+batch_size])