CANN/catlass 逐令牌反量化
Block Epilogue Per Token Dequant【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass代码位置功能说明BlockEpilogue偏特化实现使用perTokenScale和perChannelScale对block数据做perToken和perChannel的反量化。计算公式$blockD_{ij} blockC_{ij} * perChannelScale_j * perTokenScale_i$当前支持的blockC、perChannelScale、perTokenScale、blockD数据类型blockCperChannelScaleperTokenScaleblockDint32halfhalfhalfint32bfloat16_tbfloat16_tbfloat16_tint32floatfloathalfint32floatfloatbfloat16_t调度策略// For AtlasA2, per token dequant template uint32_t UB_STAGES_ struct EpilogueAtlasA2PerTokenDequant { using ArchTag Arch::AtlasA2; static constexpr uint32_t UB_STAGES UB_STAGES_; };调用示例Block组装参考样例12_quant_matmulconstexpr uint32_t ubStages 2; using EpilogueDispatchPolicy Epilogue::EpilogueAtlasA2PerTokenDequantubStages; using ScaleType Gemm::GemmTypebfloat16_t, layout::VectorLayout; using PerTokenScaleType Gemm::GemmTypebfloat16_t, layout::VectorLayout; using DType Gemm::GemmTypebfloat16_t, layout::RowMajor; using RowBroadcastMulType Gemm::GemmTypefloat, layout::RowMajor; using BroadcastOneBlkType Gemm::GemmTypefloat, layout::RowMajor; using OneBlkColumnBroadcastMulType Gemm::GemmTypefloat, layout::RowMajor; using EpilogueTileShape MatrixShape32, 256; using TileRowBroadcastMul Epilogue::Tile::TileRowBroadcastMulArchTag, RowBroadcastMulType, EpilogueTileShape; using TileBroadcastOneBlk Epilogue::Tile::TileBroadcastOneBlkArchTag, BroadcastOneBlkType, EpilogueTileShape::ROW; using TileOneBlkColumnBroadcastMul Epilogue::Tile::TileOneBlkColumnBroadcastMulArchTag, OneBlkColumnBroadcastMulType, EpilogueTileShape; using TileCopy Epilogue::Tile::TileCopyArchTag, CType, ScaleType, PerTokenScaleType, DType; using TileScheduler Epilogue::Tile::EpilogueHorizontalTileSwizzle;using BlockEpilogue Epilogue::Block::BlockEpilogue EpilogueDispatchPolicy, // 选用的后处理调度策略 CType, // 反量化前block的类型 ScaleType, // perChannelScale的类型 PerTokenScaleType, // perTokenScale的类型 DType, // 反量化后block的类型 TileRowBroadcastMul, // tile组件将(1,n)的scale广播到(m,n)后与block相乘 TileBroadcastOneBlk, // tile组件将(m,1)的perTokenScale广播到(m,32B) TileOneBlkColumnBroadcastMul, // tile组件将(m,32B)的perTokenScale广播到(m,n)后与block相乘 TileCopy, // tileCopy组件 TileScheduler // tile块切分调度 ;Block实例化参考quant_matmul_multistage_workspace在kernel代码的void operator()AscendC::AIV函数中BlockEpilogue blockEpilogue(resource);Block更新params参考quant_matmul_multistage_workspace在kernel代码的void operator()AscendC::AIV函数中EpilogueParams epilogueParams{ params.ptrScale, // perChannelScale的GM地址 layoutScale, // perChannelScale的layout params.ptrPerTokenScale, // perTokenScale的GM地址 layoutPerTokenScale, // perTokenScale的layout params.ptrD, // 输出矩阵的GM地址 layoutD // 输出矩阵的layout }; blockEpilogue.UpdateParams(epilogueParams);Block执行参考basic_matmul在kernel代码的void operator()AscendC::AIC函数中blockEpilogue( blockShapeMNK, // block的shape blockCoordMNK, // block在输出矩阵中的坐标block粒度 actualBlockShapeMNK, // 待处理block的实际shape gmBlockC, // 待处理block在GM上起始地址 layoutBlockC // 待处理block的layout );约束说明当前仅支持blockC、blockD的layout均为RowMajorperChannelScale、perTokenScale的layout均为VectorLayout。【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2598425.html
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!