CANN/PTO-ISA自定义算子示例
Custom PyTorch Operator (KERNEL_LAUNCH) Example【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isaThis example shows how to implement a custom PTO-based kernel and expose it as a PyTorch operator viatorch_npu.Directory Layoutdemos/baseline/add/ ├── op_extension/ # Python package entry (module loader) ├── csrc/ │ ├── kernel/ # PTO kernel implementation │ └── host/ # Host-side PyTorch operator registration ├── test/ # Minimal Python test ├── CMakeLists.txt # Build configuration ├── setup.py # Wheel build script └── README.md # This document1. Implement the kernelAdd a kernel source file underdemos/baseline/add/csrc/kernel/and include it in the build. For example, to buildadd_custom.cpp, add it todemos/baseline/add/CMakeLists.txt:ascendc_library(no_workspace_kernel STATIC csrc/kernel/add_custom.cpp )For build options and details, refer to the Ascend community documentation: https://www.hiascend.com/ascend-c2. Integrate with PyTorch (torch_npu)The host-side implementation lives underdemos/baseline/add/csrc/host/.2.1 Define the operator schema (Aten IR)PyTorch usesTORCH_LIBRARY/TORCH_LIBRARY_FRAGMENTto declare operator schemas that can be called from Python viatorch.ops.namespace.op_name.Example: register a custommy_addoperator in thenpunamespace:TORCH_LIBRARY_FRAGMENT(npu, m) { m.def(my_add(Tensor x, Tensor y) - Tensor); }After this, Python can calltorch.ops.npu.my_add.2.2 Implement the operatorInclude the generated kernel launch headeraclrtlaunch_kernel_name.h(generated by the build system).Allocate output tensors/workspace as needed.Enqueue the kernel viaACLRT_LAUNCH_KERNEL(wrapped byEXEC_KERNEL_CMDin this example).#include utils.h #include aclrtlaunch_add_custom.h at::Tensor run_add_custom(const at::Tensor x, const at::Tensor y) { at::Tensor z at::empty_like(x); uint32_t blockDim 20; uint32_t totalLength 1; for (uint32_t size : x.sizes()) { totalLength * size; } EXEC_KERNEL_CMD(add_custom, blockDim, x, y, z, totalLength); return z; }2.3 Register the implementationRegister the implementation withTORCH_LIBRARY_IMPL. For NPU execution,torch_npuuses thePrivateUse1dispatch key, please find the detailed introcution ofPrivateUse1on Pytorch official website https://docs.pytorch.org/tutorials/advanced/privateuseone.htmlTORCH_LIBRARY_IMPL(npu, PrivateUse1, m) { m.impl(my_add, TORCH_FN(run_add_custom)); }3. Build and runThis example requires PTO Tile Lib, PyTorch,torch_npu, and CANN. Follow the officialtorch_npuinstallation guide:https://gitcode.com/ascend/pytorch#%E5%AE%89%E8%A3%85orpython3 -m pip install -r requirements.txt3.1 Set the target SoCEditdemos/baseline/add/CMakeLists.txtand setSOC_VERSIONto your target (example: A2A3 usesAscend910B1):set(SOC_VERSION Ascendxxxyy CACHE STRING system on chip type)You can query the chip name on the target machine vianpu_smi infoand useAscendChip Nameas the value.3.2 Build the wheelSet the PTO Tile Lib path and build a wheel:export ASCEND_HOME_PATH/usr/local/Ascend/ source /usr/local/Ascend/ascend-toolkit/set_env.sh export PTO_LIB_PATH[YOUR_PATH]/pto-isa rm -rf build op_extension.egg-info python3 setup.py bdist_wheel3.3 Install the wheelcd dist pip uninstall *.whl pip install *.whl3.4 Run the testcd test python3 test.py【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2598427.html
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!