惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

Wang's Blog

使用vcpkg和CMake在vs code与clion中构建Qt应用 生成对抗网络(GANs) 机器学习中的一阶与随机优化方法:机器学习模型 Savitzky-Golay 滤波平滑与n阶导数系数的推导 经典CS问题Python实现:搜索问题 图像挖掘中Canny和Sobel边缘检测算法的性能分析 经典CS问题Python实现:简单问题 Design Patterns Used in Qt and OpenCV OpenCV Recipes:图像的几何变换 Jaya 优化算法及其变体 萤火虫算法 遗传算法 模拟退火 随机行走与最优化 自然启发式算法算法分析 机器学习之 TensorFlow 基础 数值 Python: 常微分方程 数值 Python: 求解方程 差分进化算法
【Atlas 移植训练营 极市×昇腾】 Atlas 移植初体验
2023-01-11 · via Wang's Blog

华为的Atlas系列硬件是基于昇腾系列(HUAWEI Ascend)AI 处理器,通过模块、标卡、小站、服务器、集群等丰富的产品形态,打造面向 “ 端、边、云 ” 的全场景AI基础设施方案。在这次的极市与昇腾举行的【Atlas移植营】中,体验了将YOLOv5模型移植到Atlas设备这一过程,期间收获很多。

视觉与处理模块(VPC),可以实现硬件加速的图片抠图(crop)、缩放(resize)、粘贴(paste)等功能。比通用视觉算法库OpenCV要快,能够充分利用硬件性能。处理流程如下所示:

昇腾张量编译器(Ascend Tensor Compiler,简称ATC)是昇腾CANN架构体系下的模型转换工具, 它可以将开源框架的网络模型或Ascend IR定义的单算子描述文件(json格式)转换为昇腾AI处理器支持的.om格式离线模型。其功能架构如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
#include "acl/acl.h"
#include "acl/ops/acl_dvpp.h"


aclError ret = aclInit();
ret = aclFinalize();


uint32_t device_count;
ret = aclrtGetDeviceCount(device_count);
uint32_t deviceid = 0;
ret = aclrtSetDevice(deviceid);

aclrtContext context;
ret = aclrtCreatecontext(&context,deviceid);

aclrtstream stream;
ret = aclrtCreateStream(&stream);

aclrtRunMode runmode;
ret = aclrtGetRunMode(&runmode);

int modelIdl;
model_file = "***.om";
ret = aclmdlLoadFromFileWithMem(strModelName.c_str(), &mModelID, mModelMptr, mModelMSize, mModelWptr, mModelWSize);

aclmdlDesc *modelDesc
model = aclmdlCreateDesc();
ret = aclmdlGetDesc(modelDesc,modelId);



ret = aclmdlUnload(modelId);
ret = aclmdlDestroyDesc(modelDesc);
ret = aclrtResetDevice(deviceId_);
ret = aclFinalize();





size_t modelInputSize;
void *modelInputBuffer = nullptr;
modelInputSize = aclmdlGetInputSizeByIndex(modelDesc, 0);

ret = aclrtMalloc(&modelInputBuffer, modelInputSize, ACL_MEM_MALLOC_NORMAL_ONLY);



aclmdlDataset *input_;
input_ = aclmdlCreateDataset();
aclDataBuffer *inputData = aclCreateDataBuffer(modelInputBuffer, modelInputSize);
ret = aclmdlAddDatasetBuffer(input_, inputData);
ret = aclrtMemcpy(modelInputBuffer, modelInputSize, input_host_memory_+i*3*yolo_params_.INPUT_H*yolo_params_.INPUT_W,yolo_params_.INPUT_H*yolo_params_.INPUT_W*3*sizeof(float), ACL_MEMCPY_DEVICE_TO_DEVICE);

aclmdlDataset *output_;
size_t outputSize = aclmdlGetNumOutputs(modelDesc);
output_ = aclmdlCreateDataset();


for (size_t i = 0; i < outputSize; ++i) {
size_t buffer_size = aclmdlGetOutputSizeByIndex(modelDesc, i);
void *outputBuffer = nullptr;
ret = aclrtMalloc(&outputBuffer, buffer_size, ACL_MEM_MALLOC_NORMAL_ONLY);
if(ret != 0)
{
return -2;
}
aclDataBuffer* outputData = aclCreateDataBuffer(outputBuffer, buffer_size);
ret = aclmdlAddDatasetBuffer(output_, outputData);
if(ret != 0)
{
return -2;
}
}

ret = aclmdlExecute(modelId, input_, output_);

aclDataBuffer* dataBuffer = aclmdlGetDatasetBuffer(output_, idx);


void* dataBufferDev = aclGetDataBufferAddr(dataBuffer);


size_t bufferSize = aclGetDataBufferSizeV2(dataBuffer);


void* buffer = new uint8_t[bufferSize];
aclError aclRet = aclrtMemcpy(buffer, bufferSize, dataBufferDev, bufferSize, ACL_MEMCPY_DEVICE_TO_HOST);

这里的析构是说,在每次执行完之后,都要释放掉input重新创建,因为input部分不可复用
if (input_ != nullptr)
{
for (size_t i = 0; i < aclmdlGetDatasetNumBuffers(input_); ++i)
{
aclDataBuffer* dataBuffer = aclmdlGetDatasetBuffer(input_, i);
aclDestroyDataBuffer(dataBuffer);
}
aclmdlDestroyDataset(input_);
input_ = nullptr;
}


if (output_ != nullptr) {

for (size_t i = 0; i < aclmdlGetDatasetNumBuffers(output_); ++i)
{
aclDataBuffer* dataBuffer = aclmdlGetDatasetBuffer(output_, i);
void* data = aclGetDataBufferAddr(dataBuffer);
(void)aclrtFree(data);
(void)aclDestroyDataBuffer(dataBuffer);
}
(void)aclmdlDestroyDataset(output_);
output_ = nullptr;
}