惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

阮一峰的网络日志
阮一峰的网络日志
D
Darknet – Hacking Tools, Hacker News & Cyber Security
S
Schneier on Security
The Last Watchdog
The Last Watchdog
Cyberwarzone
Cyberwarzone
S
Securelist
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cyber Attacks, Cyber Crime and Cyber Security
L
Lohrmann on Cybersecurity
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - 司徒正美
The Cloudflare Blog
V
V2EX
博客园_首页
博客园 - 聂微东
Vercel News
Vercel News
人人都是产品经理
人人都是产品经理
G
GRAHAM CLULEY
T
Tenable Blog
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
L
LINUX DO - 最新话题
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
SecWiki News
SecWiki News
博客园 - 三生石上(FineUI控件)
S
Secure Thoughts
N
News | PayPal Newsroom
T
The Blog of Author Tim Ferriss
The GitHub Blog
The GitHub Blog
T
Troy Hunt's Blog
博客园 - 【当耐特】
Forbes - Security
Forbes - Security
H
Hacker News: Front Page
A
About on SuperTechFans
B
Blog RSS Feed
Engineering at Meta
Engineering at Meta
MongoDB | Blog
MongoDB | Blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
罗磊的独立博客
D
DataBreaches.Net
P
Privacy & Cybersecurity Law Blog
Schneier on Security
Schneier on Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Google DeepMind News
Google DeepMind News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Jina AI
Jina AI
D
Docker
P
Proofpoint News Feed

Amicoyuan

Greyson Chance 2023 Beijing 重启Life分类-Seasons SUMMA:Scalable Universal Matrix Multiplication Algorithm[未更新] 论文阅读:稀疏矩阵向量乘法在申威众核架构上的性能优化
论文阅读:Towards Efficient SpMV on Sunway Manycore Architectures
John Doe · 2023-07-11 · via Amicoyuan

文章链接:

Towards Efficient SpMV on Sunway Manycore Architectures | Proceedings of the 2018 International Conference on Supercomputing (acm.org)

文章总结

dual-side multi-level partitioning technique

三层分块:Block->Tile->Slice

其中在Tile这一层会有空Tile块,不需要计算

其中Slice这一层也会有空Slice切片,不需要计算

最底层Slice切片是我们的计算核心

多级队列:负载均衡—>The work sharing mechanism in the block and slice queuesguarantee the workload balance across fleets and cores.

image-20230711215435026

映射细节:

image-20230711220304175

计算核心处理逻辑

一行8个核心:7个计算核心,1个I/O核心

计算核心负责SPMV计算

I/O核心负责将结果写回内存

多个slice组合—>batch,方便DMA,并进行数据预取(单位batch),注意计算核心slice依然没有改变

利用向量寄存器,巧妙搭载msg

image-20230711221015472

I/O核心的处理逻辑

整个block计算完才写回,避免反复访存

向量计算器meg->reduce

利用神威RMA