惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
博客园 - Franky
GbyAI
GbyAI
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
爱范儿
爱范儿
IT之家
IT之家
酷 壳 – CoolShell
酷 壳 – CoolShell
aimingoo的专栏
aimingoo的专栏
博客园_首页
MongoDB | Blog
MongoDB | Blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Recent Announcements
Recent Announcements
Scott Helme
Scott Helme
有赞技术团队
有赞技术团队
M
MIT News - Artificial intelligence
C
CERT Recently Published Vulnerability Notes
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
Jina AI
Jina AI
F
Fortinet All Blogs
N
Netflix TechBlog - Medium
L
LangChain Blog
L
LINUX DO - 最新话题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
H
Hacker News: Front Page
MyScale Blog
MyScale Blog
P
Palo Alto Networks Blog
G
Google Developers Blog
Google DeepMind News
Google DeepMind News
AI
AI
T
Troy Hunt's Blog
Microsoft Azure Blog
Microsoft Azure Blog
阮一峰的网络日志
阮一峰的网络日志
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Vercel News
Vercel News
Microsoft Security Blog
Microsoft Security Blog
罗磊的独立博客
S
Secure Thoughts
大猫的无限游戏
大猫的无限游戏
博客园 - 叶小钗
人人都是产品经理
人人都是产品经理
Blog — PlanetScale
Blog — PlanetScale
博客园 - 司徒正美
Apple Machine Learning Research
Apple Machine Learning Research
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - 三生石上(FineUI控件)
S
Security @ Cisco Blogs
Cloudbric
Cloudbric
E
Exploit-DB.com RSS Feed
Attack and Defense Labs
Attack and Defense Labs

博客园 - Len3d

研报和信息查询网站 研报查询链接 初步测试了一下C++11的async/future 徒手画个disk不容易啊。。。 fast powf SSE sqrt还是比C math库的sqrtf快了不少 Mongoose也是个大坑 On extracting ops from LLVM backend Into concurrent LRU caching once again 性能大坑 多项式在线拟合神器 iOS app开发资料整理 完美的视图旋转算法 Windows上使用clang编译 - Len3d nodejs Rpath handling on Linux C++ Web Service SDK Fast integer math tricks for C Point in polygon algorithm C code
A tiny program to benchmark image transpose algorithms
Len3d · 2017-10-22 · via 博客园 - Len3d

Here is the code:

#include <stdio.h>
#include <xmmintrin.h>
#include <windows.h>

typedef __m128 Vec;

typedef unsigned long long value_t;

__forceinline value_t now()
{
    LARGE_INTEGER n;
    QueryPerformanceCounter(&n);
    return n.QuadPart;
}

inline void img_transpose(
    Vec *dst_img, 
    Vec *src_img, 
    const int src_w, 
    const int src_h)
{
#pragma omp parallel for
    for (int j = 0; j < src_w; ++j)
    {
        for (int i = 0; i < src_h; ++i)
        {
            dst_img[j * src_h + i] = src_img[i * src_w + j];
        }
    }
}

inline void img_transpose_block(
    Vec *dst_img, 
    Vec *src_img, 
    const int src_w, 
    const int src_h)
{
#pragma omp parallel for
    for (int j = 0; j < src_w; j += 8)
    {
        for (int i = 0; i < src_h; i += 8)
        {
            const int nsize = min(j + 8, src_w);
            const int msize = min(i + 8, src_h);

            for (int n = j; n < nsize; ++n)
            {
                for (int m = i; m < msize; ++m)
                {
                    dst_img[n * src_h + m] = src_img[m * src_w + n];
                }
            }
        }
    }
}

int main(int argc, char *argv[])
{
    //// performance benchmark ////

    const int w = 1280;
    const int h = 720;
    Vec *a = new Vec [w * h];
    Vec *b = new Vec [w * h];
    value_t start_time, end_time;


    LARGE_INTEGER freq;
    QueryPerformanceFrequency(&freq);
    double ms_per_tick = 1000.0 / (double)freq.QuadPart;



    start_time = now();

    for (int t = 0; t < 50; ++t)
    {
        img_transpose(b, a, w, h);
        img_transpose(a, b, h, w);
    }

    end_time = now();
    printf("img_transpose:          %f ms\n", (double)(end_time - start_time) * ms_per_tick);



    start_time = now();

    for (int t = 0; t < 50; ++t)
    {
        img_transpose_block(b, a, w, h);
        img_transpose_block(a, b, h, w);
    }

    end_time = now();
    printf("img_transpose_block:   %f ms\n", (double)(end_time - start_time) * ms_per_tick);


    delete [] a;
    delete [] b;


    //// algorithm validation ////
    const int width = 1080;
    const int height = 1920;
    Vec *src_img = new Vec [width * height];
    Vec *dst_img = new Vec [height * width];

    for (int j = 0; j < height; ++j)
    {
        for (int i = 0; i < width; ++i)
        {
            src_img[j * width + i].m128_i32[0] = i;
            src_img[j * width + i].m128_i32[1] = j;
        }
    }

    img_transpose_block(dst_img, src_img, width, height);

    for (int j = 0; j < width; ++j)
    {
        for (int i = 0; i < height; ++i)
        {
            int pi = dst_img[j * height + i].m128_i32[0];
            int pj = dst_img[j * height + i].m128_i32[1];

            if (pi != j || pj != i)
            {
                printf("Algorithm is wrong!!!\n");
                goto END_OF_PROGRAM;
            }
        }
    }

END_OF_PROGRAM:
    printf("All done\n");


    return 0;
}

posted on 2017-10-22 21:00  Len3d  阅读(307)  评论()    收藏  举报