惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
The Blog of Author Tim Ferriss
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
云风的 BLOG
云风的 BLOG
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
P
Palo Alto Networks Blog
D
Docker
H
Hackread – Cybersecurity News, Data Breaches, AI and More
S
Schneier on Security
Engineering at Meta
Engineering at Meta
I
InfoQ
L
LangChain Blog
Cyberwarzone
Cyberwarzone
T
Tenable Blog
WordPress大学
WordPress大学
P
Privacy & Cybersecurity Law Blog
罗磊的独立博客
Apple Machine Learning Research
Apple Machine Learning Research
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Jina AI
Jina AI
C
CERT Recently Published Vulnerability Notes
Scott Helme
Scott Helme
博客园 - 三生石上(FineUI控件)
酷 壳 – CoolShell
酷 壳 – CoolShell
Know Your Adversary
Know Your Adversary
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Last Watchdog
The Last Watchdog
Last Week in AI
Last Week in AI
Cloudbric
Cloudbric
S
SegmentFault 最新的问题
爱范儿
爱范儿
Application and Cybersecurity Blog
Application and Cybersecurity Blog
博客园 - 叶小钗
AI
AI
T
Tor Project blog
I
Intezer
T
Threatpost
www.infosecurity-magazine.com
www.infosecurity-magazine.com
V
Visual Studio Blog
N
News and Events Feed by Topic
Latest news
Latest news
S
Security Affairs
博客园 - Franky
Microsoft Security Blog
Microsoft Security Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
B
Blog RSS Feed
C
Cybersecurity and Infrastructure Security Agency CISA
Hugging Face - Blog
Hugging Face - Blog
小众软件
小众软件
S
Securelist

ccagml

使用valgrind观察luajit进程内存 - ccagml 从生产环境报错学习protobuf编码规则 - ccagml 跟着vscode插件学设计模式-工厂模式 - ccagml 从技能状态图标显示错误到给 LuaJIT 报告bug - ccagml 游戏系统MySQl执行超时问题排查 - ccagml lua中pairs和ipairs都做了什么操作 - ccagml lua中#号是怎么计算字符串长度的 - ccagml lua中#号是怎么计算长度的 - ccagml lua中tonumber做了什么 - ccagml
拨开迷雾,探寻深夜游戏集群启动失败真相 - ccagml
2022-09-18 · via ccagml

背景

前几天深夜更新游戏版本,半夜服务器集群启动失败了

排查过程

  1. 发现有启动失败coredump文件
  2. 使用gdb查看coredump原因
    gdb ../pf/main coredump_cds_2022-09-15_00\:25\:20.txt
    
  3. 查询所有线程堆栈, 找到coredump问题线程
    thread apply all bt
    
    Thread 6 (LWP 32566):
    #0  WriteCoreDumpLimited (
        file_name=0x7ff8101470f8 "coredump_cds_2022-09-15_00\:25\:20.txt", max_length=1073741824)
        at src/coredumper.c:183
    #1  0x00007ff87208bafe in sig_handler (sig=6, 
        si=0x7ff859d63bf0, unused=0x7ff859d63ac0)
        at console_linux.cpp:238
    #2  <signal handler called>
    #3  0x00007ff87164e2c7 in raise ()
    from /lib64/libc.so.6
    #4  0x00007ff87164f9b8 in abort ()
    from /lib64/libc.so.6
    #5  0x00007ff8716470e6 in __assert_fail_base ()
    ---Type <return> to continue, or q <return> to quit---
    from /lib64/libc.so.6
    #6  0x00007ff871647192 in __assert_fail ()
    from /lib64/libc.so.6
    #7  0x00007ff866f757e6 in ConnMgr::threadAllocateClientConn (this=0x7ff8671aa6e0 <__g_ConnMgr_singleton>, 
        szIP=..., uPort=7085, nCookies=25130)
        at ConnMgr.cpp:745
    
  4. 排查祖传代码
    map<int, ClientConnPtr>::iterator mapIter = m_FdClientConnMap.find(fd);
    if( mapIter != m_FdClientConnMap.end())
    {
        assert(0);
    }
    
    • 可以看出,当创建连接的时候,如果申请的到fd原本就存在我们的m_FdClientConnMap中,那么就认为创建的socket连接有问题
  5. 排查祖传代码是否有问题
    1. 排查代码后发现,当lua脚本层消息发送失败后,会直接关闭对应socket连接
    2. 而对m_FdClientConnMap的数据却需要到下一帧的时候才进行处理
    3. 因为在linux中会复用fd编号, 如果这时候创建新的socket连接就会导致新连接的fd还在m_FdClientConnMap中,从而导致启动游戏服务器集群失败