惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
K
Kaspersky official blog
T
Threat Research - Cisco Blogs
PCI Perspectives
PCI Perspectives
www.infosecurity-magazine.com
www.infosecurity-magazine.com
P
Privacy International News Feed
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
U
Unit 42
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
P
Privacy & Cybersecurity Law Blog
O
OpenAI News
量子位
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
C
Cisco Blogs
AWS News Blog
AWS News Blog
Vercel News
Vercel News
Microsoft Security Blog
Microsoft Security Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
美团技术团队
T
Threatpost
S
Schneier on Security
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
C
Cyber Attacks, Cyber Crime and Cyber Security
Last Week in AI
Last Week in AI
C
CERT Recently Published Vulnerability Notes
Blog — PlanetScale
Blog — PlanetScale
C
Cybersecurity and Infrastructure Security Agency CISA
F
Full Disclosure
博客园_首页
N
Netflix TechBlog - Medium
Security Latest
Security Latest
有赞技术团队
有赞技术团队
Google DeepMind News
Google DeepMind News
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
The Register - Security
The Register - Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Recent Announcements
Recent Announcements
博客园 - Franky
P
Palo Alto Networks Blog
Project Zero
Project Zero
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
H
Help Net Security
Hacker News: Ask HN
Hacker News: Ask HN
Cisco Talos Blog
Cisco Talos Blog
H
Heimdal Security Blog
The Hacker News
The Hacker News
博客园 - 【当耐特】
GbyAI
GbyAI

鱼雨昱

以彼之矛-攻彼之盾:通过伪造运行环境无损解压RedBendEFDPackage固件包 松下 Let's Note RZ5 侧边无线物理开关在Linux下的复活指南 全站升级:正式全面启用 HTTP/3(QUIC)和IPv6 日机拾贝之富士通Felica锁定的强制解除 索尼电脑恢复镜像中MOD文件的处理 新sdat2img Yuu Web Synth Engine Web Office Toolbox(WOT) PyPianoCatSongDataExtractor
基于决策树和线性回归模型以优化深度优先搜索(DFS)性能
2024-07-27 · via 鱼雨昱

近期在实习的时候遇到一个应用和数据库间查询时的性能优化问题,在跟同事讨论解决方案时,最终选定了线性回归模型的办法。但是具体怎么落地的,就碍于NDA不能明说了。

树的构建与数据生成

尝试定义了一个简单的树结构和一个生成比较大的树的方法

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.children = []

def createTreeBesar(depth, breadth):
    def addChildren(node, currentDepth):
        if currentDepth < depth:
            for _ in range(breadth):
                child = TreeNode(random.randint(1, 100))
                node.children.append(child)
                addChildren(child, currentDepth + 1)
    root = TreeNode(random.randint(1, 100))
    addChildren(root, 1)
    return root

产生样本数据

def generateSampleData():
    data = []
    for _ in range(10000):
        value = random.randint(1, 1000)
        priority = random.random()
        data.append([value, priority])
    data = np.array(data)
    X = data[:, :-1]
    y = data[:, -1]
    return X, y

模型的训练与加载

为避免浪费每次的运行时间和适合性能评估,将保存模型

def trainPriorityModel():
    X, y = generateSampleData()
    model = DecisionTreeRegressor()
    model.fit(X, y)
    joblib.dump(model, 'priorityModel.pkl')
    return model

def trainIndexModel(data):
    values = [node.value for node in data]
    positions = list(range(len(data)))
    model = LinearRegression()
    model.fit(np.array(values).reshape(-1, 1), positions)
    joblib.dump(model, 'indexModel.pkl')
    return model

def loadModel(filePath, trainFunc):
    if os.path.exists(filePath):
        return joblib.load(filePath)
    else:
        return trainFunc()

深度优先搜索和性能评估

def standardDfs(node, visited):
    if node is None or node in visited:
        return
    visited.add(node)
    for child in node.children:
        standardDfs(child, visited)

def indexedDfs(node, visited, indexModel, data):
    if node is None or node in visited:
        return
    visited.add(node)
    for child in node.children:
        locatedNode = locateNode(indexModel, child.value, data)
        indexedDfs(locatedNode, visited, indexModel, data)

def evaluatePerformance(treeRoot, priorityModel, indexModel, data):
    startTime = time.time()
    visitedStandard = set()
    standardDfs(treeRoot, visitedStandard)
    standardTime = time.time() - startTime
    print(f"Standard DFS Run Time: {standardTime:.6f} SEC")

    startTime = time.time()
    visitedIndexed = set()
    indexedDfs(treeRoot, visitedIndexed, indexModel, data)
    indexedTime = time.time() - startTime
    print(f"Indexed DFS Run Time: {indexedTime:.6f} SEC")

结果

结果好像很好的样子?
indexed_dfs_opt