惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

D
Darknet – Hacking Tools, Hacker News & Cyber Security
V
Vulnerabilities – Threatpost
Cloudbric
Cloudbric
G
GRAHAM CLULEY
S
Securelist
Schneier on Security
Schneier on Security
Help Net Security
Help Net Security
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Project Zero
Project Zero
Spread Privacy
Spread Privacy
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
Cisco Talos Blog
Cisco Talos Blog
T
Tailwind CSS Blog
博客园_首页
有赞技术团队
有赞技术团队
Simon Willison's Weblog
Simon Willison's Weblog
Stack Overflow Blog
Stack Overflow Blog
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
Latest news
Latest news
T
Tor Project blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Attack and Defense Labs
Attack and Defense Labs
www.infosecurity-magazine.com
www.infosecurity-magazine.com
O
OpenAI News
J
Java Code Geeks
T
Tenable Blog
K
Kaspersky official blog
AWS News Blog
AWS News Blog
S
Security @ Cisco Blogs
The GitHub Blog
The GitHub Blog
T
Threatpost
月光博客
月光博客
H
Heimdal Security Blog
Security Latest
Security Latest
The Hacker News
The Hacker News
Y
Y Combinator Blog
A
Arctic Wolf
Apple Machine Learning Research
Apple Machine Learning Research
C
Cisco Blogs
美团技术团队
Microsoft Security Blog
Microsoft Security Blog
Hugging Face - Blog
Hugging Face - Blog
T
The Blog of Author Tim Ferriss
C
CERT Recently Published Vulnerability Notes
D
Docker
Google Online Security Blog
Google Online Security Blog
D
DataBreaches.Net
V
Visual Studio Blog
H
Help Net Security

博客园 - lmqljt

LangChain教程,langchain快速入门, Agent智能体rag项目实战 火山图 差异分析等 箱线图 拓展(缺口箱线图)等 多尺度时序间相关性:MSGNet 时序分析通用基础模型:TimesNet 预测/插补/分类/异常检测 PatchTST:通道独立的时序Transformer 扩散模型Difussion 随笔 大论文题目类参考 注意力机制创新思维分析 热力图 以分类为例 02 国际象棋入门快易精 初级下法 棋子杀王 SCI拒稿重投 torch.manual_seed(seed)用法及注意事项 杂志审稿人打分表参考 python+matplotlib绘图线条类型和颜色选择 Sobol全局灵敏性分析 np.transpose(),torch.permute(),tensor.permute() DTW(动态时间规整)算法原理与应用 第2期 分布迁移下的深度学习时间序列异常检测方法探究 2021-09-22
make_classification函数
lmqljt · 2024-03-01 · via 博客园 - lmqljt

sklearn.datasets.make_classification

sklearn.datasets.make_classification(n_samples=100, n_features=20, *, n_informative=2, n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=None)

生成一个随机的 n nn 类分类问题。

在不打乱的情况下,X按以下顺序水平堆叠特征:主要n_informative特征,然后n_redundant 是信息特征的线性组合,然后是n_repeated 重复,随机抽取信息和冗余特征的替换。其余特征充满随机噪声。因此,无需改组,所有有用的特征都包含在列中 。X[:, :n_informative + n_redundant + n_repeated]

from sklearn.datasets import make_classification

X, y = make_classification(n_samples=6, n_classes=2, n_features=5, n_informative=5,n_redundant=0,n_clusters_per_class=1)
display(X,y)

"""
n_samples=6 - 6行6个数据
n_classes=2 - 结果分为2类即二分类
n_features=5 - 5个特征
n_informative=5 - 5个全部有效的特征
n_redundant=0 - 冗余特征为0
n_clusters_per_class=1 - 每一个类别聚为一个簇

array([[ 1.10885456, -1.97464085,  2.14372944, -0.08241471, -2.60173628],
       [ 0.98456921, -4.67257395, -0.10161149,  0.52329866,  2.0178222 ],
       [-2.92441307, -2.20249011,  0.12827954,  1.90711152,  0.24340137],
       [ 0.14524134, -1.42685331,  1.92731161, -0.72915701,  1.3529692 ],
       [-0.09694719, -0.28604481, -2.62609999, -0.46131174,  0.72515074],
       [ 0.25540393, -2.64589841, -2.05721611,  0.53203936,  0.34273113]])
       
array([0, 1, 1, 0, 1, 0])
"""

同时参考这链接中的不平衡数据部分,以及代码部分的crossvalidate()

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_validate

classifier = RandomForestClassifier() 

scores = cross_validate(
    classifier, X, y, cv=10, 
    scoring=['accuracy', 'precision', 'recall', 'f1','r2', 'neg_mean_squared_error']
)

scores = pd.DataFrame(scores)
scores.mean()

#output
fit_time                       0.296672
score_time                     0.011580
test_accuracy                  0.911000
test_precision                 0.920261
test_recall                    0.904000
test_f1                        0.910236
test_r2                        0.644000
test_neg_mean_squared_error   -0.089000
dtype: float64