惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

B
Blog RSS Feed
WordPress大学
WordPress大学
小众软件
小众软件
C
Cisco Blogs
Hugging Face - Blog
Hugging Face - Blog
有赞技术团队
有赞技术团队
Jina AI
Jina AI
The Cloudflare Blog
C
Check Point Blog
月光博客
月光博客
V
Visual Studio Blog
D
Docker
P
Proofpoint News Feed
N
Netflix TechBlog - Medium
人人都是产品经理
人人都是产品经理
腾讯CDC
雷峰网
雷峰网
H
Heimdal Security Blog
Project Zero
Project Zero
Attack and Defense Labs
Attack and Defense Labs
G
Google Developers Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News and Events Feed by Topic
Spread Privacy
Spread Privacy
T
The Blog of Author Tim Ferriss
L
LINUX DO - 最新话题
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Webroot Blog
Webroot Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
V
V2EX - 技术
Google Online Security Blog
Google Online Security Blog
美团技术团队
M
MIT News - Artificial intelligence
N
News | PayPal Newsroom
MongoDB | Blog
MongoDB | Blog
PCI Perspectives
PCI Perspectives
T
Threat Research - Cisco Blogs
P
Privacy & Cybersecurity Law Blog
Google DeepMind News
Google DeepMind News
Microsoft Security Blog
Microsoft Security Blog
B
Blog
Blog — PlanetScale
Blog — PlanetScale
aimingoo的专栏
aimingoo的专栏
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
T
Threatpost
A
About on SuperTechFans
Hacker News: Ask HN
Hacker News: Ask HN
博客园_首页
Last Week in AI
Last Week in AI

Bing: albert einstein

暂无文章

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piy · 2019-09-26 · via Bing: albert einstein

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and \squad benchmarks while having fewer parameters compared to BERT-large. The code and the pretrained models are available at https://github.com/google-research/ALBERT.