惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

爱范儿
爱范儿
Know Your Adversary
Know Your Adversary
Google DeepMind News
Google DeepMind News
A
Arctic Wolf
P
Privacy & Cybersecurity Law Blog
云风的 BLOG
云风的 BLOG
Stack Overflow Blog
Stack Overflow Blog
V
Visual Studio Blog
Project Zero
Project Zero
L
LangChain Blog
N
News and Events Feed by Topic
博客园 - Franky
Last Week in AI
Last Week in AI
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Scott Helme
Scott Helme
T
The Exploit Database - CXSecurity.com
P
Proofpoint News Feed
Blog — PlanetScale
Blog — PlanetScale
www.infosecurity-magazine.com
www.infosecurity-magazine.com
W
WeLiveSecurity
月光博客
月光博客
博客园_首页
美团技术团队
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
腾讯CDC
Latest news
Latest news
WordPress大学
WordPress大学
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Spread Privacy
Spread Privacy
Attack and Defense Labs
Attack and Defense Labs
量子位
L
LINUX DO - 热门话题
C
CERT Recently Published Vulnerability Notes
Webroot Blog
Webroot Blog
L
Lohrmann on Cybersecurity
aimingoo的专栏
aimingoo的专栏
T
Troy Hunt's Blog
Security Latest
Security Latest
小众软件
小众软件
Cloudbric
Cloudbric
Hacker News: Ask HN
Hacker News: Ask HN
S
Secure Thoughts
雷峰网
雷峰网
T
Threat Research - Cisco Blogs
H
Hacker News: Front Page
IT之家
IT之家
Simon Willison's Weblog
Simon Willison's Weblog

轶哥博客

blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog blog
blog
轶哥 · 2023-08-23 · via 轶哥博客

Objectives of Supervised Fine-Tuning:

  1. Enhance Specific Task Performance : Aligning instructions with particular tasks.
  2. Domain Adaptation : Making the model compatible with specialized areas.
  3. Improve Interpretability and Controllability : Enhancing the model's ability to be understood and directed.

Overall, the goal is to improve robustness, which refers to the system's resilience.

Core Considerations:

  1. Diversity : To prevent overfitting, the data must be diverse. Diversity not only enhances generalization but also inference ability. It's not just about having many knowledge categories but also functional ones. The data volume for each category should be as balanced as possible; otherwise, imbalances may lead to oversensitivity to some and undersensitivity to others. Diversity can also be achieved by prompt template construction or data augmentation methods, like expanding translation instructions from Chinese to English.
  2. Avoid Mistaking SFT for Data Supplementation : SFT is not merely about adding more data; the model may remember some of it, but that's not the main purpose.
  3. Few-Shot and COT (Chain of Thought) Data Integration : Adding these into training can facilitate the model’s comprehension of instructions and multi-turn dialogue ability.
  4. Emphasis on Data Quality over Quantity in SFT : Typically, around 10,000 finely labeled data points can achieve good results.
  5. Quality over Quantity : Expanding data volume without enhancing diversity will significantly reduce benefits, while optimizing data quality will notably increase gains.

Data Quality Requirements:

  1. Length Constraints : Neither the question nor the answer should be overly long or short. Ideally, no more than 4k tokens.
  2. No Incorrect Answers : Only select high-quality data.
  3. Special Industry Requirements : For domains demanding high inference abilities, try to gather more CoT data.
  4. Diverse NLP Abilities Required : Including classification, structured output, creative writing, multi-turn dialogue, ancient Chinese translation, keyword recognition, reading comprehension, idiom explanation, text correction, sentiment analysis, entity recognition, programming, text matching, copywriting, song reviews, open questions, composition writing, storytelling, structured extraction, summarizing, closed questions, CoT, objective test questions, brainstorming, etc. (Avoid using only vertical domain data).
  5. Vertical Domain Data Proportions : Avoid too much; secondary pre-training (PT) could lead to better learning, and no vertical domain data might be added to SFT data.

Examples:

Good Dataset: Question: What's the name of the third child of Xiao Ming's mother, who has three children, with the first one named Yi Mao, and the second Er Mao? Answer: The question starts with "Xiao Ming's mother," so the third child is Xiao Ming, as per the premise.

Poor Dataset: Question: Same as above. Answer: Xiao Ming. (This direct answer lacks a thought process, emphasizing CoT)

Q & A

Why include coding ability in SFT? Teaching AI to write code is a way to instruct it to dissect problems and assemble solutions, which greatly enhances reasoning and structured output capabilities. Research supports this, including the increase in translation ability, which also boosts AI's problem-solving skills, along with other seemingly unrelated abilities.

为什么我不建议在不做PT的情况下做SFT?

如果不为了二次预训练,目前大部分模型都提供了Chat版本,直接用就好。SFT对于数据质量要求很高,在数据质量不高的情况下通过Base去做SFT容易反向优化。提升数据质量所消耗的成本也不低。

如何判断SFT的效果?

这是一个非常复杂的问题。但是可以尝试将您的问题对照场景拆解后让AI辅助解答。参考下图。然后你可以继续发送你的具体问题,让AI进行逐步分析。

image.png

image.png

image.png