Supervised Fine-Tuning (SFT) is the process of taking a pretrained model and continuing training it on labeled examples of the behavior you want.
For LLMs, SFT usually means training on instruction-response pairs so the model learns to follow instructions, answer in a useful format, and behave more like an assistant instead of a raw text completion model.
Core Idea
Pretraining teaches a model broad language patterns from massive text corpora.
SFT teaches the model a specific input-output behavior.
Example:
Instruction: Explain backpropagation in simple terms.
Response: Backpropagation is how a neural network learns from its mistakes...The model is trained to predict the response tokens given the instruction and previous response tokens.
Where It Fits
Typical LLM training pipeline:
- Pretraining
- Supervised Fine-Tuning
- Preference tuning or Reinforcement Learning
- On-Policy Distillation or model consolidation
SFT is often the first post-training stage because it gives the model a basic instruction-following policy before more advanced optimization methods are used.
Objective
SFT is usually trained with next-token prediction, the same basic objective as pretraining.
The difference is the data distribution: instead of random internet text, the data is curated instruction-response examples.
Given an instruction
Equivalently, it minimizes cross-entropy loss over the target response tokens:
Where:
= instruction or prompt = target response = current target token = previous response tokens = model parameters
Data Format
SFT datasets usually contain examples like:
- Instruction only
- Instruction + input context
- Desired response
- Multi-turn conversation
- Tool-use trace
- Chain-of-thought or reasoning trace, when appropriate
For chat models, data is often formatted into roles:
system: You are a helpful assistant.
user: Summarize this article.
assistant: ...The model is usually only trained to predict the assistant tokens, not the user or system tokens.
Why It Works
Pretrained models already know a huge amount about language, facts, coding, and reasoning patterns.
SFT does not teach everything from scratch. It nudges the model toward a desired behavior distribution:
- Follow instructions
- Answer in the expected style
- Refuse unsafe requests
- Use domain-specific terminology
- Produce structured outputs
- Match a product’s tone or workflow
This makes SFT much cheaper than pretraining because it updates an already capable model with a smaller, higher-quality dataset.
SFT vs Pretraining
Pretraining:
- Uses massive unlabeled text
- Learns general language modeling
- Optimizes broad next-token prediction
- Produces a base model
SFT:
- Uses curated labeled examples
- Learns target behavior
- Still uses next-token prediction
- Produces an instruction-following or domain-adapted model
SFT vs Preference Tuning
SFT teaches the model what a good answer looks like.
Preference tuning teaches the model which answer is better when multiple answers are possible.
SFT data:
Prompt -> Ideal responsePreference data:
Prompt -> Response A vs Response B -> Preferred responseSFT is usually simpler and more stable. Preference tuning can further improve helpfulness, reasoning, style, and alignment after the model already knows how to respond.
Benefits
- Simple and stable training objective
- Converts base models into instruction-following assistants
- Can specialize a model for a domain or workflow
- Requires much less compute than pretraining
- Works well with synthetic data from stronger models
Failure Modes
- Low-quality examples teach low-quality behavior
- Too much narrow data can cause overfitting
- The model may imitate formatting without gaining real capability
- Conflicting examples can make behavior inconsistent
- SFT alone does not optimize long-term rewards or user preferences
- Fine-tuning on small datasets can make the model forget some general abilities
Practical Notes
- Dataset quality usually matters more than dataset size
- Diverse prompts reduce overfitting to one style
- Strong SFT data should include edge cases, refusals, and hard examples
- Evaluation should test behavior, not just training loss
- SFT is often combined with LoRA or other parameter-efficient fine-tuning methods