Supervised Fine-Tuning (SFT) is the process of taking a pretrained model and continuing training it on labeled examples of the behavior you want.

For LLMs, SFT usually means training on instruction-response pairs so the model learns to follow instructions, answer in a useful format, and behave more like an assistant instead of a raw text completion model.

Core Idea

Pretraining teaches a model broad language patterns from massive text corpora.

SFT teaches the model a specific input-output behavior.

Example:

Instruction: Explain backpropagation in simple terms.
Response: Backpropagation is how a neural network learns from its mistakes...

The model is trained to predict the response tokens given the instruction and previous response tokens.

Where It Fits

Typical LLM training pipeline:

  1. Pretraining
  2. Supervised Fine-Tuning
  3. Preference tuning or Reinforcement Learning
  4. On-Policy Distillation or model consolidation

SFT is often the first post-training stage because it gives the model a basic instruction-following policy before more advanced optimization methods are used.

Objective

SFT is usually trained with next-token prediction, the same basic objective as pretraining.

The difference is the data distribution: instead of random internet text, the data is curated instruction-response examples.

Given an instruction and target response , the model maximizes:

Equivalently, it minimizes cross-entropy loss over the target response tokens:

Where:

  • = instruction or prompt
  • = target response
  • = current target token
  • = previous response tokens
  • = model parameters

Data Format

SFT datasets usually contain examples like:

  • Instruction only
  • Instruction + input context
  • Desired response
  • Multi-turn conversation
  • Tool-use trace
  • Chain-of-thought or reasoning trace, when appropriate

For chat models, data is often formatted into roles:

system: You are a helpful assistant.
user: Summarize this article.
assistant: ...

The model is usually only trained to predict the assistant tokens, not the user or system tokens.

Why It Works

Pretrained models already know a huge amount about language, facts, coding, and reasoning patterns.

SFT does not teach everything from scratch. It nudges the model toward a desired behavior distribution:

  • Follow instructions
  • Answer in the expected style
  • Refuse unsafe requests
  • Use domain-specific terminology
  • Produce structured outputs
  • Match a product’s tone or workflow

This makes SFT much cheaper than pretraining because it updates an already capable model with a smaller, higher-quality dataset.

SFT vs Pretraining

Pretraining:

  • Uses massive unlabeled text
  • Learns general language modeling
  • Optimizes broad next-token prediction
  • Produces a base model

SFT:

  • Uses curated labeled examples
  • Learns target behavior
  • Still uses next-token prediction
  • Produces an instruction-following or domain-adapted model

SFT vs Preference Tuning

SFT teaches the model what a good answer looks like.

Preference tuning teaches the model which answer is better when multiple answers are possible.

SFT data:

Prompt -> Ideal response

Preference data:

Prompt -> Response A vs Response B -> Preferred response

SFT is usually simpler and more stable. Preference tuning can further improve helpfulness, reasoning, style, and alignment after the model already knows how to respond.

Benefits

  • Simple and stable training objective
  • Converts base models into instruction-following assistants
  • Can specialize a model for a domain or workflow
  • Requires much less compute than pretraining
  • Works well with synthetic data from stronger models

Failure Modes

  • Low-quality examples teach low-quality behavior
  • Too much narrow data can cause overfitting
  • The model may imitate formatting without gaining real capability
  • Conflicting examples can make behavior inconsistent
  • SFT alone does not optimize long-term rewards or user preferences
  • Fine-tuning on small datasets can make the model forget some general abilities

Practical Notes

  • Dataset quality usually matters more than dataset size
  • Diverse prompts reduce overfitting to one style
  • Strong SFT data should include edge cases, refusals, and hard examples
  • Evaluation should test behavior, not just training loss
  • SFT is often combined with LoRA or other parameter-efficient fine-tuning methods