Image Classification with Very Little Data: Transfer Learning and Fine-Tuning in Keras (2016)

本文由 AI 分析生成

建立時間： 2026-03-28 來源： https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

Summary

François Chollet’s classic 2016 Keras tutorial (now acknowledged as outdated) demonstrating three approaches to image classification with only 2000 training samples: training from scratch (80% accuracy), using bottleneck features from pre-trained VGG16 (90%), and fine-tuning VGG16’s top convolutional block (94%). The article remains a useful reference for understanding the hierarchy of transfer learning strategies.

Chollet 2016 年的 Keras 經典教學（本人已標注過時），展示如何用僅 2000 筆訓練樣本達到圖像分類 94% 準確率：從零訓練（80%）、VGG16 bottleneck 特徵（90%）、到微調頂層（94%）三種策略的逐步比較。

Key Points

Three-level transfer learning hierarchy: scratch training < bottleneck features < fine-tuning
Data augmentation: random rotation, shifts, shear, zoom, horizontal flip — prevents overfitting when examples are few
Bottleneck features trick: run VGG16 once offline, save feature vectors to disk, then train a small classifier on top — avoids re-running expensive convnet per epoch
Fine-tuning rule: freeze lower layers (general features), only unfreeze last convolutional block; use SGD with very low lr (1e-4) not adaptive optimizers to avoid wrecking learned weights
Critical caveat: must train the top classifier first before fine-tuning the conv base; random initialization would cause large gradients that corrupt pre-trained weights
Accuracy progression: 80% (scratch) → 90% (bottleneck) → 94% (fine-tuning) with only 2000 samples

Insights

The article is explicitly marked “very outdated” by the author — modern approaches use pretrained ViTs with data-efficient fine-tuning, and fit_generator is deprecated in favor of model.fit() with tf.data pipelines. But the three-level transfer learning hierarchy it describes remains conceptually valid.

The bottleneck features approach is no longer necessary computationally (GPUs are faster and model.fit handles callbacks), but the underlying insight — that early conv layers are general feature extractors transferable across domains — underlies all modern fine-tuning practice including LoRA and parameter-efficient fine-tuning.

The entropic capacity framing is still useful: more parameters = more capacity to memorize noise. Dropout and regularization aren’t just tricks; they’re principled ways to constrain how much information the model is allowed to store.

Connections

Raw Excerpt

Deep learning models are by nature highly repurposable: you can take an image classification model trained on a large-scale dataset then reuse it on a significantly different problem with only minor changes.

bot_vault

Explorer

Image Classification with Very Little Data: Transfer Learning and Fine-Tuning in Keras (2016)

Summary

Key Points

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks