Personal project · 2026

CNN Architecture Comparison

Three neural networks compared on 120,000 food photos, showing a small model can match a big one at the same accuracy while being six times smaller.

View code on GitHub

Jupyter Notebook · Python · TensorFlow · Deep Learning

The problem

Bigger models usually win, but they cost more to train and run. I wanted to know how much model you actually need for one specific job, recognising food.

What I built

I trained three architectures on 120,000 food images and compared them. The small one matched the big one at 99.75 percent accuracy while being six times smaller and a third faster to train.

What it does

120,000 Images, Cleaned. I merged three Kaggle datasets, removed every duplicate by hashing the files, and split what was left carefully by class.
99.75 Percent Accuracy. EfficientNetB0 matched ResNet-50 once both were fine-tuned on the food images.
Six Times Smaller. The winning model is 40 MB against ResNet-50's 211 MB, at the same accuracy.
Fair to the Rare Classes. Even classes with 113 times fewer images kept F1 scores above 0.98, thanks to class weighting.

Model Performance Comparison

Model	Test Accuracy	Parameters	Size	Training Time
Custom CNN	97.97%	4.96M	56.9 MB	14.8h
EfficientNetB0	99.75%	4.07M	40.0 MB	6.7h
ResNet-50	99.76%	24.13M	211.0 MB	10.3h

Dataset Specifications

Property	Value
Total Images	120,842 (deduplicated)
Classes	14 (Fruits & Vegetables)
Split (Train/Val/Test)	84,582 / 18,119 / 18,141
Resolution	224×224 RGB

The full engineering is on GitHub.