Conference Paper IntelliSys Knowledge Distillation Peer Ranking Adaptive Teaching Image Classification

Classroom-Inspired Multi-mentor Distillation with Adaptive Learning Strategies

Shalini Sarode*, Muhammad Saif Ullah Khan*, Tahira Shehzadi, Didier Stricker, Muhammad Zeshan Afzal

Abstract

We propose ClassroomKD, a novel multi-mentor knowledge distillation framework inspired by classroom environments to enhance knowledge transfer between the student and multiple mentors with different knowledge levels. Unlike traditional methods that rely on fixed mentor-student relationships, our framework dynamically selects and adapts the teaching strategies of diverse mentors based on their effectiveness for each data sample. ClassroomKD comprises two main modules: the Knowledge Filtering (KF) module and the Mentoring module. The KF Module dynamically ranks mentors based on their performance for each input, activating only high-quality mentors to minimize error accumulation and prevent information loss. The Mentoring Module adjusts the distillation strategy by tuning each mentor's influence according to the dynamic performance gap between the student and mentors, effectively modulating the learning pace. Extensive experiments on image classification (CIFAR-100 and ImageNet) and 2D human pose estimation (COCO Keypoints and MPII Human Pose) demonstrate that ClassroomKD outperforms existing knowledge distillation methods for different network architectures. Our results highlight that a dynamic and adaptive approach to mentor selection and guidance leads to more effective knowledge transfer, paving the way for enhanced model performance through distillation.

TL;DR

ClassroomKD is a knowledge distillation framework inspired by dynamic, real-world classrooms—not static teacher–student setups.

Selective teaching: The Knowledge Filtering Module dynamically picks which mentors get to teach each data sample—only those whose predictions are both accurate and more confident than the student—avoiding error accumulation and stale guidance.
Adaptive pacing: The Mentoring Module gauges the performance gap between student and mentor, tuning a temperature parameter so that larger gaps lead to smoother, more scaffolded teaching, and tighter gaps result in sharper, more direct instruction.
Proven gains across tasks: On CIFAR-100, ImageNet, COCO Keypoints, and MPII benchmarks, this dynamic mentor selection and adaptive distillation consistently outperforms existing multi-mentor and single-mentor methods.
Greener AI at heart: By minimizing wasted learning and tailoring teaching strategies, ClassroomKD not only boosts accuracy but also promotes computational and energy efficiency, setting the stage for more socially-inspired machine learning.

Results

Table 1. Comparison with single-teacher methods on CIFAR-100

Method	Homogeneous architectures				Heterogeneous architectures
	R110 → R20	R110 → R32	R56 → R20	VGG13 → VGG8	R32×4 → MBV2	W-40×2 → SN-V2	R50 → SN-V1	Swin-T → MBV2	R18
NOKD	69.06	71.14	69.06	70.68	64.60	71.82	70.50	64.60	74.01
FitNets	68.99	71.06	69.21	73.54	64.14	73.54	73.73	63.16	78.87
AT	70.22	72.31	70.55	72.68	59.40	72.03	73.82	-	-
VID	70.16	72.61	70.38	73.96	62.98	73.40	73.61	67.57	-
CRD	71.46	73.48	71.16	73.94	69.73	75.65	76.05	69.11	77.63
SimKD	-	-	-	-	-	77.49	-	-	-
SMKD	71.70	74.05	71.59	74.39	-	-	-	-	-
RKD	69.25	71.82	69.61	73.72	64.52	73.21	72.21	64.43	74.11
SP	70.04	72.69	69.67	73.44	66.34	75.24	75.12	-	-
SRRL	71.51	73.80	-	73.23	67.30	75.56	76.61	-	-
DIST	-	-	71.75	-	-	73.45	-	68.66	77.75
KD	70.67	73.08	70.66	72.98	67.37	74.45	74.83	65.35	78.74
PKT	70.25	72.61	70.34	73.73	67.69	74.69	73.28	66.52	-
FT	70.22	72.67	70.84	73.24	67.20	75.10	75.36	-	-
AB	69.53	70.98	69.47	74.27	-	73.71	73.34	-	-
WSLD	72.19	74.12	72.15	73.89	-	-	-	-	-
CTKD	70.99	73.52	71.19	73.52	65.58	74.63	75.45	64.87	-
DTKD	-	74.07	72.05	74.12	69.01	76.19	76.29	69.10	-
OFA	-	-	-	-	-	-	-	-	-
Ours	72.06	74.71	72.13	75.29	70.26	76.74	75.81	70.23	80.32

Table 2: Comparison with multiple-teacher methods on CIFAR-100

Method	Same Archs				Mixed Archs
	WR40×2 → WR16×2	R110 → R20	R56 → R20	VGG13 → VGG8	VGG13 → MBV2	W-40×2 → SN-V1
NOKD	73.64	69.06	69.06	70.68	64.60	70.50
DML	74.83	70.55	70.24	72.86	66.30	74.52
ONE	74.68	70.77	70.43	72.01	66.26	-
SHAKE	75.78	-	71.62	73.85	68.81	76.42
TAKD	75.04	-	70.77	73.67	-	-
AEKD	75.68	71.36	71.25	74.75	68.39	76.34
EBKD	-	-	-	74.10	68.24	76.61
DGKD	76.24	-	71.92	74.40	-	-
CA-MKD	-	-	-	74.30	69.41	77.94
AVER	74.98	71.20	71.08	73.18	62.94	73.00
Ours	76.74	72.06	72.13	75.29	70.26	75.81

Slides

@inproceedings{sarode2025classroom, 
  title={Classroom-Inspired Multi-mentor Distillation with Adaptive Learning Strategies}, 
  author={Sarode, Shalini and Khan, Muhammad Saif Ullah and Shehzadi, Tahira and Stricker, Didier and Afzal, Muhammad Zeshan}, 
  editor={Arai, Kohei}, 
  booktitle={Intelligent Systems and Applications}, 
  year={2025}, 
  publisher={Springer Nature Switzerland}, 
  address={Cham}, 
  pages={294--324}, 
  isbn={978-3-031-99965-9} 
}

Maintained by saifkhichi96 on GitHub.

The website is distributed under different open-source licenses. For more details, see the notice at the bottom of the page.