No Cover Image

Journal article 92 views 6 downloads

Dynamic Facial Expression Recognition of Learners via Adaptive Global Attention and Differential Temporal Transformer

Wei Liu Orcid Logo, Lujia Li, Chun Yan, Yulin Zhang, Cheng Cheng Orcid Logo, Xinyan Zhao, Mingshi Liu

CAAI Transactions on Intelligence Technology

Swansea University Author: Cheng Cheng Orcid Logo

  • 71469.VoR.pdf

    PDF | Version of Record

    © 2026 The Author(s). This is an open access article under the terms of the Creative Commons Attribution License.

    Download (1.38MB)

Check full text

DOI (Published version): 10.1049/cit2.70115

Abstract

Analysing learners' facial expressions during learning and exploring their learning processes and emotional changes are of great significance for assisting teachers' teaching and promoting smart education. In complex learning environments, static facial expression recognition fails to capt...

Full description

Published in: CAAI Transactions on Intelligence Technology
ISSN: 2468-6557 2468-2322
Published: Institution of Engineering and Technology (IET) 2026
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa71469
Abstract: Analysing learners' facial expressions during learning and exploring their learning processes and emotional changes are of great significance for assisting teachers' teaching and promoting smart education. In complex learning environments, static facial expression recognition fails to capture the dynamic changes of learners' expressions losing the continuous features in the learning process, and its recognition effect is easily interfered with by factors such as occlusion and lighting variations during learning. To address the above issues, a network model based on adaptive global attention and temporal difference is proposed to recognise learners' dynamic expression sequences. Firstly, we have designed an Adaptive Global Attention (AGA) block, which adaptively models inter-channel relationships to dynamically enhance key channels that are highly correlated with learners' states while suppressing redundant information, thereby improving the model's feature representation capability under noisy environments. Secondly, we have designed a Differential Temporal Transformer (DTFormer) to extract differential information between consecutive frames, increasing the model's sensitivity to learners' facial expression dynamics and improving recognition performance. The two components complement each other in terms of spatial feature enhancement and temporal dynamic modelling effectively improving the model's overall capability for representing learners' dynamic facial expressions. Experiments were conducted on public datasets DFEW, FERV39k and the learner E-learning emotional state data set DAiSEE, and comparisons were made with classical methods using objective indicators. The results demonstrate that the proposed method outperforms the comparison methods in multiple performance indicators, thereby verifying its effectiveness.
Keywords: face analysis; facial expression recognition; spatial‐temporal feature; transformer
College: Faculty of Science and Engineering
Funders: UKRI Grant EP/W020408/1 at Swansea University; the Humanities and Social Science Fund of Ministry of Education of China (Grant Number: 23YJAZH084)