Cueless EEG imagined speech for subject identification: dataset and benchmarks
Electroencephalogram (EEG) signals have emerged as a promising modality for biometric identification. While previous studies have explored the use of imagined speech with semantically meaningful words for subject identification, most have relied on additional visual or auditory cues. In this study, we introduce a cueless EEG-based imagined speech paradigm, where
subjects imagine the pronunciation of semantically meaningful words without any external cues related to the target word. This innovative approach addresses the limitations of prior methods
by requiring subjects to naturally select and imagine words from a predefined list. The dataset comprises over 4,350 trials from 11 subjects across five sessions. We assess a variety of
classification methods, including traditional machine learning techniques such as Support Vector Machines (SVM) and XG-Boost, as well as time-series foundation models and deep learning architectures designed explicitly for EEG classification, such as EEG Conformer and Shallow ConvNet. A session-based hold-out validation strategy was employed to ensure reliable evaluation and prevent data leakage. Our results demonstrate outstanding classification accuracy, reaching 99.44%. These findings highlight the potential of cueless EEG paradigms for secure and reliable subject identification in real-world applications, such as brain-computer interfaces (BCIs).