Conference Paper (published)

Single layer tiny Co4 outpaces {GPT}-2 and {GPT}-{BERT}

Details

Citation

Zain NU, Naseem MR & Adeel A (2025) Single layer tiny Co4 outpaces {GPT}-2 and {GPT}-{BERT}. In: Charpentier L, Choshen L, Cotterell R, Gul MO, Hu MY, Liu J, Jumelet J, Linzen T, Mueller A, Ross C, Shah RS, Warstadt A, Wilcox EG & Williams A (eds.) Proceedings of the First BabyLM Workshop, volume Proceedings of the First BabyLM Workshop. Empirical Methods in Natural Language Processing, Hybrid, 04.11.2025. Association for Computational Linguistics, pp. 313-322. https://doi.org/10.18653/v1/2025.babylm-main.24

Abstract
We show that a tiny Co4 machine (CITATION) with a single layer, two heads, and 8M parameters, operating at O(N) computational cost (where N is the number of input tokens), in just 2 epochs outpaces GPT-2 (124M, 12 layers, O(N2)) and GPT-BERT (30M, 12 layers, O(N 2), both trained for 10 epochs. Co4 achieves orders-of-magnitude greater training efficiency on 10M tokens, demonstrating sample-efficient pretraining. On the BabyLM challenge evaluation pipeline, Co4 performs comparably or better across complex benchmarks, showing strong zero-shot and fine-tuning performance on SuperGLUE tasks. Specifically, Co4 outperforms GPT-2 in 5 out of 7 zero-shot metrics and 6 out of 7 fine-tuning tasks, and GPT-BERT in 4 out of 7 metrics in both cases. These results strongly suggest a need to rethink prevailing deep learning paradigms and associated scaling laws.

StatusPublished
FundersAdvanced Research and Invention Agency
Publication date31/12/2025
Publication date online30/11/2025
PublisherAssociation for Computational Linguistics
ISBNTODO
ConferenceEmpirical Methods in Natural Language Processing
Conference locationHybrid
Dates

People (3)

Dr Ahsan Adeel

Dr Ahsan Adeel

Assoc. Prof. in Artificial Intelligence, Computing Science and Mathematics - Division

Mr Mohsin Naseem

Mr Mohsin Naseem

Research Assistant, Computing Science

Ms Noor Ul Zain

Ms Noor Ul Zain

PhD Researcher, Computing Science and Mathematics - Division

Files (1)