Conference Paper (published)

Single layer tiny Co4 outpaces {GPT}-2 and {GPT}-{BERT}

Publisher DOI

Details

Citation

Zain NU, Naseem MR & Adeel A (2025) Single layer tiny Co4 outpaces {GPT}-2 and {GPT}-{BERT}. In: Charpentier L, Choshen L, Cotterell R, Gul MO, Hu MY, Liu J, Jumelet J, Linzen T, Mueller A, Ross C, Shah RS, Warstadt A, Wilcox EG & Williams A (eds.) Proceedings of the First BabyLM Workshop, volume Proceedings of the First BabyLM Workshop. Empirical Methods in Natural Language Processing, Hybrid, 04.11.2025. Association for Computational Linguistics, pp. 313-322. https://doi.org/10.18653/v1/2025.babylm-main.24

Abstract
We show that a tiny Co4 machine (CITATION) with a single layer, two heads, and 8M parameters, operating at O(N) computational cost (where N is the number of input tokens), in just 2 epochs outpaces GPT-2 (124M, 12 layers, O(N2)) and GPT-BERT (30M, 12 layers, O(N 2), both trained for 10 epochs. Co4 achieves orders-of-magnitude greater training efficiency on 10M tokens, demonstrating sample-efficient pretraining. On the BabyLM challenge evaluation pipeline, Co4 performs comparably or better across complex benchmarks, showing strong zero-shot and fine-tuning performance on SuperGLUE tasks. Specifically, Co4 outperforms GPT-2 in 5 out of 7 zero-shot metrics and 6 out of 7 fine-tuning tasks, and GPT-BERT in 4 out of 7 metrics in both cases. These results strongly suggest a need to rethink prevailing deep learning paradigms and associated scaling laws.

Status	Published
Funders	Advanced Research and Invention Agency
Publication date	31/12/2025
Publication date online	30/11/2025
Publisher	Association for Computational Linguistics
ISBN	TODO
Conference	Empirical Methods in Natural Language Processing
Conference location	Hybrid
Dates	30/11/2025

People (3)

Dr Ahsan Adeel

Assoc. Prof. in Artificial Intelligence, Computing Science and Mathematics - Division

Mr Mohsin Naseem

Research Assistant, Computing Science

Ms Noor Ul Zain

PhD Researcher, Computing Science and Mathematics - Division

Files (1)

Published Version