Conference Paper (published)
Single layer tiny Co4 outpaces {GPT}-2 and {GPT}-{BERT}
Zain NU, Naseem MR & Adeel A (2025) Single layer tiny Co4 outpaces {GPT}-2 and {GPT}-{BERT}. In: Charpentier L, Choshen L, Cotterell R, Gul MO, Hu MY, Liu J, Jumelet J, Linzen T, Mueller A, Ross C, Shah RS, Warstadt A, Wilcox EG & Williams A (eds.) Proceedings of the First BabyLM Workshop, volume Proceedings of the First BabyLM Workshop. Empirical Methods in Natural Language Processing, Hybrid, 04.11.2025. Association for Computational Linguistics, pp. 313-322. https://doi.org/10.18653/v1/2025.babylm-main.24