
40
Neural Information Processing Systems 33 (NeurIPS 2020), Vol.33, pp.1877-1901,
2020
[14] OpenAI. “GPT-4 Technical Report,” arXiv:2303.08774, 2023
[15] Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong
Wang, Furu Wei. “Retentive Network: A Successor to Transformer for Large
Language Models,” arXiv preprint arXiv:2307.08621, 2023
[16] Albert Gu and Tri Dao. “Mamba: Linear-Time Sequence Modeling with Selective
State Spaces,” arXiv preprint arXiv:2312.00752, 2023
[17] Yingwei Ma, Yue Liu, Yue Yu, Yuanliang Zhang, Yu Jiang, Changjian Wang,
Shanshan Li. “At Which Training Stage Does Code Data Help LLMs Reasoning?”
Proceedings of 12th International Conference on Learning Representations (ICLR-
2024), arXiv:2309.16298, 2024
[18] Patrick Lewis et al. “Retrieval-Augmented Generation for Knowledge-Intensive
NLP Tasks,” arXiv:2005.11401, 2020
[19] Diederik P. Kingma and Max Welling. “Auto-Encoding Variational Bayes,”
Proceedings of 2nd International Conference on Learning Representations (ICLR-
2014), arXiv:1312.6114, 2014
[20] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
“High-Resolution Image Synthesis with Latent Diffusion Models,” Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR-
2022), pp. 10684-10695, 2022
[21] Lvmin Zhang, Anyi Rao, Maneesh Agrawala. “Adding Conditional Control to Text-
to-Image Diffusion Models,” Proceedings of International Conference on
Computer Vision (ICCV-2023), pp. 3813-3824, 2023
[22] William Peebles and Saining Xie. “Scalable Diffusion Models with
Transformers,” Proceedings of the IEEE/CVF International Conference on
Computer Vision (ICCV-2023), pp. 4172-4182, 2023
[23] Alec Radford et al. “Learning Transferable Visual Models From Natural Language
Supervision,” Proceedings of the 38th International Conference on Machine
Learning (ICML-2021), pp. 8748-8763, 2021
[24] Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee. “Visual Instruction
Tuning,” arXiv:2304.08485, 2023
[25] Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan
Salakhutdinov, Abdelrahman Mohamed. “HuBERT: Self-Supervised Speech
Representation Learning by Masked Prediction of Hidden Units,” IEEE/ACM
Transactions on Audio, Speech, and Language Processing, Vol. 29, pp. 3451-3460,
2021
[26] Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. “High Fidelity