Veliki jezički modeli — разлика између измена

Садржај обрисан Садржај додат

Инлајн

Верзија на датум 26. март 2024. у 22:32

Veliki jezički modeli (large language model, LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process.^[1] LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.^[2]

LLMs are artificial neural networks. The largest and most capable, ажурирано: март 2024.^{[ажурирање]}, are built with a decoder-only transformer-based architecture while some recent implementations are based on other architectures, such as recurrent neural network variants and Mamba (a state space model).^[3]^[4]^[5]

Up to 2020, fine tuning was the only way a model could be adapted to be able to accomplish specific tasks. Larger sized models, such as GPT-3, however, can be prompt-engineered to achieve similar results.^[6] They are thought to acquire knowledge about syntax, semantics and "ontology" inherent in human language corpora, but also inaccuracies and biases present in the corpora.^[7]

Some notable LLMs are OpenAI's GPT series of models (e.g., GPT-3.5 and GPT-4, used in ChatGPT and Microsoft Copilot), Google's PaLM and Gemini (the latter of which is currently used in the chatbot of the same name), xAI's Grok, Meta's LLaMA family of open-source models, Anthropic's Claude models, and Mistral AI's open source models.

Napomene

Reference

^ „Better Language Models and Their Implications”. OpenAI. 2019-02-14. Архивирано из оригинала 2020-12-19. г. Приступљено 2019-08-25.
^ Bowman, Samuel R. (2023). „Eight Things to Know about Large Language Models”. arXiv:2304.00612  [cs.CL].
^ Peng, Bo; et al. (2023). „RWKV: Reinventing RNNS for the Transformer Era”. arXiv:2305.13048  [cs.CL].
^ Merritt, Rick (2022-03-25). „What Is a Transformer Model?”. NVIDIA Blog (на језику: енглески). Приступљено 2023-07-25.
^ Gu, Albert; Dao, Tri (2023-12-01), Mamba: Linear-Time Sequence Modeling with Selective State Spaces, arXiv:2312.00752 
^ Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (децембар 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H., ур. „Language Models are Few-Shot Learners” (PDF). Advances in Neural Information Processing Systems. Curran Associates, Inc. 33: 1877—1901.
^ Manning, Christopher D. (2022). „Human Language Understanding & Reasoning”. Daedalus. 151 (2): 127—138. S2CID 248377870. doi:10.1162/daed_a_01905 .

Literatura

Jurafsky, Dan, Martin, James. H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 3rd Edition draft, 2023.
Phuong, Mary; Hutter, Marcus (2022). „Formal Algorithms for Transformers”. arXiv:2207.09238  [cs.LG].
Eloundou, Tyna; Manning, Sam; Mishkin, Pamela; Rock, Daniel (2023). „GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models”. arXiv:2303.10130  [econ.GN].
Eldan, Ronen; Li, Yuanzhi (2023). „TinyStories: How Small Can Language Models Be and Still Speak Coherent English?”. arXiv:2305.07759  [cs.CL].
Frank, Michael C. (27. 6. 2023). „Baby steps in evaluating the capacities of large language models”. Nature Reviews Psychology (на језику: енглески). 2 (8): 451—452. ISSN 2731-0574. S2CID 259713140. doi:10.1038/s44159-023-00211-x. Приступљено 2. 7. 2023.
Zhao, Wayne Xin; et al. (2023). „A Survey of Large Language Models”. arXiv:2303.18223  [cs.CL].
Kaddour, Jean; et al. (2023). „Challenges and Applications of Large Language Models”. arXiv:2307.10169  [cs.CL].
Yin, Shukang; Fu, Chaoyou; Zhao, Sirui; Li, Ke; Sun, Xing; Xu, Tong; Chen, Enhong (2023-06-01). „A Survey on Multimodal Large Language Models”. arXiv:2306.13549  [cs.CV].
Open LLMs repository on GitHub.

[:7-1] „Better Language Models and Their Implications”. OpenAI. 2019-02-14. Архивирано из оригинала 2020-12-19. г. Приступљено 2019-08-25.

[Bowman-2] Bowman, Samuel R. (2023). „Eight Things to Know about Large Language Models”. arXiv:2304.00612  [cs.CL].

[3] Peng, Bo; et al. (2023). „RWKV: Reinventing RNNS for the Transformer Era”. arXiv:2305.13048  [cs.CL].

[4] Merritt, Rick (2022-03-25). „What Is a Transformer Model?”. NVIDIA Blog (на језику: енглески). Приступљено 2023-07-25.

[5] Gu, Albert; Dao, Tri (2023-12-01), Mamba: Linear-Time Sequence Modeling with Selective State Spaces, arXiv:2312.00752 

[few-shot-learners-6] Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (децембар 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H., ур. „Language Models are Few-Shot Learners” (PDF). Advances in Neural Information Processing Systems. Curran Associates, Inc. 33: 1877—1901.

[Manning-2022-7] Manning, Christopher D. (2022). „Human Language Understanding & Reasoning”. Daedalus. 151 (2): 127—138. S2CID 248377870. doi:10.1162/daed_a_01905 .

[1]

[2]

[3]

[4]

[5]

[6]

[7]