(The below text version of the notes is for search purposes and convenience. See the PDF version for proper formatting such as bold, italics, etc., and graphics where applicable. Copyright: 2022 Retraice, Inc.)
Re100: News of ChatGPT, Part 3
retraice.com
An interpretation of ChatGPT's architecture.
Language models, question answering, machine translation, captioning, summarization; formal vs. natural language, grammar and syntax; conditional probabilities of strings given strings; parameters; deep neural networks; ChatGPT's three steps; supervised policy, reward model, optimized policy using PPO to choose step size in gradient descent.
Air date: Thursday, 29th Dec. 2022, 10:00 PM Eastern/US.
An attempt to explain language models
Language models aim to solve: question answering, machine translation, reading comprehension [captioning?], and summarization.^1 They're an attempt to overcome the grammar and syntax problem of natural languages by assigning probability to strings (?... by calculating conditional probabilities ...?), i.e. whether a string is more or less likely to be said or written, in response to a given string, based on previous observations of the `environment' or corpora or language.^2
Deep Neural networks are data structures, layers of many adjustable input-output functions. Parameters summarize the training data.^3
An attempt to explain ChatGPT
________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
PIC
________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
In supervised learning, an agent "observes input-output pairs and learns a function that maps from input to output"; in unsupervised learning, an agent "learns the patterns in the input without any explicit feedback"; in reinforcement learning, an agent "learns from a series of reinforcements: rewards and punishments."^4
Proximal Policy Optimization (PPO) seems to be about choosing step size in gradient descent.^5
_
References
Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge University Press. ISBN: 978-1108455145.
https://mml-book.github.io/ Searches:
https://www.amazon.com/s?k=9781108455145
https://www.google.com/search?q=isbn+9781108455145
https://lccn.loc.gov/2019040762
Retraice (2022/12/10). Re76: Gradients and Partial Derivatives Part 7 (AIMA4e pp. 119-122). retraice.com.
https://www.retraice.com/segments/re76 Retrieved 11th Dec. 2022.
Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach. Pearson, 4th ed. ISBN: 978-0134610993. Searches:
https://www.amazon.com/s?k=978-0134610993
https://www.google.com/search?q=isbn+978-0134610993
https://lccn.loc.gov/2019047498
Footnotes
^1 https://openai.com/blog/better-language-models/ p. 1 of PDF write-up Language Models are Unsupervised Multitask Learners.
^2 https://openai.com/blog/better-language-models/ p. 2 of PDF write-up Language Models are Unsupervised Multitask Learners.; AIMA4e pp. 824-826.
^3 Russell & Norvig (2020) p. 686.
Cf. https://www.retraice.com/aima4e p. 5.
^4 Russell & Norvig (2020) p. 653.
^5 https://openai.com/blog/openai-baselines-ppo/; Retraice (2022/12/10); Deisenroth et al. (2020) p. 205 in the print edition, p. 229 in https://mml-book.github.io/book/mml-book.pdf.