Notes by Retraice - Re100-NOTES.pdf - Retraice, Inc. | Podcast

Re100-NOTES.pdf

DEC 30, 2022

Notes by Retraice

Play

Description Community

About

(The below text version of the notes is for search purposes and convenience. See the PDF version for proper formatting such as bold, italics, etc., and graphics where applicable. Copyright: 2022 Retraice, Inc.)

Re100: News of ChatGPT, Part 3

retraice.com

An interpretation of ChatGPT's architecture.
Language models, question answering, machine translation, captioning, summarization; formal vs. natural language, grammar and syntax; conditional probabilities of strings given strings; parameters; deep neural networks; ChatGPT's three steps; supervised policy, reward model, optimized policy using PPO to choose step size in gradient descent.

Air date: Thursday, 29th Dec. 2022, 10:00 PM Eastern/US.

An attempt to explain language models

Language models aim to solve: question answering, machine translation, reading comprehension [captioning?], and summarization.^1 They're an attempt to overcome the grammar and syntax problem of natural languages by assigning probability to strings (?... by calculating conditional probabilities ...?), i.e. whether a string is more or less likely to be said or written, in response to a given string, based on previous observations of the `environment' or corpora or language.^2

Deep Neural networks are data structures, layers of many adjustable input-output functions. Parameters summarize the training data.^3

An attempt to explain ChatGPT
________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

PIC
________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

In supervised learning, an agent "observes input-output pairs and learns a function that maps from input to output"; in unsupervised learning, an agent "learns the patterns in the input without any explicit feedback"; in reinforcement learning, an agent "learns from a series of reinforcements: rewards and punishments."^4

Proximal Policy Optimization (PPO) seems to be about choosing step size in gradient descent.^5

References

Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge University Press. ISBN: 978-1108455145.
https://mml-book.github.io/ Searches:
https://www.amazon.com/s?k=9781108455145
https://www.google.com/search?q=isbn+9781108455145
https://lccn.loc.gov/2019040762

Retraice (2022/12/10). Re76: Gradients and Partial Derivatives Part 7 (AIMA4e pp. 119-122). retraice.com.
https://www.retraice.com/segments/re76 Retrieved 11th Dec. 2022.

Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach. Pearson, 4th ed. ISBN: 978-0134610993. Searches:
https://www.amazon.com/s?k=978-0134610993
https://www.google.com/search?q=isbn+978-0134610993
https://lccn.loc.gov/2019047498

Footnotes

^1 https://openai.com/blog/better-language-models/ p. 1 of PDF write-up Language Models are Unsupervised Multitask Learners.

^2 https://openai.com/blog/better-language-models/ p. 2 of PDF write-up Language Models are Unsupervised Multitask Learners.; AIMA4e pp. 824-826.

^3 Russell & Norvig (2020) p. 686.
Cf. https://www.retraice.com/aima4e p. 5.

^4 Russell & Norvig (2020) p. 653.

^5 https://openai.com/blog/openai-baselines-ppo/; Retraice (2022/12/10); Deisenroth et al. (2020) p. 205 in the print edition, p. 229 in https://mml-book.github.io/book/mml-book.pdf.

Comments