OpenAI GPT
Last updated
Last updated
The Generative Pre-trained Transformer (GPT) by OpenAI is a family of autoregressive language models.
GPT utilizes the decoder architecture from the standard Transformer network (with a few engineering tweaks) as a independent unit. This is coupled with an unprecedented size of 2048 as the number of tokens as input and 175 billion parameters (requiring ~800 GB of storage).
The training method is “generative pretraining”, meaning that it is trained to predict what the next token is. The model demonstrated strong few-shot learning on many text-based tasks.
The end result is the ability to generate human-like text with swift response time and great accuracy. Owing to the GPT family of models having been exposed to a reasonably large dataset and number of parameters (175B), these language models require few or in some cases no examples to fine-tune the model (a process that is called “prompt-based” fine-tuning) to fit the downstream task. The quality of the text generated by GPT-3 is so high that it can be difficult to determine whether or not it was written by a human, which has both benefits and risks (source).
Before GPT, language models (LMs) were typically trained on a large amount of accurately labelled data, which was hard to come by. These LMs offered great performance on the particular supervised task that they were trained to do, but were unable to be domain-adapted to other tasks easily.
Microsoft announced on September 22, 2020, that it had licensed “exclusive” use of GPT-3; others can still use the public API to receive output, but only Microsoft has access to GPT-3’s underlying model.
Let’s look below at each GPT1, GPT2, and GPT3 (with emphasis on GPT-3 as its more widely used today) and how they were able to make a dent in Natural Language Processing tasks.
Understanding GPT by animations: https://www.youtube.com/watch?v=MQnJZuBGmSQ&t=655s
GPT are few-shot learners: https://arxiv.org/pdf/2005.14165.pdf
Here is, again, like in a header (unoriginal me), the GPT-3 Playground. This is the place where you should begin to explore the creative potential of GPT-3. Even before you have a concrete idea for its implementation, this is also mostly the right place for artists and writers who want to generate one-time text.
Here, you will find the UX simplicity — and you don’t need more, really.
You have:
Prompt/Completion window
Preset bar
Settings
Submit button.
Preset bar — is useful if you designed presets you want to re-use or to share.
Settings — here, you can control the nature of the text:
Engine — GPT-3 provides four engines, which vary in output quality, latency (output speed) etc. “Ada” is the quickest one. “Davinci” is the most sophisticated engine — but sometimes you have to wait longer for text rendering. My choice: “Davinci” Note: with semantic search (I’ll write on this one day later), you can get relevant and quick results even with Ada.
Response Length (1–2048) — Length of the text, in tokens (approx. 1 token for 1 word, varies from engine to engine). My choice: 700–1000 Note: Your input is calculated within 2048 tokens. The longer text you put in, the more appropriate becomes the output, but the smaller it is (everything must fit into 2048 tokens).
Temperature (0.0–1.0)— controls randomness, between boring redundancy and chaotic fiction. The higher the value, the more random texts appeared — but are still coherent, using Transformer’s self-attention. My use: 0.9 — in this case, it is not boring, but still not too repetitive. Note: try the same prompt with various temperature.
More information: https://algowriting.medium.com/gpt-3-temperature-setting-101-41200ff0d0be
As you may have noticed, I use the parameters above at the moment — I am already overwhelmed with the quality of variation between Length and Temperature. But you have even more ways to control the text (which is still not created) with the following settings:
Top P (0.0–1.0) — controls probability and diversity of completion
Frequency penalty (0.0–1.0) — looks for frequency among used tokens and decreases/increases the use of the same text patterns. The higher the value, the lesser is the possibility to get repeated patterns in the completion, compared with used tokens.
Presence Penalty (0.0–1.0) — By increasing the value, you can widen the possibility of new completion topics.
Best of (1–20) — generates x variants and shows the best one. Warning: it will use more of your credits than shown in the Completion, so use wisely
These settings are also suitable for saving specific presets, which works at the best for you — or experimenting with the same prompts but various parameters.
The last part is essential in case you generate some very structural text, like a chat. In the example on the left, you can see the settings of a default chat preset.
Stop sequences help GPT-3 to detect where to stop and to jump into another line.
Inject Start Text — this is the part of AI (in preset above shown as “AI”) — here the GPT-3 writes till it decides to stop + jump into the next line and provide your part:
You can see it as the precise determination of the characters.
In this case, GPT-3 wrote, “Hello, may I ask…”, jumped to the next line, and defined your part: “Human:”. Now it’s your turn. And after you write your text till end and click “SUBMIT” (alternatively CTRG+Enter), it will continue as “AI:”+GPT-3-written contents.
Note: if you delete Stop Sequences, GPT-3 will continue writing a dialog using “Characters” defined in Inject Start/Restart Text, but — unsupervised.
As you see, there are various control options, which you may use to move the still unknown text into some specific direction to give precise nature to a text.
Using Show Probabilities, you can get insights into generated content with all the probabilities; you look into the Matrix, so to say:
More information: https://vimeo.com/485016927?embedded=true&source=vimeo_logo&owner=418868
References: