什麼是生成式人工智慧以及它是如何運作的？

已發表: 2023-09-26

生成式人工智慧是人工智慧的一個子集，已成為科技界的一股革命力量。但它到底是什麼？為什麼它會受到如此多的關注？

這份深入的指南將深入探討生成式人工智慧模型的工作原理、它們能做什麼和不能做什麼，以及所有這些元素的含義。

什麼是生成式人工智慧？

生成式人工智慧（genAI）是指能夠產生新內容的系統，無論是文字、圖像、音樂，甚至是影片。傳統上，AI/ML 意味著三件事：監督學習、無監督學習和強化學習。每個都給出基於聚類輸出的見解。

非生成式人工智慧模型根據輸入進行計算（例如對圖像進行分類或翻譯句子）。相較之下，生成模型會產生「新」輸出，例如寫論文、作曲、設計圖形，甚至創造現實世界中不存在的逼真人臉。

生成式人工智慧的影響

生成式人工智慧的興起具有重大影響。憑藉生成內容的能力，娛樂、設計和新聞等行業正在經歷範式轉移。

例如，新聞機構可以使用人工智慧起草報告，而設計師可以獲得人工智慧輔助的圖形建議。人工智慧可以在幾秒鐘內產生數百個廣告口號——無論這些選項是否好或不是另一回事。

生成式人工智慧可以為個人使用者產生量身定制的內容。想像一下類似音樂應用程式的東西，它可以根據您的心情創作一首獨特的歌曲，或者新聞應用程式可以起草有關您感興趣的主題的文章。

問題在於，隨著人工智慧在內容創作中發揮越來越重要的作用，有關真實性、版權和人類創造力價值的問題變得更加普遍。

生成式人工智慧如何運作？

生成式人工智慧的核心是預測序列中的下一個數據，無論是句子中的下一個單字還是影像中的下一個像素。讓我們來分解一下這是如何實現的。

統計模型

統計模型是大多數人工智慧系統的支柱。他們使用數學方程式來表示不同變數之間的關係。

對於生成式人工智慧，模型經過訓練可以識別資料中的模式，然後使用這些模式生成新的、相似的數據。

如果一個模型接受英語句子的訓練，它就會學習一個單字跟在另一個單字後面的統計可能性，從而產生連貫的句子。

數據收集

數據的品質和數量都至關重要。生成模型在大量資料集上進行訓練以理解模式。

對於語言模型來說，這可能意味著從書籍、網站和其他文字中提取數十億個單字。

對於影像模型來說，這可能意味著分析數百萬張影像。訓練資料越多樣化、越全面，模型產生多樣化輸出的效果就越好。

變壓器和注意力如何發揮作用

Transformer 是 Vaswani 等人在 2017 年題為「Attention Is All You Need」的論文中介紹的一種神經網路架構。它們從此成為大多數最先進語言模型的基礎。如果沒有 Transformer，ChatGPT 將無法運作。

「注意力」機制讓模型專注於輸入資料的不同部分，就像人類在理解句子時專注於特定單字一樣。

這種機制讓模型決定輸入的哪些部分與給定任務相關，使其高度靈活和強大。

下面的程式碼是變壓器機制的基本分解，用簡單的英文解釋了每個部分。

 class Transformer: # Convert words to vectors # What this is : turns words into "vector embeddings" –basically numbers that represent the words and their relationships to each other. # Demo : "the pineapple is cool and tasty" -> [0.2, 0.5, 0.3, 0.8, 0.1, 0.9] self.embedding = Embedding(vocab_size, d_model) # Add position information to the vectors # What this is : Since words in a sentence have a specific order, we add information about each word's position in the sentence. # Demo : "the pineapple is cool and tasty" with position -> [0.2+0.01, 0.5+0.02, 0.3+0.03, 0.8+0.04, 0.1+0.05, 0.9+0.06] self.positional_encoding = PositionalEncoding(d_model) # Stack of transformer layers # What this is : Multiple layers of the Transformer model stacked on top of each other to process data in depth. # Why it does it : Each layer captures different patterns and relationships in the data. # Explained like I'm five : Imagine a multi-story building. Each floor (or layer) has people (or mechanisms) doing specific jobs. The more floors, the more jobs get done! self.transformer_layers = [TransformerLayer(d_model, nhead) for _ in range(num_layers)] # Convert the output vectors to word probabilities # What this is : A way to predict the next word in a sequence. # Why it does it : After processing the input, we want to guess what word comes next. # Explained like I'm five : After listening to a story, this tries to guess what happens next. self.output_layer = Linear(d_model, vocab_size) def forward(self, x): # Convert words to vectors, as above x = self.embedding(x) # Add position information, as above x = self.positional_encoding(x) # Pass through each transformer layer # What this is : Sending our data through each floor of our multi-story building. # Why it does it : To deeply process and understand the data. # Explained like I'm five : It's like passing a note in class. Each person (or layer) adds something to the note before passing it on, which can end up with a coherent story – or a mess. for layer in self.transformer_layers: x = layer(x) # Get the output word probabilities # What this is : Our best guess for the next word in the sequence. return self.output_layer(x)

在程式碼中，您可能有一個 Transformer 類別和一個 TransformerLayer 類別。這就像有一個樓層的藍圖與整棟建築物的藍圖一樣。

這段 TransformerLayer 程式碼向您展示了特定元件（例如多頭注意力和特定安排）的工作原理。

class TransformerLayer: # Multi-head attention mechanism # What this is : A mechanism that lets the model focus on different parts of the input data simultaneously. # Demo : "the pineapple is cool and tasty" might become "this PINEAPPLE is COOL and TASTY" as the model pays more attention to certain words. self.attention = MultiHeadAttention(d_model, nhead) # Simple feed-forward neural network # What this is : A basic neural network that processes the data after the attention mechanism. # Demo : "this PINEAPPLE is COOL and TASTY" -> [0.25, 0.55, 0.35, 0.85, 0.15, 0.95] (slight changes in numbers after processing) self.feed_forward = FeedForward(d_model) def forward(self, x): # Apply attention mechanism # What this is : The step where we focus on different parts of the sentence. # Explained like I'm five : It's like highlighting important parts of a book. attention_output = self.attention(x, x, x) # Pass the output through the feed-forward network # What this is : The step where we process the highlighted information. return self.feed_forward(attention_output)

前饋神經網路是最簡單的人工神經網路類型之一。它由一個輸入層、一個或多個隱藏層和一個輸出層組成。

資料沿著一個方向流動－從輸入層，經過隱藏層，最後到達輸出層。網路中不存在環路或環路。

在 Transformer 架構的背景下，前饋神經網路在每層的注意力機制之後使用。這是一個簡單的兩層線性變換，中間有一個 ReLU 活化。

 # Scaled dot-product attention mechanism class ScaledDotProductAttention: def __init__(self, d_model): # Scaling factor helps in stabilizing the gradients # it reduces the variance of the dot product. # What this is: A scaling factor based on the size of our model's embeddings. # What it does : Helps to make sure the dot products don't get too big. # Why it does it : Big dot products can make a model unstable and harder to train. # How it does it : By dividing the dot products by the square root of the embedding size. # It's used when calculating attention scores. # Explained like I'm five : Imagine you shouted something really loud. This scaling factor is like turning the volume down so it's not too loud. self.scaling_factor = d_model ** 0.5 def forward(self, query, key, value): # What this is : The function that calculates how much attention each word should get. # What it does : Determines how relevant each word in a sentence is to every other word. # Why it does it : So we can focus more on important words when trying to understand a sentence. # How it does it : By taking the dot product (the numeric product: a way to measure similarity) of the query and key, then scaling it, and finally using that to weigh our values. # How it fits into the rest of the code : This function is called whenever we want to calculate attention in our model. # Explained like I'm five : Imagine you have a toy and you want to see which of your friends likes it the most. This function is like asking each friend how much they like the toy, and then deciding who gets to play with it based on their answers. # Calculate attention scores by taking the dot product of the query and key. scores = dot_product(query, key) / self.scaling_factor # Convert the raw scores to probabilities using the softmax function. attention_weights = softmax(scores) # Weight the values using the attention probabilities. return dot_product(attention_weights, value) # Feed-forward neural network # This is an extremely basic example of a neural network. class FeedForward: def __init__(self, d_model): # First linear layer increases the dimensionality of the data. self.layer1 = Linear(d_model, d_model * 4) # Second linear layer brings the dimensionality back to d_model. self.layer2 = Linear(d_model * 4, d_model) def forward(self, x): # Pass the input through the first layer, #Pass the input through the first layer: # Input : This refers to the data you feed into the neural network. I # First layer : Neural networks consist of layers, and each layer has neurons. When we say "pass the input through the first layer," we mean that the input data is being processed by the neurons in this layer. Each neuron takes the input, multiplies it by its weights (which are learned during training), and produces an output. # apply ReLU activation to introduce non-linearity, # and then pass through the second layer. #ReLU activation: ReLU stands for Rectified Linear Unit. # It's a type of activation function, which is a mathematical function applied to the output of each neuron. In simpler terms, if the input is positive, it returns the input value; if the input is negative or zero, it returns zero. # Neural networks can model complex relationships in data by introducing non-linearities. # Without non-linear activation functions, no matter how many layers you stack in a neural network, it would behave just like a single-layer perceptron because summing these layers would give you another linear model. # Non-linearities allow the network to capture complex patterns and make better predictions. return self.layer2(relu(self.layer1(x))) # Positional encoding adds information about the position of each word in the sequence. class PositionalEncoding: def __init__(self, d_model): # What this is : A setup to add information about where each word is in a sentence. # What it does : Prepares to add a unique "position" value to each word. # Why it does it : Words in a sentence have an order, and this helps the model remember that order. # How it does it : By creating a special pattern of numbers for each position in a sentence. # How it fits into the rest of the code : Before processing words, we add their position info. # Explained like I'm five : Imagine you're in a line with your friends. This gives everyone a number to remember their place in line. pass def forward(self, x): # What this is : The main function that adds position info to our words. # What it does : Combines the word's original value with its position value. # Why it does it : So the model knows the order of words in a sentence. # How it does it : By adding the position values we prepared earlier to the word values. # How it fits into the rest of the code : This function is called whenever we want to add position info to our words. # Explained like I'm five : It's like giving each of your toys a tag that says if it's the 1st, 2nd, 3rd toy, and so on. return x # Helper functions def dot_product(a, b): # Calculate the dot product of two matrices. # What this is : A mathematical operation to see how similar two lists of numbers are. # What it does : Multiplies matching items in the lists and then adds them up. # Why it does it : To measure similarity or relevance between two sets of data. # How it does it : By multiplying and summing up. # How it fits into the rest of the code : Used in attention to see how relevant words are to each other. # Explained like I'm five : Imagine you and your friend have bags of candies. You both pour them out and match each candy type. Then, you count how many matching pairs you have. return a @ b.transpose(-2, -1) def softmax(x): # Convert raw scores to probabilities ensuring they sum up to 1. # What this is : A way to turn any list of numbers into probabilities. # What it does : Makes the numbers between 0 and 1 and ensures they all add up to 1. # Why it does it : So we can understand the numbers as chances or probabilities. # How it does it : By using exponentiation and division. # How it fits into the rest of the code : Used to convert attention scores into probabilities. # Explained like I'm five : Lets go back to our toys. This makes sure that when you share them, everyone gets a fair share, and no toy is left behind. return exp(x) / sum(exp(x), axis=-1) def relu(x): # Activation function that introduces non-linearity. It sets negative values to 0. # What this is : A simple rule for numbers. # What it does : If a number is negative, it changes it to zero. Otherwise, it leaves it as it is. # Why it does it : To introduce some simplicity and non-linearity in our model's calculations. # How it does it : By checking each number and setting it to zero if it's negative. # How it fits into the rest of the code : Used in neural networks to make them more powerful and flexible. # Explained like I'm five : Imagine you have some stickers, some are shiny (positive numbers) and some are dull (negative numbers). This rule says to replace all dull stickers with blank ones. return max(0, x)

生成式人工智慧如何運作——簡單來說

將生成式人工智慧視為擲加權骰子。訓練資料決定權重（或機率）。

如果骰子代表句子中的下一個單詞，則訓練資料中經常跟隨當前單字的單字將具有更高的權重。因此，“天空”可能比“香蕉”更頻繁地跟隨“藍色”。當人工智慧「擲骰子」產生內容時，它更有可能根據其訓練選擇統計上更可能的序列。

那麼，法學碩士如何產生「看似」原創的內容呢？

讓我們以一個虛假的清單文章——「內容行銷人員的最佳開齋節禮物」——來看看看法學碩士如何產生此清單結合了有關禮物、開齋節和內容行銷人員的文件中的文字提示。

在處理之前，文字被分解成稱為“標記”的較小的片段。這些標記可以短至一個字符，也可以長至一個單字。

例： “Eid al-Fitr 是慶祝活動”變成 [“Eid”、“al-Fitr”、“is”、“a”、“celebration”]。

這使得模型能夠處理可管理的文本區塊並理解句子的結構。

然後使用嵌入將每個標記轉換為向量（數字列表）。這些向量捕獲每個單字的含義和上下文。

位置編碼為每個詞向量添加有關其在句子中的位置的信息，確保模型不會丟失該順序資訊。

然後我們使用注意力機制：這使得模型在生成輸出時能夠專注於輸入文字的不同部分。如果您還記得 BERT，這就是讓 Google 員工對 BERT 如此興奮的地方。

如果我們的模型看到了關於“禮物”的文本，並且知道人們在慶祝活動中贈送禮物，並且它也看到了關於“開齋節”是一個重要慶祝活動的文本，那麼它就會“關注”這些聯繫。

同樣，如果它看到關於“內容行銷人員”需要特定工具或資源的文本，它可以將“禮物”的想法與“內容行銷人員”聯繫起來。

現在我們可以組合上下文：當模型透過多個 Transformer 層處理輸入文字時，它會組合它所學到的上下文。

因此，即使原文從未提及“內容行銷人員的開齋節禮物”，該模型也可以將“開齋節”、“禮物”和“內容行銷人員”的概念結合在一起來產生此內容。

這是因為它已經了解了每個術語的更廣泛背景。

透過注意力機制和每個 Transformer 層中的前饋網路處理輸入後，模型會產生序列中下一個單字的詞彙表的機率分佈。

它可能會認為，在“最好”和“開齋節”等詞之後，接下來出現的詞“禮物”很有可能。同樣，它可能會將“禮物”與“內容行銷人員”等潛在接收者聯繫起來。

取得搜尋行銷人員信賴的每日電子報。

查看條款。

建構多大的語言模型

從基本 Transformer 模型到 GPT-3 或 BERT 等複雜的大型語言模型 (LLM) 的過程涉及擴展和完善各種組件。

以下是逐步細分：

法學碩士接受過大量文字資料的訓練。很難解釋這些數據有多大。

C4 資料集是許多法學碩士的起點，包含 750 GB 的文字資料。這是 805,306,368,000 位元組——資訊量很大。這些數據可以包括書籍、文章、網站、論壇、評論部分和其他來源。

資料越多樣化、越全面，模型的理解和泛化能力就越好。

雖然基本的 Transformer 架構仍然是基礎，但 LLM 的參數數量明顯增加。例如，GPT-3 擁有 1750 億個參數。在這種情況下，參數是指在訓練過程中學習到的神經網路中的權重和偏差。

在深度學習中，模型被訓練為透過調整這些參數來進行預測，以減少其預測與實際結果之間的差異。

調整這些參數的過程稱為最佳化，它使用梯度下降等演算法。

權重：這些是神經網路中的值，用於在網路層內轉換輸入資料。它們在訓練期間進行調整以優化模型的輸出。相鄰層中神經元之間的每個連接都有一個相關的權重。
偏差：這些也是神經網路中添加到層轉換輸出中的值。它們為模型提供了額外的自由度，使其能夠更好地擬合訓練資料。層中的每個神經元都有相關的偏差。

這種縮放允許模型儲存和處理資料中更複雜的模式和關係。

大量的參數也意味著模型需要大量的計算能力和記憶體來進行訓練和推理。這就是為什麼訓練此類模型需要大量資源，並且通常使用 GPU 或 TPU 等專用硬體。

該模型經過訓練，可以使用強大的計算資源來預測序列中的下一個單字。它根據所犯的錯誤調整其內部參數，並不斷改進其預測。

像我們討論的那樣的注意力機制對於法學碩士來說至關重要。它們允許模型在生成輸出時專注於輸入的不同部分。

透過權衡上下文中不同單字的重要性，注意力機制使模型能夠產生連貫且上下文相關的文本。如此大規模的實施使得法學碩士能夠以他們的方式工作。

轉換器如何預測文字？

Transformer 透過多層處理輸入標記來預測文本，每個層都配備了注意力機制和前饋網路。

處理後，模型會產生序列中下一個單字的詞彙表的機率分佈。通常選擇機率最高的單字作為預測。

大型語言模型是如何建構和訓練的？

建立法學碩士涉及收集資料、清理資料、訓練模型、微調模型以及積極、持續的測試。

該模型最初在龐大的語料庫上進行訓練，以預測序列中的下一個單字。這個階段允許模型學習單字之間的聯繫，這些單字會拾取語法模式，可以代表世界事實的關係，以及感覺像是邏輯推理的聯繫。這些連接還使其能夠識別訓練資料中存在的偏差。

預訓練後，模型會在較小的資料集上進行完善，通常由人工審閱者遵循指導方針。

微調是建立法學碩士的關鍵一步。它涉及在更具體的資料集或任務上訓練預訓練模型。我們以 ChatGPT 為例。

如果你玩過 GPT 模型，你就會知道提示不是“寫這個東西”，而是更像

提示：從前
- 續：塔頂有一個邪惡的巫師。
提示：小雞為什麼要加入樂團？
- 繼續：因為它有雞腿！

從那時起到 ChatGPT 需要大量低薪勞動力。這些人創建了巨大的語料庫來衡量 GPT 回應和預期行為的權重。這些工作人員創建了大量的提示/延續文本，如下所示：

提示：完成這個故事：“從前…”
- 繼續：當然！從前，在很遠很遠的地方，有一個小村莊，坐落在兩座雄偉的山脈之間。
提示：給我講一個關於雞的笑話。
- 繼續：雞為什麼要加入樂團？因為它有雞腿！

由於以下幾個原因，這種微調過程至關重要：

特異性：雖然預訓練使模型對語言有廣泛的理解，但微調會縮小其知識和行為的範圍，以便與特定任務或領域保持一致。例如，根據醫療數據進行微調的模型將更好地回答醫療問題。
控制：微調使開發人員可以更好地控制模型的輸出。開發人員可以使用精選的資料集來指導模型產生所需的回應並避免不良行為。
安全：它有助於減少有害或有偏見的輸出。透過在微調過程中使用指南，人工審核人員可以確保模型不會產生不適當的內容。
效能：微調可以顯著提高模型在特定任務上的表現。例如，針對客戶支援進行了微調的模型將比通用模型更好。

您可以看出 ChatGPT 特別在某些方面進行了微調。

例如，「邏輯推理」是法學碩士往往會遇到的問題。 ChatGPT 的最佳邏輯推理模型 - GPT-4 - 經過嚴格訓練，可以明確地識別數字模式。

而不是這樣的：

提示：2+2 是什麼？
過程：在兒童數學教科書中常出現2+2=4。偶爾會提到“2+2=5”，但在這種情況下，通常會有更多與喬治·奧威爾或《星際迷航》相關的上下文。如果是在這種情況下，權重將更傾向於 2+2=5。但該上下文不存在，因此在本例中下一個標記可能是 4。
響應：2+2=4

訓練的過程是這樣的：

訓練：2+2=4
訓練：4/2=2
訓練：4的一半是2
訓練： 2 of 2 是 4

…等等。

這意味著對於那些更「邏輯」的模型，訓練過程更加嚴格，重點是確保模型理解並正確應用邏輯和數學原理。

該模型涉及各種數學問題及其解決方案，確保它可以概括這些原理並將其應用於新的、未見過的問題。

這種微調過程的重要性，尤其是對於邏輯推理而言，怎麼強調都不為過。如果沒有它，模型可能會為簡單的邏輯或數學問題提供不正確或無意義的答案。

圖像模型與語言模型

雖然圖像和語言模型可能使用類似的架構（例如 Transformer），但它們處理的資料卻有根本的不同：

影像模型

這些模型處理像素，通常以分層方式工作，首先分析小圖案（如邊緣），然後將它們組合起來識別更大的結構（如形狀），依此類推，直到您理解整個圖像。

語言模型

這些模型處理單字或字元的序列。他們需要理解上下文、語法和語義，以產生連貫且上下文相關的文本。

傑出的生成式人工智慧介面如何運作

Dall-E + 中途

Dall-E 是適用於影像產生的 GPT-3 模型的變體。它是在大量文字圖像對資料集上進行訓練的。 Midjourney 是另一種基於專有模型的影像生成軟體。

輸入：您提供文字描述，例如“雙頭火烈鳥”。
處理：這些模型將文字編碼為一系列數字，然後解碼這些向量，找到與像素的關係，以產生圖像。該模型從訓練資料中學習了文字描述和視覺表示之間的關係。
輸出：與給定描述匹配或相關的圖像。

手指、圖案、問題

為什麼這些工具不能總是生成看起來正常的手？這些工具的工作原理是查看彼此相鄰的像素。

將早期或更原始的生成圖像與更新的圖像進行比較時，您可以看到這是如何工作的：早期的模型看起來非常模糊。相比之下，最近的型號要清晰得多。

這些模型透過根據已經產生的像素預測下一個像素來產生影像。這個過程重複數百萬次才能產生完整的圖像。

手，尤其是手指，非常複雜，有很多細節需要準確捕捉。

每個手指的位置、長度和方向在不同的影像中可能有很大差異。

當根據文字描述產生圖像時，模型必須對手的確切姿勢和結構做出許多假設，這可能會導致異常。

聊天GPT

ChatGPT 是基於 GPT-3.5 架構，這是一個基於 Transformer 的模型，專為自然語言處理任務而設計。

輸入：模擬對話的提示或一系列訊息。
處理： ChatGPT 利用來自不同網路文字的大量知識來產生回應。它會考慮對話中提供的上下文，並嘗試產生最相關和連貫的答案。
輸出：繼續或回答對話的文字回應。

專業

ChatGPT 的優點在於它能夠處理各種主題並模擬類人對話，使其成為聊天機器人和虛擬助理的理想選擇。

Bard + 搜尋生成體驗 (SGE)

雖然具體細節可能是專有的，但 Bard 是基於 Transformer AI 技術，類似於其他最先進的語言模型。 SGE 是基於類似的模型，但融入了 Google 使用的其他 ML 演算法。

SGE 可能使用基於變壓器的生成模型生成內容，然後從搜尋排名頁面中模糊提取答案。（這可能不是真的。只是根據它的工作原理進行的猜測。請不要起訴我！）

輸入：提示/命令/搜尋
處理：巴德處理輸入並按照其他法學碩士的方式工作。 SGE 使用類似的架構，但添加了一個層，用於搜尋其內部知識（從訓練資料中獲得）以產生合適的回應。它考慮提示的結構、上下文和產生相關內容的意圖。
輸出：產生的內容可以是故事、答案或任何其他類型的文字。

生成式人工智慧的應用（及其爭議）

藝術與設計

生成式人工智慧現在可以創作藝術品、音樂，甚至是產品設計。這為創造力和創新開闢了新的途徑。

爭議

人工智慧在藝術領域的興起引發了關於創意領域失業的爭論。

此外，人們也擔心：

勞動違規，尤其是在沒有適當歸屬或補償的情況下使用人工智慧產生的內容時。
高階主管威脅作家用人工智慧取代他們是引發作家罷工的問題之一。

自然語言處理（NLP）

AI 模型現在廣泛用於聊天機器人、語言翻譯和其他 NLP 任務。

除了通用人工智慧 (AGI) 的夢想之外，這是法學碩士的最佳用途，因為它們接近「通才」NLP 模型。

爭議

許多用戶發現聊天機器人缺乏人情味，有時甚至令人討厭。

此外，雖然人工智慧在語言翻譯方面取得了重大進展，但它往往缺乏人工翻譯帶來的細微差別和文化理解，導致翻譯令人印象深刻且有缺陷。

醫學和藥物發現

人工智慧可以快速分析大量醫學數據並產生潛在的藥物化合物，從而加快藥物發現過程。許多醫生已經使用法學碩士來寫筆記和與病人溝通

爭議

依賴法學碩士用於醫療目的可能會存在問題。醫學需要精確性，人工智慧的任何錯誤或疏忽都可能造成嚴重後果。

醫學界也已經存在偏見，而這些偏見在使用法學碩士的過程中只會變得更加嚴重。如下所述，在隱私、功效和道德方面也存在類似的問題。

賭博

許多人工智慧愛好者對在遊戲中使用人工智慧感到興奮：他們表示人工智慧可以產生逼真的遊戲環境、角色甚至整個遊戲情節，增強遊戲體驗。透過使用這些工具可以增強 NPC 對話。

爭議

關於遊戲設計中的意向性存在爭議。

雖然人工智慧可以產生大量內容，但有些人認為它缺乏人類設計師帶來的深思熟慮的設計和敘事凝聚力。

《看門狗 2》有程式化的 NPC，這對增加整個遊戲的敘事凝聚力幾乎沒有幫助。

行銷和廣告

人工智慧可以分析消費者行為並產生個人化廣告和促銷內容，使行銷活動更加有效。

法學碩士擁有其他人寫作的背景，這使得它們對於產生使用者故事或更細緻的程式化想法很有用。 Instead of recommending TVs to someone who just bought a TV, LLMs can recommend accessories someone might want instead.

Controversy

The use of AI in marketing raises privacy concerns. There's also a debate about the ethical implications of using AI to influence consumer behavior.

Dig deeper: How to scale the use of large language models in marketing

Continuing issues with LLMS

Contextual understanding and comprehension of human speech

Limitation: AI models, including GPT, often struggle with nuanced human interactions, such as detecting sarcasm, humor, or lies.
Example: In stories where a character is lying to other characters, the AI might not always grasp the underlying deceit and might interpret statements at face value.

Pattern matching

Limitation: AI models, especially those like GPT, are fundamentally pattern matchers. They excel at recognizing and generating content based on patterns they've seen in their training data. However, their performance can degrade when faced with novel situations or deviations from established patterns.
Example: If a new slang term or cultural reference emerges after the model's last training update, it might not recognize or understand it.

Lack of common sense understanding

Limitation: While AI models can store vast amounts of information, they often lack a "common sense" understanding of the world, leading to outputs that might be technically correct but contextually nonsensical.

Potential to reinforce biases

Ethical consideration: AI models learn from data, and if that data contains biases, the model will likely reproduce and even amplify those biases. This can lead to outputs that are sexist, racist, or otherwise prejudiced.

Challenges in generating unique ideas

Limitation: AI models generate content based on patterns they've seen. While they can combine these patterns in novel ways, they don't "invent" like humans do. Their "creativity" is a recombination of existing ideas.

Data Privacy, Intellectual Property, and Quality Control Issues:

Ethical consideration : Using AI models in applications that handle sensitive data raises concerns about data privacy. When AI generates content, questions arise about who owns the intellectual property rights. Ensuring the quality and accuracy of AI-generated content is also a significant challenge.

Bad code

AI models might generate syntactically correct code when used for coding tasks but functionally flawed or insecure. I have had to correct the code people have added to sites they generated using LLMs. It looked right, but was not. Even when it does work, LLMs have out-of-date expectations for code, using functions like “document.write” that are no longer considered best practice.

Hot takes from an MLOps engineer and technical SEO

This section covers some hot takes I have about LLMs and generative AI. Feel free to fight with me.

Prompt engineering isn't real (for generative text interfaces)

Generative models, especially large language models (LLMs) like GPT-3 and its successors, have been touted for their ability to generate coherent and contextually relevant text based on prompts.

Because of this, and since these models have become the new “gold rush," people have started to monetize “prompt engineering” as a skill. This can be either $1,400 courses or prompt engineering jobs.

However, there are some critical considerations:

LLMs change rapidly

As technology evolves and new model versions are released, how they respond to prompts can change. What worked for GPT-3 might not work the same way for GPT-4 or even a newer version of GPT-3.

This constant evolution means prompt engineering can become a moving target, making it challenging to maintain consistency. Prompts that work in January may not work in March.

Uncontrollable outcomes

While you can guide LLMs with prompts, there's no guarantee they'll always produce the desired output. For instance, asking an LLM to generate a 500-word essay might result in outputs of varying lengths because LLMs don't know what numbers are.

Similarly, while you can ask for factual information, the model might produce inaccuracies because it cannot tell the difference between accurate and inaccurate information by itself.

Using LLMs in non-language-based applications is a bad idea

LLMs are primarily designed for language tasks. While they can be adapted for other purposes, there are inherent limitations:

Struggle with novel ideas

LLMs are trained on existing data, which means they're essentially regurgitating and recombining what they've seen before. They don't "invent" in the truest sense of the word.

Tasks that require genuine innovation or out-of-the-box thinking should not use LLMs.

You can see an issue with this when it comes to people using GPT models for news content – if something novel comes along, it's hard for LLMs to deal with it.

*This **didn't** happen, but it is published online and is currently the top result for Megan Crosby.*

For example, a site that seems to be generating content with LLMs published a possibly libelous article about Megan Crosby. Crosby was caught elbowing opponents in real life.

Without that context, the LLM created a completely different, evidence-free story about a “controversial comment.”

Text-focused

At their core, LLMs are designed for text. While they can be adapted for tasks like image generation or music composition, they might not be as proficient as models specifically designed for those tasks.

LLMs don't know what the truth is

They generate outputs based on patterns encountered in their training data. This means they can't verify facts or discern true and false information.

If they've been exposed to misinformation or biased data during training, or they don't have context for something, they might propagate those inaccuracies in their outputs.

This is especially problematic in applications like news generation or academic research, where accuracy and truth are paramount.

Think about it like this: if an LLM has never come across the name “Jimmy Scrambles” before but knows it's a name, prompts to write about it will only come up with related vectors.

Designers are always better than AI-generated Art

AI has made significant strides in art, from generating paintings to composing music. However, there's a fundamental difference between human-made art and AI-generated art:

Intent, feeling, vibe

Art is not just about the final product but the intent and emotion behind it.

A human artist brings their experiences, emotions, and perspectives to their work, giving it depth and nuance that's challenging for AI to replicate.

A “bad” piece of art from a person has more depth than a beautiful piece of art from a prompt.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

Add Search Engine Land to your Google News feed.