C2W3.Assignment.Language Models: Auto-Complete.Part1

理论课:C2W3.Auto-complete and Language Models

文章目录

  • 1 Load and Preprocess Data
    • 1.1: Load the data
    • 1.2 Pre-process the data
    • Exercise 01.Split data into sentences
    • Exercise 02.Tokenize sentences
    • Exercise 03
      • Split into train and test sets
    • Exercise 04
      • Handling 'Out of Vocabulary' words
    • Exercise 05
    • Exercise 06
    • Exercise 07

理论课: C2W3.Auto-complete and Language Models
之前有学过Auto-Correct,这节学习Auto-Complete,例如:
在这里插入图片描述
当然,浏览器里面也有类似操作。
语言模型是自动完成系统的关键组成部分。
语言模型为单词序列分配概率,使更 “可能 ”的序列获得更高的分数。 例如

“I have a pen”
的概率要高于
“I am a pen”

可以利用这种概率计算来开发自动完成系统。
假设用户输入

“I eat scrambled”
那么你可以找到一个词x,使 “I eat scrambled x ”获得最高概率。 如果 x = “eggs”,那么句子就是
“I eat scrambled eggs”

可以选择的语言模型很多种,这里直接使用简单高效的N-grams。大概步骤为:

  1. 加载和预处理数据
    • 加载并标记数据。
    • 将句子分成训练集和测试集。
    • 用未知标记 <unk>替换低频词。
  2. 开发基于 N-gram 的语言模型
    • 计算给定数据集中的 n-grams 数量。
    • 用 k 平滑估计下一个词的条件概率。
  3. 通过计算复杂度得分来评估 N-gram 模型。
  4. 使用N-gram模型为句子推荐下一个单词。
import math
import random
import numpy as np
import pandas as pd
import nltk
#nltk.download('punkt')import w3_unittest
nltk.data.path.append('.')

1 Load and Preprocess Data

1.1: Load the data

数据是一个长字符串,包含很多很多条微博,推文之间有一个换行符“\n”。

with open("./data/en_US.twitter.txt", "r", encoding="utf-8") as f:data = f.read()
print("Data type:", type(data))
print("Number of letters:", len(data))
print("First 300 letters of the data")
print("-------")
display(data[0:300])
print("-------")print("Last 300 letters of the data")
print("-------")
display(data[-300:])
print("-------")

结果:

Data type: <class 'str'>
Number of letters: 3335477
First 300 letters of the data
-------
"How are you? Btw thanks for the RT. You gonna be in DC anytime soon? Love to see you. Been way, way too long.\nWhen you meet someone special... you'll know. Your heart will beat more rapidly and you'll smile for no reason.\nthey've decided its more fun if I don't.\nSo Tired D; Played Lazer Tag & Ran A "
-------
Last 300 letters of the data
-------
"ust had one a few weeks back....hopefully we will be back soon! wish you the best yo\nColombia is with an 'o'...“: We now ship to 4 countries in South America (fist pump). Please welcome Columbia to the Stunner Family”\n#GutsiestMovesYouCanMake Giving a cat a bath.\nCoffee after 5 was a TERRIBLE idea.\n"
-------

1.2 Pre-process the data

通过以下步骤对这些数据进行预处理:

  1. 用“\n ”作为分隔符将数据分割成句子。
  2. 将每个句子分割成标记。请注意,在英文中,我们交替使用 “token ”和 “word”。
  3. 将句子分配到训练集或测试集
  4. 查找在训练数据中至少出现 N 次的标记。
  5. <unk> 替换出现少于 N 次的词组

注意:本实验省略了验证数据。

  • 在实际应用中,我们应该保留一部分数据作为验证集,并用它来调整我们的训练。
  • 为简单起见,这里跳过这一过程。

Exercise 01.Split data into sentences

# UNIT TEST COMMENT: Candidate for Table Driven Tests 
### UNQ_C1 GRADED_FUNCTION: split_to_sentences ###
def split_to_sentences(data):"""Split data by linebreak "\n"Args:data: strReturns:A list of sentences"""### START CODE HERE ###sentences = data.split("\n")### END CODE HERE #### Additional clearning (This part is already implemented)# - Remove leading and trailing spaces from each sentence# - Drop sentences if they are empty strings.sentences = [s.strip() for s in sentences]sentences = [s for s in sentences if len(s) > 0]return sentences    

运行:

# test your code
x = """
I have a pen.\nI have an apple. \nAh\nApple pen.\n
"""
print(x)split_to_sentences(x)

结果:
I have a pen.
I have an apple.
Ah
Apple pen.
PS.此处不带语音。

Exercise 02.Tokenize sentences

将所有标记词转换为小写,这样原文中大写的单词(例如句子开头的单词)就会与小写单词得到相同的处理。将每个单词列表添加到句子列表中。

# UNIT TEST COMMENT: Candidate for Table Driven Tests 
### UNQ_C2 GRADED_FUNCTION: tokenize_sentences ###
def tokenize_sentences(sentences):"""Tokenize sentences into tokens (words)Args:sentences: List of stringsReturns:List of lists of tokens"""# Initialize the list of lists of tokenized sentencestokenized_sentences = []### START CODE HERE #### Go through each sentencefor sentence in sentences: # complete this line# Convert to lowercase letterssentence = sentence.lower()# Convert into a list of wordstokenized =  nltk.word_tokenize(sentence)# append the list of words to the list of liststokenized_sentences.append(tokenized)### END CODE HERE ###return tokenized_sentences

运行:

# test your code
sentences = ["Sky is blue.", "Leaves are green.", "Roses are red."]
tokenize_sentences(sentences)

结果:

[['sky', 'is', 'blue', '.'],['leaves', 'are', 'green', '.'],['roses', 'are', 'red', '.']]

Exercise 03

根据前面两个练习获取分词后的数据

# UNIT TEST COMMENT: Candidate for Table Driven Tests 
### UNQ_C3 GRADED_FUNCTION: get_tokenized_data ###
def get_tokenized_data(data):"""Make a list of tokenized sentencesArgs:data: StringReturns:List of lists of tokens"""### START CODE HERE #### Get the sentences by splitting up the datasentences = split_to_sentences(data)# Get the list of lists of tokens by tokenizing the sentencestokenized_sentences = tokenize_sentences(sentences)### END CODE HERE ###return tokenized_sentences

测试:

# test your function
x = "Sky is blue.\nLeaves are green\nRoses are red."
get_tokenized_data(x)

结果:

[['sky', 'is', 'blue', '.'],['leaves', 'are', 'green'],['roses', 'are', 'red', '.']]

Split into train and test sets

tokenized_data = get_tokenized_data(data)
random.seed(87)
random.shuffle(tokenized_data)train_size = int(len(tokenized_data) * 0.8)
train_data = tokenized_data[0:train_size]
test_data = tokenized_data[train_size:]

运行:

print("{} data are split into {} train and {} test set".format(len(tokenized_data), len(train_data), len(test_data)))print("First training sample:")
print(train_data[0])print("First test sample")
print(test_data[0])

结果:

47961 data are split into 38368 train and 9593 test set
First training sample:
['i', 'personally', 'would', 'like', 'as', 'our', 'official', 'glove', 'of', 'the', 'team', 'local', 'company', 'and', 'quality', 'production']
First test sample
['that', 'picture', 'i', 'just', 'seen', 'whoa', 'dere', '!', '!', '>', '>', '>', '>', '>', '>', '>']

Exercise 04

这里不会使用数据中出现的所有分词(单词)进行训练,只使用频率较高的单词。

  • 只专注于在数据中出现至少 N 次的单词。
  • 先计算每个词在数据中出现的次数。

需要一个双 for 循环,一个用于句子,另一个用于句子中的标记。

# UNIT TEST COMMENT: Candidate for Table Driven Tests 
### UNQ_C4 GRADED_FUNCTION: count_words ###
def count_words(tokenized_sentences):"""Count the number of word appearence in the tokenized sentencesArgs:tokenized_sentences: List of lists of stringsReturns:dict that maps word (str) to the frequency (int)"""word_counts = {}### START CODE HERE #### Loop through each sentencefor sentence in tokenized_sentences: # complete this line# Go through each token in the sentencefor token in sentence: # complete this line# If the token is not in the dictionary yet, set the count to 1if token not in word_counts.keys(): # complete this line with the proper conditionword_counts[token] = 1# If the token is already in the dictionary, increment the count by 1else:word_counts[token] += 1### END CODE HERE ###return word_counts

测试:

# test your code
tokenized_sentences = [['sky', 'is', 'blue', '.'],['leaves', 'are', 'green', '.'],['roses', 'are', 'red', '.']]
count_words(tokenized_sentences)

结果:

{'sky': 1,'is': 1,'blue': 1,'.': 3,'leaves': 1,'are': 2,'green': 1,'roses': 1,'red': 1}

Handling ‘Out of Vocabulary’ words

当模型遇到了一个它在训练过程中从未见过的单词,那么它将没有输入单词来帮助它确定下一个要建议的单词。模型将无法预测下一个单词,因为当前单词没有计数。

  • 这种 “新 ”单词被称为 “未知单词”,或out of vocabulary(OOV)。
  • 测试集中未知词的百分比称为 OOV 率。

为了在预测过程中处理未知词,可以使用一个特殊的标记 “unk”来表示所有未知词。
修改训练数据,使其包含一些 “未知 ”词来进行训练。
将在训练集中出现频率不高的词转换成 “未知”词。
创建一个训练集中出现频率最高的单词列表,称为封闭词汇表(closed vocabulary)。
将不属于封闭词汇表的所有其他单词转换为标记 “unk”。

Exercise 05

创建一个接收文本文档和阈值 count_threshold 的函数。

任何计数大于或等于阈值 count_threshold 的单词都会保留在封闭词汇表中。
函数将返回单词封闭词汇表。

# UNIT TEST COMMENT: Candidate for Table Driven Tests 
### UNQ_C5 GRADED_FUNCTION: get_words_with_nplus_frequency ###
def get_words_with_nplus_frequency(tokenized_sentences, count_threshold):"""Find the words that appear N times or moreArgs:tokenized_sentences: List of lists of sentencescount_threshold: minimum number of occurrences for a word to be in the closed vocabulary.Returns:List of words that appear N times or more"""# Initialize an empty list to contain the words that# appear at least 'minimum_freq' times.closed_vocab = []# Get the word couts of the tokenized sentences# Use the function that you defined earlier to count the wordsword_counts = count_words(tokenized_sentences)### START CODE HERE ###
#   UNIT TEST COMMENT: Whole thing can be one-lined with list comprehension
#   filtered_words = None# for each word and its countfor word, cnt in word_counts.items(): # complete this line# check that the word's count# is at least as great as the minimum countif cnt>=count_threshold: # complete this line with the proper condition# append the word to the list <unk>closed_vocab.append(word)### END CODE HERE ###return closed_vocab

测试:

# test your code
tokenized_sentences = [['sky', 'is', 'blue', '.'],['leaves', 'are', 'green', '.'],['roses', 'are', 'red', '.']]
tmp_closed_vocab = get_words_with_nplus_frequency(tokenized_sentences, count_threshold=2)
print(f"Closed vocabulary:")
print(tmp_closed_vocab)

结果:
Closed vocabulary:
[‘.’, ‘are’]

Exercise 06

出现次数达到或超过 count_threshold 的词属于封闭词汇。
其他词均视为未知词。
用标记 <unk> 替换不在封闭词汇表中的词。

# UNIT TEST COMMENT: Candidate for Table Driven Tests 
### UNQ_C6 GRADED_FUNCTION: replace_oov_words_by_unk ###
def replace_oov_words_by_unk(tokenized_sentences, vocabulary, unknown_token="<unk>"):"""Replace words not in the given vocabulary with '<unk>' token.Args:tokenized_sentences: List of lists of stringsvocabulary: List of strings that we will useunknown_token: A string representing unknown (out-of-vocabulary) wordsReturns:List of lists of strings, with words not in the vocabulary replaced"""# Place vocabulary into a set for faster searchvocabulary = set(vocabulary)# Initialize a list that will hold the sentences# after less frequent words are replaced by the unknown tokenreplaced_tokenized_sentences = []# Go through each sentencefor sentence in tokenized_sentences:# Initialize the list that will contain# a single sentence with "unknown_token" replacementsreplaced_sentence = []### START CODE HERE (Replace instances of 'None' with your code) #### for each token in the sentencefor token in sentence: # complete this line# Check if the token is in the closed vocabularyif token in vocabulary: # complete this line with the proper condition# If so, append the word to the replaced_sentencereplaced_sentence.append(token)else:# otherwise, append the unknown token insteadreplaced_sentence.append(unknown_token)### END CODE HERE #### Append the list of tokens to the list of listsreplaced_tokenized_sentences.append(replaced_sentence)return replaced_tokenized_sentences

测试:

tokenized_sentences = [["dogs", "run"], ["cats", "sleep"]]
vocabulary = ["dogs", "sleep"]
tmp_replaced_tokenized_sentences = replace_oov_words_by_unk(tokenized_sentences, vocabulary)
print(f"Original sentence:")
print(tokenized_sentences)
print(f"tokenized_sentences with less frequent words converted to '<unk>':")
print(tmp_replaced_tokenized_sentences)

结果:

Original sentence:
[['dogs', 'run'], ['cats', 'sleep']]
tokenized_sentences with less frequent words converted to '<unk>':
[['dogs', '<unk>'], ['<unk>', 'sleep']]

Exercise 07

结合已实现的函数正式处理数据。

  • 查找在训练数据中至少出现过 count_threshold 次的标记。
  • <unk>替换训练数据和测试数据中出现次数少于 count_threshold 的标记。
# UNIT TEST COMMENT: Candidate for Table Driven Tests 
### UNQ_C7 GRADED_FUNCTION: preprocess_data ###
def preprocess_data(train_data, test_data, count_threshold, unknown_token="<unk>", get_words_with_nplus_frequency=get_words_with_nplus_frequency, replace_oov_words_by_unk=replace_oov_words_by_unk):"""Preprocess data, i.e.,- Find tokens that appear at least N times in the training data.- Replace tokens that appear less than N times by "<unk>" both for training and test data.        Args:train_data, test_data: List of lists of strings.count_threshold: Words whose count is less than this are treated as unknown.Returns:Tuple of- training data with low frequent words replaced by "<unk>"- test data with low frequent words replaced by "<unk>"- vocabulary of words that appear n times or more in the training data"""### START CODE HERE (Replace instances of 'None' with your code) #### Get the closed vocabulary using the train datavocabulary = get_words_with_nplus_frequency(train_data,count_threshold)# For the train data, replace less common words with ""train_data_replaced = replace_oov_words_by_unk(train_data,vocabulary,unknown_token)# For the test data, replace less common words with ""test_data_replaced = replace_oov_words_by_unk(test_data,vocabulary,unknown_token)### END CODE HERE ###return train_data_replaced, test_data_replaced, vocabulary

测试:

# test your code
tmp_train = [['sky', 'is', 'blue', '.'],['leaves', 'are', 'green']]
tmp_test = [['roses', 'are', 'red', '.']]tmp_train_repl, tmp_test_repl, tmp_vocab = preprocess_data(tmp_train, tmp_test, count_threshold = 1)print("tmp_train_repl")
print(tmp_train_repl)
print()
print("tmp_test_repl")
print(tmp_test_repl)
print()
print("tmp_vocab")
print(tmp_vocab)

结果:

tmp_train_repl
[['sky', 'is', 'blue', '.'], ['leaves', 'are', 'green']]tmp_test_repl
[['<unk>', 'are', '<unk>', '.']]tmp_vocab
['sky', 'is', 'blue', '.', 'leaves', 'are', 'green']

正式处理数据:

minimum_freq = 2
train_data_processed, test_data_processed, vocabulary = preprocess_data(train_data, test_data, minimum_freq)
print("First preprocessed training sample:")
print(train_data_processed[0])
print()
print("First preprocessed test sample:")
print(test_data_processed[0])
print()
print("First 10 vocabulary:")
print(vocabulary[0:10])
print()
print("Size of vocabulary:", len(vocabulary))

结果:

First preprocessed training sample:
['i', 'personally', 'would', 'like', 'as', 'our', 'official', 'glove', 'of', 'the', 'team', 'local', 'company', 'and', 'quality', 'production']First preprocessed test sample:
['that', 'picture', 'i', 'just', 'seen', 'whoa', 'dere', '!', '!', '>', '>', '>', '>', '>', '>', '>']First 10 vocabulary:
['i', 'personally', 'would', 'like', 'as', 'our', 'official', 'glove', 'of', 'the']Size of vocabulary: 14823

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.xdnf.cn/news/1486120.html

如若内容造成侵权/违法违规/事实不符,请联系一条长河网进行投诉反馈,一经查实,立即删除!

相关文章

2024.7.22 作业

1.将双向链表和循环链表自己实现一遍&#xff0c;至少要实现创建、增、删、改、查、销毁工作 循环链表 looplinklist.h #ifndef LOOPLINKLIST_H #define LOOPLINKLIST_H#include <myhead.h>typedef int datatype;typedef struct Node {union {int len;datatype data;}…

K8S 部署peometheus + grafana 监控

安装说明 如果有下载不下来的docker镜像可以私信我免费下载。 系统版本为 Centos7.9 内核版本为 6.3.5-1.el7 K8S版本为 v1.26.14 动态存储&#xff1a;部署文档 GitHub地址 下载yaml 文件 ## 因为我的K8S 版本比较新&#xff0c;我下载的是当前的最新版本&#xff0c;你的要…

【启明智显分享】甲醛检测仪HMI方案:ESP32-S3方案4.3寸触摸串口屏,RS485、WIFI/蓝牙可选

今年&#xff0c;“串串房”一词频繁引发广大网友关注。“串串房”&#xff0c;也被称为“陷阱房”“贩子房”——炒房客以低价收购旧房子或者毛坯房&#xff0c;用极度节省成本的方式对房子进行装修&#xff0c;之后作为精修房高价租售&#xff0c;因甲醛等有害物质含量极高&a…

自动驾驶---视觉Transformer的应用

1 背景 在过去的几年&#xff0c;随着自动驾驶技术的不断发展&#xff0c;神经网络逐渐进入人们的视野。Transformer的应用也越来越广泛&#xff0c;逐步走向自动驾驶技术的前沿。笔者也在博客《人工智能---什么是Transformer?》中大概介绍了Transformer的一些内容&#xff1a…

昇思MindSpore 应用学习-K近邻算法实现红酒聚类-CSDN

K近邻算法实现红酒聚类-AI代码解析 本实验主要介绍使用MindSpore在部分wine数据集上进行KNN实验。 1、实验目的 了解KNN的基本概念&#xff1b;了解如何使用MindSpore进行KNN实验。 2、K近邻算法原理介绍 K近邻算法&#xff08;K-Nearest-Neighbor, KNN&#xff09;是一种…

传神社区|数据集合集第7期|法律NLP数据集合集

自从ChatGPT等大型语言模型&#xff08;Large Language Model, LLM&#xff09;出现以来&#xff0c;其类通用人工智能&#xff08;AGI&#xff09;能力引发了自然语言处理&#xff08;NLP&#xff09;领域的新一轮研究和应用浪潮。尤其是ChatGLM、LLaMA等普通开发者都能运行的…

类和对象:完结

1.再深构造函数 • 之前我们实现构造函数时&#xff0c;初始化成员变量主要使⽤函数体内赋值&#xff0c;构造函数初始化还有⼀种⽅ 式&#xff0c;就是初始化列表&#xff0c;初始化列表的使⽤⽅式是以⼀个冒号开始&#xff0c;接着是⼀个以逗号分隔的数据成 员列表&#xf…

嵌入式C/C++、FreeRTOS、STM32F407VGT6和TCP:智能家居安防系统的全流程介绍(代码示例)

1. 项目概述 随着物联网技术的快速发展,智能家居安防系统越来越受到人们的重视。本文介绍了一种基于STM32单片机的嵌入式安防中控系统的设计与实现方案。该系统集成了多种传感器,实现了实时监控、报警和远程控制等功能,为用户提供了一个安全、可靠的家居安防解决方案。 1.1 系…

c++ 高精度加法(只支持正整数)

再给大家带来一篇高精度&#xff0c;不过这次是高精度加法&#xff01;话不多说&#xff0c;开整&#xff01; 声明 与之前那篇文章一样&#xff0c;如果看起来费劲可以结合总代码来看 定义 由于加法进位最多进1位&#xff0c;所以我们的结果ans[]的长度定义为两个加数中最…

【Linux】HTTP 协议

目录 1. URL2. HTTP 协议2.1. HTTP 请求2.2. HTTP 响应 1. URL URL 表示着是统一资源定位符(Uniform Resource Locator), 就是 web 地址&#xff0c;俗称“网址”; 每个有效的 URL 可以通过互联网访问唯一的资源, 是互联网上标准资源的地址; URL 的主要由四个部分组成: sche…

如何查看jvm资源占用情况

如何设置jar的内存 java -XX:MetaspaceSize256M -XX:MaxMetaspaceSize256M -XX:AlwaysPreTouch -XX:ReservedCodeCacheSize128m -XX:InitialCodeCacheSize128m -Xss512k -Xmx2g -Xms2g -XX:UseG1GC -XX:G1HeapRegionSize4M -jar your-application.jar以上配置为堆内存4G jar项…

广州邀请媒体宣传(附媒体名单)

传媒如春雨&#xff0c;润物细无声&#xff0c;大家好&#xff0c;我是51媒体网胡老师。 广州地区 媒体邀约&#xff1a; 记者现场采访&#xff0c;电视台到场报道&#xff0c;展览展会宣传&#xff0c;广交会企业宣传&#xff0c;工厂探班&#xff0c;媒体专访等。 适合广州…

自监督学习在言语障碍及老年语音识别中的应用

近几十年来针对正常言语的自动语音识别&#xff08;ASR&#xff09;技术取得了快速进展&#xff0c;但准确识别言语障碍&#xff08;dysarthric&#xff09;和老年言语仍然是一项极具挑战性的任务。言语障碍是一种由多种运动控制疾病引起的常见言语障碍类型&#xff0c;包括脑瘫…

android studio中svn的使用

第一步&#xff0c;建立一个项目。 第二步&#xff0c;share project。 第三步&#xff0c;选择存放的位置&#xff0c;然后添加提交信息&#xff0c;最后点击share。这样就可以在svn上面看到一个空的项目名称。 第四步&#xff0c;看到文件变成了绿色&#xff0c;点击commit图…

高翔【自动驾驶与机器人中的SLAM技术】学习笔记(三)基变换与坐标变换;微分方程;李群和李代数;雅可比矩阵

一、基变换与坐标变换 字小,事不小。 因为第一反应:坐标咋变,坐标轴就咋变呀。事实却与我们想象的相反。这俩互为逆矩阵。 第一次读没有读明白,后面到事上才明白。 起因是多传感器标定:多传感器,就代表了多个坐标系,多个基底。激光雷达和imu标定。这个标定程序,网上,…

Python机器学习入门:从理论到实践

文章目录 前言一、机器学习是什么&#xff1f;二、机器学习基本流程三、使用Python进行机器学习1.数据读取2.数据规范化3. 数据降维&#xff08;主成分分析&#xff09;4. 机器学习模型的选择5. 线性回归模型的实现6. 可视化结果 总结 前言 机器学习是人工智能的一个重要分支&…

pytorch 笔记:torch.optim.Adam

torch.optim.Adam 是一个实现 Adam 优化算法的类。Adam 是一个常用的梯度下降优化方法&#xff0c;特别适合处理大规模数据集和参数的深度学习模型 torch.optim.Adam(params, lr0.001, betas(0.9, 0.999), eps1e-08, weight_decay0, amsgradFalse, *, foreachNone, maximizeFa…

1小时上手Alibaba Sentinel流控安全组件

微服务的雪崩效应 假如我们开发了一套分布式应用系统&#xff0c;前端应用分别向A/H/I/P四个服务发起调用请求&#xff1a; 但随着时间推移&#xff0c;假如服务 I 因为优化问题&#xff0c;导致需要 20 秒才能返回响应&#xff0c;这就必然会导致20秒内该请求线程会一直处于阻…

【北京迅为】《i.MX8MM嵌入式Linux开发指南》-第三篇 嵌入式Linux驱动开发篇-第三十八章 驱动模块编译进内核

i.MX8MM处理器采用了先进的14LPCFinFET工艺&#xff0c;提供更快的速度和更高的电源效率;四核Cortex-A53&#xff0c;单核Cortex-M4&#xff0c;多达五个内核 &#xff0c;主频高达1.8GHz&#xff0c;2G DDR4内存、8G EMMC存储。千兆工业级以太网、MIPI-DSI、USB HOST、WIFI/BT…

OpenAI从GPT-4V到GPT-4O,再到GPT-4OMini简介

OpenAI从GPT-4V到GPT-4O&#xff0c;再到GPT-4OMini简介 一、引言 在人工智能领域&#xff0c;OpenAI的GPT系列模型一直是自然语言处理的标杆。随着技术的不断进步&#xff0c;OpenAI推出了多个版本的GPT模型&#xff0c;包括视觉增强的GPT-4V&#xff08;GPT-4 with Vision&…