Upsert YouTube 转录文本

在接下来的练习中，您将创建一个聊天机器人。它能通过摄取视频的转录文本和附加元数据，来回答关于 YouTube 视频的问题，并把这些数据写入您的 'pinecone-datacamp' 索引。

首先，您将从 youtube_rag_data.csv 文件准备数据，并将包含全部元数据的向量 upsert 到 'pinecone-datacamp' 索引中。数据已提供在 DataFrame youtube_df 中。

下面是 youtube_df DataFrame 中的一个示例转录：

id: 
35Pdoyi6ZoQ-t0.0

title:
Training and Testing an Italian BERT - Transformers From Scratch #4

text: 
Hi, welcome to the video. So this is the fourth video in a Transformers from Scratch 
mini series. So if you haven't been following along, we've essentially covered what 
you can see on the screen. So we got some data. We built a tokenizer with it...

url: 
https://youtu.be/35Pdoyi6ZoQ

published: 
01-01-2024

使用您的 API 密钥初始化 Pinecone 客户端（OpenAI 客户端已作为 client 提供）。
从每个 row 中提取 'id'、'text'、'title'、'url' 和 'published' 元数据。
使用 OpenAI 的 'text-embedding-3-small' 对 texts 进行编码。
将向量及其元数据 upsert 到名为 'youtube_rag_dataset' 的命名空间中。

道练习

Upsert YouTube 转录文本

说明

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}道练习

说明

道练习