YouTubeの文字起こしをアップサートする

次の演習では、動画の文字起こしと追加のメタデータを 'pinecone-datacamp' インデックスに取り込み、YouTube動画に関する質問に答えられるチャットボットを作成します。

まず、youtube_rag_data.csv ファイルのデータを準備し、すべてのメタデータ付きでベクトルを 'pinecone-datacamp' インデックスにアップサートします。データは DataFrame youtube_df に用意されています。

以下は、youtube_df DataFrame に含まれるサンプルの文字起こしです：

id: 
35Pdoyi6ZoQ-t0.0

title:
Training and Testing an Italian BERT - Transformers From Scratch #4

text: 
Hi, welcome to the video. So this is the fourth video in a Transformers from Scratch 
mini series. So if you haven't been following along, we've essentially covered what 
you can see on the screen. So we got some data. We built a tokenizer with it...

url: 
https://youtu.be/35Pdoyi6ZoQ

published: 
01-01-2024

OpenAIのAPIキーでPineconeクライアントを初期化します（OpenAIクライアントは client として利用できます）。
各rowから 'id'、'text'、'title'、'url'、'published' のメタデータを抽出します。
OpenAIの 'text-embedding-3-small' を使ってtextsをエンコードします。
ベクトルとメタデータを、'youtube_rag_dataset' という名前の名前空間にアップサートします。

แบบฝึกหัด

YouTubeの文字起こしをアップサートする

คำแนะนำ

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}แบบฝึกหัด

คำแนะนำ

แบบฝึกหัด