YouTube 대본 업서트하기

다음 연습 문제에서는 비디오 대본과 추가 메타데이터를 'pinecone-datacamp' 인덱스에 적재해, YouTube 동영상에 대한 질문에 답할 수 있는 챗봇을 만들어 볼 거예요.

먼저, youtube_rag_data.csv 파일의 데이터를 준비하고, 모든 메타데이터와 함께 벡터를 'pinecone-datacamp' 인덱스에 업서트하세요. 데이터는 DataFrame youtube_df로 제공됩니다.

다음은 youtube_df DataFrame에 포함된 예시 대본입니다:

id: 
35Pdoyi6ZoQ-t0.0

title:
Training and Testing an Italian BERT - Transformers From Scratch #4

text: 
Hi, welcome to the video. So this is the fourth video in a Transformers from Scratch 
mini series. So if you haven't been following along, we've essentially covered what 
you can see on the screen. So we got some data. We built a tokenizer with it...

url: 
https://youtu.be/35Pdoyi6ZoQ

published: 
01-01-2024

Pinecone 클라이언트를 API 키로 초기화하세요(OpenAI 클라이언트는 client로 제공됩니다).
각 row에서 'id', 'text', 'title', 'url', 'published' 메타데이터를 추출하세요.
OpenAI의 'text-embedding-3-small'로 texts를 인코딩하세요.
벡터와 메타데이터를 'youtube_rag_dataset'라는 네임스페이스에 업서트하세요.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}연습 문제

지침

연습 문제