BaşlayınÜcretsiz Başlayın

Tokenizing text

You want to leverage a pre-trained model from Hugging Face and fine-tune it with data from your company support team to help classify interactions depending on the risk for churn. This will help the team prioritize what to address first, and how to address it, making them more proactive.

Prepare the training and test data for fine-tuning by tokenizing the text.

The data AutoTokenizer and AutoModelForSequenceClassification have been loaded for you.

Bu egzersiz

Introduction to LLMs in Python

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Load the pre-trained model and tokenizer in preparation for fine-tuning.
  • Tokenize both the train_data["interaction"] and test_data["interaction"], enabling padding and sequence truncation.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Load the model and tokenizer
model = ____.____("distilbert-base-uncased")
tokenizer = ____.____("distilbert-base-uncased")

# Tokenize the data
tokenized_training_data = ____(train_data["interaction"], return_tensors="pt", ____, ____, max_length=20)

tokenized_test_data = ____(test_data["interaction"], return_tensors="pt", ____, ____, max_length=20)

print(tokenized_training_data)
Kodu Düzenle ve Çalıştır