BaşlayınÜcretsiz Başlayın

Unreliable data source identification

Your team is developing a model for assisting in generating accurate reporting in the automotive safety industry. You have gathered preference data from three data sources - a "GlobalDrive Safety Institute," an "AutoTech Safety Alliance," and "QuickScan Auto Review". Recently, concerns have arisen about the integrity of the data, and you have been asked to assess it for any unreliable data sources.

automotive_df is a combined DataFrame loaded using the pre-imported pandas library. It contains data from the three sources. The pre-imported majority_vote function creates a dictionary-like object with the majority (chosen, rejected) pair per 'id'.

Bu egzersiz

Reinforcement Learning from Human Feedback (RLHF)

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Define the condition for counting one disagreement with the majority vote for a given data source.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

def detect_unreliable_source(merged_df):
    df_majority = df.groupby('id').apply(majority_vote)
    disagreements = {source: 0 for source in df['source'].unique()}
    for _, row in df.iterrows():
        # Condition to find a disagreement with majority vote
        ____
    unreliable_source = max(disagreements, key=disagreements.get)
    return unreliable_source

disagreement = detect_unreliable_source(automotive_df)
print("Unreliable Source:", disagreement)
Kodu Düzenle ve Çalıştır