CommencerCommencer gratuitement

Unreliable data source identification

Your team is developing a model for assisting in generating accurate reporting in the automotive safety industry. You have gathered preference data from three data sources - a "GlobalDrive Safety Institute," an "AutoTech Safety Alliance," and "QuickScan Auto Review". Recently, concerns have arisen about the integrity of the data, and you have been asked to assess it for any unreliable data sources.

automotive_df is a combined DataFrame loaded using the pre-imported pandas library. It contains data from the three sources. The pre-imported majority_vote function creates a dictionary-like object with the majority (chosen, rejected) pair per 'id'.

Cet exercice fait partie du cours

Reinforcement Learning from Human Feedback (RLHF)

Afficher le cours

Instructions

  • Define the condition for counting one disagreement with the majority vote for a given data source.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

def detect_unreliable_source(merged_df):
    df_majority = df.groupby('id').apply(majority_vote)
    disagreements = {source: 0 for source in df['source'].unique()}
    for _, row in df.iterrows():
        # Condition to find a disagreement with majority vote
        ____
    unreliable_source = max(disagreements, key=disagreements.get)
    return unreliable_source

disagreement = detect_unreliable_source(automotive_df)
print("Unreliable Source:", disagreement)
Modifier et exécuter le code