ComenzarEmpieza gratis

Unreliable data source identification

Your team is developing a model for assisting in generating accurate reporting in the automotive safety industry. You have gathered preference data from three data sources - a "GlobalDrive Safety Institute," an "AutoTech Safety Alliance," and "QuickScan Auto Review". Recently, concerns have arisen about the integrity of the data, and you have been asked to assess it for any unreliable data sources.

automotive_df is a combined DataFrame loaded using the pre-imported pandas library. It contains data from the three sources. The pre-imported majority_vote function creates a dictionary-like object with the majority (chosen, rejected) pair per 'id'.

Este ejercicio forma parte del curso

Reinforcement Learning from Human Feedback (RLHF)

Ver curso

Instrucciones del ejercicio

  • Define the condition for counting one disagreement with the majority vote for a given data source.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

def detect_unreliable_source(merged_df):
    df_majority = df.groupby('id').apply(majority_vote)
    disagreements = {source: 0 for source in df['source'].unique()}
    for _, row in df.iterrows():
        # Condition to find a disagreement with majority vote
        ____
    unreliable_source = max(disagreements, key=disagreements.get)
    return unreliable_source

disagreement = detect_unreliable_source(automotive_df)
print("Unreliable Source:", disagreement)
Editar y ejecutar código