Unreliable data source identification
Your team is developing a model for assisting in generating accurate reporting in the automotive safety industry. You have gathered preference data from three data sources - a "GlobalDrive Safety Institute," an "AutoTech Safety Alliance," and "QuickScan Auto Review". Recently, concerns have arisen about the integrity of the data, and you have been asked to assess it for any unreliable data sources.
automotive_df
is a combined DataFrame
loaded using the pre-imported pandas
library. It contains data from the three sources. The pre-imported majority_vote
function creates a dictionary-like object with the majority (chosen, rejected) pair per 'id'
.
Este exercício faz parte do curso
Reinforcement Learning from Human Feedback (RLHF)
Instruções do exercício
- Define the condition for counting one disagreement with the majority vote for a given data source.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
def detect_unreliable_source(merged_df):
df_majority = df.groupby('id').apply(majority_vote)
disagreements = {source: 0 for source in df['source'].unique()}
for _, row in df.iterrows():
# Condition to find a disagreement with majority vote
____
unreliable_source = max(disagreements, key=disagreements.get)
return unreliable_source
disagreement = detect_unreliable_source(automotive_df)
print("Unreliable Source:", disagreement)