Name: CIS Seminar: “Rater Equivalence: An Interpretable Measure of Classifier Accuracy Against Human Labels”
Start: 2022-10-20T15:30:00-04:00
End: 2022-10-20T16:30:00-04:00
Location: Wu and Chen Auditorium (Room 101), Levine Hall

CIS Seminar: “Rater Equivalence: An Interpretable Measure of Classifier Accuracy Against Human Labels”

October 20, 2022 at 3:30 PM - 4:30 PM

Share this event

Add to Calendar

Details

Date: October 20, 2022

Time: 3:30 PM - 4:30 PM

Event Tags:CIS

Organizer

Computer and Information Science

Phone: 215-898-8560

Email: cherylh@cis.upenn.edu

Website: View Organizer Website

Venue

Wu and Chen Auditorium (Room 101), Levine Hall 3330 Walnut Street
Philadelphia
PA 19104 Google Map

View Venue Website

In many classification tasks, the ground truth is either noisy or subjective. Examples of noisy ground truth include: does this radiology image show a cancerous growth? does this radar data portend an imminent tornado? Examples of subjective ground truth include: which of two alternative paper titles is better? is this comment toxic? what is the political leaning of this news article? We refer to tasks where human labels are the only indication of ground truth available at the time that decisions must be made as survey settings. In these settings, measures of classifier accuracy against human labels, such as precision, recall, and cross-entropy, confound the quality of the classifier with the level of agreement among human raters. Thus, they have no meaningful interpretation on their own. We describe a procedure that, given a dataset with predictions from a classifier and K labels per item, rescales any underlying accuracy measure into one that has an intuitive interpretation. The K raters are divided into a source panel and a target panel. The source panel’s labels for an item are combined to produce a predicted label for another rater. Both the source panel predictions and classifier predictions are scored against the same target panel’s labels. The rater equivalence of any classifier is the minimum number of source raters needed to produce the same expected score as that found for the classifier. We explore the stability of the rater equivalence measure as the target panel size varies and find one underlying measure, determinant mutual information, for which it is invariant.

CIS Seminar: “Rater Equivalence: An Interpretable Measure of Classifier Accuracy Against Human Labels”

October 20, 2022 at 3:30 PM - 4:30 PM

Details

Organizer

Venue

Read More