BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Penn Engineering Events - ECPv6.15.18//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Penn Engineering Events
X-ORIGINAL-URL:https://seasevents.nmsdev7.com
X-WR-CALDESC:Events for Penn Engineering Events
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20210314T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20211107T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20220313T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20221106T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20230312T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20231105T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20221020T153000
DTEND;TZID=America/New_York:20221020T163000
DTSTAMP:20260405T142857
CREATED:20221005T153839Z
LAST-MODIFIED:20221005T153839Z
UID:7612-1666279800-1666283400@seasevents.nmsdev7.com
SUMMARY:CIS Seminar: "Rater Equivalence: An Interpretable Measure of Classifier Accuracy Against Human Labels"
DESCRIPTION:In many classification tasks\, the ground truth is either noisy or subjective. Examples of noisy ground truth include: does this radiology image show a cancerous growth? does this radar data portend an imminent tornado? Examples of subjective ground truth include: which of two alternative paper titles is better? is this comment toxic? what is the political leaning of this news article? We refer to tasks where human labels are the only indication of ground truth available at the time that decisions must be made as survey settings. In these settings\, measures of classifier accuracy against human labels\, such as precision\, recall\, and cross-entropy\, confound the quality of the classifier with the level of agreement among human raters. Thus\, they have no meaningful interpretation on their own. We describe a procedure that\, given a dataset with predictions from a classifier and K labels per item\, rescales any underlying accuracy measure into one that has an intuitive interpretation. The K raters are divided into a source panel and a target panel. The source panel’s labels for an item are combined to produce a predicted label for another rater. Both the source panel predictions and classifier predictions are scored against the same target panel’s labels. The rater equivalence of any classifier is the minimum number of source raters needed to produce the same expected score as that found for the classifier. We explore the stability of the rater equivalence measure as the target panel size varies and find one underlying measure\, determinant mutual information\, for which it is invariant.
URL:https://seasevents.nmsdev7.com/event/cis-seminar-rater-equivalence-an-interpretable-measure-of-classifier-accuracy-against-human-labels/
LOCATION:Wu and Chen Auditorium (Room 101)\, Levine Hall\, 3330 Walnut Street\, Philadelphia\, PA\, 19104\, United States
ORGANIZER;CN="Computer and Information Science":MAILTO:cherylh@cis.upenn.edu
END:VEVENT
END:VCALENDAR