Related work#
The idea of aggregating token log-probabilities is well known. You’ll find it as a
subroutine in papers from GPT-2[1] to
Self-Consistency[2] to InPars[3] to hallucination detection[4] to SimPO[5]. The cappr implementation includes a few
computational and statistical optimizations, while maintaining a simple interface.
Here are some papers which focus on the idea of aggregating token log-probabilities.
This paper[6] presents a transposed version of CAPPr. Its method was used in CAPPr’s demo for the Winograd Schema Challenge.
PET with multiple masks[7] also aggregates token log-probabilities to do prompt-completion classification. But these log-probabilities are assumed to come from masked language models like BERT.