cappr.huggingface.classify#

Perform prompt-completion classification using a model which can be loaded via

transformers.AutoModelForCausalLM.from_pretrained or
auto_gptq.AutoGPTQForCausalLM.from_quantized.

You probably just want the predict() or predict_examples() functions :-)

In the implementation, attention block keys and values for prompts are automatically cached and shared across completions.

cappr.huggingface.classify.cache(model_and_tokenizer: tuple[ModelForCausalLM, PreTrainedTokenizerBase], prefixes: str | Sequence[str], clear_cache_on_exit: bool = True, logits_all: bool = True)[source]#

In this context, every prompt processed by model_and_tokenizer starts with a fixed prefix. As a result, computations in this context are faster.

Parameters:

model_and_tokenizer (tuple[ModelForCausalLM, PreTrainedTokenizerBase]) – a model and its tokenizer
prefixes (str | Sequence[str]) – prefix(es) for all strings that will be processed in this context, e.g., a string containing shared prompt instructions, or a string containing instructions and exemplars for few-shot prompting. prefixes and future strings are assumed to be separated by a whitespace.
clear_cache_on_exit (bool, optional) – whether or not to clear the cache and render the returned model and tokenizer unusable when we exit the context. This is important because it saves memory, and makes code more explicit about the model’s state. By default, True
logits_all (bool, optional) – whether or not to have the cached model include logits for all tokens (including the past). By default, past token logits are included

Example

Usage with predict_proba():

import numpy as np
from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import cache, predict_proba

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model_and_tokenizer = (model, tokenizer)

# Create data
prompt_prefix = '''Instructions: complete the sequence.
Here are examples:
A, B, C => D
1, 2, 3 => 4

Complete this sequence:'''

prompts = ["a, b, c =>", "X, Y =>"]
completions = ["d", "Z", "Hi"]

# Compute
with cache(
    model_and_tokenizer, prompt_prefix
) as cached_model_and_tokenizer:
    # prompt_prefix and each prompt are separated by a whitespace
    pred_probs = predict_proba(
        prompts, completions, cached_model_and_tokenizer
    )

# The above computation is equivalent to this one:
prompts_full = [prompt_prefix + " " + prompt for prompt in prompts]
pred_probs_wo_cache = predict_proba(
    prompts_full, completions, model_and_tokenizer
)
assert np.allclose(pred_probs, pred_probs_wo_cache, atol=1e-5)

print(pred_probs.round(1))
# [[1. 0. 0.]
#  [0. 1. 0.]]

Here’s a more complicated example, which might help in explaining usage:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import cache
from cappr.huggingface._utils import (
    does_tokenizer_need_prepended_space,
    logits_texts,
)

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model_and_tokenizer = (model, tokenizer)

# Assume that all strings will be separated by a whitespace
delim = " "
if not does_tokenizer_need_prepended_space(tokenizer):
    # for SentencePiece tokenizers like Llama's
    delim = ""

logits = lambda *args, **kwargs: logits_texts(*args, **kwargs)[0]
'''
Returns next-token logits for each token in an inputted text.
'''

with cache(model_and_tokenizer, "a") as cached_a:
    with cache(cached_a, delim + "b c") as cached_a_b_c:
        with cache(cached_a_b_c, delim + "d") as cached_a_b_c_d:
            logits1 = logits([delim + "e f"], cached_a_b_c_d)
            logits2 = logits([delim + "x"], cached_a_b_c_d)
        logits3 = logits([delim + "1 2 3"], cached_a_b_c)
    logits4 = logits([delim + "b c d"], cached_a)

logits_correct = lambda texts, **kwargs: logits(
    texts, model_and_tokenizer, drop_bos_token=False
)

atol = 1e-4
assert torch.allclose(logits1, logits_correct(["a b c d e f"]), atol=atol)
assert torch.allclose(logits2, logits_correct(["a b c d x"]), atol=atol)
assert torch.allclose(logits3, logits_correct(["a b c 1 2 3"]), atol=atol)
assert torch.allclose(logits4, logits_correct(["a b c d"]), atol=atol)

cappr.huggingface.classify.cache_model(model_and_tokenizer: tuple[ModelForCausalLM, PreTrainedTokenizerBase], prefixes: str | Sequence[str], logits_all: bool = True) → tuple[ModelForCausalLM, PreTrainedTokenizerBase][source]#

Caches the model so that every future computation with it starts with prefixes. As a result, computations with this model are faster.

Use this function instead of the context manager cache() to keep the cache for future computations, including those outside of a context.

Parameters:

model_and_tokenizer (tuple[ModelForCausalLM, PreTrainedTokenizerBase]) – a model and its tokenizer
prefixes (str | Sequence[str]) – prefix(es) for all future strings that will be processed, e.g., a string containing shared prompt instructions, or a string containing instructions and exemplars for few-shot prompting. prefixes and future strings are assumed to be separated by a whitespace.
logits_all (bool, optional) – whether or not to have the cached model include logits for all tokens (including the past). By default, past token logits are included

Returns:

cached model and the (unmodified) tokenizer

Return type:

tuple[ModelForCausalLM, PreTrainedTokenizerBase]

Example

Usage with predict_proba():

import numpy as np
from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import cache_model, predict_proba

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model_and_tokenizer = (model, tokenizer)

# Create data
prompt_prefix = '''Instructions: complete the sequence.
Here are examples:
A, B, C => D
1, 2, 3 => 4

Complete this sequence:'''

prompts = ["a, b, c =>", "X, Y =>"]
completions = ["d", "Z", "Hi"]

# Cache
cached_model_and_tokenizer = cache_model(
    model_and_tokenizer, prompt_prefix
)

# Compute
pred_probs = predict_proba(
    prompts, completions, cached_model_and_tokenizer
)

# The above computation is equivalent to this one:
prompts_full = [prompt_prefix + " " + prompt for prompt in prompts]
pred_probs_wo_cache = predict_proba(
    prompts_full, completions, model_and_tokenizer
)
assert np.allclose(pred_probs, pred_probs_wo_cache, atol=1e-5)

print(pred_probs.round(1))
# [[1. 0. 0.]
#  [0. 1. 0.]]

cappr.huggingface.classify.log_probs_conditional(prompts: str | Sequence[str], completions: Sequence[str], model_and_tokenizer: tuple[ModelForCausalLM, PreTrainedTokenizerBase], end_of_prompt: Literal[' ', ''] = ' ', show_progress_bar: bool | None = None, batch_size: int = 2, batch_size_completions: int | None = None, **kwargs) → list[list[float]] | list[list[list[float]]][source]#

Log-probabilities of each completion token conditional on each prompt and previous completion tokens.

Parameters:

prompts (str | Sequence[str]) – string(s), where, e.g., each contains the text you want to classify
completions (Sequence[str]) – strings, where, e.g., each one is the name of a class which could come after a prompt
model_and_tokenizer (tuple[ModelForCausalLM, PreTrainedTokenizerBase]) – a model and its tokenizer
end_of_prompt (Literal[' ', ''], optional) – whitespace or empty string to join prompt and completion, by default whitespace
show_progress_bar (bool | None, optional) – whether or not to show a progress bar. By default, it will be shown only if there are at least 5 prompts
batch_size (int, optional) – the maximum number of prompts that the model will process in parallel, by default 2
batch_size_completions (int, optional) – the maximum number of completions that the model will process in parallel. By default, all completions are processed in parallel

Returns:

log_probs_completions – If prompts is a string, then a 2-D list is returned: log_probs_completions[completion_idx][completion_token_idx] is the log-probability of the completion token in completions[completion_idx], conditional on prompt + end_of_prompt and previous completion tokens.

If prompts is a sequence of strings, then a 3-D list is returned: log_probs_completions[prompt_idx][completion_idx][completion_token_idx] is the log-probability of the completion token in completions[completion_idx], conditional on prompts[prompt_idx] + end_of_prompt and previous completion tokens.

Return type:

list[list[float]] | list[list[list[float]]]

Note

To efficiently aggregate log_probs_completions, use cappr.utils.classify.agg_log_probs().

Example

Here we’ll use single characters (which are single tokens) to more clearly demonstrate what this function does:

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import log_probs_conditional

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Create data
prompts = ["x y", "a b c"]
completions = ["z", "d e"]

# Compute
log_probs_completions = log_probs_conditional(
    prompts, completions, model_and_tokenizer=(model, tokenizer)
)

# Outputs (rounded) next to their symbolic representation

print(log_probs_completions[0])
# [[-4.5],        [[log Pr(z | x, y)],
#  [-5.6, -3.2]]   [log Pr(d | x, y),    log Pr(e | x, y, d)]]

print(log_probs_completions[1])
# [[-9.7],        [[log Pr(z | a, b, c)],
#  [-0.2, -0.03]]  [log Pr(d | a, b, c), log Pr(e | a, b, c, d)]]

cappr.huggingface.classify.log_probs_conditional_examples(examples: Example | Sequence[Example], model_and_tokenizer: tuple[ModelForCausalLM, PreTrainedTokenizerBase], show_progress_bar: bool | None = None, batch_size: int = 2, batch_size_completions: int | None = None) → list[list[float]] | list[list[list[float]]][source]#

Log-probabilities of each completion token conditional on each prompt and previous completion tokens.

Parameters:

examples (Example | Sequence[Example]) – Example object(s), where each contains a prompt and its set of possible completions
model_and_tokenizer (tuple[ModelForCausalLM, PreTrainedTokenizerBase]) – a model and its tokenizer
show_progress_bar (bool | None, optional) – whether or not to show a progress bar. By default, it will be shown only if there are at least 5 examples
batch_size (int, optional) – the maximum number of examples that the model will process in parallel, by default 2
batch_size_completions (int, optional) – the maximum number of completions that the model will process in parallel. By default, all completions are processed in parallel

Returns:

log_probs_completions – If examples is a cappr.Example, then a 2-D list is returned: log_probs_completions[completion_idx][completion_token_idx] is the log-probability of the completion token in example.completions[completion_idx], conditional on example.prompt + example.end_of_prompt and previous completion tokens.

If examples is a sequence of cappr.Example objects, then a 3-D list is returned: log_probs_completions[example_idx][completion_idx][completion_token_idx] is the log-probability of the completion token in examples[example_idx].completions[completion_idx], conditional on examples[example_idx].prompt + examples[example_idx].end_of_prompt and previous completion tokens.

Return type:

list[list[float]] | list[list[list[float]]]

Note

To aggregate log_probs_completions, use cappr.utils.classify.agg_log_probs().

Note

The attribute cappr.Example.prior is unused.

Example

Here we’ll use single characters (which are single tokens) to more clearly demonstrate what this function does:

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr import Example
from cappr.huggingface.classify import log_probs_conditional_examples

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Create examples
examples = [
    Example(prompt="x y", completions=("z", "d e")),
    Example(prompt="a b c", completions=("1 2",), normalize=False),
]

# Compute
log_probs_completions = log_probs_conditional_examples(
    examples, model_and_tokenizer=(model, tokenizer)
)

# Outputs (rounded) next to their symbolic representation

print(log_probs_completions[0])  # corresponds to examples[0]
# [[-4.5],        [[log Pr(z | x, y)],
#  [-5.6, -3.2]]   [log Pr(d | x, y),    log Pr(e | x, y, d)]]

print(log_probs_completions[1])  # corresponds to examples[1]
# [[-5.0, -1.7]]  [[log Pr(1 | a, b, c)], log Pr(2 | a, b, c, 1)]]

cappr.huggingface.classify.predict(prompts: str | Sequence[str], completions: Sequence[str], model_and_tokenizer: tuple[ModelForCausalLM, PreTrainedTokenizerBase], prior: Sequence[float] | None = None, end_of_prompt: Literal[' ', ''] = ' ', discount_completions: float = 0.0, log_marg_probs_completions: Sequence[Sequence[float]] | None = None, show_progress_bar: bool | None = None, batch_size: int = 2, batch_size_completions: int | None = None) → str | list[str][source]#

Predict which completion is most likely to follow each prompt.

Parameters:

prompts (str | Sequence[str]) – string(s), where, e.g., each contains the text you want to classify
completions (Sequence[str]) – strings, where, e.g., each one is the name of a class which could come after a prompt
model_and_tokenizer (tuple[ModelForCausalLM, PreTrainedTokenizerBase]) – a model and its tokenizer
prior (Sequence[float] | None, optional) – a probability distribution over completions, representing a belief about their likelihoods regardless of the prompt. By default, each completion in completions is assumed to be equally likely
end_of_prompt (Literal[' ', ''], optional) – whitespace or empty string to join prompt and completion, by default whitespace
discount_completions (float, optional) – experimental feature: set it to >0.0 (e.g., 1.0 may work well) if a completion is consistently getting over-predicted. You could instead fudge the prior, but this hyperparameter may be easier to tune than the prior. By default 0.0
log_marg_probs_completions (Sequence[Sequence[float]] | None, optional) – experimental feature: pre-computed log probabilities of completion tokens conditional on previous completion tokens (not prompt tokens). Only used if not discount_completions. Pre-compute them by passing completions, model, and end_of_prompt to token_logprobs(). By default, if not discount_completions, they are (re-)computed
show_progress_bar (bool | None, optional) – whether or not to show a progress bar. By default, it will be shown only if there are at least 5 prompts
batch_size (int, optional) – the maximum number of prompts that the model will process in parallel, by default 2
batch_size_completions (int, optional) – the maximum number of completions that the model will process in parallel. By default, all completions are processed in parallel

Returns:

preds – If prompts is a string, then the completion from completions which is predicted to most likely follow prompt + end_of_prompt is returned.

If prompts is a sequence of strings, then a list with length len(prompts) is returned. preds[prompt_idx] is the completion in completions which is predicted to follow prompts[prompt_idx] + end_of_prompt.

Return type:

str | list[str]

Note

In this function, the set of possible completions which could follow each prompt is the same for every prompt. If instead, each prompt could be followed by a different set of completions, then construct a sequence of cappr.Example objects and pass them to predict_examples().

Example

Let’s have GPT-2 (small) predict where stuff is in the kitchen:

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import predict

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Define a classification task
prompts = ["The tacos are cooking", "Ice cream is"]
class_names = ("on the stove", "in the freezer", "in the fridge")
prior = (1 / 5, 2 / 5, 2 / 5)

preds = predict(
    prompts,
    completions=class_names,
    model_and_tokenizer=(model, tokenizer),
    prior=prior,
)
print(preds)
# ['on the stove',
#  'in the freezer']

cappr.huggingface.classify.predict_examples(examples: Example | Sequence[Example], model_and_tokenizer: tuple[ModelForCausalLM, PreTrainedTokenizerBase], show_progress_bar: bool | None = None, batch_size: int = 2, batch_size_completions: int | None = None) → str | list[str][source]#

Predict which completion is most likely to follow each prompt.

Parameters:

examples (Example | Sequence[Example]) – Example object(s), where each contains a prompt and its set of possible completions
model_and_tokenizer (tuple[ModelForCausalLM, PreTrainedTokenizerBase]) – a model and its tokenizer
show_progress_bar (bool | None, optional) – whether or not to show a progress bar. By default, it will be shown only if there are at least 5 examples
batch_size (int, optional) – the maximum number of examples that the model will process in parallel, by default 2
batch_size_completions (int, optional) – the maximum number of completions that the model will process in parallel. By default, all completions are processed in parallel

Returns:

preds – If examples is an cappr.Example, then the completion from example.completions which is predicted to most likely follow example.prompt + example.end_of_prompt is returned.

If examples is a sequence of cappr.Example objects, then a list with length len(examples) is returned: preds[example_idx] is the completion in examples[example_idx].completions which is predicted to most likely follow examples[example_idx].prompt + examples[example_idx].end_of_prompt.

Return type:

str | list[str]

Example

GPT-2 (small) doing media trivia:

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr import Example
from cappr.huggingface.classify import predict_examples

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Create examples
examples = [
    Example(
        prompt="Jodie Foster played",
        completions=("Clarice Starling", "Trinity in The Matrix"),
    ),
    Example(
        prompt="Batman, from Batman: The Animated Series, was played by",
        completions=("Pete Holmes", "Kevin Conroy", "Spongebob!"),
        prior=(1 / 3, 2 / 3, 0),
    ),
]

preds = predict_examples(
    examples, model_and_tokenizer=(model, tokenizer)
)
print(preds)
# ['Clarice Starling',
#  'Kevin Conroy']

cappr.huggingface.classify.predict_proba(prompts: str | Sequence[str], completions: Sequence[str], model_and_tokenizer: tuple[ModelForCausalLM, PreTrainedTokenizerBase], prior: Sequence[float] | None = None, end_of_prompt: Literal[' ', ''] = ' ', normalize: bool = True, discount_completions: float = 0.0, log_marg_probs_completions: Sequence[Sequence[float]] | None = None, show_progress_bar: bool | None = None, batch_size: int = 2, batch_size_completions: int | None = None) → npt.NDArray[np.floating][source]#

Predict probabilities of each completion coming after each prompt.

Parameters:

prompts (str | Sequence[str]) – string(s), where, e.g., each contains the text you want to classify
completions (Sequence[str]) – strings, where, e.g., each one is the name of a class which could come after a prompt
model_and_tokenizer (tuple[ModelForCausalLM, PreTrainedTokenizerBase]) – a model and its tokenizer
prior (Sequence[float] | None, optional) – a probability distribution over completions, representing a belief about their likelihoods regardless of the prompt. By default, each completion in completions is assumed to be equally likely
end_of_prompt (Literal[' ', ''], optional) – whitespace or empty string to join prompt and completion, by default whitespace
normalize (bool, optional) – whether or not to normalize completion-after-prompt probabilities into a probability distribution over completions. Set this to False if you’d like the raw completion-after-prompt probability, or you’re solving a multi-label prediction problem. By default, True
discount_completions (float, optional) – experimental feature: set it (e.g., 1.0 may work well) if a completion is consistently getting too high predicted probabilities. You could instead fudge the prior, but this hyperparameter may be easier to tune than the prior. By default 0.0
log_marg_probs_completions (Sequence[Sequence[float]] | None, optional) – experimental feature: pre-computed log probabilities of completion tokens conditional on previous completion tokens (not prompt tokens). Only used if not discount_completions. Pre-compute them by passing completions, model, and end_of_prompt to token_logprobs(). By default, if not discount_completions, they are (re-)computed
show_progress_bar (bool | None, optional) – whether or not to show a progress bar. By default, it will be shown only if there are at least 5 prompts
batch_size (int, optional) – the maximum number of prompts that the model will process in parallel, by default 2
batch_size_completions (int, optional) – the maximum number of completions that the model will process in parallel. By default, all completions are processed in parallel

Returns:

pred_probs – If prompts is a string, then an array with shape len(completions), is returned: pred_probs[completion_idx] is the model’s estimate of the probability that completions[completion_idx] comes after prompt + end_of_prompt.

If prompts is a sequence of strings, then an array with shape (len(prompts), len(completions)) is returned: pred_probs[prompt_idx, completion_idx] is the model’s estimate of the probability that completions[completion_idx] comes after prompts[prompt_idx] + end_of_prompt.

Return type:

npt.NDArray[np.floating]

Note

Example

Let’s have GPT-2 (small) predict where stuff is in the kitchen. This example also conveys that it’s not the greatest model out there:

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import predict_proba

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Define a classification task
prompts = ["The tacos are cooking", "Ice cream is"]
class_names = ("on the stove", "in the freezer", "in the fridge")
prior = (1 / 5, 2 / 5, 2 / 5)

pred_probs = predict_proba(
    prompts,
    completions=class_names,
    model_and_tokenizer=(model, tokenizer),
    prior=prior,
)
pred_probs_rounded = pred_probs.round(1)  # just for cleaner output

# predicted probability that tacos cook on the stove
print(pred_probs_rounded[0, 0])
# 0.4

# predicted probability that ice cream is in the freezer
print(pred_probs_rounded[1, 1])
# 0.5

# predicted probability that ice cream is in the fridge
print(pred_probs_rounded[1, 2])
# 0.4

cappr.huggingface.classify.predict_proba_examples(examples: Example | Sequence[Example], model_and_tokenizer: tuple[ModelForCausalLM, PreTrainedTokenizerBase], show_progress_bar: bool | None = None, batch_size: int = 2, batch_size_completions: int | None = None) → npt.NDArray[np.floating] | list[npt.NDArray[np.floating]][source]#

Predict probabilities of each completion coming after each prompt.

Parameters:

examples (Example | Sequence[Example]) – Example object(s), where each contains a prompt and its set of possible completions
model_and_tokenizer (tuple[ModelForCausalLM, PreTrainedTokenizerBase]) – a model and its tokenizer
show_progress_bar (bool | None, optional) – whether or not to show a progress bar. By default, it will be shown only if there are at least 5 examples
batch_size (int, optional) – the maximum number of examples that the model will process in parallel, by default 2
batch_size_completions (int, optional) – the maximum number of completions that the model will process in parallel. By default, all completions are processed in parallel

Returns:

pred_probs – If examples is an cappr.Example, then an array with shape (len(example.completions),) is returned: pred_probs[completion_idx] is the model’s estimate of the probability that example.completions[completion_idx] comes after example.prompt + example.end_of_prompt.

If examples is a sequence of cappr.Example objects, then a list with length len(examples) is returned: pred_probs[example_idx][completion_idx] is the model’s estimate of the probability that examples[example_idx].completions[completion_idx] comes after examples[example_idx].prompt + examples[example_idx].end_of_prompt. If the number of completions per example is a constant k, then an array with shape (len(examples), k) is returned instead of a list of 1-D arrays.

Return type:

npt.NDArray[np.floating] | list[npt.NDArray[np.floating]]

Example

GPT-2 (small) doing media trivia:

from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr import Example
from cappr.huggingface.classify import predict_proba_examples

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Create examples
examples = [
    Example(
        prompt="Jodie Foster played",
        completions=("Clarice Starling", "Trinity in The Matrix"),
    ),
    Example(
        prompt="Batman, from Batman: The Animated Series, was played by",
        completions=("Pete Holmes", "Kevin Conroy", "Spongebob!"),
        prior=(1 / 3, 2 / 3, 0),
    ),
]

pred_probs = predict_proba_examples(
    examples, model_and_tokenizer=(model, tokenizer)
)

# predicted probability that Jodie Foster played Clarice Starling, not Trinity
print(pred_probs[0][0].round(2))
# 0.7

# predicted probability that Batman was played by Kevin Conroy
print(pred_probs[1][1].round(2))
# 0.97

cappr.huggingface.classify.token_logprobs(texts: str | Sequence[str], model_and_tokenizer: tuple[ModelForCausalLM, PreTrainedTokenizerBase], end_of_prompt: Literal[' ', ''] = ' ', show_progress_bar: bool | None = None, add_bos: bool = False, batch_size: int = 16, **kwargs) → list[float] | list[list[float]][source]#

For each text, compute each token’s log-probability conditional on all previous tokens in the text.

Parameters:

texts (str | Sequence[str]) – input text(s)
model_and_tokenizer (tuple[ModelForCausalLM, PreTrainedTokenizerBase]) – a model and its tokenizer
end_of_prompt (Literal[' ', ''], optional) – This string gets added to the beginning of each text. It’s important to set this if you’re using the discount feature. Otherwise, set it to “”. By default ” “
show_progress_bar (bool | None, optional) – whether or not to show a progress bar. By default, it will be shown only if there are at least 5 texts
add_bos (bool, optional) – whether or not to add a beginning-of-sentence token to each text in texts if the tokenizer has a beginning-of-sentence token, by default False
batch_size (int, optional) – the maximum number of texts that the model will process in parallel, by default 16

Returns:

log_probs – If texts is a string, then a 1-D list is returned: log_probs[token_idx] is the log-probability of the token at token_idx of texts conditional on all previous tokens in texts.

If texts is a sequence of strings, then a 2-D list is returned: log_probs[text_idx][token_idx] is the log-probability of the token at token_idx of texts[text_idx] conditional on all previous tokens in texts[text_idx].

Return type:

list[float] | list[list[float]]

Warning

Set end_of_prompt=””, add_bos=True unless you’re using the discount feature.

Note

For each text, the first token’s log-probability is always None because no autoregressive LM directly estimates the marginal probability of a token.

Raises:

TypeError – if texts is not a sequence
ValueError – if texts is empty