Select a prompt-completion format#

“If you can dodge a wrench, you can dodge a ball!”

—Patches

With CAPPr, your job is to conceptually write up your classification task as a formatted string, called the prompt-completion format:

{prompt}{end_of_prompt}{completion}

The text you want to classify should appear in the prompt. end_of_prompt is either a whitespace " " (default) or the empty string "". Each completion should contain a choice/class which the text could belong to.[1]

Each completion should flow naturally after {prompt}{end_of_prompt}. So pay close attention to the use of whitespaces, newlines, and word casing. CAPPr doesn’t do any string processing for you; it just concatenates the three strings and sends it! It’s on you to format the prompt according to the model’s instruction/chat format, assuming that’s applicable and beneficial. Before calling any CAPPr functions, consider printing {prompt}{end_of_prompt}{completion} for each completion in your list of possible completions/choices, and ensure that each passes the eye test.

And yes, you’ll likely need to do a bit of prompt engineering. But if you can write a sentence, you can write a prompt. It’s mostly a matter of trial and error. Here’s an external guide if you’d like to survey research in this field.[3]

Empirically, the impact of the prompt-completion format on accuracy depends on the quality of the language model. For larger, instruction-trained models, the format is not as important. For smaller, less instruction-trained models, it can be critical to get the format right.

Feel free to open an issue if you’re having trouble with writing a prompt. It’s not always an easy task. Models can be surprisingly finicky.

Prompt-completion formats#

Most successful prompt-completion formats adhere to one of these three styles: Concat-Class, Yes-No, Multiple Choice.

Concat-Class#

In this format, the full name of the class is directly used as the completion. In other words, the class name is directly concatenated after {prompt}{end_of_prompt}.

Here’s an example:

prompts = [
    "Stephen Curry is a",
    "Martina Navratilova was a",
    "Dexter, from the TV Series Dexter's Laboratory, is a",
    "LeBron James is a",
]
end_of_prompt = " "

# Each of the prompts could possibly be completed with one of these:
class_names = ("basketball player", "tennis player", "scientist")

# What strings will CAPPr see?
for i, prompt in enumerate(prompts):
    print(f"For prompt {i + 1}")
    print("------------")
    for completion in class_names:
        print(f"{prompt}{end_of_prompt}{completion}")
    print()

# For prompt 1
# ------------
# Stephen Curry is a basketball player
# Stephen Curry is a tennis player
# Stephen Curry is a scientist
#
# For prompt 2
# ------------
# Martina Navratilova was a basketball player
# Martina Navratilova was a tennis player
# Martina Navratilova was a scientist
#
# For prompt 3
# ------------
# Dexter, from the TV Series Dexter's Laboratory, is a basketball player
# Dexter, from the TV Series Dexter's Laboratory, is a tennis player
# Dexter, from the TV Series Dexter's Laboratory, is a scientist
#
# For prompt 4
# ------------
# LeBron James is a basketball player
# LeBron James is a tennis player
# LeBron James is a scientist

For each prompt, CAPPr will pick the completion which makes the sentence make the most sense (according to the model). A reasonable hypothesis is that this prompt-completion format is especially well-suited to less instruction-trained models. See the footnote for a reason for this hypothesis.[2]

It’s often a good idea to also mention the choices/completions in the prompt, or the system prompt for an instruction-trained model. For example:

class_names = ("basketball player", "tennis player", "scientist")
class_names_str = '\n'.join(class_names)
prompt_prefix = f'''Every input belongs to one of these categories:
{class_names_str}'''
print(prompt_prefix)
# Every input belongs to one of these categories:
# basketball player
# tennis player
# scientist

Warning

CAPPr hasn’t been systematically evaluated on completions where some are longer than 30 tokens. Consider this domain uncharted and risky for CAPPr.

Examples#

For examples of this prompt-completion format in action, see any of the Hugging Face demos excluding sciq.ipynb, or any of the Llama CPP demos.

For minimal examples, see the Example section for each of these functions:

cappr.openai.classify.predict()

cappr.huggingface.classify.predict()

cappr.llama_cpp.classify.predict()

cappr.openai.classify.predict_examples()

cappr.huggingface.classify.predict_examples()

cappr.llama_cpp.classify.predict_examples()

For minimal examples with a shared prompt prefix, see the Example section for each of these functions:

cappr.huggingface.classify.cache_model()

cappr.llama_cpp.classify.cache_model()

Yes-No#

Sometimes, your task can be framed as a yes or no question.

Here’s an example of a successful format for instruction-trained models, which was pulled from this demo:

def prompt_yes_or_no(text: str) -> str:
    return (
        "The following sentence was taken from a medical case report: "
        f'"{text}"\n'
        "Does the sentence describe an adverse effect of a pharmaceutical "
        "drug or substance?\n"
        "Answer Yes or No:"
    )

end_of_prompt = " "
completions = ("Yes", "No")

medical_case_report = (
    "We describe the case of a 10-year-old girl with two "
    "epileptic seizures and subcontinuous spike-waves during sleep, who "
    "presented unusual side-effects related to clobazam (CLB) monotherapy."
)

prompt = prompt_yes_or_no(medical_case_report)
for completion in completions:
    print(f"{prompt}{end_of_prompt}{completion}")
    print()

Examples#

For another example of this prompt-completion format in action, see this demo.

Multiple Choice#

Many models have been extensively trained to answer multiple choice questions. One caveat is that the number of choices ideally shouldn’t be more than five, because multiple choice question formats seen during training are usually limited to the letters from school exams: A, B, C, D, E. Based on a few experiments, multiple choice questions are less appropriate for less instruction-trained models.

Also, ensure that the system prompt is explicit about answering with one of the letters. Here’s an example of the system prompt used for the Llama 2 COPA demo:

system_prompt_copa = (
    "Identify the cause or effect of a premise given two choices. Each "
    "choice is identified by a letter, A or B.\n"
    "Respond only with the letter corresponding to the correct cause or "
    "effect."
)

Here’s a little utility function which automatically writes out the letters and choices:

from string import ascii_uppercase as alphabet

def multiple_choice(*choices) -> str:
    if len(choices) > len(alphabet):
        raise ValueError("There are more choices than letters.")
    letters_and_choices = [
        f"{letter}. {choice}" for letter, choice in zip(alphabet, choices)
    ]
    return "\n".join(letters_and_choices)

choices = [
    "Don't Wanna Know",
    "Shit",
    "All Time Low",
    "Welcome to the Internet",
    "Bezos II",
]
print(multiple_choice(*choices))
# A. Don't Wanna Know
# B. Shit
# C. All Time Low
# D. Welcome to the Internet
# E. Bezos II

Examples#

For an example of this prompt-completion format in action, see this demo.

Quirks#

Most models are sensitive to quirky differences between prompts.

For models using SentencePiece tokenization, e.g., Llama and Mistral, only

{prompt} {completion}

formats are possible. In other words, end_of_prompt is always a whitespace, regardless of your input. The placement of whitespaces can also affect performance. Consider adding the whitespace to the end of the prompt string.

Another quirk is that when using a Concat-Class style prompt with a less instruction-trained model, it’s possible to achieve higher accuracy by abandoning the chat/instruction format. See, e.g., the Llama 2 COPA demo.

These notes will be updated as more quirks are discovered.

Wrangle step-by-step completions#

Step-by-step[4] and chain-of-thought[5] prompts are highly effective for more involved tasks. While CAPPr is not immediately well-suited to these prompts, it may be applied to post-process completions:

  1. Get the completion from the step-by-step / chain-of-thought prompt

  2. Pass this completion in a second prompt, and have CAPPr classify the answer. You can probably get away with using a cheap model for this task, as it just takes a bit of semantic parsing.

Here’s an example:

from cappr.openai.api import gpt_chat_complete
from cappr.openai.classify import predict

# Task for a student in school: pick the next prereq to take
class_to_prereqs = {
    "CS-101": "no prerequisites",
    "CS-102": "CS-101",
    "MATH-101": "no prerequisites",
    "MATH-102": "MATH-101",
    "ML-101": "CS-101, MATH-102, STAT-101",
    "STAT-101": "MATH-101",
    "STAT-102": "STAT-101, MATH-102",
}
class_to_prereqs_str = "\n".join(
    f"{class_}: {prereqs}" for class_, prereqs in class_to_prereqs.items()
)

prompt_raw = f"""
Hi Professor. I'm interested in taking ML-101, but I'm struggling to decide
which course I need to take before that. I've already taken CS-101. Which
course should I take next?

Here's a list of courses and their prerequisites which I pulled from the
course catalog.

{class_to_prereqs_str}
"""

prompt_step_by_step = prompt_raw + "\n" + "Let's think step by step."

chat_api_response = gpt_chat_complete(
   prompt_step_by_step,
   model="gpt-4",
   system_msg=(
       "You are a computer scientist mentoring a student. End your response "
       "to the student's question with the final answer, which is the name "
       "of a course."
   ),
   max_tokens=1_000,
   temperature=0,
)

step_by_step_answer = chat_api_response[0]["message"]["content"]

prompt_answer = f'''
Here is an answer about which course a student needs to take:

"""
{step_by_step_answer}
"""

According to this answer, the very next course that the student should
take is'''

answer = predict(
    prompt_answer,
    completions=class_to_prereqs.keys(),
    model="text-babbage-001",
)

print(answer)
# MATH-101

A note on few-shot prompts#

While almost all of the examples in the documentation are “zero-shot” prompts (they don’t include examples of inputs and expected outputs), nothing about CAPPr prevents you from using few-shot prompts / in-context learning. Just make sure you’re not paying too much money, latency, or data for a small benefit. Use cappr.llama_cpp.classify.cache_model() or cappr.huggingface.classify.cache_model() if applicable. And consider that you may not need to label many (or any!) examples for few-shot prompting to work well.[6]

Footnotes#

References#