Select a prompt-completion format ================================= *"If you can dodge a wrench, you can dodge a ball!"* -- Patches With CAPPr, your job is to conceptually write up your classification task as a formatted string, called the prompt-completion format: .. code:: python {prompt}{end_of_prompt}{completion} The text you want to classify should appear in the ``prompt``. ``end_of_prompt`` is either a whitespace ``" "`` (default) or the empty string ``""``. Each ``completion`` should contain a choice/class which the text could belong to.\ [#]_ Each ``completion`` should flow naturally after ``{prompt}{end_of_prompt}``. So pay close attention to the use of whitespaces, newlines, and word casing. CAPPr doesn't do any string processing for you; it just concatenates the three strings and sends it! **It's on you to format the prompt according to the model's instruction/chat format**, assuming that's applicable and beneficial. Before calling any CAPPr functions, consider printing ``{prompt}{end_of_prompt}{completion}`` for each completion in your list of possible completions/choices, and ensure that each passes the eye test. And yes, you'll likely need to do a bit of prompt engineering. But if you can write a sentence, you can write a prompt. It's mostly a matter of trial and error. Here's an `external guide`_ if you'd like to survey research in this field.\ [3]_ Empirically, the impact of the prompt-completion format on accuracy depends on the quality of the language model. For larger, instruction-trained models, the format is not as important. For smaller, less instruction-trained models, it can be critical to get the format right. .. _external guide: https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/ Feel free to `open an issue `_ if you're having trouble with writing a prompt. It's not always an easy task. Models can be surprisingly finicky. Prompt-completion formats ------------------------- Most successful prompt-completion formats adhere to one of these three styles: Concat-Class, Yes-No, Multiple Choice. Concat-Class ~~~~~~~~~~~~ In this format, the full name of the class is directly used as the completion. In other words, the class name is directly concatenated after ``{prompt}{end_of_prompt}``. Here's an example: .. code:: python prompts = [ "Stephen Curry is a", "Martina Navratilova was a", "Dexter, from the TV Series Dexter's Laboratory, is a", "LeBron James is a", ] end_of_prompt = " " # Each of the prompts could possibly be completed with one of these: class_names = ("basketball player", "tennis player", "scientist") # What strings will CAPPr see? for i, prompt in enumerate(prompts): print(f"For prompt {i + 1}") print("------------") for completion in class_names: print(f"{prompt}{end_of_prompt}{completion}") print() # For prompt 1 # ------------ # Stephen Curry is a basketball player # Stephen Curry is a tennis player # Stephen Curry is a scientist # # For prompt 2 # ------------ # Martina Navratilova was a basketball player # Martina Navratilova was a tennis player # Martina Navratilova was a scientist # # For prompt 3 # ------------ # Dexter, from the TV Series Dexter's Laboratory, is a basketball player # Dexter, from the TV Series Dexter's Laboratory, is a tennis player # Dexter, from the TV Series Dexter's Laboratory, is a scientist # # For prompt 4 # ------------ # LeBron James is a basketball player # LeBron James is a tennis player # LeBron James is a scientist For each prompt, CAPPr will pick the completion which makes the sentence make the most sense (according to the model). A reasonable hypothesis is that this prompt-completion format is especially well-suited to less instruction-trained models. See the footnote for a reason for this hypothesis.\ [#]_ It's often a good idea to also mention the choices/completions in the prompt, or the system prompt for an instruction-trained model. For example: .. code:: python class_names = ("basketball player", "tennis player", "scientist") class_names_str = '\n'.join(class_names) prompt_prefix = f'''Every input belongs to one of these categories: {class_names_str}''' print(prompt_prefix) # Every input belongs to one of these categories: # basketball player # tennis player # scientist .. warning:: CAPPr hasn't been systematically evaluated on ``completions`` where some are longer than 30 tokens. Consider this domain uncharted and risky for CAPPr. Examples ++++++++ For examples of this prompt-completion format in action, see any of the `Hugging Face demos `_ excluding ``sciq.ipynb``, or any of the `Llama CPP demos `_. For minimal examples, see the **Example** section for each of these functions: :func:`cappr.openai.classify.predict` :func:`cappr.huggingface.classify.predict` :func:`cappr.llama_cpp.classify.predict` :func:`cappr.openai.classify.predict_examples` :func:`cappr.huggingface.classify.predict_examples` :func:`cappr.llama_cpp.classify.predict_examples` For minimal examples with a shared prompt prefix, see the **Example** section for each of these functions: :func:`cappr.huggingface.classify.cache_model` :func:`cappr.llama_cpp.classify.cache_model` Yes-No ~~~~~~ Sometimes, your task can be framed as a yes or no question. Here's an example of a successful format for instruction-trained models, which was pulled from `this demo `_: .. code:: python def prompt_yes_or_no(text: str) -> str: return ( "The following sentence was taken from a medical case report: " f'"{text}"\n' "Does the sentence describe an adverse effect of a pharmaceutical " "drug or substance?\n" "Answer Yes or No:" ) end_of_prompt = " " completions = ("Yes", "No") medical_case_report = ( "We describe the case of a 10-year-old girl with two " "epileptic seizures and subcontinuous spike-waves during sleep, who " "presented unusual side-effects related to clobazam (CLB) monotherapy." ) prompt = prompt_yes_or_no(medical_case_report) for completion in completions: print(f"{prompt}{end_of_prompt}{completion}") print() Examples ++++++++ For another example of this prompt-completion format in action, see `this demo `_. Multiple Choice ~~~~~~~~~~~~~~~ Many models have been extensively trained to answer multiple choice questions. One caveat is that the number of choices ideally shouldn't be more than five, because multiple choice question formats seen during training are usually limited to the letters from school exams: A, B, C, D, E. Based on a few experiments, multiple choice questions are less appropriate for less instruction-trained models. Also, ensure that the system prompt is explicit about answering with one of the letters. Here's an example of the system prompt used for the `Llama 2 COPA demo `_: .. code:: python system_prompt_copa = ( "Identify the cause or effect of a premise given two choices. Each " "choice is identified by a letter, A or B.\n" "Respond only with the letter corresponding to the correct cause or " "effect." ) Here's a little utility function which automatically writes out the letters and choices: .. code:: python from string import ascii_uppercase as alphabet def multiple_choice(*choices) -> str: if len(choices) > len(alphabet): raise ValueError("There are more choices than letters.") letters_and_choices = [ f"{letter}. {choice}" for letter, choice in zip(alphabet, choices) ] return "\n".join(letters_and_choices) choices = [ "Don't Wanna Know", "Shit", "All Time Low", "Welcome to the Internet", "Bezos II", ] print(multiple_choice(*choices)) # A. Don't Wanna Know # B. Shit # C. All Time Low # D. Welcome to the Internet # E. Bezos II Examples ++++++++ For an example of this prompt-completion format in action, see `this demo `_. Quirks ------ Most models are sensitive to quirky differences between prompts. For models using SentencePiece tokenization, e.g., Llama and Mistral, only .. code:: python {prompt} {completion} formats are possible. In other words, ``end_of_prompt`` is always a whitespace, regardless of your input. The placement of whitespaces can also affect performance. Consider adding the whitespace to the end of the ``prompt`` string. Another quirk is that when using a Concat-Class style prompt with a less instruction-trained model, it's possible to achieve higher accuracy by abandoning the chat/instruction format. See, e.g., the `Llama 2 COPA demo`_. These notes will be updated as more quirks are discovered. Wrangle step-by-step completions -------------------------------- Step-by-step\ [4]_ and chain-of-thought\ [5]_ prompts are highly effective for more involved tasks. While CAPPr is not immediately well-suited to these prompts, it may be applied to post-process completions: 1. Get the completion from the step-by-step / chain-of-thought prompt 2. Pass this completion in a second prompt, and have CAPPr classify the answer. You can probably get away with using a cheap model for this task, as it just takes a bit of semantic parsing. Here's an example: .. code:: python from cappr.openai.api import gpt_chat_complete from cappr.openai.classify import predict # Task for a student in school: pick the next prereq to take class_to_prereqs = { "CS-101": "no prerequisites", "CS-102": "CS-101", "MATH-101": "no prerequisites", "MATH-102": "MATH-101", "ML-101": "CS-101, MATH-102, STAT-101", "STAT-101": "MATH-101", "STAT-102": "STAT-101, MATH-102", } class_to_prereqs_str = "\n".join( f"{class_}: {prereqs}" for class_, prereqs in class_to_prereqs.items() ) prompt_raw = f""" Hi Professor. I'm interested in taking ML-101, but I'm struggling to decide which course I need to take before that. I've already taken CS-101. Which course should I take next? Here's a list of courses and their prerequisites which I pulled from the course catalog. {class_to_prereqs_str} """ prompt_step_by_step = prompt_raw + "\n" + "Let's think step by step." chat_api_response = gpt_chat_complete( prompt_step_by_step, model="gpt-4", system_msg=( "You are a computer scientist mentoring a student. End your response " "to the student's question with the final answer, which is the name " "of a course." ), max_tokens=1_000, temperature=0, ) step_by_step_answer = chat_api_response[0]["message"]["content"] prompt_answer = f''' Here is an answer about which course a student needs to take: """ {step_by_step_answer} """ According to this answer, the very next course that the student should take is''' answer = predict( prompt_answer, completions=class_to_prereqs.keys(), model="text-babbage-001", ) print(answer) # MATH-101 A note on few-shot prompts -------------------------- While almost all of the examples in the documentation are "zero-shot" prompts (they don't include examples of inputs and expected outputs), nothing about CAPPr prevents you from using few-shot prompts / in-context learning. Just make sure you're not paying too much money, latency, or data for a small benefit. Use :func:`cappr.llama_cpp.classify.cache_model` or :func:`cappr.huggingface.classify.cache_model` if applicable. And consider that you may not need to label many (or any!) examples for few-shot prompting to work well.\ [6]_ Footnotes --------- .. [#] These are not hard rules. For example, the `demo for the Winograd Schema Challenge `_ flips the roles of the ``prompt`` and ``completion``. Just don't use the ``prior`` keyword argument in that case. .. [#] CAPPr may be able to lean more on what was learned during pretraining than methods which rely on instruction-style prompts. Consider the `COPA task `_. A smaller language model probably hasn't seen enough of the instruction-style prompt: .. code:: The man broke his toe because A. He got a hole in his sock B. He dropped a hammer on his foot Answer A or B. But from pretraining, the model has probably seen many sentences like: .. code:: The man broke his toe because he dropped a hammer on his foot. And it would therefore give higher probability to the correct choice: ``he dropped a hammer on his foot``. References ---------- .. [3] Weng, Lilian. (Mar 2023). Prompt Engineering. Lil'Log. https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/. .. [4] Kojima, Takeshi, et al. "Large language models are zero-shot reasoners." arXiv preprint arXiv:2205.11916 (2022). .. [5] Wei, Jason, et al. "Chain of thought prompting elicits reasoning in large language models." arXiv preprint arXiv:2201.11903 (2022). .. [6] Min, Sewon, et al. "Rethinking the role of demonstrations: What makes in-context learning work?." arXiv preprint arXiv:2202.12837 (2022).