Design choices ============== This is the first open source software project I've created. I thought I'd write about some of its design choices, and what I learned from creating it. These thoughts are more reflective than educational. An alternate design ------------------- As noted on the previous page, ``predict_proba`` is a transformation of the output of ``log_probs_conditional``—a transformation which is always independent of the LM interface. ``predict`` is just the argmax of the output of ``predict_proba``. These patterns suggest a design which does some type of function-wrapping. The current design uses decorators. An alternate design is to create an abstract base class which all LM interfaces inherit from / implement. .. code:: python from abc import ABC, abstractmethod import numpy.typing as npt class _AbstractClassifier(ABC): @classmethod @abstractmethod def log_probs_conditional( cls, prompts: str | Sequence[str], completions: Sequence[str], model, **kwargs, ) -> list[list[float]] | list[list[list[float]]]: pass @classmethod def predict_proba( cls, prompts: str | Sequence[str], completions: Sequence[str], model, **kwargs, ) -> npt.NDArray[np.floating]: log_probs_completions = cls.log_probs_conditional( prompts, completions, model, **kwargs ) return ( # these methods don't exist, but you know what I mean log_probs_completions .average(axis=-1) .exponentiate() .normalize(axis=1) ) @classmethod def predict( cls, prompts: str | Sequence[str], completions: Sequence[str], model, **kwargs, ) -> str | list[str]: pred_probs = cls.predict_proba(prompts, completions, model, **kwargs) pred_completions = [ completions[completion_idx] for completion_idx in pred_probs.argmax(axis=1) ] return pred_completions As far as I can see, there is no benefit to this design. It's tempting to think that it reduces writing LM modules to writing sublcasses which implement ``log_probs_conditional`` and nothing else. That's not practically true. Because if you're writing a new LM module, it most likely has unique ``kwargs`` which you need to supply in the signature and explain in the docstring. So you need to write out the method anyway. Moreover, the amount of repeated code that's saved is insignificant. There is a small cost to this design. Enabling the simple import interface— :: from cappr.{LM interface name}.classify import predict_proba —would require adding a bit of logic to ``__init__.py`` and Sphinx's ``docs/source/conf.py`` for every new module. I could automate this stuff by implementing some slightly tricky logic. But more logic is worse than less logic. Overall, this design doesn't align with my style. Inheritance isn't needed to achieve function wrapping. Decorators are sufficient. Input checks ------------ There are some silent and some loud errors which would be hard to debug for a user. For example: - inputting an unordered iterable will cause an output array of predicted probabilities to be meaningless - inputting an incorrectly structured prior will cause all of the model's compute to be a waste - inputting an empty object causes an obscure index error in some downstream model/tokenization functions. To combat this, every decorator does input checking. These checks aren't too restrictive. For example, ``isinstance(ordered_object, Sequence)`` is not required. ``ordered_object`` just has to have an obvious, deterministic order when ``x for x in ordered_object`` is done. As a result, the user can be lazy: they can input a pandas Series, a dictionary's ``.keys()``, a numpy array, or torch tensor, and the function will work just fine. Context managers ---------------- The Hugging Face module requires that the ``model_and_tokenizer`` input is set up in a particular way. It looks like many other tools solve this problem by creating a loading function or method which does the required set up. Its returned object is internal to the package. I don't think this extra abstraction is necessary. And it comes at the small cost of requiring the user to learn a new way to load their model. Instead, ``cappr.huggingface`` lets the user initialize the object however they want. It then internally sets it up as required, rolling back these changes when finished. This pattern is accomplished by the context managers `here `_. No string formatting abstractions --------------------------------- This package will do one thing well: pick a completion from a user-created prompt. If users need or want to use a string formatter, that's on them. Repeat docstrings ----------------- Lots of text in docstrings are repeated. After all, every LM module is implementing a protocol. I previously experimented with `an automation `_ that dynamically writes the docstring via decorators. This pattern is used throughout Hugging Face ``transformers``, for example. I decided against this pattern because it sacrifices an important convenience: hovering over a function to see what it does. Code analyzers like Pylance won't show the ``__doc__`` attribute that was dynamically constructed. I personally am slightly annoyed when I have to open up a function's documentation in my browser, and look back and forth at my browser and IDE. I like the convenience of hovering over the function in my IDE itself. So I opted to do what numpy, scipy, and scikit-learn do in their docstrings: repeat text. It's definitely tedious to make modifications. But that tediousness is outweighed by the benefits to the user. Testing ------- This package's tests are designed in a sophisticated (complicated) way. It took me a while to think about what they should look like. The goal was to allow for 2 things: #. shared test cases universal to all ``classify`` modules—these are the parametrizations in ``_base.TestPromptsCompletions`` and ``_base.TestExamples`` (see this `module `_) #. LM-interface-specific fixtures and parametrizations to test LM-interface-specific setups and arguments, e.g., ``batch_size`` in ``cappr.huggingface``. The current testing design accomplishes these things through inheritance, because pytest is incredibly powerful with inheritance. For an example, see the `tests `_ for llama-cpp models. There are still a few testing todos. One problem is that there are dependencies in the tests; if ``test_log_probs_conditional`` fails, the rest of the tests will fail. Ideally, for example, ``test_predict_proba`` assumes ``log_probs_conditional`` is correct. Mistakes were made ------------------ I made plenty of mistakes while developing CAPPr. Too many breaking changes ~~~~~~~~~~~~~~~~~~~~~~~~~ `Releases `_ were not as backwards compatible as they could've been. I was tripped up by the OpenAI v1.0 release. I've been figuring stuff out on the fly and releasing whenever I think something is good enough for the short term. Too many half-measures ~~~~~~~~~~~~~~~~~~~~~~ It's well known that attention keys and values can be cached whenever substrings are repeated for inference. Getting this feature to align with the CAPPr scheme took nitty gritty work. My first few implementations of caching were suboptimal from both a computational and a UI perspective. I got lost in the sauce of making lots and lots of incremental improvements. Eventually, I `re-did `_ the whole thing with some success. There are still probably important optimizations I left on the table, but it'll do for now. Marketing matters ~~~~~~~~~~~~~~~~~ The first version of the `User Guide `_ was written for ML types, when it should've been written for software engineers. What's text classification? What are "labeled examples"? What's a prior? Why is a probability distribution useful? Docs for other tools answer, or successfully dodge, these questions much more effectively. Pleasant surprises ------------------ See `this page of the documentation `_. Besides the algorithmic stuff, I was pleasantly surprised to find that I enjoyed engineering this project from the ground up. Mulling over design decisions and managing myself was fun. I also became much more aware of open source tools and practices. I appreciate open source software at a higher level.