Contributor guide¶
We strongly recommend you get familiar with lincs by reading all the user documentation before reading this guide. This will help you contribute in a way that is consistent with the rest of the project.
We also recommend that you find and read the code pointed at by the examples in this document. It will give a much more concrete idea of what we are talking about.
Finally, please be aware that this document is a work in progress. Writing a contributor guide is kind of an exercise in clairvoyance: we have to predict what you will try to do. So, if we got that part wrong, do not hesitate to contact us as explained at the beginning of the Readme.
Do contribute!¶
We value contributions of any scale, from minor details to major refactoring. If you see a typo, please fix that typo! Using the GitHub web interface spares you the need to even clone the repository. If you think our entire architecture deserves a rewrite, please… discuss it with us https://github.com/MICS-Lab/lincs/discussions. (Don’t spend time on something that we might reject for reasons not entirely apparent to you at the moment.) If you have an idea but don’t want (or know how) to implement it yourself, please tell us about it. If you find a bug, please report it. Everything helps!
We also recognize that contributing to an open source project can be intimidating, and that not everyone has the same experience and fluency with the tools and programming languages we use. If you’re willing to get our feedback on your contribution, you can be assured that we will take time to provide this feedback in a kind and constructive manner.
A good way to get started is often to look for something similar in the code base and get inspiration from it.
Using git grep TheOldTHing
is a good way to find all places where that similar thing is used,
to help you make changes everywhere TheNewThing
is needed.
You can also use git blame
to identify a commit that changed something similar,
and see what else was changed in that commit or the next few.
Development dependencies¶
To modify lincs, you need reasonably recent versions of:
Bash
Git
Docker
If you have an CUDA-compatible NVidia GPU and want to run code that uses it, you’ll need to configure the NVidia Docker runtime.
Development cycle¶
The main loop when working on lincs is:
make some changes
run
./run-development-cycle.sh
repeat
The ./run-development-cycle.sh
script first builds a Docker image with the development dependencies.
It can take a long time the first time you run it, but the Docker cache makes subsequent builds much faster.
It then runs that image as a Docker container to build the library, run its C++ and Python unit tests, install it, run its integration tests, etc.
It provides a few options to speed things up; see its --help
option.
Eventually, if you are a maintainer of the PyPI package, you can publish a new version of the package using… ./publish.sh
.
Else, please open a pull request on GitHub!
./run-development-cycle.sh¶
./run-development-cycle.sh [OPTIONS]
Options
- --with-docs¶
Build the documentation. The built documentation is published at https://mics-lab.github.io/lincs/ using GitHub Pages. So, it should only be pushed to GitHub when a new version of the package is published. Use this option to see the impact of your changes on the documentation, but do not commit them.
- --single-python-version¶
Run tests under a single Python version to save time. Please run the full development cycle at least once before submitting your changes.
- --unit-coverage¶
Measure coverage of unit tests, stop right after that. Implies –single-python-version. Quite long.
- --skip-unit¶
Skip unit tests to save time.
- --skip-long-unit¶
Skip long unit tests to save time.
- --skip-cpp-unit¶
Skip C++ unit tests to save time.
- --skip-python-unit¶
Skip Python unit tests to save time.
- --skip-install¶
Skip installation to save time.
- --skip-notebooks¶
Skip notebooks to save time.
- --skip-unchanged-notebooks¶
Skip notebooks that have no ‘git diff’ to save time.
- --forbid-gpu¶
Skip all tests that use the GPU. This is done automatically (with a warning) if your machine does not have a GPU. Using this option explicitly avoids the warning.
- --forbid-chrones¶
Build lincs without Chrones.
- --doctest-option <doctest_option>¶
Pass an option verbatim to doctest. Can be used multiple times.
./publish.sh¶
./publish.sh [OPTIONS] COMMAND [ARGS]...
finalize¶
./publish.sh finalize [OPTIONS]
Options
- --dry-run¶
first-pre¶
./publish.sh first-pre [OPTIONS] {patch|minor|major} {alpha|beta|rc}
Options
- --dry-run¶
Arguments
- LEVEL¶
Required argument
- PRE¶
Required argument
next-pre¶
./publish.sh next-pre [OPTIONS]
Options
- --dry-run¶
quick¶
./publish.sh quick [OPTIONS] {patch|minor|major}
Options
- --dry-run¶
Arguments
- LEVEL¶
Required argument
Directory structure¶
All source code is in the lincs
directory.
The development
directory contains scripts and files used for development.
Scripts ./run-development-cycle.sh
and ./publish.sh
at the root are thin wrappers for scripts in that directory.
The root directory also contains basic packaging files like the README.rst
, setup.py
, MANIFEST.in
files, as well as the licence files.
Documentation sources are in doc-sources
, and built documentation is in docs
, to be published in GitHub Pages.
Published documentation should only be committed when publishing a new version.
You can build it locally to check how your changes are rendered using ./run-development-cycle.sh --with-docs
.
If you do that, docs
will contains documentation for your current -dev
version of lincs.
We don’t want that published, so you must commit only doc-sources
and not docs
.
Integration tests are in… integration-tests
.
Each test consists of a run.sh
script and accompanying files.
Some tests are generated from the documentation to check that the code examples are correct.
This is done by the ./run-development-cycle.sh
script before it runs them.
General design¶
Strategies¶
The strategy design pattern is a form of dependency injection that allows selecting an algorithm at runtime. You can then implement several algorithms for the same task, and easily switch between them.
This is particularly fitting for experimenting with different algorithms and their variants, and for benchmarking them.
lincs makes an extensive use of strategies. They are described from a user perspective in our user guide.
From a contributor perspective, it’s important to make the distinction between:
an extension point, which defines the interface of a strategy and calls it
an actual strategy, which implements that interface
Strategies can recursively call other strategies, so it’s possible for a piece of code to be a strategy and to define extension points.
Examples:
LearnMrsortByWeightsProfilesBreed
defines several extension points (e.g.WeightsOptimizationStrategy
,ProfilesImprovementStrategy
, etc.)ImproveProfilesWithAccuracyHeuristicOnCpu
andImproveProfilesWithAccuracyHeuristicOnGpu
are strategies for theProfilesImprovementStrategy
extension point
But beware of virtual function calls¶
Virtual function calls are somewhat costly (see integration-tests/virtual-cost
) so, although they are useful, they must be used with care.
It’s best to keep them for cases where they are not called too often; up to a few thousands per learning should be OK.
When polymorphism is required for frequent calls, it’s best to use template-based static polymorphism.
An example of that can be found in lincs/liblincs/learning/mrsort-by-weights-profiles-breed/optimize-weights/linear-program.hpp
,
where the LP solver is injected using the LinearProgram
template parameter, at no runtime cost.
So, why not all templates?¶
One could now consider using templates everywhere, and not use virtual function calls at all. This would have the following negative consequences:
The number of explicit template instantiations would explode in a combinatorial way.
For example, the LinearProgram
template parameter of .../optimize-weights/linear-program.hpp
is currently instantiated explicitly for each LP solver in .../optimize-weights/linear-program.cpp
.
If LearnMrsortByWeightsProfilesBreed
was a template, it would have to be instantiated for the whole Cartesian product of all variants of each strategy, to a great maintenance cost.
Note that this is not specific to explicit template instanciation, because we expose lincs as a Python library:
the Python module has to give access to all instanciations of LearnMrsortByWeightsProfilesBreed
.
It would forbid customization from the Python side. By nature, Python customization happens at runtime, which requires virtual functions. For example, from the Python side, it’s possible to add a termination strategy, but it’s not possible to add an LP solver.
That explains why we use virtual functions where we do.
How-tos¶
Update the documentation¶
To update the documentation, you’ll have to get familiar with the following tools:
And to less extent:
You can then edit README.rst
and files in doc-sources
and run ./run-development-cycle.sh --with-docs
.
Open docs/index.html
in your browser to check the result.
Choose Python or C++ for your change¶
This section is an anticipation for when we do publish the Python and C++ APIs of lincs. This should happen in 2024 (famous last words…), so we like to keep it in mind.
lincs is written partly in C++ and partly in Python. One important reason for a Python part is usability: Python is arguably easier to get started with than C++, so having a Python interface makes it easier for users to get started with lincs. The main reason for writing the core of lincs in C++ is performance: for CPU-intensive tasks, compiled C++ is definitely faster than interpreted Python; even more so for multi-threaded code.
Here is how we suggest you choose what language to use for your changes:
Do you know both languages?
If you only know one of those languages, well, use it. It may not be the best choice for the project, but it is the best choice for you. If your contribution requires and deserves to be re-implemented in the other language, then someone else may do it, or you may become motivated enough to learn the other language.
Should the new feature be exposed in the C++ library?
The core of lincs is usable as a C++ library (synthetic data generation, learning, classification).
A counter example is the visualization
module, which is only usable from Python.
If the new feature should be usable through the C++ library, then it must be written in C++.
How computationally-intensive is the new feature?
Most computationally-intensive parts should be written in C++, and Python can be used for the rest.
For example, a LearnMrsortByWeightsProfilesBreed::BreedingStrategy
that reduces the number of iterations of the LearnMrsortByWeightsProfilesBreed
can be written in Python because this high-level strategy is called only a few times per learning.
On the other side, a variant of OptimizeWeightsUsingGlop
that spares a few CPU cycles should be written in C++ because this is where most CPU time is spent.
Tweak an existing strategy¶
Just do it! Locate the strategy you want to change, and change it.
But this should be quite rare and only for minor changes. For anything more significant, you should add a new strategy instead. This will make it easy to compare the new and old strategies, and then choose the one we want to advertise more or set as default.
Add a new strategy¶
To implement a new way to do something, you should add a new strategy.
It boils down to:
implement the strategy
expose the strategy in the Python API
expose the strategy on the command-line
This is a case where using git grep
on existing strategies is very effective to find all the places where you need to make changes.
Add a new extension point¶
Dynamic or static?¶
By default, you should start with a dynamic extension point unless it’s very clear from the beginning that it will be called bilions of times per learning.
Definition of an dynamic extension point¶
@todo(Documentation, later) Write
Definition of an static extension point¶
@todo(Documentation, later) Write
Default behavior for backward compatibility¶
If your extension point is a new step in some previously existing algorithm, then you should provide a default implementation that mimics the previous behavior.
This can even be a null strategy if the previous behavior was to do nothing.
Add an external solver¶
As explained above, solvers are injected statically via template parameters. So their interface is not specified explicitly (e.g. by an abstract base class), but relies on duck-typing.
To add a new one, you have to:
install the solver library in
development/Dockerfile
add it in the extension’s
libraries
insetup.py
write a new solver class, e.g. in
lincs/liblincs/linear-programming/foobar.hpp
and.../foobar.cpp
test it to check that its interface and functionality are compatible with the existing solvers, e.g. in
.../linear-programming/test.cpp
optionally add specific tests in
.../foobar.cpp
add explicit template instantiations everywhere you want it used, e.g. in
.../learning/mrsort-by-weights-profiles-breed/optimize-weights/linear-program.cpp
add typedefs for the instantiations in
.../liblincs/learning.hpp
expose typedefs in the Python module
.../liblincs/liblincs_module.cpp
import it in
lincs/__init__.py
add options to use it in
lincs/command_line_interface.py
run and fix integration tests; expect failure of the
help-all
test, and update itsexpected.txt
file accordingly
The procedure above gives examples for linear programming solvers, but the same ideas apply to SAT and max-SAT solvers.
Here is the commit that introduced the Alglib linear programming solver: 0a790ef. The modifications we had to make to the “Get started” guide are no longer needed now that we publish binary wheels, and the user guide has been entirely restructured since this commit. Besides that, changes to the code still look like what you’d have to do to add a new solver.