.. WARNING: this file is generated from 'doc-sources/get-started.rst.tmpl'. MANUAL EDITS WILL BE LOST. .. Copyright 2023-2024 Vincent Jacques =========== Get started =========== Get *lincs* =========== We provide binary wheels for *lincs* on Linux, Windows and macOS for x86_64 processors, so running ``pip install lincs --only-binary lincs`` should be enough on those systems. We generally recommend you use ``pip`` in a virtual environment (``python -m venv``) or directly ``pipx`` to install any package, including *lincs*. Recent Ubuntu systems will even enforce that, by `refusing to install PyPI packages `_ in the "externally managed" default environment. Alternatively, you can use our `Docker image `_ (``docker run --rm -it jacquev6/lincs:latest``) and run the commands below in there. If you're on a platform for which we don't make wheels and our Docker image doesn't cover your needs, you'll have to build *lincs* from sources. We don't recommend you do that, because it can be a lot of work. If you really want to go that route, you may want to start by reading the `GitHub Actions workflow `_ we use to build the binary wheels. You'll probably start by trying ``pip install lincs``, see what dependencies are missing, install them and iterate from there. If you end up modifying *lincs* to make it work on your platform, we kindly ask you to contribute your changes back to the project. .. _start-command-line: Start using *lincs*' command-line interface =========================================== Even if you plan to use *lincs* mainly through its Python API, we do recommend you go through this section first. It will make it easier for you when you go through our :doc:`Python API guide `. If you're a Jupyter user, you can `download the notebook `_ this section is based on. The command-line interface is the easiest way to get started with *lincs*, starting with ``lincs --help``, which should output something like: .. code:: text Usage: lincs [OPTIONS] COMMAND [ARGS]... lincs (Learn and Infer Non-Compensatory Sorting) is a set of tools for training and using MCDA models. Options: --version Show the version and exit. --help Show this message and exit. Commands: classification-accuracy Compute a classification accuracy. classify Classify alternatives. describe Provide human-readable descriptions. generate Generate synthetic data. info Get information about lincs itself. learn Learn a model. visualize Make graphs from data. It's organized into sub-commands, the first one being ``generate``, to generate synthetic pseudo-random data. *lincs* is designed to handle real-world data, but it's often easier to start with synthetic data to get familiar with the tooling and required file formats. Synthetic data is described in our :ref:`conceptual overview documentation `. So, start by generating a classification problem with 4 criteria and 3 categories: .. code:: shell lincs generate classification-problem 4 3 --output-problem problem.yml The generated ``problem.yml`` should look like: .. code:: yaml # Reproduction command (with lincs version 1.1.0): lincs generate classification-problem 4 3 --random-seed 40 kind: classification-problem format_version: 1 criteria: - name: Criterion 1 value_type: real preference_direction: increasing min_value: 0 max_value: 1 - name: Criterion 2 value_type: real preference_direction: increasing min_value: 0 max_value: 1 - name: Criterion 3 value_type: real preference_direction: increasing min_value: 0 max_value: 1 - name: Criterion 4 value_type: real preference_direction: increasing min_value: 0 max_value: 1 ordered_categories: - name: Worst category - name: Intermediate category 1 - name: Best category You can edit this file to change the criteria names, the number of categories, *etc.* as long as you keep the same format. That format is explained in details in our :ref:`user guide `. The concept of "classification problem" is described in our :ref:`conceptual overview documentation `. Note that to keep this "Get Started" simple, we only consider the most basic kind of criteria: real-valued, with normalized minimal and maximal values, and increasing preference direction. There are many other kinds of criteria, and you can read about them in our user guide. If you want a human-readable explanation of the problem, you can use: .. code:: shell lincs describe classification-problem problem.yml It will tell you something like: .. code:: text This a classification problem into 3 ordered categories named "Worst category", "Intermediate category 1" and "Best category". The best category is "Best category" and the worst category is "Worst category". There are 4 classification criteria (in no particular order). Criterion "Criterion 1" takes real values between 0.0 and 1.0 included. Higher values of "Criterion 1" are known to be better. Criterion "Criterion 2" takes real values between 0.0 and 1.0 included. Higher values of "Criterion 2" are known to be better. Criterion "Criterion 3" takes real values between 0.0 and 1.0 included. Higher values of "Criterion 3" are known to be better. Criterion "Criterion 4" takes real values between 0.0 and 1.0 included. Higher values of "Criterion 4" are known to be better. Then generate an NCS classification model: .. code:: shell lincs generate classification-model problem.yml --output-model model.yml It should look like: .. code:: yaml # Reproduction command (with lincs version 1.1.0): lincs generate classification-model problem.yml --random-seed 41 --model-type mrsort kind: ncs-classification-model format_version: 1 accepted_values: - kind: thresholds thresholds: [0.255905151, 0.676961303] - kind: thresholds thresholds: [0.0551739037, 0.324553937] - kind: thresholds thresholds: [0.162252158, 0.673279881] - kind: thresholds thresholds: [0.0526000932, 0.598555863] sufficient_coalitions: - &coalitions kind: weights criterion_weights: [0.147771254, 0.618687689, 0.406786472, 0.0960085914] - *coalitions The file format, including the ``*coalitions`` YAML reference, is documented in our :ref:`user guide `. You can visualize it using: .. code:: shell lincs visualize classification-model problem.yml model.yml model.png It should output something like: .. image:: get-started/model.png :alt: Model visualization :align: center The model format is quite generic to ensure *lincs* can evolve to handle future models, so you may want to get a human-readable description of a model, including wether it's an MR-Sort or Uc-NCS model, using: .. code:: shell lincs describe classification-model problem.yml model.yml It should output something like: .. code:: text This is a MR-Sort (a.k.a. 1-Uc-NCS) model: an NCS model where the sufficient coalitions are specified using the same criterion weights for all boundaries. The weights associated to each criterion are: - Criterion "Criterion 1": 0.15 - Criterion "Criterion 2": 0.62 - Criterion "Criterion 3": 0.41 - Criterion "Criterion 4": 0.10 To get into an upper category, an alternative must be better than the following profiles on a set of criteria whose weights add up to at least 1: - For category "Intermediate category 1": at least 0.26 on criterion "Criterion 1", at least 0.06 on criterion "Criterion 2", at least 0.16 on criterion "Criterion 3", and at least 0.05 on criterion "Criterion 4" - For category "Best category": at least 0.68 on criterion "Criterion 1", at least 0.32 on criterion "Criterion 2", at least 0.67 on criterion "Criterion 3", and at least 0.60 on criterion "Criterion 4" And finally generate a set of classified alternatives: .. code:: shell lincs generate classified-alternatives problem.yml model.yml 1000 --output-alternatives learning-set.csv The file format is documented in our :ref:`reference documentation `. @todo(Feature, later) Should we provide utilities to split a set of alternatives into a training set and a testing set? Currently we suggest generating two sets from a synthetic model, but for real-world data it could be useful to split a single set. Then we'll need to think about the how the ``--max-imbalance`` option interacts with that feature. It should start with something like this, and contain 1000 alternatives: .. code:: text # Reproduction command (with lincs version 1.1.0): lincs generate classified-alternatives problem.yml model.yml 1000 --random-seed 42 --misclassified-count 0 name,"Criterion 1","Criterion 2","Criterion 3","Criterion 4",category "Alternative 1",0.37454012,0.796543002,0.95071429,0.183434784,"Best category" "Alternative 2",0.731993914,0.779690981,0.598658502,0.596850157,"Intermediate category 1" "Alternative 3",0.156018645,0.445832759,0.15599452,0.0999749228,"Worst category" "Alternative 4",0.0580836125,0.4592489,0.866176128,0.333708614,"Best category" "Alternative 5",0.601114988,0.14286682,0.708072603,0.650888503,"Intermediate category 1" You can visualize its first five alternatives using: .. code:: shell lincs visualize classification-model problem.yml model.yml --alternatives learning-set.csv --alternatives-count 5 alternatives.png It should output something like: .. image:: get-started/alternatives.png :alt: Alternatives visualization :align: center You now have a (synthetic) learning set. You can use it to train a new model: .. code:: shell lincs learn classification-model problem.yml learning-set.csv --output-model trained-model.yml The trained model has the same structure as the original (synthetic) model because they are both MR-Sort models for the same problem. The learning set doesn't contain all the information from the original model, and the trained model was reconstituted from this partial information, so it is numerically different: .. code:: yaml # Reproduction command (with lincs version 1.1.0): lincs learn classification-model problem.yml learning-set.csv --model-type mrsort --mrsort.strategy weights-profiles-breed --mrsort.weights-profiles-breed.models-count 9 --mrsort.weights-profiles-breed.accuracy-heuristic.random-seed 43 --mrsort.weights-profiles-breed.initialization-strategy maximize-discrimination-per-criterion --mrsort.weights-profiles-breed.weights-strategy linear-program --mrsort.weights-profiles-breed.linear-program.solver glop --mrsort.weights-profiles-breed.profiles-strategy accuracy-heuristic --mrsort.weights-profiles-breed.accuracy-heuristic.processor cpu --mrsort.weights-profiles-breed.breed-strategy reinitialize-least-accurate --mrsort.weights-profiles-breed.reinitialize-least-accurate.portion 0.5 --mrsort.weights-profiles-breed.target-accuracy 1.0 kind: ncs-classification-model format_version: 1 accepted_values: - kind: thresholds thresholds: [0.339874953, 0.421424538] - kind: thresholds thresholds: [0.0556534864, 0.326433569] - kind: thresholds thresholds: [0.162616938, 0.67343241] - kind: thresholds thresholds: [0.0878681168, 0.252649099] sufficient_coalitions: - &coalitions kind: weights criterion_weights: [0, 1.01327896e-06, 0.999998987, 0] - *coalitions If the training is effective, the resulting trained model should however behave closely to the original one. To see how close a trained model is to the original one, you can reclassify a testing set. First, generate a testing set from the original model: .. code:: shell lincs generate classified-alternatives problem.yml model.yml 3000 --output-alternatives testing-set.csv Then ask the trained model to classify it: .. code:: shell lincs classify problem.yml trained-model.yml testing-set.csv --output-alternatives reclassified-testing-set.csv There are a few differences between the original testing set and the reclassified one: .. code:: shell diff testing-set.csv reclassified-testing-set.csv That command should show a few alternatives that are not classified the same way by the original and the trained model: .. code:: diff 522c522 < "Alternative 520",0.617141366,0.326259822,0.901315808,0.460642993,"Best category" --- > "Alternative 520",0.617141366,0.326259822,0.901315808,0.460642993,"Intermediate category 1" 615c615 < "Alternative 613",0.547554553,0.0552174859,0.690436542,0.511019647,"Intermediate category 1" --- > "Alternative 613",0.547554553,0.0552174859,0.690436542,0.511019647,"Worst category" 2596c2596 < "Alternative 2594",0.234433308,0.780464768,0.162389532,0.622178912,"Intermediate category 1" --- > "Alternative 2594",0.234433308,0.780464768,0.162389532,0.622178912,"Worst category" 2610c2610 < "Alternative 2608",0.881479025,0.055544015,0.82936728,0.853676081,"Intermediate category 1" --- > "Alternative 2608",0.881479025,0.055544015,0.82936728,0.853676081,"Worst category" You can also measure the classification accuracy of the trained model on that testing set: .. code:: shell lincs classification-accuracy problem.yml trained-model.yml testing-set.csv It should be close to 100%: .. code:: text 2996/3000 What now? ========= If you haven't done so yet, we recommend you now read our :doc:`conceptual overview documentation `. Keep in mind that we've only demonstrated the default learning approach in this guide. See our :doc:`user guide ` for more details. .. @todo(Documentation, later) Add an intermediate document, a case study, that shows a realistic use case. Once you're comfortable with the concepts and tooling, you can use a learning set based on real-world data and train a model that you can use to classify new real-world alternatives.