The Python API

This document builds up on our “Get Started” guide and our user guide, and introduces lincs’ Python API. This API is more flexible, albeit more complex, than the command-line interface you’ve been using so far.

If you’re a Jupyter user, you can download the notebook this document is based on.

Do it again, in Python

First, lets do exactly the same thing as in our “Get started” guide, but using the Python API.

from lincs import classification as lc

Generate a synthetic classification problem:

problem = lc.generate_problem(criteria_count=4, categories_count=3, random_seed=40)

The first difference with the command-line interface is the third argument to the call to generate_problem: it’s the pseudo-random seed optionally passed by the --random-seed option on the command line. All pseudo-random seeds are mandatory in the Python API, so that you have full control of reproducibility. If you don’t care about it, you can use random.randrange(2**30) to use pseudo-random pseudo-random seeds. (No typo here: the pseudo-random seeds are pseudo-random.)

Generated problems are returned as Python objects of class lincs.Problem. You can print them:

import sys
problem.dump(sys.stdout)
kind: classification-problem
format_version: 1
criteria:
  - name: Criterion 1
    value_type: real
    preference_direction: increasing
    min_value: 0
    max_value: 1
  - name: Criterion 2
    value_type: real
    preference_direction: increasing
    min_value: 0
    max_value: 1
  - name: Criterion 3
    value_type: real
    preference_direction: increasing
    min_value: 0
    max_value: 1
  - name: Criterion 4
    value_type: real
    preference_direction: increasing
    min_value: 0
    max_value: 1
ordered_categories:
  - name: Worst category
  - name: Intermediate category 1
  - name: Best category

Description functions generate a list of strings:

print("\n".join(lc.describe_problem(problem)))
This a classification problem into 3 ordered categories named "Worst category", "Intermediate category 1" and "Best category".
The best category is "Best category" and the worst category is "Worst category".
There are 4 classification criteria (in no particular order).
Criterion "Criterion 1" takes real values between 0.0 and 1.0 included.
Higher values of "Criterion 1" are known to be better.
Criterion "Criterion 2" takes real values between 0.0 and 1.0 included.
Higher values of "Criterion 2" are known to be better.
Criterion "Criterion 3" takes real values between 0.0 and 1.0 included.
Higher values of "Criterion 3" are known to be better.
Criterion "Criterion 4" takes real values between 0.0 and 1.0 included.
Higher values of "Criterion 4" are known to be better.

Generate a synthetic MR-Sort classification model, again with an explicit pseudo-random seed:

model = lc.generate_mrsort_model(problem, random_seed=41)

model.dump(problem, sys.stdout)
kind: ncs-classification-model
format_version: 1
accepted_values:
  - kind: thresholds
    thresholds: [0.255905151, 0.676961303]
  - kind: thresholds
    thresholds: [0.0551739037, 0.324553937]
  - kind: thresholds
    thresholds: [0.162252158, 0.673279881]
  - kind: thresholds
    thresholds: [0.0526000932, 0.598555863]
sufficient_coalitions:
  - &coalitions
    kind: weights
    criterion_weights: [0.147771254, 0.618687689, 0.406786472, 0.0960085914]
  - *coalitions

Visualization functions interface with Matplotlib:

import matplotlib.pyplot as plt
axes = plt.subplots(1, 1, figsize=(6, 4), layout="constrained")[1]
lc.visualize_model(problem, model, [], axes)

png

Get the model’s description:

print("\n".join(lc.describe_model(problem, model)))
This is a MR-Sort (a.k.a. 1-Uc-NCS) model: an NCS model where the sufficient coalitions are specified using the same criterion weights for all boundaries.
The weights associated to each criterion are:
  - Criterion "Criterion 1": 0.15
  - Criterion "Criterion 2": 0.62
  - Criterion "Criterion 3": 0.41
  - Criterion "Criterion 4": 0.10
To get into an upper category, an alternative must be better than the following profiles on a set of criteria whose weights add up to at least 1:
  - For category "Intermediate category 1": at least 0.26 on criterion "Criterion 1", at least 0.06 on criterion "Criterion 2", at least 0.16 on criterion "Criterion 3", and at least 0.05 on criterion "Criterion 4"
  - For category "Best category": at least 0.68 on criterion "Criterion 1", at least 0.32 on criterion "Criterion 2", at least 0.67 on criterion "Criterion 3", and at least 0.60 on criterion "Criterion 4"

Generate a synthetic learning set (with an explicit pseudo-random seed):

learning_set = lc.generate_alternatives(problem, model, alternatives_count=1000, random_seed=42)

Dump it (in memory instead of on sys.stdout to print only the first few lines):

import io
f = io.StringIO()
learning_set.dump(problem, f)
print("\n".join(f.getvalue().splitlines()[:6] + ['...']))
name,"Criterion 1","Criterion 2","Criterion 3","Criterion 4",category
"Alternative 1",0.37454012,0.796543002,0.95071429,0.183434784,"Best category"
"Alternative 2",0.731993914,0.779690981,0.598658502,0.596850157,"Intermediate category 1"
"Alternative 3",0.156018645,0.445832759,0.15599452,0.0999749228,"Worst category"
"Alternative 4",0.0580836125,0.4592489,0.866176128,0.333708614,"Best category"
"Alternative 5",0.601114988,0.14286682,0.708072603,0.650888503,"Intermediate category 1"
...

Visualize it:

axes = plt.subplots(1, 1, figsize=(6, 4), layout="constrained")[1]
lc.visualize_model(problem, model, learning_set.alternatives[:5], axes)

png

Let’s now train a new model from this synthetic learning set. The command-line interface of lincs learn classification-model accepts quite a few options. Most of them set up the strategies used for the learning, as described further in our user guide. When using the Python API, you have to create these strategies yourself:

learning_data = lc.LearnMrsortByWeightsProfilesBreed.LearningData(problem, learning_set, models_count=9, random_seed=43)
profiles_initialization_strategy = lc.InitializeProfilesForProbabilisticMaximalDiscriminationPowerPerCriterion(learning_data)
weights_optimization_strategy = lc.OptimizeWeightsUsingGlop(learning_data)
profiles_improvement_strategy = lc.ImproveProfilesWithAccuracyHeuristicOnCpu(learning_data)
breeding_strategy = lc.ReinitializeLeastAccurate(learning_data, profiles_initialization_strategy=profiles_initialization_strategy, count=4)
termination_strategy = lc.TerminateAtAccuracy(learning_data, target_accuracy=len(learning_set.alternatives))

Then create the learning itself:

learning = lc.LearnMrsortByWeightsProfilesBreed(
    learning_data,
    profiles_initialization_strategy,
    weights_optimization_strategy,
    profiles_improvement_strategy,
    breeding_strategy,
    termination_strategy,
)

And .perform it to create the learned Model object:

learned_model = learning.perform()
learned_model.dump(problem, sys.stdout)
kind: ncs-classification-model
format_version: 1
accepted_values:
  - kind: thresholds
    thresholds: [0.339874953, 0.421424538]
  - kind: thresholds
    thresholds: [0.0556534864, 0.326433569]
  - kind: thresholds
    thresholds: [0.162616938, 0.67343241]
  - kind: thresholds
    thresholds: [0.0878681168, 0.252649099]
sufficient_coalitions:
  - &coalitions
    kind: weights
    criterion_weights: [0, 1.01327896e-06, 0.999998987, 0]
  - *coalitions

Create a testing set and classify it, taking notes of the accuracy of the new model on that testing set:

testing_set = lc.generate_alternatives(problem, model, alternatives_count=3000, random_seed=44)
classification_result = lc.classify_alternatives(problem, learned_model, testing_set)
classification_result.changed, classification_result.unchanged
(4, 2996)

This covers what was done in our “Get started” guide. As you can see the Python API is more verbose, but for good reasons: it’s more powerful as you’ll see in the next section.

Do more, with the Python API

Create classification objects

You don’t have to use our pseudo-random generation functions; you can create Problem, Model, etc. instances yourself.

Create a Problem

problem = lc.Problem(
    criteria=[
        lc.Criterion("Physics grade", lc.Criterion.IntegerValues(lc.Criterion.PreferenceDirection.increasing, 0, 100)),
        lc.Criterion("Literature grade", lc.Criterion.EnumeratedValues(["f", "e", "d", "c", "b", "a"])),
    ],
    ordered_categories=[lc.Category("Failed"), lc.Category("Passed"), lc.Category("Congratulations")],
)

problem.dump(sys.stdout)
kind: classification-problem
format_version: 1
criteria:
  - name: Physics grade
    value_type: integer
    preference_direction: increasing
    min_value: 0
    max_value: 100
  - name: Literature grade
    value_type: enumerated
    ordered_values: [f, e, d, c, b, a]
ordered_categories:
  - name: Failed
  - name: Passed
  - name: Congratulations

You can access all their attributes in code as well:

criterion = problem.criteria[0]
criterion.name
'Physics grade'
criterion.value_type, criterion.is_real, criterion.is_integer, criterion.is_enumerated
(liblincs.ValueType.integer, False, True, False)
values = criterion.integer_values
values.preference_direction, values.is_increasing, values.is_decreasing
(liblincs.PreferenceDirection.isotone, True, False)
values.min_value, values.max_value
(0, 100)
criterion = problem.criteria[1]
criterion.name
'Literature grade'
criterion.value_type, criterion.is_real, criterion.is_integer, criterion.is_enumerated
(liblincs.ValueType.enumerated, False, False, True)
values = criterion.enumerated_values
list(values.ordered_values)
['f', 'e', 'd', 'c', 'b', 'a']
values.get_value_rank(value="a")
5

Create a Model

model = lc.Model(
    problem,
    accepted_values=[
        lc.AcceptedValues(lc.AcceptedValues.IntegerThresholds([50, 80])),
        lc.AcceptedValues(lc.AcceptedValues.EnumeratedThresholds(["c", "a"])),
    ],
    sufficient_coalitions=[
        lc.SufficientCoalitions(lc.SufficientCoalitions.Weights([0.5, 0.5])),
        lc.SufficientCoalitions(lc.SufficientCoalitions.Weights([0.5, 0.5])),
    ],
)

model.dump(problem, sys.stdout)
kind: ncs-classification-model
format_version: 1
accepted_values:
  - kind: thresholds
    thresholds: [50, 80]
  - kind: thresholds
    thresholds: [c, a]
sufficient_coalitions:
  - &coalitions
    kind: weights
    criterion_weights: [0.5, 0.5]
  - *coalitions
accepted = model.accepted_values[0]
accepted.value_type, accepted.is_real, accepted.is_integer, accepted.is_enumerated
(liblincs.ValueType.integer, False, True, False)
accepted.kind, accepted.is_thresholds
(liblincs.Kind.thresholds, True)
list(accepted.integer_thresholds.thresholds)
[50, 80]
accepted = model.accepted_values[1]
accepted.value_type, accepted.is_real, accepted.is_integer, accepted.is_enumerated
(liblincs.ValueType.enumerated, False, False, True)
accepted.kind, accepted.is_thresholds
(liblincs.Kind.thresholds, True)
list(accepted.enumerated_thresholds.thresholds)
['c', 'a']
sufficient = model.sufficient_coalitions[0]
sufficient.kind, sufficient.is_weights, sufficient.is_roots
(liblincs.Kind.weights, True, False)
list(sufficient.weights.criterion_weights)
[0.5, 0.5]

Create (classified) Alternatives

alternatives = lc.Alternatives(problem, [
    lc.Alternative(
        "Unclassified alternative",
        [
            lc.Performance(lc.Performance.Integer(50)),
            lc.Performance(lc.Performance.Enumerated("c")),
        ],
        None
    ),
    lc.Alternative(
        "Classified alternative",
        [
            lc.Performance(lc.Performance.Integer(90)),
            lc.Performance(lc.Performance.Enumerated("a")),
        ],
        2
    ),
])
alternative = alternatives.alternatives[0]
alternative.category_index is None
True
performance = alternative.profile[0]
performance.value_type, performance.is_real, performance.is_integer, performance.is_enumerated
(liblincs.ValueType.integer, False, True, False)
performance.integer.value
50
problem.ordered_categories[alternatives.alternatives[1].category_index].name
'Congratulations'

Clone classification objects

Just use copy.deepcopy:

import copy

copied_problem = copy.deepcopy(problem)
copied_model = copy.deepcopy(model)
copied_alternatives = copy.deepcopy(alternatives)

This is especially useful e.g. if you want to identify alternatives that are classified differently by two models, because lc.classify_alternatives mutates the alternatives: clone the Alternatives, classify the copy and iterate over the zip of both Alternatives, comparing their .category_index.

Serialize classification objects

In YAML and CSV like the command-line

(and the upcoming C++ API)

Classification objects have a .dump method, and their classes have a static .load method that accept file-like objects.

We’ve used them above to print classification objects to sys.stdout. Here is an example of how to use them with actual files:

with open("problem.yml", "w") as f:
    problem.dump(f)

with open("model.yml", "w") as f:
    model.dump(problem, f)

with open("alternatives.csv", "w") as f:
    alternatives.dump(problem, f)

with open("problem.yml") as f:
    problem = lc.Problem.load(f)

with open("model.yml") as f:
    model = lc.Model.load(problem, f)

with open("alternatives.csv") as f:
    alternatives = lc.Alternatives.load(problem, f)

And here with in-memory io objects:

f = io.StringIO()
problem.dump(f)
s = f.getvalue()
print(s)
kind: classification-problem
format_version: 1
criteria:
  - name: Physics grade
    value_type: integer
    preference_direction: increasing
    min_value: 0
    max_value: 100
  - name: Literature grade
    value_type: enumerated
    ordered_values: [f, e, d, c, b, a]
ordered_categories:
  - name: Failed
  - name: Passed
  - name: Congratulations
f = io.StringIO(s)
problem = lc.Problem.load(f)

Using the Python-specific pickle module

Classification objects simply support pickling and unpickling. We recommend using the YAML and CSV formats whenever possible because they are not tied to the Python language (or the lincs library for that matter).

import pickle

pickle.loads(pickle.dumps(problem)).dump(sys.stdout)
kind: classification-problem
format_version: 1
criteria:
  - name: Physics grade
    value_type: integer
    preference_direction: increasing
    min_value: 0
    max_value: 100
  - name: Literature grade
    value_type: enumerated
    ordered_values: [f, e, d, c, b, a]
ordered_categories:
  - name: Failed
  - name: Passed
  - name: Congratulations

Note however that learning objects (e.g. instances of LearnMrsortByWeightsProfilesBreed) are not picklable.

Customize the model visualization

We’ve decided to not implement any customization for the visualizations produced by lincs.classification.visualize_model (and thus lincs visualize classification-model) because this would make it less robust and increase its complexity beyond the scope of lincs.

However, lincs.classification.visualize_model is written in Python, using the Python API described in this guide. This means that you can get inspiration from its source code (i.e. copy-paste it) to produce custom visualizations for your needs.

Be aware that our implementation supports a few edge cases, so it might be a bit more complex than what you actually need:

  • problems and models with a single criterion are visualized in a sensible way

problem = lc.Problem([lc.Criterion("Criterion", lc.Criterion.RealValues(lc.Criterion.PreferenceDirection.increasing, 0, 100))], [lc.Category("Bad"), lc.Category("Good")])
model = lc.generate_mrsort_model(problem, 42)
axes = plt.subplots(1, 1, figsize=(6, 4), layout="constrained")[1]
lc.visualize_model(problem, model, [], axes)

png

  • all value types (real, integer and enumerated) are visualized on vertical axes

problem = lc.Problem(
    [
        lc.Criterion("Real criterion", lc.Criterion.RealValues(lc.Criterion.PreferenceDirection.increasing, 0, 100)),
        lc.Criterion("Integer criterion", lc.Criterion.IntegerValues(lc.Criterion.PreferenceDirection.increasing, 0, 100)),
        lc.Criterion("Enumerated criterion", lc.Criterion.EnumeratedValues(["E", "D", "C", "B", "A"])),
    ],
    [lc.Category("Bad"), lc.Category("Good")]
)
model = lc.generate_mrsort_model(problem, 43)
axes = plt.subplots(1, 1, figsize=(6, 4), layout="constrained")[1]
lc.visualize_model(problem, model, [], axes)

png

  • increasing and decreasing criteria are visualized with axes going up or down

problem = lc.Problem(
    [
        lc.Criterion("Increasing criterion", lc.Criterion.RealValues(lc.Criterion.PreferenceDirection.increasing, 0, 100)),
        lc.Criterion("Decreasing criterion", lc.Criterion.RealValues(lc.Criterion.PreferenceDirection.decreasing, 0, 100)),
    ],
    [lc.Category("Bad"), lc.Category("Good")]
)
model = lc.generate_mrsort_model(problem, 42)
axes = plt.subplots(1, 1, figsize=(6, 4), layout="constrained")[1]
lc.visualize_model(problem, model, [], axes)

png

  • any min and max values are aligned horizontally at the top and bottom of vertical axes

problem = lc.Problem(
    [
        lc.Criterion("A", lc.Criterion.RealValues(lc.Criterion.PreferenceDirection.increasing, 0, 1)),
        lc.Criterion("B", lc.Criterion.RealValues(lc.Criterion.PreferenceDirection.increasing, -10, 10)),
    ],
    [lc.Category("Bad"), lc.Category("Good")]
)
model = lc.generate_mrsort_model(problem, 44)
axes = plt.subplots(1, 1, figsize=(6, 4), layout="constrained")[1]
lc.visualize_model(problem, model, [], axes)

png

  • labels for integer criteria with any number of intervals

problem = lc.Problem(
    [
        lc.Criterion("1 interval", lc.Criterion.IntegerValues(lc.Criterion.PreferenceDirection.increasing, 0, 1)),
        lc.Criterion("2 intervals", lc.Criterion.IntegerValues(lc.Criterion.PreferenceDirection.increasing, 10, 12)),
        lc.Criterion("3 intervals", lc.Criterion.IntegerValues(lc.Criterion.PreferenceDirection.increasing, -10, -7)),
        lc.Criterion("Even number\nof intervals", lc.Criterion.IntegerValues(lc.Criterion.PreferenceDirection.increasing, 100, 200)),
        lc.Criterion("Odd number\nof intervals", lc.Criterion.IntegerValues(lc.Criterion.PreferenceDirection.increasing, 1000, 1201)),
    ],
    [lc.Category("Bad"), lc.Category("Good")]
)
model = lc.generate_mrsort_model(problem, 43)
axes = plt.subplots(1, 1, figsize=(6, 4), layout="constrained")[1]
lc.visualize_model(problem, model, [], axes)

png

Create your own learning strategies

The “Weights, profiles, breed” learning approach uses strategies for each of its steps. lincs provides effective strategies as you’ve seen in the first part of this guide. You can also define your own strategies, to experiment with new ways to learn.

Let’s create a problem and learning set for this section:

problem = lc.Problem(
    [
        lc.Criterion("Criterion 1", lc.Criterion.RealValues(lc.Criterion.PreferenceDirection.decreasing, 0, 10)),
        lc.Criterion("Criterion 2", lc.Criterion.IntegerValues(lc.Criterion.PreferenceDirection.increasing, 0, 100)),
        lc.Criterion("Criterion 3", lc.Criterion.EnumeratedValues(["F", "E", "D", "C", "B", "A"])),
    ],
    [lc.Category("Bad"), lc.Category("Medium"), lc.Category("Good")],
)
learning_set = lc.generate_alternatives(problem, lc.generate_mrsort_model(problem, random_seed=42), alternatives_count=1000, random_seed=43)

LearningData

First, let’s get more familiar with the LearningData. You’ve seen it briefly in the first part of this guide but its purpose might still be quite obscure at this point.

learning_data = lc.LearnMrsortByWeightsProfilesBreed.LearningData(problem, learning_set, models_count=9, random_seed=43)

This object is shared by all strategies. They communicate by changing it, using side effects. It’s the state of the WPB algorithm. It contains two families of attributes.

Input data

The first family of attributes is about the problem and learning set. These attributes never change. First, the counts:

(learning_data.criteria_count, learning_data.categories_count, learning_data.boundaries_count, learning_data.alternatives_count)
(3, 3, 2, 1000)

The learning set is pre-processed in the LearningData so that the WPB algorithm doesn’t have to manipulate the different type of criterion values. In the LearningData, we keep only the ranks of the performances of each alternative in the learning set. The learning set is also destructured into a few arrays. Here are the attributes that describe this pre-processed learning set:

The number of distinct values actually seen for each criterion (including the min and max values for numerical criteria):

list(learning_data.values_counts)  # Indexed by [criterion_index]
[1002, 101, 6]

We see that the learning data knows 1002 values for the real-valued criterion. This is usual as it’s rare for two floating point values to be exactly equal, so the 1000 alternatives have distinct values, and the min and max are two more values. The learning data contains 101 values for the integer-valued criterion, meaning that the alternatives in the learning set do actually cover the whole set of possible values. And similarly, 6 values for the enumerated criterion.

For each criterion, the ranks of the performance of each alternative:

[list(v)[:10] + ['...'] for v in learning_data.performance_ranks]  # Indexed by [criterion_index][alternative_index]
[[883, 900, 753, 216, 365, 410, 302, 852, 738, 45, '...'],
 [50, 13, 17, 86, 4, 2, 25, 81, 47, 87, '...'],
 [3, 0, 1, 0, 3, 3, 2, 0, 0, 3, '...']]

The assignment of each alternative, i.e. the index of its category:

list(learning_data.assignments)[:10] + ['...']  # Indexed by [alternative_index]
[2, 2, 2, 0, 1, 1, 1, 2, 2, 0, '...']

All these attributes are iterable and allow random access through an integer index. They do not support splicing.

learning_data.assignments[0]
2
In-progress data

The second family of attributes is about the WPB algorithm itself.

The WPB approach operates on several “in progress” models. Their number is constant:

learning_data.models_count
9

Each model comes with a uniform random bits generator (URBG for short):

[str(urbg)[:43] + '>' for urbg in learning_data.urbgs]  # Indexed by [model_index]
['<liblincs.UniformRandomBitsGenerator object>',
 '<liblincs.UniformRandomBitsGenerator object>',
 '<liblincs.UniformRandomBitsGenerator object>',
 '<liblincs.UniformRandomBitsGenerator object>',
 '<liblincs.UniformRandomBitsGenerator object>',
 '<liblincs.UniformRandomBitsGenerator object>',
 '<liblincs.UniformRandomBitsGenerator object>',
 '<liblincs.UniformRandomBitsGenerator object>',
 '<liblincs.UniformRandomBitsGenerator object>']

This lets heuristic strategies operate in parallel on models and still produce deterministic results. URBGs are callable to get the next pseudo-random integer:

[r() for r in learning_data.urbgs]
[494155588,
 870190228,
 2450414687,
 641676058,
 661088198,
 363471047,
 1448606581,
 1348129397,
 2542538607]

The WPB learning approach is iterative, and the iteration_index is stored in the learning data. It starts at zero and tells you the current iteration:

learning_data.iteration_index
0

The remaining attributes are modified at each iteration, and start uninitialized. For this presentation, we’ll first run one iteration of WPB so that their values make sense.

learning_data = lc.LearnMrsortByWeightsProfilesBreed.LearningData(problem, learning_set, models_count=9, random_seed=43)
profiles_initialization_strategy = lc.InitializeProfilesForProbabilisticMaximalDiscriminationPowerPerCriterion(learning_data)
weights_optimization_strategy = lc.OptimizeWeightsUsingGlop(learning_data)
profiles_improvement_strategy = lc.ImproveProfilesWithAccuracyHeuristicOnCpu(learning_data)
breeding_strategy = lc.ReinitializeLeastAccurate(learning_data, profiles_initialization_strategy=profiles_initialization_strategy, count=4)
termination_strategy = lc.TerminateAfterIterations(learning_data, max_iterations_count=1)

lc.LearnMrsortByWeightsProfilesBreed(
    learning_data,
    profiles_initialization_strategy,
    weights_optimization_strategy,
    profiles_improvement_strategy,
    breeding_strategy,
    termination_strategy,
).perform()

assert(learning_data.iteration_index == 0)

Its model_indexes contains indexes of models in increasing order of accuracy.

list(learning_data.model_indexes)
[1, 5, 4, 8, 3, 6, 2, 0, 7]

Its accuracies attribute holds the accuracy of each model. They are stored as the count of correctly-classified alternatives, between 0 and alternatives_count included:

list(learning_data.accuracies)  # Indexed by model_index
[728, 593, 725, 671, 637, 609, 676, 833, 640]

If you iterate the accuracies in the order of model_indexes, they are sorted:

[learning_data.accuracies[index] for index in learning_data.model_indexes] == sorted(learning_data.accuracies)
True

Its profile_ranks hold, for each in-progress model, boundary, and criterion, the rank of the boundary’s performance, on the same scale as the performance_ranks attributes.

[[list(vv) for vv in v] for v in learning_data.profile_ranks]  # Indexed by [model_index][boundary_index][criterion_index]
[[[124, 9, 0], [633, 9, 3]],
 [[492, 8, 1], [965, 8, 4]],
 [[268, 15, 0], [506, 54, 3]],
 [[230, 8, 1], [272, 26, 4]],
 [[201, 1, 1], [201, 2, 3]],
 [[86, 64, 0], [86, 99, 0]],
 [[223, 60, 2], [310, 80, 5]],
 [[235, 20, 2], [595, 20, 3]],
 [[261, 52, 0], [262, 52, 3]]]

Its weights attribute holds the MR-Sort weight of each criterion in each in-progress model.

[list(v) for v in learning_data.weights]  # Indexed by [model_index][criterion_index]
[[1.0132789611816406e-06, 0.9999989867210388, 1.0132789611816406e-06],
 [0.9999989867210388, 0.9999989867210388, 1.0132789611816406e-06],
 [1.0, 0.0, 0.9999989867210388],
 [0.5000004768371582, 0.4999994933605194, 0.4999994933605194],
 [1.0132789611816406e-06, 0.9999989867210388, 0.0],
 [0.9999989867210388, 0.0, 1.0132789611816406e-06],
 [1.0, 0.0, 0.0],
 [1.0, 0.9999989867210388, 1.0132789611816406e-06],
 [0.9999989867210388, 0.0, 0.9999989867210388]]

Finally, LearningData has two methods about the best model so far: get_best_model and get_best_accuracy, which respectively return the best Model so far and its accuracy.

learning_data.get_best_model().dump(problem, sys.stdout)
kind: ncs-classification-model
format_version: 1
accepted_values:
  - kind: thresholds
    thresholds: [7.7909708, 4.06594753]
  - kind: thresholds
    thresholds: [20, 20]
  - kind: thresholds
    thresholds: [D, C]
sufficient_coalitions:
  - &coalitions
    kind: weights
    criterion_weights: [1, 0.999998987, 1.01327896e-06]
  - *coalitions
learning_data.get_best_accuracy()
833

Observer strategies

With this hopefully better understanding of LearningData, let’s write our own Observer strategy. It’s arguably the simplest to starts with, because it’s not expected to change the LearningData, but only observe it at some key points of the learning.

To start as simple as possible, lets reproduce the behavior of the --...-verbose flag on the command line, by creating an observer that just prints the best accuracy at each step.

Observer strategies must define two methods to be called by the learning algorithm: after_iteration, to be called at the end of each iteration, after the breeding of models is done, and before_return, to be called just before the final model is returned.

class VerboseObserver(lc.LearnMrsortByWeightsProfilesBreed.Observer):
    def __init__(self, learning_data):
        super().__init__()
        self.learning_data = learning_data

    def after_iteration(self):
        print(f"Best accuracy (after {self.learning_data.iteration_index + 1} iterations): {self.learning_data.get_best_accuracy()}")

    def before_return(self):
        print(f"Final accuracy (after {self.learning_data.iteration_index + 1} iterations): {self.learning_data.get_best_accuracy()}")

We can now pass it to a learning and perform that learning to observe its effects:

profiles_initialization_strategy = lc.InitializeProfilesForProbabilisticMaximalDiscriminationPowerPerCriterion(learning_data)
weights_optimization_strategy = lc.OptimizeWeightsUsingGlop(learning_data)
profiles_improvement_strategy = lc.ImproveProfilesWithAccuracyHeuristicOnCpu(learning_data)
breeding_strategy = lc.ReinitializeLeastAccurate(learning_data, profiles_initialization_strategy=profiles_initialization_strategy, count=4)
termination_strategy = lc.TerminateAtAccuracy(learning_data, target_accuracy=len(learning_set.alternatives))
observer = VerboseObserver(learning_data)

learned_model = lc.LearnMrsortByWeightsProfilesBreed(
    learning_data,
    profiles_initialization_strategy,
    weights_optimization_strategy,
    profiles_improvement_strategy,
    breeding_strategy,
    termination_strategy,
    [observer],
).perform()
Best accuracy (after 1 iterations): 938
Best accuracy (after 2 iterations): 992
Best accuracy (after 3 iterations): 997
Best accuracy (after 4 iterations): 997
Best accuracy (after 5 iterations): 997
Best accuracy (after 6 iterations): 997
Best accuracy (after 7 iterations): 997
Best accuracy (after 8 iterations): 999
Final accuracy (after 9 iterations): 1000

Now let’s do something slightly more complicated: our goal for IntermediatesObserver is to keep track of the best model so far at different times during the learning. Specifically, we want to keep the models at iterations 1, 2, 4, 8, etc..

import math

class IntermediatesObserver(lc.LearnMrsortByWeightsProfilesBreed.Observer):
    def __init__(self, problem, learning_data):
        super().__init__()
        self.problem = problem
        self.learning_data = learning_data
        self.intermediate_models = []

    def after_iteration(self):
        if math.log2(self.learning_data.iteration_index + 1).is_integer():
            self.intermediate_models.append(self.learning_data.get_best_model())

    def before_return(self):
        pass

learning_data = lc.LearnMrsortByWeightsProfilesBreed.LearningData(problem, learning_set, models_count=9, random_seed=43)  # Do *not* reuse the same `LearningData` for several learnings
profiles_initialization_strategy = lc.InitializeProfilesForProbabilisticMaximalDiscriminationPowerPerCriterion(learning_data)
weights_optimization_strategy = lc.OptimizeWeightsUsingGlop(learning_data)
profiles_improvement_strategy = lc.ImproveProfilesWithAccuracyHeuristicOnCpu(learning_data)
breeding_strategy = lc.ReinitializeLeastAccurate(learning_data, profiles_initialization_strategy=profiles_initialization_strategy, count=4)
termination_strategy = lc.TerminateAtAccuracy(learning_data, target_accuracy=len(learning_set.alternatives))
observer = IntermediatesObserver(problem, learning_data)

final_model = lc.LearnMrsortByWeightsProfilesBreed(
    learning_data,
    profiles_initialization_strategy,
    weights_optimization_strategy,
    profiles_improvement_strategy,
    breeding_strategy,
    termination_strategy,
    [observer],
).perform()

for model in observer.intermediate_models:
    model.dump(problem, sys.stdout)
final_model.dump(problem, sys.stdout)
kind: ncs-classification-model
format_version: 1
accepted_values:
  - kind: thresholds
    thresholds: [7.7909708, 4.06594753]
  - kind: thresholds
    thresholds: [20, 20]
  - kind: thresholds
    thresholds: [D, C]
sufficient_coalitions:
  - &coalitions
    kind: weights
    criterion_weights: [1, 0.999998987, 1.01327896e-06]
  - *coalitions
kind: ncs-classification-model
format_version: 1
accepted_values:
  - kind: thresholds
    thresholds: [7.95116329, 3.89878368]
  - kind: thresholds
    thresholds: [0, 21]
  - kind: thresholds
    thresholds: [C, B]
sufficient_coalitions:
  - &coalitions
    kind: weights
    criterion_weights: [1, 0, 1]
  - *coalitions
kind: ncs-classification-model
format_version: 1
accepted_values:
  - kind: thresholds
    thresholds: [7.96338844, 3.82566905]
  - kind: thresholds
    thresholds: [73, 84]
  - kind: thresholds
    thresholds: [B, B]
sufficient_coalitions:
  - &coalitions
    kind: weights
    criterion_weights: [1, 0, 1]
  - *coalitions
kind: ncs-classification-model
format_version: 1
accepted_values:
  - kind: thresholds
    thresholds: [7.96338844, 3.74707603]
  - kind: thresholds
    thresholds: [94, 99]
  - kind: thresholds
    thresholds: [B, B]
sufficient_coalitions:
  - &coalitions
    kind: weights
    criterion_weights: [1, 0, 1]
  - *coalitions
kind: ncs-classification-model
format_version: 1
accepted_values:
  - kind: thresholds
    thresholds: [7.95116329, 3.74707603]
  - kind: thresholds
    thresholds: [94, 99]
  - kind: thresholds
    thresholds: [B, B]
sufficient_coalitions:
  - &coalitions
    kind: weights
    criterion_weights: [1, 0, 1]
  - *coalitions

Other strategies

Coming up with new interesting strategies is far from easy, so in this guide, we’ll just describe the interfaces that the WPB approach expects from these strategies, and not even try to do anything intelligent. That part is up to you!

Each strategy must inherit from a given abstract base class, as you can see below. Each strategy must override a given method as is detailed below.

Profiles initialization strategies must implement .initialize_profiles(model_indexes_begin, model_indexes_end), that should initialize all profile_ranks for models at indexes in [learning_data.model_index[i] for i in range(model_indexes_begin, model_indexes_end)].

class SillyProfilesInitializationStrategy(lc.LearnMrsortByWeightsProfilesBreed.ProfilesInitializationStrategy):
    def __init__(self, learning_data):
        super().__init__()
        self.learning_data = learning_data

    def initialize_profiles(self, model_indexes_begin, model_indexes_end):
        print("initialize_profiles", model_indexes_begin, model_indexes_end, file=sys.stderr)
        for model_index_index in range(model_indexes_begin, model_indexes_end):
            model_index = learning_data.model_indexes[model_index_index]
            for boundary_index in range(self.learning_data.boundaries_count):
                for criterion_index in range(self.learning_data.criteria_count):
                    self.learning_data.profile_ranks[model_index][boundary_index][criterion_index] = 0

Weights optimization strategies must implement .optimize_weights(model_indexes_begin, model_indexes_end), that should optimize all weights for models at indexes in [learning_data.model_index[i] for i in range(model_indexes_begin, model_indexes_end)].

class SillyWeightsOptimizationStrategy(lc.LearnMrsortByWeightsProfilesBreed.WeightsOptimizationStrategy):
    def __init__(self, learning_data):
        super().__init__()
        self.learning_data = learning_data

    def optimize_weights(self, model_indexes_begin, model_indexes_end):
        print("optimize_weights", file=sys.stderr)
        for model_index_index in range(model_indexes_begin, model_indexes_end):
            model_index = learning_data.model_indexes[model_index_index]
            for criterion_index in range(self.learning_data.criteria_count):
                self.learning_data.weights[model_index][criterion_index] = 1.1 / self.learning_data.criteria_count

Profiles improvement strategies must implement .improve_profiles(model_indexes_begin, model_indexes_end), that should improve profile_ranks for models at indexes in [learning_data.model_index[i] for i in range(model_indexes_begin, model_indexes_end)].

class SillyProfilesImprovementStrategy(lc.LearnMrsortByWeightsProfilesBreed.ProfilesImprovementStrategy):
    def __init__(self, learning_data):
        super().__init__()
        self.learning_data = learning_data

    def improve_profiles(self, model_indexes_begin, model_indexes_end):
        print("improve_profiles", file=sys.stderr)
        for model_index_index in range(model_indexes_begin, model_indexes_end):
            model_index = learning_data.model_indexes[model_index_index]
            for boundary_index in range(self.learning_data.boundaries_count):
                for criterion_index in range(self.learning_data.criteria_count):
                    rank = (boundary_index + 1) * (self.learning_data.values_counts[criterion_index] // (self.learning_data.boundaries_count + 1))
                    self.learning_data.profile_ranks[model_index][boundary_index][criterion_index] = rank

Breeding strategies must implement .breed(), that should breed all models.

class SillyBreedingStrategy(lc.LearnMrsortByWeightsProfilesBreed.BreedingStrategy):
    def __init__(self, learning_data):
        super().__init__()
        self.learning_data = learning_data

    def breed(self):
        print("breed", file=sys.stderr)

Termination strategies must implement .terminate(), that should return True to terminate the learning.

class SillyTerminationStrategy(lc.LearnMrsortByWeightsProfilesBreed.TerminationStrategy):
    def __init__(self, learning_data):
        super().__init__()
        self.learning_data = learning_data

    def terminate(self):
        print("terminate", file=sys.stderr)
        return self.learning_data.iteration_index == 3
problem = lc.Problem(
    [
        lc.Criterion("Criterion 1", lc.Criterion.RealValues(lc.Criterion.PreferenceDirection.decreasing, 0, 10)),
        lc.Criterion("Criterion 2", lc.Criterion.IntegerValues(lc.Criterion.PreferenceDirection.increasing, 0, 100)),
        lc.Criterion("Criterion 3", lc.Criterion.EnumeratedValues(["F", "E", "D", "C", "B", "A"])),
    ],
    [lc.Category("Bad"), lc.Category("Medium"), lc.Category("Good")],
)
learning_set = lc.generate_alternatives(problem, lc.generate_mrsort_model(problem, random_seed=42), alternatives_count=1000, random_seed=43)

learning_data = lc.LearnMrsortByWeightsProfilesBreed.LearningData(problem, learning_set, models_count=9, random_seed=43)
profiles_initialization_strategy = SillyProfilesInitializationStrategy(learning_data)
weights_optimization_strategy = SillyWeightsOptimizationStrategy(learning_data)
profiles_improvement_strategy = SillyProfilesImprovementStrategy(learning_data)
breeding_strategy = SillyBreedingStrategy(learning_data)
termination_strategy = SillyTerminationStrategy(learning_data)

Here are the logs produced by these silly strategies during the learning:

learned_model = lc.LearnMrsortByWeightsProfilesBreed(
    learning_data,
    profiles_initialization_strategy,
    weights_optimization_strategy,
    profiles_improvement_strategy,
    breeding_strategy,
    termination_strategy,
).perform()
initialize_profiles 0 9
optimize_weights
improve_profiles
terminate
breed
optimize_weights
improve_profiles
terminate
breed
optimize_weights
improve_profiles
terminate
breed
optimize_weights
improve_profiles
terminate

And here is the silly model they produce:

learned_model.dump(problem, sys.stdout)
kind: ncs-classification-model
format_version: 1
accepted_values:
  - kind: thresholds
    thresholds: [6.94936371, 3.24692106]
  - kind: thresholds
    thresholds: [33, 66]
  - kind: thresholds
    thresholds: [D, B]
sufficient_coalitions:
  - &coalitions
    kind: weights
    criterion_weights: [0.366666675, 0.366666675, 0.366666675]
  - *coalitions