
How to install Multi Expression Programming XWindows 64bitJust download the program from here: mepx_win64.zip (2.37 MB) unzip the archive and run the mepx.exe application. There is no installation kit. Please remember where you saved it so that you can run it next time. Apple macOS / OSX (64bit) (version 10.9 or newer)Download the program from here: mepx_macos.zip (2.35 MB) It is .zip archive. Doubleclick it in Finder. It should be decompressed in the same folder as the zip archive. Open it from there (you should right click the icon and choose Open command). Ubuntu (64bit) (tested on Ubuntu 18)Download the program from here: mepx.deb (2.23 MB) Install the program with Ubuntu Software Center. If the icons are not shown on buttons, please open a Terminal and run the following command (this will display icons on buttons  which are disabled by default): gsettings set org.gnome.settingsdaemon.plugins.xsettings overrides "{'Gtk/ButtonImages': <1>, 'Gtk/MenuImages': <1>}"Test dataTest projects (taken from PROBEN1 and other datasets) can be downloaded from: MEPX test projects on Github. Just download a .xml file and press the Load project button from MEPX to load it. User manualQuick start
DataData are loaded from csv or txt files. Data must be separated by blank space, tab or ;. Last value on each line is the target (expected output). Test data can be without output (they may have one column less than training data). Currently the problems can have only 1 output (see below an exception for classication problems). Files containing multiple outputs must be split accordingly (for instance Building problem from PROBEN1 which has 3 outputs (energy, hot and cold water)). For classification problems, the last column may contain only values 0 or 1 (for binary classification) or values 0,1 ... (num_classes  1) for more multiclass classification. It is also possible that the output for classification problems to be given in Oneofm format. For instance if the problem has 5 classes, the output will have 5 values, one of them being set to 1 and all others being set to 0. This type of format is loaded from files with dt extension. Training data is compulsory. The others (validation and test) are optional. You can also load alphanumerical values and then convert them to numerical values. You have several specialised buttons for that:
The user can also scale numerical values to a given interval. ParametersFitness functionFitness (or the error) is computed as follows: For symbolic regression problems, the fitness is either Mean Absolute Error (sum of errors divided by the number of examples) or Mean Squared Error (sum of squared error divided by the number of examples). For classification problems the fitness is computed in multiple ways depending on problem or strategy. However what we report, in the resulted tables, is the percentage of incorrectly classified data (the number of incorrectly classified examples divided by the number of examples and multiplied by 100). Problem typeCan be:
A problem with 2 classes can be solved by selecting either binary classification or multiclass classification. Binary classification uses a threshold for makind distimction between classes. Values less or equal to the threshold are classified as belonging to class 0 and the others are classified as belonging to class 1. In the case of binary classification, the threshold is computed automatically (because of that, binary classification can be slower sometimes). For multiclass classification there are 3 strategies:
If use validation set is checked then, at each generation, the best individual is run against the validation set, and the best such individual (from those tested against the validation set) is the output of the program (and will be applied on the test data). It is possible to run the optimization on a smaller set of training data. In such case you have to set the Random subset size to a value smaller than the size of the training set. The set is changed after Num generations for which random subset is kept fixed. functions (or operators)Classic arithmetic operators +, , *, ... nothing new here. Do not confuse with genetic operators! Note that trigonometric operators work with radians. The algorithmMEPX uses a steady state model with multiple subpopulations. Steadystate means that inside one subpopulation, the worst individuals are replaced with newer ones (if the newer are better). User may specify the number of subpopulations. Each subpopulation will run independently from the others and, after one generation, they will exchange few individuals. Genetic operators (crossover and mutation) are classic ... nothing new here. It is possible to specify how often the variables, operators and constants should appear in a chromosome. This is done probabilistically. If you want more operators to appear, please increase the operators probability. More operators means more complex expressions. Sum of functions (operators) probability, variables probability and constants probability must be 1. The constants' probability is computed automatically as 1  the sum of the other 2 probabilities. ConstantsIn order to enable constants, one must define a probability greater than 0 for constants. You cannot edit that probability directly, but constants_probability + operators_probability + variables_probability = 1. So if you define a value for probability for operators or variables such that their sum is less than 0, you will get a greater than 0 value for constants. Constants can be user defined or generated by the program (over a given interval). Generated constants can be kept fixed for all the evolution or they can also evolve. Mutation of constants is done by adding a random value between [max delta, +max_delta]. RunsUsually multiple runs must be performed for computing some statistics. It is also possible to specify the initial seed of the first run (consecutive runs will start from previous seed + 1). Num threads  will run the subpopulations on multiple CPU cores. This can increase the speed of analysis significantly. If you have a quad core processor with hyperthreading, you may set the number of threads to 8. For best results make sure that the number of subpopulations is a multiple of number of threads. ResultsThe following results are displayed:
Reporting problems, bugs, commentsIf you have problems with this program please save the project (by pressing the Save Project button from the main toolbar) and send it to mihai.oltean@gmail.com 