ARM-Java

Java implementation of the Adaptive Regression by Mixing algorithm

View the Project on GitHub ignacioarnaldo/arm-java

Adaptive Regression by Mixing

Adaptive Regression by Mixing (ARM) was introduced in:

Yuhong Yang: Adaptive Regression by Mixing. Journal of American Statistical Association, 96:454, 574-588, 2001.

ARM fuses the predictions of a set of models according to an estimation of their accuracy. The fused model obtained with ARM is a linear combination of the predictions of the original models. This way, it is possible to fuse predictions obtained with different models and algorithms. For instance, one could combine the predictions of a Neural Network with the predictions of CART etc.

For futher details, please check the overview of the ARM algorithm.

Tutorial

Step 1: Data format

Data must be provided in csv format where:

  1. each line corresponds to an exemplar
  2. each column contains the predictions of one model (procedure in the paper)
  3. the target/true values are placed in the last column
Any additional line or column containing labels or nominal values needs to be removed.

Step 2: Download the armfusion.jar file from here

Step 3: Running ARM

All you need to provide is the path to your prediction matrix in csv format and the number of iterations:

$ java -jar armfusion.jar -csv path_to_preds -iters num_iters

At the end of the run, the fused model is printed. Below, we show a fused model obtained by fusing 5 models:

   0.7 * y0
+ 0.0 * y1
+ 0.1 * y2
+ 0.2 * y3
+ 0.0 * y4

In the example, the first model y0 is assigned a weight of 0.7. Models y2 and y3 receive the weights 0.1 and 0.2 respectively while the remaining y1 and y4 have zero weights.

Publications

This implementation of ARM has been used in the following publications:

I Arnaldo, K Veeramachaneni, UM O'Reilly: Flash: A GP-GPU Ensemble Learning System for Handling Large Datasets. Genetic Programming. Lecture Notes in Computer Science Volume 8599, 2014, pp 13-24.

Veeramachaneni, K; Arnaldo, I; Derby, O; O’Reilly, UM: FlexGP: Cloud-Based Ensemble Learning with Genetic Programming for Large Regression Problems. Journal of Grid Computing. November, 2014.

Authors and Contributors

This project is developed is by Ignacio Arnaldo (@ignacioarnaldo) of the Any-Scale Learning For All (ALFA) group at MIT. Contact us by email at iarnaldo@mit.edu

ALFA