Scratching that itch from ifelse

Okay, as I wrote yesterday, ifelse is rather slow, at least compared to working in C++. As my current project is using ifelse rather a lot, i decided to write a small utility function. In the expectation that I will collect a number of similar functions, I made a package out of it and posted it on github: https://github.com/ojessen/ojUtils

I get a speedup of about 30 times, independent of the target type.

Feedback and corrections greatly appreciated.

Thanks to the people at Travis for providing a free CI server which works directly with github. This of course is a tiny example, but it is good to know that the workflow to set this up can be done in 5 minutes.

And thanks to Romain Fraoncois for showing some Rcpp sugar:

Some data:

require(ojUtils)
## Loading required package: ojUtils
require(microbenchmark)
## Loading required package: microbenchmark
test = sample(c(T,F), size = 1e5, T)
yes = runif(1e5)
no = runif(1e5)

microbenchmark(ifelse(test, yes, no), ifelseC(test, yes, no))
## Loading required package: Rcpp
## Unit: microseconds
##                    expr   min      lq  median      uq    max neval
##   ifelse(test, yes, no) 31925 33404.8 34065.1 58083.5  71891   100
##  ifelseC(test, yes, no)   620   647.5   721.8   817.7 209254   100
test = sample(c(T,F), size = 1e5, T)
yes = rep("a", 1e5)
no = rep("b", 1e5)

microbenchmark(ifelse(test, yes, no), ifelseC(test, yes, no))
## Unit: milliseconds
##                    expr    min     lq median     uq   max neval
##   ifelse(test, yes, no) 57.313 58.763 59.626 72.435 87.92   100
##  ifelseC(test, yes, no)  1.747  1.837  1.926  2.749 29.56   100
test = sample(c(T,F), size = 1e5, T)
yes = rep(1L, 1e5)
no = rep(2L, 1e5)

microbenchmark(ifelse(test, yes, no), ifelseC(test, yes, no))
## Unit: microseconds
##                    expr     min      lq  median      uq   max neval
##   ifelse(test, yes, no) 30747.6 31868.5 32274.8 32829.0 59412   100
##  ifelseC(test, yes, no)   453.7   548.9   581.5   646.2 27575   100
test = sample(c(T,F), size = 1e5, T)
yes = rep(T, 1e5)
no = rep(F, 1e5)

microbenchmark(ifelse(test, yes, no), ifelseC(test, yes, no))
## Unit: microseconds
##                    expr     min      lq  median      uq   max neval
##   ifelse(test, yes, no) 29331.2 31167.3 31719.7 32455.3 60589   100
##  ifelseC(test, yes, no)   460.1   537.1   566.8   640.7 27118   100

Comparing ifelse with C++ for loop with ifs

I currently am reading a bit about using Rcpp and its potential for speeding up R. I found one unexpected example in the lecture from Hadley Wickham:

require(Rcpp)
signR <- function(x) {
  if (x > 0) {
    1
  } else if (x == 0) {
    0
  } else {
    -1
  }
}

cppFunction('int signC(int x) {
  if (x > 0) {
    return 1;
  } else if (x == 0) {
    return 0;
  } else {
    return -1;
  }
}')

require(microbenchmark)
microbenchmark(signC(rnorm(1)), signR(rnorm(1)),times = 1e5)
## Unit: microseconds
##             expr   min    lq median   uq  max neval
##  signC(rnorm(1)) 2.832 3.186  3.540 3.54 4130 1e+05
##  signR(rnorm(1)) 2.478 3.186  3.186 3.54 2641 1e+05

As expected, the two versions perform nearly identical. Now for the surprise: I changed the scalar version of signC into a vectorized version:

library(Rcpp)

cppFunction('IntegerVector signCVec(NumericVector x){
int n = x.size();
IntegerVector out(n);
for(int i = 0; i < n; i++){
            if(x[i] > 0 ){
out[i] = 1;
            } else if(x[i] == 0){
out[i] = 0;
            } else {
out[i] = -1;
            }
}
return out;
}

            ')

signRVec <- function(x) {
  ifelse(x > 0,1, ifelse(x == 0,0,-1))
}

Now I would have expected the two functions also to be rather similar in execution, but see for yourself:

x = rnorm(1e6)

microbenchmark(signCVec(x), signRVec(x),times = 10)
## Unit: milliseconds
##         expr    min      lq  median      uq     max neval
##  signCVec(x)   8.07   8.103   8.311   8.761   8.952    10
##  signRVec(x) 571.91 581.988 607.664 620.322 743.546    10

Wow: A 60-odd-times reduction using Rcpp.