Discussion:
[Rcpp-devel] Speed of RCppEigen Cholesky decomposition on sparse matrix
Hoffman, Gabriel
2018-11-26 17:23:24 UTC
Permalink
I am developing a statistical model and I have a prototype working in R
code. I make extensive use of sparse matrices, so the R code is pretty
fast, but hoped that using RCppEigen to evaluate the log-likelihood
function could avoid a lot of memory copying and be substantially
faster. However, in a simple example I am seeing that RCppEigen is
3-5x slower than standard R code for cholesky decomposition of a sparse
matrix. This is the case on R 3.5.1 using RcppEigen_0.3.3.4.0 on both
OS X and CentOS 6.9.

Since this simple operation is so much slower it doesn't seem like
using RCppEigen is worth it in this case. Is this an issue with BLAS,
some libraries or compiler options, or is R code really the fastest
option?


library(Matrix)
library(inline)

# construct sparse matrix
#########################

# construct a matrix C that is N x N with S total entries
# set C = crossprod(X)
N = 100000
S = 1000000
i = sample(1:1000, S, replace=TRUE)
j = sample(1:1000, S, replace=TRUE)
values = runif(S, 0, .3)
X = sparseMatrix(i=i, j=j, x = values, symmetric=FALSE )

C = as(crossprod(X), "dgCMatrix")

# check sparsity fraction
S / N^2

# define RCppEigen code
CholeskyCppSparse<-'
using Rcpp::as;
using Eigen::Map;
using Eigen::SparseMatrix;
using Eigen::MappedSparseMatrix;
using Eigen::SimplicialLLT;

// get data into RcppEigen
const MappedSparseMatrix<double> Sigma(as<MappedSparseMatrix<double> >(Sigma_in));

// compute Cholesky
typedef SimplicialLLT<SparseMatrix<double> > SpChol;
const SpChol Ch(Sigma);
'

CholSparse <- cxxfunction(signature(Sigma_in = "dgCMatrix"), CholeskyCppSparse, plugin = "RcppEigen")

# compare times
system.time(replicate(10, chol( C )))
# output:
# user system elapsed
# 0.341 0.014 0.355

system.time(replicate(10, CholSparse( C )))
# output:
# user system elapsed
# 1.639 0.046 1.687

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14

Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices datasets utils methods base

other attached packages:
[1] inline_0.3.15 Matrix_1.2-15

loaded via a namespace (and not attached):
[1] compiler_3.5.1 RcppEigen_0.3.3.4.0 Rcpp_1.0.0
[4] grid_3.5.1 lattice_0.20-38

Changing the size of the matrix and the number of entries does not
change the relative times much

Thanks,
- Gabriel
Dirk Eddelbuettel
2018-11-26 17:47:05 UTC
Permalink
Gabriel,

On 26 November 2018 at 17:23, Hoffman, Gabriel wrote:
| I am developing a statistical model and I have a prototype working in R
| code. I make extensive use of sparse matrices, so the R code is pretty
| fast, but hoped that using RCppEigen to evaluate the log-likelihood

(lowercase c generally: Rcpp, RcppEigen, ...)

| function could avoid a lot of memory copying and be substantially
| faster. However, in a simple example I am seeing that RCppEigen is
| 3-5x slower than standard R code for cholesky decomposition of a sparse
| matrix. This is the case on R 3.5.1 using RcppEigen_0.3.3.4.0 on both
| OS X and CentOS 6.9.
|
| Since this simple operation is so much slower it doesn't seem like
| using RCppEigen is worth it in this case. Is this an issue with BLAS,
| some libraries or compiler options, or is R code really the fastest
| option?

Well Eigen (famously) does not use BLAS (for dense matrices) so that
comparison is off. But then again sparse matrices never use BLAS anyway.

Eigen had sparse matrix code earlier which is why Doug originally connected
it to R via Rcpp and RcppEigen. We now also have sparse matrix code in
RcppArmadillo (thanks chiefly to Binxiang during his Google Summer of Code)
so you could compare those two. And then there is of course the Matrix
package which had all of this first first so you could consider that (from C
/ C++ as well). That may be fastest, but the most work to get "wired up". Or
maybe not.

But in essence: "what you see is what you get". There should be no layer in
either of these packages slowing access down, and ultimately the sparse
matrix libraries are often called. I think there may be some run-time
checks. RcppArmadillo has a vignette on this too...

I actually don't use sparse matrices all that much so maybe others will pipe
in.

Dirk

| library(Matrix)
| library(inline)
|
| # construct sparse matrix
| #########################
|
| # construct a matrix C that is N x N with S total entries
| # set C = crossprod(X)
| N = 100000
| S = 1000000
| i = sample(1:1000, S, replace=TRUE)
| j = sample(1:1000, S, replace=TRUE)
| values = runif(S, 0, .3)
| X = sparseMatrix(i=i, j=j, x = values, symmetric=FALSE )
|
| C = as(crossprod(X), "dgCMatrix")
|
| # check sparsity fraction
| S / N^2
|
| # define RCppEigen code
| CholeskyCppSparse<-'
| using Rcpp::as;
| using Eigen::Map;
| using Eigen::SparseMatrix;
| using Eigen::MappedSparseMatrix;
| using Eigen::SimplicialLLT;
|
| // get data into RcppEigen
| const MappedSparseMatrix<double> Sigma(as<MappedSparseMatrix<double> >(Sigma_in));
|
| // compute Cholesky
| typedef SimplicialLLT<SparseMatrix<double> > SpChol;
| const SpChol Ch(Sigma);
| '
|
| CholSparse <- cxxfunction(signature(Sigma_in = "dgCMatrix"), CholeskyCppSparse, plugin = "RcppEigen")
|
| # compare times
| system.time(replicate(10, chol( C )))
| # output:
| # user system elapsed
| # 0.341 0.014 0.355
|
| system.time(replicate(10, CholSparse( C )))
| # output:
| # user system elapsed
| # 1.639 0.046 1.687
|
| sessionInfo()
| R version 3.5.1 (2018-07-02)
| Platform: x86_64-apple-darwin15.6.0 (64-bit)
| Running under: macOS 10.14
|
| Matrix products: default
| BLAS:
| /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
| LAPACK:
| /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
|
| locale:
| [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
|
| attached base packages:
| [1] stats graphics grDevices datasets utils methods base
|
| other attached packages:
| [1] inline_0.3.15 Matrix_1.2-15
|
| loaded via a namespace (and not attached):
| [1] compiler_3.5.1 RcppEigen_0.3.3.4.0 Rcpp_1.0.0
| [4] grid_3.5.1 lattice_0.20-38
|
| Changing the size of the matrix and the number of entries does not
| change the relative times much
|
| Thanks,
| - Gabriel
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-***@lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
--
http://dirk.eddelbuettel.com | @eddelbuettel | ***@debian.org
Avraham Adler
2018-11-26 17:59:36 UTC
Permalink
If I recall correctly, Eigen does not use the BLAS, but has reprogrammed
the operations in C++. If you want to take advantage of a fast system BLAS,
try Armadillo, or in this case RcppArmadillo.

Thanks,

Avi
Post by Hoffman, Gabriel
I am developing a statistical model and I have a prototype working in R
code. I make extensive use of sparse matrices, so the R code is pretty
fast, but hoped that using RCppEigen to evaluate the log-likelihood
function could avoid a lot of memory copying and be substantially
faster. However, in a simple example I am seeing that RCppEigen is
3-5x slower than standard R code for cholesky decomposition of a sparse
matrix. This is the case on R 3.5.1 using RcppEigen_0.3.3.4.0 on both
OS X and CentOS 6.9.
Since this simple operation is so much slower it doesn't seem like
using RCppEigen is worth it in this case. Is this an issue with BLAS,
some libraries or compiler options, or is R code really the fastest
option?
library(Matrix)
library(inline)
# construct sparse matrix
#########################
# construct a matrix C that is N x N with S total entries
# set C = crossprod(X)
N = 100000
S = 1000000
i = sample(1:1000, S, replace=TRUE)
j = sample(1:1000, S, replace=TRUE)
values = runif(S, 0, .3)
X = sparseMatrix(i=i, j=j, x = values, symmetric=FALSE )
C = as(crossprod(X), "dgCMatrix")
# check sparsity fraction
S / N^2
# define RCppEigen code
CholeskyCppSparse<-'
using Rcpp::as;
using Eigen::Map;
using Eigen::SparseMatrix;
using Eigen::MappedSparseMatrix;
using Eigen::SimplicialLLT;
// get data into RcppEigen
const MappedSparseMatrix<double> Sigma(as<MappedSparseMatrix<double>
Post by Hoffman, Gabriel
(Sigma_in));
// compute Cholesky
typedef SimplicialLLT<SparseMatrix<double> > SpChol;
const SpChol Ch(Sigma);
'
CholSparse <- cxxfunction(signature(Sigma_in = "dgCMatrix"),
CholeskyCppSparse, plugin = "RcppEigen")
# compare times
system.time(replicate(10, chol( C )))
# user system elapsed
# 0.341 0.014 0.355
system.time(replicate(10, CholSparse( C )))
# user system elapsed
# 1.639 0.046 1.687
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14
Matrix products: default
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
[1] stats graphics grDevices datasets utils methods base
[1] inline_0.3.15 Matrix_1.2-15
[1] compiler_3.5.1 RcppEigen_0.3.3.4.0 Rcpp_1.0.0
[4] grid_3.5.1 lattice_0.20-38
Changing the size of the matrix and the number of entries does not
change the relative times much
Thanks,
- Gabriel
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
--
Sent from Gmail Mobile
Yixuan Qiu
2018-11-26 18:22:50 UTC
Permalink
I guess BLAS plays a minor role here. The operation here is a sparse
Cholesky decomposition, and if I'm correct the Matrix package in R uses
SuiteSparse/CHOLMOD to do the factorization, while Eigen uses its own
implementation. The underlying numerical algorithms may be different.


Best,
Yixuan
Post by Hoffman, Gabriel
I am developing a statistical model and I have a prototype working in R
code. I make extensive use of sparse matrices, so the R code is pretty
fast, but hoped that using RCppEigen to evaluate the log-likelihood
function could avoid a lot of memory copying and be substantially
faster. However, in a simple example I am seeing that RCppEigen is
3-5x slower than standard R code for cholesky decomposition of a sparse
matrix. This is the case on R 3.5.1 using RcppEigen_0.3.3.4.0 on both
OS X and CentOS 6.9.
Since this simple operation is so much slower it doesn't seem like
using RCppEigen is worth it in this case. Is this an issue with BLAS,
some libraries or compiler options, or is R code really the fastest
option?
library(Matrix)
library(inline)
# construct sparse matrix
#########################
# construct a matrix C that is N x N with S total entries
# set C = crossprod(X)
N = 100000
S = 1000000
i = sample(1:1000, S, replace=TRUE)
j = sample(1:1000, S, replace=TRUE)
values = runif(S, 0, .3)
X = sparseMatrix(i=i, j=j, x = values, symmetric=FALSE )
C = as(crossprod(X), "dgCMatrix")
# check sparsity fraction
S / N^2
# define RCppEigen code
CholeskyCppSparse<-'
using Rcpp::as;
using Eigen::Map;
using Eigen::SparseMatrix;
using Eigen::MappedSparseMatrix;
using Eigen::SimplicialLLT;
// get data into RcppEigen
const MappedSparseMatrix<double> Sigma(as<MappedSparseMatrix<double>
Post by Hoffman, Gabriel
(Sigma_in));
// compute Cholesky
typedef SimplicialLLT<SparseMatrix<double> > SpChol;
const SpChol Ch(Sigma);
'
CholSparse <- cxxfunction(signature(Sigma_in = "dgCMatrix"),
CholeskyCppSparse, plugin = "RcppEigen")
# compare times
system.time(replicate(10, chol( C )))
# user system elapsed
# 0.341 0.014 0.355
system.time(replicate(10, CholSparse( C )))
# user system elapsed
# 1.639 0.046 1.687
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14
Matrix products: default
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
[1] stats graphics grDevices datasets utils methods base
[1] inline_0.3.15 Matrix_1.2-15
[1] compiler_3.5.1 RcppEigen_0.3.3.4.0 Rcpp_1.0.0
[4] grid_3.5.1 lattice_0.20-38
Changing the size of the matrix and the number of entries does not
change the relative times much
Thanks,
- Gabriel
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Serguei Sokol
2018-11-27 14:33:55 UTC
Permalink
Post by Hoffman, Gabriel
I am developing a statistical model and I have a prototype working in R
code.  I make extensive use of sparse matrices, so the R code is pretty
fast, but hoped that using RCppEigen to evaluate the log-likelihood
function could avoid a lot of memory copying and be substantially
faster.  However, in a simple  example I am seeing that RCppEigen is
3-5x slower than standard R code for cholesky decomposition of a sparse
matrix.  This is the case on R 3.5.1 using RcppEigen_0.3.3.4.0 on both
OS X and CentOS 6.9.
Since this simple operation is so much slower it doesn't seem like
using RCppEigen is worth it in this case.  Is this an issue with BLAS,
some libraries or compiler options, or is R code really the fastest
option?
After few checks, it seems to be a test issue. Matrix package stores the
decomposition somewhere in attributes of the submitted matrix. So the
the repetitive calls requiring chol() decomposition are not really doing
the job. Instead, previously stored result is reused.

You can easily convince yourself by "modifying" the matrix C (and thus
invalidating previous decomposition) by doing something like "C + 0." :

system.time(replicate(10, chol( C )))
#utilisateur système écoulé
# 0.459 0.011 0.471
system.time(replicate(10, chol( C+0. )))
#utilisateur système écoulé
# 5.365 0.060 5.425
system.time(replicate(10, CholSparse( C+0. )))
#utilisateur système écoulé
# 3.627 0.035 3.665

On my machine, I have almost identical timing for CholSparse() with or
without C modification:

system.time(replicate(10, CholSparse( C )))
#utilisateur système écoulé
# 3.283 0.004 3.289
which proves that Eigen doesn't store the decomposition for future reuse
and redo the decomposition at each call on the same matrix.

Best,
Serguei.
Post by Hoffman, Gabriel
library(Matrix)
library(inline)
# construct sparse matrix
#########################
# construct a matrix C that is N x N with S total entries
# set C = crossprod(X)
N = 100000
S = 1000000
i = sample(1:1000, S, replace=TRUE)
j = sample(1:1000, S, replace=TRUE)
values = runif(S, 0, .3)
X = sparseMatrix(i=i, j=j, x = values, symmetric=FALSE )
C = as(crossprod(X), "dgCMatrix")
# check sparsity fraction
S / N^2
# define RCppEigen code
CholeskyCppSparse<-'
using Rcpp::as;
using Eigen::Map;
using Eigen::SparseMatrix;
using Eigen::MappedSparseMatrix;
using Eigen::SimplicialLLT;
// get data into RcppEigen
const MappedSparseMatrix<double> Sigma(as<MappedSparseMatrix<double>
Post by Hoffman, Gabriel
(Sigma_in));
// compute Cholesky
typedef SimplicialLLT<SparseMatrix<double> > SpChol;
const SpChol Ch(Sigma);
'
CholSparse <- cxxfunction(signature(Sigma_in = "dgCMatrix"),
CholeskyCppSparse, plugin = "RcppEigen")
# compare times
system.time(replicate(10, chol( C )))
#   user  system elapsed
#  0.341   0.014   0.355
system.time(replicate(10, CholSparse( C )))
#   user  system elapsed
# 1.639   0.046   1.687
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14
Matrix products: default
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
[1] stats     graphics  grDevices datasets  utils     methods   base
[1] inline_0.3.15 Matrix_1.2-15
[1] compiler_3.5.1      RcppEigen_0.3.3.4.0 Rcpp_1.0.0
[4] grid_3.5.1          lattice_0.20-38
Changing the size of the matrix and the number of entries does not
change the relative times much
Thanks,
- Gabriel
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Yixuan Qiu
2018-11-27 16:27:47 UTC
Permalink
This is really a brilliant observation!


Best,
Yixuan
Post by Serguei Sokol
Post by Hoffman, Gabriel
I am developing a statistical model and I have a prototype working in R
code. I make extensive use of sparse matrices, so the R code is pretty
fast, but hoped that using RCppEigen to evaluate the log-likelihood
function could avoid a lot of memory copying and be substantially
faster. However, in a simple example I am seeing that RCppEigen is
3-5x slower than standard R code for cholesky decomposition of a sparse
matrix. This is the case on R 3.5.1 using RcppEigen_0.3.3.4.0 on both
OS X and CentOS 6.9.
Since this simple operation is so much slower it doesn't seem like
using RCppEigen is worth it in this case. Is this an issue with BLAS,
some libraries or compiler options, or is R code really the fastest
option?
After few checks, it seems to be a test issue. Matrix package stores the
decomposition somewhere in attributes of the submitted matrix. So the
the repetitive calls requiring chol() decomposition are not really doing
the job. Instead, previously stored result is reused.
You can easily convince yourself by "modifying" the matrix C (and thus
system.time(replicate(10, chol( C )))
#utilisateur système écoulé
# 0.459 0.011 0.471
system.time(replicate(10, chol( C+0. )))
#utilisateur système écoulé
# 5.365 0.060 5.425
system.time(replicate(10, CholSparse( C+0. )))
#utilisateur système écoulé
# 3.627 0.035 3.665
On my machine, I have almost identical timing for CholSparse() with or
system.time(replicate(10, CholSparse( C )))
#utilisateur système écoulé
# 3.283 0.004 3.289
which proves that Eigen doesn't store the decomposition for future reuse
and redo the decomposition at each call on the same matrix.
Best,
Serguei.
Loading...