Discussion:
[Rcpp-devel] segmentation fault when wrapping a big Eigen 2-d array
Wasey, Jack O
2018-04-09 11:22:53 UTC
Permalink
Dear Rcpp developers,

I'm having trouble with a reproducible error when wrapping a big Eigen dense matrix.

Rcpp::LogicalMatrix mat_out_bool = Rcpp::wrap(result);

Thread 1 "R" received signal SIGSEGV, Segmentation fault.
coerceToLogical (v=0x7fffa1a1c010) at coerce.c:441

Where 'result' is an Eigen::MatrixXi of size 472132x30, which is the result of a sparse x dense matrix multiplication.
DenseMap result = visit_codes_sparse * map;

The backtrace begins:
#0 coerceToLogical (v=0x7fffa1a1c010) at coerce.c:441
#1 Rf_coerceVector (v=<optimized out>, type=<optimized out>) at coerce.c:1243
#2 0x00007fffdc660515 in Rcpp::internal::basic_cast<10> (x=0x7fffa1a1c010) at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/r_cast.h:66
#3 0x00007fffdc65edc5 in Rcpp::internal::r_true_cast<10> (x=0x7fffa1a1c010) at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/r_cast.h:95
#4 0x00007fffdc664a9d in Rcpp::r_cast<10> (x=0x7fffa1a1c010) at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/r_cast.h:163
#5 0x00007fffdc666f99 in Rcpp::Matrix<10, Rcpp::PreserveStorage>::Matrix (this=0x7fffffff23a0, x=0x7fffa1a1c010)
at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/vector/Matrix.h:53
#6 0x00007fffdc68603e in icd9Comorbid_alt_MatMul (icd9df=..., icd9Mapping=..., visitId="visit_id", icd9Field="code", threads=8, chunk_size=256, omp_chunk_size=1)
at comorbid_alt_MatMul.cpp:271
#7 0x00007fffdc64a597 in _icd_icd9Comorbid_alt_MatMul_try (icd9dfSEXP=0x555558590cd8, icd9MappingSEXP=0x55555839d9d0, visitIdSEXP=0x55555a25e2b8, icd9FieldSEXP=0x55555c066298,
threadsSEXP=0x5555623279f8, chunk_sizeSEXP=0x5555586122e8, omp_chunk_sizeSEXP=0x555558612258) at RcppExports.cpp:347

I would love to give a small reproducible example, but at this point, I can only trigger the problem with a big data set (which makes valgrind unreasonably slow), and only by repeating the computation, as sometimes it does succeed. The code is in the eigen-sparse branch of https://github.com/jackwasey/icd . I do not expect anyone to download and debug my code (which FWIW compiles without any warnings), but I would like to know whether, in principle, I am doing something wrong with the integer to logical 'cast' when wrapping, or if this is possibly an Rcpp or RcppEigen bug. I can induce the crash on a server with huge amounts of RAM, and on Mac and Linux platforms, with and without OpenMP enabled.

Thanks,
Jack
Dirk Eddelbuettel
2018-04-09 12:09:55 UTC
Permalink
On 9 April 2018 at 07:22, Wasey, Jack O wrote:
| Dear Rcpp developers,
|
| I'm having trouble with a reproducible error when wrapping a big Eigen dense matrix.
|
| Rcpp::LogicalMatrix mat_out_bool = Rcpp::wrap(result);
|
| Thread 1 "R" received signal SIGSEGV, Segmentation fault.
| coerceToLogical (v=0x7fffa1a1c010) at coerce.c:441
|
| Where 'result' is an Eigen::MatrixXi of size 472132x30, which is the result of a sparse x dense matrix multiplication.
| DenseMap result = visit_codes_sparse * map;
|
| The backtrace begins:
| #0 coerceToLogical (v=0x7fffa1a1c010) at coerce.c:441
| #1 Rf_coerceVector (v=<optimized out>, type=<optimized out>) at coerce.c:1243
| #2 0x00007fffdc660515 in Rcpp::internal::basic_cast<10> (x=0x7fffa1a1c010) at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/r_cast.h:66
| #3 0x00007fffdc65edc5 in Rcpp::internal::r_true_cast<10> (x=0x7fffa1a1c010) at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/r_cast.h:95
| #4 0x00007fffdc664a9d in Rcpp::r_cast<10> (x=0x7fffa1a1c010) at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/r_cast.h:163
| #5 0x00007fffdc666f99 in Rcpp::Matrix<10, Rcpp::PreserveStorage>::Matrix (this=0x7fffffff23a0, x=0x7fffa1a1c010)
| at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/vector/Matrix.h:53
| #6 0x00007fffdc68603e in icd9Comorbid_alt_MatMul (icd9df=..., icd9Mapping=..., visitId="visit_id", icd9Field="code", threads=8, chunk_size=256, omp_chunk_size=1)
| at comorbid_alt_MatMul.cpp:271
| #7 0x00007fffdc64a597 in _icd_icd9Comorbid_alt_MatMul_try (icd9dfSEXP=0x555558590cd8, icd9MappingSEXP=0x55555839d9d0, visitIdSEXP=0x55555a25e2b8, icd9FieldSEXP=0x55555c066298,
| threadsSEXP=0x5555623279f8, chunk_sizeSEXP=0x5555586122e8, omp_chunk_sizeSEXP=0x555558612258) at RcppExports.cpp:347
|
| I would love to give a small reproducible example, but at this point, I can only trigger the problem with a big data set (which makes valgrind unreasonably slow), and only by repeating the computation, as sometimes it does succeed. The code is in the eigen-sparse branch of https://github.com/jackwasey/icd . I do not expect anyone to download and debug my code (which FWIW compiles without any warnings), but I would like to know whether, in principle, I am doing something wrong with the integer to logical 'cast' when wrapping, or if this is possibly an Rcpp or RcppEigen bug. I can induce the crash on a server with huge amounts of RAM, and on Mac and Linux platforms, with and without OpenMP enabled.

From the "well then don't do it" school: can you cast _before_ you interact
with R / Rcpp / RcppEigen? What happens when you try the "big" operation
entirely in C++ / Eigen and then "merely" transfer a known object (ie an int
vector/matrix) ? If that works, try bool and then see if that maps to Logical.

Some code paths are simply less well trodden. Int would be first route.

I may of course be entirely off the page. Early morning ...

Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | ***@debian.org
Jack Wasey
2018-04-09 13:51:21 UTC
Permalink
Post by Dirk Eddelbuettel
| Dear Rcpp developers,
|
| I'm having trouble with a reproducible error when wrapping a big Eigen dense matrix.
|
| Rcpp::LogicalMatrix mat_out_bool = Rcpp::wrap(result);
|
| Thread 1 "R" received signal SIGSEGV, Segmentation fault.
| coerceToLogical (v=0x7fffa1a1c010) at coerce.c:441
|
| Where 'result' is an Eigen::MatrixXi of size 472132x30, which is the result of a sparse x dense matrix multiplication.
| DenseMap result = visit_codes_sparse * map;
|
| #0 coerceToLogical (v=0x7fffa1a1c010) at coerce.c:441
| #1 Rf_coerceVector (v=<optimized out>, type=<optimized out>) at coerce.c:1243
| #2 0x00007fffdc660515 in Rcpp::internal::basic_cast<10> (x=0x7fffa1a1c010) at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/r_cast.h:66
| #3 0x00007fffdc65edc5 in Rcpp::internal::r_true_cast<10> (x=0x7fffa1a1c010) at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/r_cast.h:95
| #4 0x00007fffdc664a9d in Rcpp::r_cast<10> (x=0x7fffa1a1c010) at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/r_cast.h:163
| #5 0x00007fffdc666f99 in Rcpp::Matrix<10, Rcpp::PreserveStorage>::Matrix (this=0x7fffffff23a0, x=0x7fffa1a1c010)
| at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/vector/Matrix.h:53
| #6 0x00007fffdc68603e in icd9Comorbid_alt_MatMul (icd9df=..., icd9Mapping=..., visitId="visit_id", icd9Field="code", threads=8, chunk_size=256, omp_chunk_size=1)
| at comorbid_alt_MatMul.cpp:271
| #7 0x00007fffdc64a597 in _icd_icd9Comorbid_alt_MatMul_try (icd9dfSEXP=0x555558590cd8, icd9MappingSEXP=0x55555839d9d0, visitIdSEXP=0x55555a25e2b8, icd9FieldSEXP=0x55555c066298,
| threadsSEXP=0x5555623279f8, chunk_sizeSEXP=0x5555586122e8, omp_chunk_sizeSEXP=0x555558612258) at RcppExports.cpp:347
|
| I would love to give a small reproducible example, but at this point, I can only trigger the problem with a big data set (which makes valgrind unreasonably slow), and only by repeating the computation, as sometimes it does succeed. The code is in the eigen-sparse branch of https://github.com/jackwasey/icd . I do not expect anyone to download and debug my code (which FWIW compiles without any warnings), but I would like to know whether, in principle, I am doing something wrong with the integer to logical 'cast' when wrapping, or if this is possibly an Rcpp or RcppEigen bug. I can induce the crash on a server with huge amounts of RAM, and on Mac and Linux platforms, with and without OpenMP enabled.
From the "well then don't do it" school: can you cast _before_ you interact
with R / Rcpp / RcppEigen? What happens when you try the "big" operation
entirely in C++ / Eigen and then "merely" transfer a known object (ie an int
vector/matrix) ? If that works, try bool and then see if that maps to Logical.
Some code paths are simply less well trodden. Int would be first route.
I may of course be entirely off the page. Early morning ...
Dirk
--
I seem to tread in unusual code paths often, which is
why I appreciate so much your guidance. I can't reproduce the error now
I've converted from integer Eigen matrix to Rcpp::IntegerMatrix, then
within Rcpp, Integer to Logical matrices.

I naively assumed that if the compiler let me do it, it was a valid
cast. Does this mean I unearthered a bug somewhere down the line of
dependencies?

Thanks,
Jack
Dirk Eddelbuettel
2018-04-12 01:07:27 UTC
Permalink
On 9 April 2018 at 09:51, Jack Wasey wrote:
| I seem to tread in unusual code paths often, which is
| why I appreciate so much your guidance. I can't reproduce the error now
| I've converted from integer Eigen matrix to Rcpp::IntegerMatrix, then
| within Rcpp, Integer to Logical matrices.

I think it is super tempting to write compact code like that. But the
template magic is a little fragile, and nesting does not seem to help. So I
just learned to be defensive and do it one step at time..

Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | ***@debian.org
Loading...