Discussion:
[Rcpp-devel] Subsetter uses int for indexing (among other issues)?
William Nolan
2018-11-07 06:45:53 UTC
Permalink
Hi all,

Longtime user and lurker here.

I got "index error" thrown by Rcpp when trying to subset a matrix with
width * height == 644764 * 3776 greater than MAXINT:

Allocating 647764 x 3776 matrix...
Catchpoint 1 (exception thrown), 0x00007ffff4baa920 in __cxa_throw () from
/usr/lib64/libstdc++.so.6
#1 0x0000000000473a05 in Rcpp::stop<>(char const*) (fmt=0x7fffe7bc3545
"index error")
at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/exceptions/cpp11/exceptions.h:52
52 throw Rcpp::exception( tfm::format(fmt,
std::forward<Args>(args)... ).c_str() );
(gdb) down
#0 0x00007ffff4baa920 in __cxa_throw () from /usr/lib64/libstdc++.so.6
(gdb) up
#1 0x0000000000473a05 in Rcpp::stop<>(char const*) (fmt=0x7fffe7bc3545
"index error")
at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/exceptions/cpp11/exceptions.h:52
52 throw Rcpp::exception( tfm::format(fmt,
std::forward<Args>(args)... ).c_str() );
(gdb) up
#2 0x00007fffe7bb75a6 in Rcpp::SubsetProxy<14, Rcpp::PreserveStorage, 13,
true, Rcpp::Vector<13, Rcpp::PreserveStorage> >::check_indices
(this=0x7fffffffc450, x=0x13f9920,
n=3751, size=-1849010432) at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/vector/:138
138 stop("index error");
(gdb) up
#3 0x00007fffe7bb6a3c in Rcpp::SubsetProxy<14, Rcpp::PreserveStorage, 13,
true, Rcpp::Vector<13, Rcpp::PreserveStorage> >::get_indices
(this=0x7fffffffc450, t=...)
at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/vector/Subsetter.h:149
149 check_indices(ptr, rhs_n, lhs_n);
(gdb) l
144 #endif
145
146 void get_indices( traits::identity< traits::int2type<INTSXP> >
t ) {
147 indices.reserve(rhs_n);
148 int* ptr = INTEGER(rhs);
149 check_indices(ptr, rhs_n, lhs_n);
150 for (int i=0; i < rhs_n; ++i) {
151 indices.push_back( rhs[i] );
152 }
153 indices_n = rhs_n;

As we can see from the stack trace and below, lhs.size() is negative when
cast to int:

(gdb) p (int)(lhs.size())
$12 = -1849010432

This is all coming from the assignment (via operator []) of the subsetting
of one matrix to another matrix's subset:

(static_cast<NumericVector&>(mat))[lhsI] =
(static_cast<NumericVector&>(signals))[rhsI];

(lhsI and rhsI are IntegerVector's)

Now, setting aside whether I *should* be doing that -- what I *do* see in
Subsetter.h (including what I understand to be the most recent version,
1.0.0 from github) is the use of int for indices all over the place in this
file, including in the member variable:

std::vector<int> indices;

Is there any reason why the indices that Subsetter uses internally
shouldn't be size_t or an equivalently capable type like R_xlen_t?
For example, Subsetter's check_indices function takes int's as arguments,
while Vector's size method returns R_xlen_t.

I'll change my code to manually copy elements via operator() using the
row/column arguments for now. Seems like Subsetter is maybe not quite
ready for prime time.
Qiang Kou
2018-11-07 07:23:02 UTC
Permalink
Hi, William,

Can you give us a small piece of code to reproduce the error? That will
ease our discussion.

Best,

KK
Post by William Nolan
Hi all,
Longtime user and lurker here.
I got "index error" thrown by Rcpp when trying to subset a matrix with
Allocating 647764 x 3776 matrix...
Catchpoint 1 (exception thrown), 0x00007ffff4baa920 in __cxa_throw () from
/usr/lib64/libstdc++.so.6
#1 0x0000000000473a05 in Rcpp::stop<>(char const*) (fmt=0x7fffe7bc3545
"index error")
at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/exceptions/cpp11/exceptions.h:52
52 throw Rcpp::exception( tfm::format(fmt,
std::forward<Args>(args)... ).c_str() );
(gdb) down
#0 0x00007ffff4baa920 in __cxa_throw () from /usr/lib64/libstdc++.so.6
(gdb) up
#1 0x0000000000473a05 in Rcpp::stop<>(char const*) (fmt=0x7fffe7bc3545
"index error")
at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/exceptions/cpp11/exceptions.h:52
52 throw Rcpp::exception( tfm::format(fmt,
std::forward<Args>(args)... ).c_str() );
(gdb) up
#2 0x00007fffe7bb75a6 in Rcpp::SubsetProxy<14, Rcpp::PreserveStorage, 13,
true, Rcpp::Vector<13, Rcpp::PreserveStorage> >::check_indices
(this=0x7fffffffc450, x=0x13f9920,
n=3751, size=-1849010432) at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/vector/:138
138 stop("index error");
(gdb) up
#3 0x00007fffe7bb6a3c in Rcpp::SubsetProxy<14, Rcpp::PreserveStorage, 13,
true, Rcpp::Vector<13, Rcpp::PreserveStorage> >::get_indices
(this=0x7fffffffc450, t=...)
at
/home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/vector/Subsetter.h:149
149 check_indices(ptr, rhs_n, lhs_n);
(gdb) l
144 #endif
145
146 void get_indices( traits::identity< traits::int2type<INTSXP> >
t ) {
147 indices.reserve(rhs_n);
148 int* ptr = INTEGER(rhs);
149 check_indices(ptr, rhs_n, lhs_n);
150 for (int i=0; i < rhs_n; ++i) {
151 indices.push_back( rhs[i] );
152 }
153 indices_n = rhs_n;
As we can see from the stack trace and below, lhs.size() is negative when
(gdb) p (int)(lhs.size())
$12 = -1849010432
This is all coming from the assignment (via operator []) of the subsetting
(static_cast<NumericVector&>(mat))[lhsI] =
(static_cast<NumericVector&>(signals))[rhsI];
(lhsI and rhsI are IntegerVector's)
Now, setting aside whether I *should* be doing that -- what I *do* see in
Subsetter.h (including what I understand to be the most recent version,
1.0.0 from github) is the use of int for indices all over the place in this
std::vector<int> indices;
Is there any reason why the indices that Subsetter uses internally
shouldn't be size_t or an equivalently capable type like R_xlen_t?
For example, Subsetter's check_indices function takes int's as arguments,
while Vector's size method returns R_xlen_t.
I'll change my code to manually copy elements via operator() using the
row/column arguments for now. Seems like Subsetter is maybe not quite
ready for prime time.
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Kevin Ushey
2018-11-07 16:29:05 UTC
Permalink
I agree we should be using R_xlen_t here. As always, PRs are welcome.

You might want to file this at https://github.com/RcppCore/Rcpp/issues
just so it doesn't get lost.

Thanks,
Kevin
Post by Qiang Kou
Hi, William,
Can you give us a small piece of code to reproduce the error? That will ease our discussion.
Best,
KK
Post by William Nolan
Hi all,
Longtime user and lurker here.
Allocating 647764 x 3776 matrix...
Catchpoint 1 (exception thrown), 0x00007ffff4baa920 in __cxa_throw () from /usr/lib64/libstdc++.so.6
#1 0x0000000000473a05 in Rcpp::stop<>(char const*) (fmt=0x7fffe7bc3545 "index error")
at /home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/exceptions/cpp11/exceptions.h:52
52 throw Rcpp::exception( tfm::format(fmt, std::forward<Args>(args)... ).c_str() );
(gdb) down
#0 0x00007ffff4baa920 in __cxa_throw () from /usr/lib64/libstdc++.so.6
(gdb) up
#1 0x0000000000473a05 in Rcpp::stop<>(char const*) (fmt=0x7fffe7bc3545 "index error")
at /home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/exceptions/cpp11/exceptions.h:52
52 throw Rcpp::exception( tfm::format(fmt, std::forward<Args>(args)... ).c_str() );
(gdb) up
#2 0x00007fffe7bb75a6 in Rcpp::SubsetProxy<14, Rcpp::PreserveStorage, 13, true, Rcpp::Vector<13, Rcpp::PreserveStorage> >::check_indices (this=0x7fffffffc450, x=0x13f9920,
n=3751, size=-1849010432) at /home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/vector/:138
138 stop("index error");
(gdb) up
#3 0x00007fffe7bb6a3c in Rcpp::SubsetProxy<14, Rcpp::PreserveStorage, 13, true, Rcpp::Vector<13, Rcpp::PreserveStorage> >::get_indices (this=0x7fffffffc450, t=...)
at /home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/vector/Subsetter.h:149
149 check_indices(ptr, rhs_n, lhs_n);
(gdb) l
144 #endif
145
146 void get_indices( traits::identity< traits::int2type<INTSXP> > t ) {
147 indices.reserve(rhs_n);
148 int* ptr = INTEGER(rhs);
149 check_indices(ptr, rhs_n, lhs_n);
150 for (int i=0; i < rhs_n; ++i) {
151 indices.push_back( rhs[i] );
152 }
153 indices_n = rhs_n;
(gdb) p (int)(lhs.size())
$12 = -1849010432
(static_cast<NumericVector&>(mat))[lhsI] = (static_cast<NumericVector&>(signals))[rhsI];
(lhsI and rhsI are IntegerVector's)
std::vector<int> indices;
Is there any reason why the indices that Subsetter uses internally shouldn't be size_t or an equivalently capable type like R_xlen_t?
For example, Subsetter's check_indices function takes int's as arguments, while Vector's size method returns R_xlen_t.
I'll change my code to manually copy elements via operator() using the row/column arguments for now. Seems like Subsetter is maybe not quite ready for prime time.
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
William Nolan
2018-11-08 00:19:29 UTC
Permalink
Thanks. Will put together a small working example and file it. I might even attempt a fix in my copious free time ;-).

Cheers,
Will
Post by Kevin Ushey
I agree we should be using R_xlen_t here. As always, PRs are welcome.
You might want to file this at https://github.com/RcppCore/Rcpp/issues
just so it doesn't get lost.
Thanks,
Kevin
Post by Qiang Kou
Hi, William,
Can you give us a small piece of code to reproduce the error? That will ease our discussion.
Best,
KK
Post by William Nolan
Hi all,
Longtime user and lurker here.
Allocating 647764 x 3776 matrix...
Catchpoint 1 (exception thrown), 0x00007ffff4baa920 in __cxa_throw () from /usr/lib64/libstdc++.so.6
#1 0x0000000000473a05 in Rcpp::stop<>(char const*) (fmt=0x7fffe7bc3545 "index error")
at /home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/exceptions/cpp11/exceptions.h:52
52 throw Rcpp::exception( tfm::format(fmt, std::forward<Args>(args)... ).c_str() );
(gdb) down
#0 0x00007ffff4baa920 in __cxa_throw () from /usr/lib64/libstdc++.so.6
(gdb) up
#1 0x0000000000473a05 in Rcpp::stop<>(char const*) (fmt=0x7fffe7bc3545 "index error")
at /home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/exceptions/cpp11/exceptions.h:52
52 throw Rcpp::exception( tfm::format(fmt, std::forward<Args>(args)... ).c_str() );
(gdb) up
#2 0x00007fffe7bb75a6 in Rcpp::SubsetProxy<14, Rcpp::PreserveStorage, 13, true, Rcpp::Vector<13, Rcpp::PreserveStorage> >::check_indices (this=0x7fffffffc450, x=0x13f9920,
n=3751, size=-1849010432) at /home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/vector/:138
138 stop("index error");
(gdb) up
#3 0x00007fffe7bb6a3c in Rcpp::SubsetProxy<14, Rcpp::PreserveStorage, 13, true, Rcpp::Vector<13, Rcpp::PreserveStorage> >::get_indices (this=0x7fffffffc450, t=...)
at /home/nolanw/R/x86_64-redhat-linux-gnu-library/3.5/Rcpp/include/Rcpp/vector/Subsetter.h:149
149 check_indices(ptr, rhs_n, lhs_n);
(gdb) l
144 #endif
145
146 void get_indices( traits::identity< traits::int2type<INTSXP> > t ) {
147 indices.reserve(rhs_n);
148 int* ptr = INTEGER(rhs);
149 check_indices(ptr, rhs_n, lhs_n);
150 for (int i=0; i < rhs_n; ++i) {
151 indices.push_back( rhs[i] );
152 }
153 indices_n = rhs_n;
(gdb) p (int)(lhs.size())
$12 = -1849010432
(static_cast<NumericVector&>(mat))[lhsI] = (static_cast<NumericVector&>(signals))[rhsI];
(lhsI and rhsI are IntegerVector's)
std::vector<int> indices;
Is there any reason why the indices that Subsetter uses internally shouldn't be size_t or an equivalently capable type like R_xlen_t?
For example, Subsetter's check_indices function takes int's as arguments, while Vector's size method returns R_xlen_t.
I'll change my code to manually copy elements via operator() using the row/column arguments for now. Seems like Subsetter is maybe not quite ready for prime time.
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Dirk Eddelbuettel
2018-11-08 02:30:20 UTC
Permalink
On 8 November 2018 at 09:19, William Nolan wrote:
| Thanks. Will put together a small working example and file it. I might even attempt a fix in my copious free time ;-).

Awesome :)

We'll help. Small steps and iterations work well.

Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | ***@debian.org
William Nolan
2018-11-08 05:45:46 UTC
Permalink
Thanks, guys.

I submitted issue #919 <https://github.com/RcppCore/Rcpp/issues/919> with a
small-ish test case showing the issue.

Will take a look at what's necessary to address. I see subsetting is in
the unit tests, but this is a funny one in that verifying the fix works
there would require a big chunk of memory (16GB, i.e. 2^31 * 8 bytes for a
Numeric) to be present on the tester's machine -- seems excessive.

Maybe it's enough for the fix to pass the existing unit tests, and pass the
testbed I put in the github issue. Anyway, TBD
Post by Dirk Eddelbuettel
| Thanks. Will put together a small working example and file it. I might
even attempt a fix in my copious free time ;-).
Awesome :)
We'll help. Small steps and iterations work well.
Dirk
--
William Nolan
2018-11-08 06:52:00 UTC
Permalink
Okay. Changes are done and appear to work for *just* my test case. Of
course I'd like to ensure I didn't break anything along the way.

I'd like to run the unit test suite on them before creating a PR, but am
unsure on what steps to take.

Given my local cloned git repo, with my edits to Subsetter.h, what's the
right way to recompile against it and run the unit tests there?

Many thanks,
Will
Post by William Nolan
Thanks, guys.
I submitted issue #919 <https://github.com/RcppCore/Rcpp/issues/919> with
a small-ish test case showing the issue.
Will take a look at what's necessary to address. I see subsetting is in
the unit tests, but this is a funny one in that verifying the fix works
there would require a big chunk of memory (16GB, i.e. 2^31 * 8 bytes for a
Numeric) to be present on the tester's machine -- seems excessive.
Maybe it's enough for the fix to pass the existing unit tests, and pass
the testbed I put in the github issue. Anyway, TBD
Post by Dirk Eddelbuettel
| Thanks. Will put together a small working example and file it. I might
even attempt a fix in my copious free time ;-).
Awesome :)
We'll help. Small steps and iterations work well.
Dirk
--
Qiang Kou
2018-11-08 07:28:03 UTC
Permalink
Hi, William,

I think you can send the PR first. It will trigger the unit tests on github.

Best,

KK
Post by William Nolan
Okay. Changes are done and appear to work for *just* my test case. Of
course I'd like to ensure I didn't break anything along the way.
I'd like to run the unit test suite on them before creating a PR, but am
unsure on what steps to take.
Given my local cloned git repo, with my edits to Subsetter.h, what's the
right way to recompile against it and run the unit tests there?
Many thanks,
Will
Post by William Nolan
Thanks, guys.
I submitted issue #919 <https://github.com/RcppCore/Rcpp/issues/919>
with a small-ish test case showing the issue.
Will take a look at what's necessary to address. I see subsetting is in
the unit tests, but this is a funny one in that verifying the fix works
there would require a big chunk of memory (16GB, i.e. 2^31 * 8 bytes for a
Numeric) to be present on the tester's machine -- seems excessive.
Maybe it's enough for the fix to pass the existing unit tests, and pass
the testbed I put in the github issue. Anyway, TBD
Post by Dirk Eddelbuettel
| Thanks. Will put together a small working example and file it. I might
even attempt a fix in my copious free time ;-).
Awesome :)
We'll help. Small steps and iterations work well.
Dirk
--
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Dirk Eddelbuettel
2018-11-08 12:35:30 UTC
Permalink
On 7 November 2018 at 23:28, Qiang Kou wrote:
| Hi, William,
|
| I think you can send the PR first. It will trigger the unit tests on github.

And running 'R CMD build ...' followed by 'R CMD check ...' on your modified
sources tests them locally. There are equivalent GUI buttons in RStudio you
can use and devtools has support too (that I am not familiar with).

Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | ***@debian.org
William Nolan
2018-11-09 05:14:50 UTC
Permalink
Thanks. That did the trick. After testing w/my own test harness, I made
sure the unit tests work with a baseline cloned fork.
Then integrated my changes into the fork, re-built/checked with bumped-up
version # and tests all passed again. Pushed fork to github.

I have just created PR #920 <https://github.com/RcppCore/Rcpp/pull/920>.
Thanks for all the help, and hopefully I didn't mess it up too badly on the
first try :-).

Regards,
Will
Post by Dirk Eddelbuettel
| Hi, William,
|
| I think you can send the PR first. It will trigger the unit tests on github.
And running 'R CMD build ...' followed by 'R CMD check ...' on your modified
sources tests them locally. There are equivalent GUI buttons in RStudio you
can use and devtools has support too (that I am not familiar with).
Dirk
--
Dirk Eddelbuettel
2018-11-09 12:36:24 UTC
Permalink
On 9 November 2018 at 14:14, William Nolan wrote:
| Thanks. That did the trick. After testing w/my own test harness, I made
| sure the unit tests work with a baseline cloned fork.
| Then integrated my changes into the fork, re-built/checked with bumped-up
| version # and tests all passed again. Pushed fork to github.

We mostly get emails from people who are stuck and do not work this out, so
it is refreshing to have a positive result :)

And yes, that's the way to do it. Even better than normal by first confirming
the baseline. Well done.

| I have just created PR #920 <https://github.com/RcppCore/Rcpp/pull/920>.
| Thanks for all the help, and hopefully I didn't mess it up too badly on the
| first try :-).

Looks good so far as I mentioned there, will get more scrutiny. So thanks
(and congrats :) already.

Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | ***@debian.org
Loading...