Discussion:
[Rcpp-devel] RcppParallel with string or char?
brian knaus
2018-02-27 20:43:03 UTC
Permalink
Hello,

I have an Rcpp function that I would like to speed up, so RcppParallel
seemed like a solution. The issue I'm having is that my data is not
numeric. I currently use an Rcpp::StringMatrix as input and
Rcpp::CharacterMatrix as output from the function. The examples at the Rcpp
Gallery all seem to consist of RMatrix<double> or RVector<double>. I was
able to find the below SO post which seems to ask my question but does not
appear to have resolved successfully.

https://stackoverflow.com/questions/29105531/rcppparallel-converting-charactermatrix-to-rmatrixt/29115437

Is there a thread safe non-numeric data structure that can be used with
RcppParallel?

Thank you!
Brian Knaus
Dirk Eddelbuettel
2018-02-27 22:30:10 UTC
Permalink
On 27 February 2018 at 12:43, brian knaus wrote:
| I have an Rcpp function that I would like to speed up, so RcppParallel
| seemed like a solution. The issue I'm having is that my data is not
| numeric. I currently use an Rcpp::StringMatrix as input and
| Rcpp::CharacterMatrix as output from the function. The examples at the Rcpp
| Gallery all seem to consist of RMatrix<double> or RVector<double>. I was
| able to find the below SO post which seems to ask my question but does not
| appear to have resolved successfully.
|
| https://stackoverflow.com/questions/29105531/rcppparallel-converting-charactermatrix-to-rmatrixt/29115437
|
| Is there a thread safe non-numeric data structure that can be used with
| RcppParallel?

You may have to craft a container similar to RMatrix:
https://github.com/RcppCore/RcppParallel/blob/master/inst/include/RcppParallel/RMatrix.h

Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | ***@debian.org
brian knaus
2018-02-27 23:46:52 UTC
Permalink
Thanks for your suggestion Dirk! But it may be beyond my abilities. I may
have to learn more than I anticipated on this one.

Brian
Post by Dirk Eddelbuettel
| I have an Rcpp function that I would like to speed up, so RcppParallel
| seemed like a solution. The issue I'm having is that my data is not
| numeric. I currently use an Rcpp::StringMatrix as input and
| Rcpp::CharacterMatrix as output from the function. The examples at the Rcpp
| Gallery all seem to consist of RMatrix<double> or RVector<double>. I was
| able to find the below SO post which seems to ask my question but does not
| appear to have resolved successfully.
|
| https://stackoverflow.com/questions/29105531/rcppparallel-converting-
charactermatrix-to-rmatrixt/29115437
|
| Is there a thread safe non-numeric data structure that can be used with
| RcppParallel?
https://github.com/RcppCore/RcppParallel/blob/master/inst/
include/RcppParallel/RMatrix.h
Dirk
--
Dirk Eddelbuettel
2018-02-27 23:51:22 UTC
Permalink
On 27 February 2018 at 15:46, brian knaus wrote:
| Thanks for your suggestion Dirk! But it may be beyond my abilities. I may
| have to learn more than I anticipated on this one.

Yes, it looks like a handful.

OTOH you may be able to get by with standard C++ container, possibly at the
cost of one initial copy. Maybe std::vector<std::string> can do the trick
for you and we do already have converters to those.

Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | ***@debian.org
brian knaus
2018-02-28 02:38:13 UTC
Permalink
Thanks for the suggestion! Unfortunately, I don't follow you. This is
likely because this is my first attempt at parallel code outside of one-off
mclapply(). My hope is to use this in a package on CRAN, so I'd like
something more portable than mclapply().

My understanding is that we need to work with 'thread safe' data
structures. For example, in the worker in (
http://gallery.rcpp.org/articles/parallel-matrix-transform/)
RMatrix<double> is used for input and output matrices. And part of the
point of RMatrix and RVector are to provide these 'thread safe' data
structures so the rest of us do not need to learn the details of their
complexities. When you say I could try using std::vector<std::string> do
you mean as a substitute for RMatrix in the worker? Or perhaps
std::vector<std::vector<std::string>> as a substitute for RMatrix?

Thank you!
Brian
Post by Dirk Eddelbuettel
| Thanks for your suggestion Dirk! But it may be beyond my abilities. I may
| have to learn more than I anticipated on this one.
Yes, it looks like a handful.
OTOH you may be able to get by with standard C++ container, possibly at the
cost of one initial copy. Maybe std::vector<std::string> can do the trick
for you and we do already have converters to those.
Dirk
--
Dirk Eddelbuettel
2018-02-28 02:58:33 UTC
Permalink
On 27 February 2018 at 18:38, brian knaus wrote:
| Thanks for the suggestion! Unfortunately, I don't follow you. This is
| likely because this is my first attempt at parallel code outside of one-off
| mclapply(). My hope is to use this in a package on CRAN, so I'd like
| something more portable than mclapply().
|
| My understanding is that we need to work with 'thread safe' data
| structures. For example, in the worker in (
| http://gallery.rcpp.org/articles/parallel-matrix-transform/)
| RMatrix<double> is used for input and output matrices. And part of the
| point of RMatrix and RVector are to provide these 'thread safe' data
| structures so the rest of us do not need to learn the details of their
| complexities. When you say I could try using std::vector<std::string> do
| you mean as a substitute for RMatrix in the worker? Or perhaps
| std::vector<std::vector<std::string>> as a substitute for RMatrix?

RMatrix (and RVector) exist because we cannot use Rcpp::NumericMatrix (and
Rcpp::NumericVector) as those are "proxy objects" which reuse the R-allocated
memory. That is not thread safe as R may have a gc event.

So converting your R text objects into std::vector<std::string> is also
thread-safe as it provides a distinct copy. Which is why I suggested it
earlier.

You said you had string data, so you likely need something just like
std::list<std::string> or std::vector<std::string> anyway.

Does that make sense? One word of caution, though: RcppParallel and
friends are user-friendly compared to doing it by hand using OpenMP etc, but
not quite as easy as mlapply. Maybe just cooking up a simpler C++ based
package and have that used on the parallel instances governed by mclapply?

Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | ***@debian.org
brian knaus
2018-02-28 16:18:35 UTC
Permalink
Thanks Dirk! I think that does make sense. I'll work on trying to implement
it. I don't think I'll get to it today, but I'll try to post back to the
group if I arrive at a solution.

Thanks!
Brian
Post by Dirk Eddelbuettel
| Thanks for the suggestion! Unfortunately, I don't follow you. This is
| likely because this is my first attempt at parallel code outside of one-off
| mclapply(). My hope is to use this in a package on CRAN, so I'd like
| something more portable than mclapply().
|
| My understanding is that we need to work with 'thread safe' data
| structures. For example, in the worker in (
| http://gallery.rcpp.org/articles/parallel-matrix-transform/)
| RMatrix<double> is used for input and output matrices. And part of the
| point of RMatrix and RVector are to provide these 'thread safe' data
| structures so the rest of us do not need to learn the details of their
| complexities. When you say I could try using std::vector<std::string> do
| you mean as a substitute for RMatrix in the worker? Or perhaps
| std::vector<std::vector<std::string>> as a substitute for RMatrix?
RMatrix (and RVector) exist because we cannot use Rcpp::NumericMatrix (and
Rcpp::NumericVector) as those are "proxy objects" which reuse the R-allocated
memory. That is not thread safe as R may have a gc event.
So converting your R text objects into std::vector<std::string> is also
thread-safe as it provides a distinct copy. Which is why I suggested it
earlier.
You said you had string data, so you likely need something just like
std::list<std::string> or std::vector<std::string> anyway.
Does that make sense? One word of caution, though: RcppParallel and
friends are user-friendly compared to doing it by hand using OpenMP etc, but
not quite as easy as mlapply. Maybe just cooking up a simpler C++ based
package and have that used on the parallel instances governed by mclapply?
Dirk
--
Loading...