Discussion:
[Rcpp-devel] Populate a Matrix in Parallel (Text Version)
Joseph Wood
2018-09-26 16:17:40 UTC
Permalink
As the subject states, this question is regarding populating a matrix
in parallel. I am currently reading "C++ Concurrency in Action:
Practical Multithreading" as I'd like to take some algorithms I have
to the next level. I have looked at the RcppParallel package, but the
features offered there do not seem to apply to this situation. I will
explain my reasoning further down. First, we set up our scenario.

1. We have an empty matrix with the number of rows equal to numRows
2. We are able to generate the ith row of the matrix at will
3. Our underlying subroutine populates the matrix from any
particular starting point one by one.

This scenario easily extends to a parallel setup. We have each entry
in our matrix being visited exactly one time by only one thread. The
idea is that if we have m threads, we can split up the work so that
each thread is responsible for populating roughly (numRows / m) number
of rows of our matrix.

Here is a simplified example that represents my real situation (In my
project I don't have the cpp11 plugin as I take care of this in
Makevars file with CXX_STD = CXX11):

#include <Rcpp.h>
#include <thread>

// [[Rcpp::plugins(cpp11)]]

int myFactorial(int n) {
int res = 1;
for (int i = 1; i <= n; ++i)
res *= i;

return res;
}

std::vector<int> nthPerm(int n, int index) {
int temp = myFactorial(n);
std::vector<int> indexVec(n), res(n);
std::iota(indexVec.begin(), indexVec.end(), 1);

for (int k = 0, r = n; k < n; ++k, --r) {
temp /= r;
int j = (int) index / temp;
res[k] = indexVec[j];
index -= (temp * j);
indexVec.erase(indexVec.begin() + j);
}

return res;
}

// Simplified version for demonstration only. The real subroutines
// that carry out this task are more complicated
void PopulateMatrix(Rcpp::IntegerMatrix permuteMatrix,
std::vector<int> z, int count, int nRows, int n) {
for (; count < nRows; ++count) {
for (int j = 0; j < n; ++j)
permuteMatrix(count, j) = z[j];

std::next_permutation(z.begin(), z.end());
}
}

// [[Rcpp::export]]
SEXP ParallelPerms(int n, int userThrds = 0) {
int nThreads = std::thread::hardware_concurrency() - 1;
std::vector<std::thread> myThreads;
nThreads = (userThrds > 0) ? std::min(userThrds, nThreads) : nThreads;

int step = 0, numRows = myFactorial(n);
int stepSize = numRows / nThreads;
int nextStep = stepSize;
std::vector<int> z(n);
std::iota(z.begin(), z.end(), 1);

Rcpp::IntegerMatrix myMat = Rcpp::no_init_matrix(numRows, n);

for (std::size_t j = 0; j < (nThreads - 1); ++j) {
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z,
step, nextStep, n);
step += stepSize;
nextStep += stepSize;
z = nthPerm(n, step);
}

// Guarantee that we get all the rows. N.B. We are passing numRows
// instead of nextStep... they aren't guaranteed to be the same
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z, step,
numRows, n);

for (auto& thr: myThreads)
thr.join();

return myMat;
}

I have read that Rcpp objects are not thread safe as they make
unpredictable calls to the garbage collector
(https://github.com/RcppCore/RcppParallel/issues/17), however Romain
Francois states:

"As soon as you don't use references for Rcpp types, you are not safe.
If you use references, it all depends on what you do with them."

I have a couple of questions regarding this.

My initial thought was I thought Rcpp objects were passed by reference
by default. Secondly, if this isn't the case, is it as simple as
adding an ampersand to all of my Rcpp objects in the function
parameters?

The project that I'm implementing this in (RcppAlgos) builds fine on
win-builder as well as all of the various rhub checks with no errors
(even check_with_sanitizers() and check_with_valgrind()). When I
submitted v 2.1.0 to CRAN, there were sporadic build errors on some of
the linux platforms. By sporadic, I mean sometimes it passes, and
other times it would fail with the error : segfault from C stack
overflow. The current version (v 2.2.0) still has the argument for
parallel computing, but it doesn't do anything. It is only there for
backwards compatibility.

When I initially submitted, it should be noted that I did not have my
matrices wrapped with std::ref in the call to create new threads, so
I'm not sure what effect this will have on those builds if I were to
submit to CRAN now. I will say that after I saw the errors on the
CRAN checks, I immediately sought to replicate them. I was successful
in extreme situations. For example, if I called the parallel
implementation thousands of times I could generate the error. I would
put these extreme tests in my tests folder and check them in a unit
test environment so as not to crash my r session.

I then sought out a more robust solution to my situation and found
that thread function arguments are by default pass by value, and if
you have a particular variable that is expected to be passed by
reference, then you must add std::ref (See
https://en.cppreference.com/w/cpp/thread/thread/thread). I have done
this and have noticed that I can't generate the issues with the
extreme tests above. HOWEVER, If I call it say 50000 times 10 times in
a row, I can sometimes generate an issue (not necessarily the segfault
above.. most of the time it is a stack imbalance warning... the
warning you get when the number of UNPROTECTS doesn't match the number
of PROTECTS in the R C API).

I then revisited the RcppParallel package to see if there were any
solutions there. I know that RcppParallel implements fully thread safe
objects like RMatrix<T>, however I can't see a way to set up a Worker
to populate my matrix. I guess the issue I see is that if you look at
my set-up above, we first get the starter vector with a call to
nthPerm from the parent function, then I pass this to PopulateMatrix
which proceeds to populate rows of my matrix for a given number of
rows. With the examples I've seen, there is no dependency on an
external function to get a specific entry point. I thought about
bypassing the RcppParallel::Worker altogether and simply use the
RMatrix<T> object, however I don't think this is how that object is to
be utilized. For example, I have not seen any RcppParallel examples
that preallocate an RMatrix<T> object.

My question is, is there a way to make my current set up thread-safe?
If not, is it possible at all to simply populate an object in parallel
safely?

Alternatively, if my question seems a bit naive, a nudge in the right
direction would be greatly appreciated. I don't mind going back to
square one, if need be.

Thanks,
Joseph Wood
Ralf Stubner
2018-09-26 16:37:28 UTC
Permalink
I haven’t read your message in detail, but the second example from here might be helpful: https://cran.r-project.org/web/packages/dqrng/vignettes/parallel.html

There a matrix is filled in parallel by splitting the columns among the threads. Splitting by columns is helpful since matrices in R are stored that way. You could use the same method to obtain the transpose of your matrix.

Greetings
Ralf

—
Ralf Stubner
Senior Software Engineer / Trainer

daqana GmbH
Dortustraße 48
14467 Potsdam

T: +49 331 23 70 81 66
F: +49 331 23 70 81 67
M: +49 162 20 91 196
Mail: ***@r-institute.com

Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr: DE300072622
GeschÀftsfÌhrer: Prof. Dr. Dr. Karl-Kuno Kunze
Post by Joseph Wood
As the subject states, this question is regarding populating a matrix
Practical Multithreading" as I'd like to take some algorithms I have
to the next level. I have looked at the RcppParallel package, but the
features offered there do not seem to apply to this situation. I will
explain my reasoning further down. First, we set up our scenario.
1. We have an empty matrix with the number of rows equal to numRows
2. We are able to generate the ith row of the matrix at will
3. Our underlying subroutine populates the matrix from any
particular starting point one by one.
This scenario easily extends to a parallel setup. We have each entry
in our matrix being visited exactly one time by only one thread. The
idea is that if we have m threads, we can split up the work so that
each thread is responsible for populating roughly (numRows / m) number
of rows of our matrix.
Here is a simplified example that represents my real situation (In my
project I don't have the cpp11 plugin as I take care of this in
#include <Rcpp.h>
#include <thread>
// [[Rcpp::plugins(cpp11)]]
int myFactorial(int n) {
int res = 1;
for (int i = 1; i <= n; ++i)
res *= i;
return res;
}
std::vector<int> nthPerm(int n, int index) {
int temp = myFactorial(n);
std::vector<int> indexVec(n), res(n);
std::iota(indexVec.begin(), indexVec.end(), 1);
for (int k = 0, r = n; k < n; ++k, --r) {
temp /= r;
int j = (int) index / temp;
res[k] = indexVec[j];
index -= (temp * j);
indexVec.erase(indexVec.begin() + j);
}
return res;
}
// Simplified version for demonstration only. The real subroutines
// that carry out this task are more complicated
void PopulateMatrix(Rcpp::IntegerMatrix permuteMatrix,
std::vector<int> z, int count, int nRows, int n) {
for (; count < nRows; ++count) {
for (int j = 0; j < n; ++j)
permuteMatrix(count, j) = z[j];
std::next_permutation(z.begin(), z.end());
}
}
// [[Rcpp::export]]
SEXP ParallelPerms(int n, int userThrds = 0) {
int nThreads = std::thread::hardware_concurrency() - 1;
std::vector<std::thread> myThreads;
nThreads = (userThrds > 0) ? std::min(userThrds, nThreads) : nThreads;
int step = 0, numRows = myFactorial(n);
int stepSize = numRows / nThreads;
int nextStep = stepSize;
std::vector<int> z(n);
std::iota(z.begin(), z.end(), 1);
Rcpp::IntegerMatrix myMat = Rcpp::no_init_matrix(numRows, n);
for (std::size_t j = 0; j < (nThreads - 1); ++j) {
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z,
step, nextStep, n);
step += stepSize;
nextStep += stepSize;
z = nthPerm(n, step);
}
// Guarantee that we get all the rows. N.B. We are passing numRows
// instead of nextStep... they aren't guaranteed to be the same
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z, step,
numRows, n);
for (auto& thr: myThreads)
thr.join();
return myMat;
}
I have read that Rcpp objects are not thread safe as they make
unpredictable calls to the garbage collector
(https://github.com/RcppCore/RcppParallel/issues/17), however Romain
"As soon as you don't use references for Rcpp types, you are not safe.
If you use references, it all depends on what you do with them."
I have a couple of questions regarding this.
My initial thought was I thought Rcpp objects were passed by reference
by default. Secondly, if this isn't the case, is it as simple as
adding an ampersand to all of my Rcpp objects in the function
parameters?
The project that I'm implementing this in (RcppAlgos) builds fine on
win-builder as well as all of the various rhub checks with no errors
(even check_with_sanitizers() and check_with_valgrind()). When I
submitted v 2.1.0 to CRAN, there were sporadic build errors on some of
the linux platforms. By sporadic, I mean sometimes it passes, and
other times it would fail with the error : segfault from C stack
overflow. The current version (v 2.2.0) still has the argument for
parallel computing, but it doesn't do anything. It is only there for
backwards compatibility.
When I initially submitted, it should be noted that I did not have my
matrices wrapped with std::ref in the call to create new threads, so
I'm not sure what effect this will have on those builds if I were to
submit to CRAN now. I will say that after I saw the errors on the
CRAN checks, I immediately sought to replicate them. I was successful
in extreme situations. For example, if I called the parallel
implementation thousands of times I could generate the error. I would
put these extreme tests in my tests folder and check them in a unit
test environment so as not to crash my r session.
I then sought out a more robust solution to my situation and found
that thread function arguments are by default pass by value, and if
you have a particular variable that is expected to be passed by
reference, then you must add std::ref (See
https://en.cppreference.com/w/cpp/thread/thread/thread). I have done
this and have noticed that I can't generate the issues with the
extreme tests above. HOWEVER, If I call it say 50000 times 10 times in
a row, I can sometimes generate an issue (not necessarily the segfault
above.. most of the time it is a stack imbalance warning... the
warning you get when the number of UNPROTECTS doesn't match the number
of PROTECTS in the R C API).
I then revisited the RcppParallel package to see if there were any
solutions there. I know that RcppParallel implements fully thread safe
objects like RMatrix<T>, however I can't see a way to set up a Worker
to populate my matrix. I guess the issue I see is that if you look at
my set-up above, we first get the starter vector with a call to
nthPerm from the parent function, then I pass this to PopulateMatrix
which proceeds to populate rows of my matrix for a given number of
rows. With the examples I've seen, there is no dependency on an
external function to get a specific entry point. I thought about
bypassing the RcppParallel::Worker altogether and simply use the
RMatrix<T> object, however I don't think this is how that object is to
be utilized. For example, I have not seen any RcppParallel examples
that preallocate an RMatrix<T> object.
My question is, is there a way to make my current set up thread-safe?
If not, is it possible at all to simply populate an object in parallel
safely?
Alternatively, if my question seems a bit naive, a nudge in the right
direction would be greatly appreciated. I don't mind going back to
square one, if need be.
Thanks,
Joseph Wood
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Alexis Sarda
2018-09-26 17:05:26 UTC
Permalink
I'm not sure I understand the problem. You've found RcppParallel, have you
looked at the examples? There's one with handling matrices here:

http://gallery.rcpp.org/articles/parallel-matrix-transform/

In the call to parallelFor() you can change the values you iterate over and
change your logic based on it.
In my package I do a lot of cross-distance matrix calculations with
RcppParallel, see e.g.:

https://github.com/asardaes/dtwclust/blob/master/src/distmat/fillers.cpp
https://github.com/asardaes/dtwclust/blob/master/CONTRIBUTING.md

Regards,
Alexis.
Post by Ralf Stubner
I haven’t read your message in detail, but the second example from here
https://cran.r-project.org/web/packages/dqrng/vignettes/parallel.html
There a matrix is filled in parallel by splitting the columns among the
threads. Splitting by columns is helpful since matrices in R are stored
that way. You could use the same method to obtain the transpose of your
matrix.
Greetings
Ralf
—
Ralf Stubner
Senior Software Engineer / Trainer
daqana GmbH
Dortustraße 48
14467 Potsdam
T: +49 331 23 70 81 66
F: +49 331 23 70 81 67
M: +49 162 20 91 196 <+49%20162%2020%2091%20196>
Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr: DE300072622
GeschÀftsfÌhrer: Prof. Dr. Dr. Karl-Kuno Kunze
As the subject states, this question is regarding populating a matrix
Practical Multithreading" as I'd like to take some algorithms I have
to the next level. I have looked at the RcppParallel package, but the
features offered there do not seem to apply to this situation. I will
explain my reasoning further down. First, we set up our scenario.
1. We have an empty matrix with the number of rows equal to numRows
2. We are able to generate the ith row of the matrix at will
3. Our underlying subroutine populates the matrix from any
particular starting point one by one.
This scenario easily extends to a parallel setup. We have each entry
in our matrix being visited exactly one time by only one thread. The
idea is that if we have m threads, we can split up the work so that
each thread is responsible for populating roughly (numRows / m) number
of rows of our matrix.
Here is a simplified example that represents my real situation (In my
project I don't have the cpp11 plugin as I take care of this in
#include <Rcpp.h>
#include <thread>
// [[Rcpp::plugins(cpp11)]]
int myFactorial(int n) {
int res = 1;
for (int i = 1; i <= n; ++i)
res *= i;
return res;
}
std::vector<int> nthPerm(int n, int index) {
int temp = myFactorial(n);
std::vector<int> indexVec(n), res(n);
std::iota(indexVec.begin(), indexVec.end(), 1);
for (int k = 0, r = n; k < n; ++k, --r) {
temp /= r;
int j = (int) index / temp;
res[k] = indexVec[j];
index -= (temp * j);
indexVec.erase(indexVec.begin() + j);
}
return res;
}
// Simplified version for demonstration only. The real subroutines
// that carry out this task are more complicated
void PopulateMatrix(Rcpp::IntegerMatrix permuteMatrix,
std::vector<int> z, int count, int nRows, int n) {
for (; count < nRows; ++count) {
for (int j = 0; j < n; ++j)
permuteMatrix(count, j) = z[j];
std::next_permutation(z.begin(), z.end());
}
}
// [[Rcpp::export]]
SEXP ParallelPerms(int n, int userThrds = 0) {
int nThreads = std::thread::hardware_concurrency() - 1;
std::vector<std::thread> myThreads;
nThreads = (userThrds > 0) ? std::min(userThrds, nThreads) : nThreads;
int step = 0, numRows = myFactorial(n);
int stepSize = numRows / nThreads;
int nextStep = stepSize;
std::vector<int> z(n);
std::iota(z.begin(), z.end(), 1);
Rcpp::IntegerMatrix myMat = Rcpp::no_init_matrix(numRows, n);
for (std::size_t j = 0; j < (nThreads - 1); ++j) {
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z,
step, nextStep, n);
step += stepSize;
nextStep += stepSize;
z = nthPerm(n, step);
}
// Guarantee that we get all the rows. N.B. We are passing numRows
// instead of nextStep... they aren't guaranteed to be the same
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z, step,
numRows, n);
for (auto& thr: myThreads)
thr.join();
return myMat;
}
I have read that Rcpp objects are not thread safe as they make
unpredictable calls to the garbage collector
(https://github.com/RcppCore/RcppParallel/issues/17), however Romain
"As soon as you don't use references for Rcpp types, you are not safe.
If you use references, it all depends on what you do with them."
I have a couple of questions regarding this.
My initial thought was I thought Rcpp objects were passed by reference
by default. Secondly, if this isn't the case, is it as simple as
adding an ampersand to all of my Rcpp objects in the function
parameters?
The project that I'm implementing this in (RcppAlgos) builds fine on
win-builder as well as all of the various rhub checks with no errors
(even check_with_sanitizers() and check_with_valgrind()). When I
submitted v 2.1.0 to CRAN, there were sporadic build errors on some of
the linux platforms. By sporadic, I mean sometimes it passes, and
other times it would fail with the error : segfault from C stack
overflow. The current version (v 2.2.0) still has the argument for
parallel computing, but it doesn't do anything. It is only there for
backwards compatibility.
When I initially submitted, it should be noted that I did not have my
matrices wrapped with std::ref in the call to create new threads, so
I'm not sure what effect this will have on those builds if I were to
submit to CRAN now. I will say that after I saw the errors on the
CRAN checks, I immediately sought to replicate them. I was successful
in extreme situations. For example, if I called the parallel
implementation thousands of times I could generate the error. I would
put these extreme tests in my tests folder and check them in a unit
test environment so as not to crash my r session.
I then sought out a more robust solution to my situation and found
that thread function arguments are by default pass by value, and if
you have a particular variable that is expected to be passed by
reference, then you must add std::ref (See
https://en.cppreference.com/w/cpp/thread/thread/thread). I have done
this and have noticed that I can't generate the issues with the
extreme tests above. HOWEVER, If I call it say 50000 times 10 times in
a row, I can sometimes generate an issue (not necessarily the segfault
above.. most of the time it is a stack imbalance warning... the
warning you get when the number of UNPROTECTS doesn't match the number
of PROTECTS in the R C API).
I then revisited the RcppParallel package to see if there were any
solutions there. I know that RcppParallel implements fully thread safe
objects like RMatrix<T>, however I can't see a way to set up a Worker
to populate my matrix. I guess the issue I see is that if you look at
my set-up above, we first get the starter vector with a call to
nthPerm from the parent function, then I pass this to PopulateMatrix
which proceeds to populate rows of my matrix for a given number of
rows. With the examples I've seen, there is no dependency on an
external function to get a specific entry point. I thought about
bypassing the RcppParallel::Worker altogether and simply use the
RMatrix<T> object, however I don't think this is how that object is to
be utilized. For example, I have not seen any RcppParallel examples
that preallocate an RMatrix<T> object.
My question is, is there a way to make my current set up thread-safe?
If not, is it possible at all to simply populate an object in parallel
safely?
Alternatively, if my question seems a bit naive, a nudge in the right
direction would be greatly appreciated. I don't mind going back to
square one, if need be.
Thanks,
Joseph Wood
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Joseph Wood
2018-09-28 00:04:52 UTC
Permalink
Hey Ralf and Alexis,

Thank you so much for your replies. All of the resources and
information are very much appreciated.

Alexis, yes I have looked at many examples including the one you
linked to dealing with matrix transform. I'm really glad you linked
that, because this is a perfect example of why this will not work for
my situation. I will first note that all of the examples I have seen
thus far, are carrying out operations that are independent of the
other entries in the matrix.

My situation is fundamentally different. The algorithm that fills the
matrix does so in a way that relies on the previous row and more
importantly, has a starter vector that isn't apart of the populate
process. That is, in the example I gave, you will notice I get the ith
entry point by calling nthPerm, and once I call PopulateMatrix, the
matrix is populated calling std::next_permutation which is dependent
on the previous permutation. The std::next_permutation is just an
example, but perfectly represents my challenge here. This algorithm
can't produce the 10th permutation without first being fed the 9th
permutation. In the matrix transform example, the square root of the
10th row has nothing to do with the 9th row whatsoever.

I hope this clarifies my question above, and I'm sorry for not
explaining more fully.

Again, I really appreciate the advice.

Joseph Wood
Post by Alexis Sarda
http://gallery.rcpp.org/articles/parallel-matrix-transform/
In the call to parallelFor() you can change the values you iterate over and change your logic based on it.
https://github.com/asardaes/dtwclust/blob/master/src/distmat/fillers.cpp
https://github.com/asardaes/dtwclust/blob/master/CONTRIBUTING.md
Regards,
Alexis.
I haven’t read your message in detail, but the second example from here might be helpful: https://cran.r-project.org/web/packages/dqrng/vignettes/parallel.html
There a matrix is filled in parallel by splitting the columns among the threads. Splitting by columns is helpful since matrices in R are stored that way. You could use the same method to obtain the transpose of your matrix.
Greetings
Ralf

Ralf Stubner
Senior Software Engineer / Trainer
daqana GmbH
Dortustraße 48
14467 Potsdam
T: +49 331 23 70 81 66
F: +49 331 23 70 81 67
M: +49 162 20 91 196
Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr: DE300072622
Geschäftsführer: Prof. Dr. Dr. Karl-Kuno Kunze
As the subject states, this question is regarding populating a matrix
Practical Multithreading" as I'd like to take some algorithms I have
to the next level. I have looked at the RcppParallel package, but the
features offered there do not seem to apply to this situation. I will
explain my reasoning further down. First, we set up our scenario.
1. We have an empty matrix with the number of rows equal to numRows
2. We are able to generate the ith row of the matrix at will
3. Our underlying subroutine populates the matrix from any
particular starting point one by one.
This scenario easily extends to a parallel setup. We have each entry
in our matrix being visited exactly one time by only one thread. The
idea is that if we have m threads, we can split up the work so that
each thread is responsible for populating roughly (numRows / m) number
of rows of our matrix.
Here is a simplified example that represents my real situation (In my
project I don't have the cpp11 plugin as I take care of this in
#include <Rcpp.h>
#include <thread>
// [[Rcpp::plugins(cpp11)]]
int myFactorial(int n) {
int res = 1;
for (int i = 1; i <= n; ++i)
res *= i;
return res;
}
std::vector<int> nthPerm(int n, int index) {
int temp = myFactorial(n);
std::vector<int> indexVec(n), res(n);
std::iota(indexVec.begin(), indexVec.end(), 1);
for (int k = 0, r = n; k < n; ++k, --r) {
temp /= r;
int j = (int) index / temp;
res[k] = indexVec[j];
index -= (temp * j);
indexVec.erase(indexVec.begin() + j);
}
return res;
}
// Simplified version for demonstration only. The real subroutines
// that carry out this task are more complicated
void PopulateMatrix(Rcpp::IntegerMatrix permuteMatrix,
std::vector<int> z, int count, int nRows, int n) {
for (; count < nRows; ++count) {
for (int j = 0; j < n; ++j)
permuteMatrix(count, j) = z[j];
std::next_permutation(z.begin(), z.end());
}
}
// [[Rcpp::export]]
SEXP ParallelPerms(int n, int userThrds = 0) {
int nThreads = std::thread::hardware_concurrency() - 1;
std::vector<std::thread> myThreads;
nThreads = (userThrds > 0) ? std::min(userThrds, nThreads) : nThreads;
int step = 0, numRows = myFactorial(n);
int stepSize = numRows / nThreads;
int nextStep = stepSize;
std::vector<int> z(n);
std::iota(z.begin(), z.end(), 1);
Rcpp::IntegerMatrix myMat = Rcpp::no_init_matrix(numRows, n);
for (std::size_t j = 0; j < (nThreads - 1); ++j) {
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z,
step, nextStep, n);
step += stepSize;
nextStep += stepSize;
z = nthPerm(n, step);
}
// Guarantee that we get all the rows. N.B. We are passing numRows
// instead of nextStep... they aren't guaranteed to be the same
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z, step,
numRows, n);
for (auto& thr: myThreads)
thr.join();
return myMat;
}
I have read that Rcpp objects are not thread safe as they make
unpredictable calls to the garbage collector
(https://github.com/RcppCore/RcppParallel/issues/17), however Romain
"As soon as you don't use references for Rcpp types, you are not safe.
If you use references, it all depends on what you do with them."
I have a couple of questions regarding this.
My initial thought was I thought Rcpp objects were passed by reference
by default. Secondly, if this isn't the case, is it as simple as
adding an ampersand to all of my Rcpp objects in the function
parameters?
The project that I'm implementing this in (RcppAlgos) builds fine on
win-builder as well as all of the various rhub checks with no errors
(even check_with_sanitizers() and check_with_valgrind()). When I
submitted v 2.1.0 to CRAN, there were sporadic build errors on some of
the linux platforms. By sporadic, I mean sometimes it passes, and
other times it would fail with the error : segfault from C stack
overflow. The current version (v 2.2.0) still has the argument for
parallel computing, but it doesn't do anything. It is only there for
backwards compatibility.
When I initially submitted, it should be noted that I did not have my
matrices wrapped with std::ref in the call to create new threads, so
I'm not sure what effect this will have on those builds if I were to
submit to CRAN now. I will say that after I saw the errors on the
CRAN checks, I immediately sought to replicate them. I was successful
in extreme situations. For example, if I called the parallel
implementation thousands of times I could generate the error. I would
put these extreme tests in my tests folder and check them in a unit
test environment so as not to crash my r session.
I then sought out a more robust solution to my situation and found
that thread function arguments are by default pass by value, and if
you have a particular variable that is expected to be passed by
reference, then you must add std::ref (See
https://en.cppreference.com/w/cpp/thread/thread/thread). I have done
this and have noticed that I can't generate the issues with the
extreme tests above. HOWEVER, If I call it say 50000 times 10 times in
a row, I can sometimes generate an issue (not necessarily the segfault
above.. most of the time it is a stack imbalance warning... the
warning you get when the number of UNPROTECTS doesn't match the number
of PROTECTS in the R C API).
I then revisited the RcppParallel package to see if there were any
solutions there. I know that RcppParallel implements fully thread safe
objects like RMatrix<T>, however I can't see a way to set up a Worker
to populate my matrix. I guess the issue I see is that if you look at
my set-up above, we first get the starter vector with a call to
nthPerm from the parent function, then I pass this to PopulateMatrix
which proceeds to populate rows of my matrix for a given number of
rows. With the examples I've seen, there is no dependency on an
external function to get a specific entry point. I thought about
bypassing the RcppParallel::Worker altogether and simply use the
RMatrix<T> object, however I don't think this is how that object is to
be utilized. For example, I have not seen any RcppParallel examples
that preallocate an RMatrix<T> object.
My question is, is there a way to make my current set up thread-safe?
If not, is it possible at all to simply populate an object in parallel
safely?
Alternatively, if my question seems a bit naive, a nudge in the right
direction would be greatly appreciated. I don't mind going back to
square one, if need be.
Thanks,
Joseph Wood
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Jeff Newmiller
2018-09-28 00:59:18 UTC
Permalink
You cannot parallelize a serial calculation. Sorry, parallelism is not a magic wand that you can wave at any problem. If you find portions of your calculations are independent, then you can parallelize those portions and do the rest serially.
Post by Joseph Wood
Hey Ralf and Alexis,
Thank you so much for your replies. All of the resources and
information are very much appreciated.
Alexis, yes I have looked at many examples including the one you
linked to dealing with matrix transform. I'm really glad you linked
that, because this is a perfect example of why this will not work for
my situation. I will first note that all of the examples I have seen
thus far, are carrying out operations that are independent of the
other entries in the matrix.
My situation is fundamentally different. The algorithm that fills the
matrix does so in a way that relies on the previous row and more
importantly, has a starter vector that isn't apart of the populate
process. That is, in the example I gave, you will notice I get the ith
entry point by calling nthPerm, and once I call PopulateMatrix, the
matrix is populated calling std::next_permutation which is dependent
on the previous permutation. The std::next_permutation is just an
example, but perfectly represents my challenge here. This algorithm
can't produce the 10th permutation without first being fed the 9th
permutation. In the matrix transform example, the square root of the
10th row has nothing to do with the 9th row whatsoever.
I hope this clarifies my question above, and I'm sorry for not
explaining more fully.
Again, I really appreciate the advice.
Joseph Wood
Post by Alexis Sarda
I'm not sure I understand the problem. You've found RcppParallel,
have you looked at the examples? There's one with handling matrices
Post by Alexis Sarda
http://gallery.rcpp.org/articles/parallel-matrix-transform/
In the call to parallelFor() you can change the values you iterate
over and change your logic based on it.
Post by Alexis Sarda
In my package I do a lot of cross-distance matrix calculations with
https://github.com/asardaes/dtwclust/blob/master/src/distmat/fillers.cpp
Post by Alexis Sarda
https://github.com/asardaes/dtwclust/blob/master/CONTRIBUTING.md
Regards,
Alexis.
On Wed, Sep 26, 2018 at 6:37 PM Ralf Stubner
I haven’t read your message in detail, but the second example from
https://cran.r-project.org/web/packages/dqrng/vignettes/parallel.html
Post by Alexis Sarda
There a matrix is filled in parallel by splitting the columns among
the threads. Splitting by columns is helpful since matrices in R are
stored that way. You could use the same method to obtain the transpose
of your matrix.
Post by Alexis Sarda
Greetings
Ralf

Ralf Stubner
Senior Software Engineer / Trainer
daqana GmbH
Dortustraße 48
14467 Potsdam
T: +49 331 23 70 81 66
F: +49 331 23 70 81 67
M: +49 162 20 91 196
Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr: DE300072622
Geschäftsführer: Prof. Dr. Dr. Karl-Kuno Kunze
As the subject states, this question is regarding populating a
matrix
Post by Alexis Sarda
Practical Multithreading" as I'd like to take some algorithms I have
to the next level. I have looked at the RcppParallel package, but
the
Post by Alexis Sarda
features offered there do not seem to apply to this situation. I
will
Post by Alexis Sarda
explain my reasoning further down. First, we set up our scenario.
1. We have an empty matrix with the number of rows equal to numRows
2. We are able to generate the ith row of the matrix at will
3. Our underlying subroutine populates the matrix from any
particular starting point one by one.
This scenario easily extends to a parallel setup. We have each entry
in our matrix being visited exactly one time by only one thread. The
idea is that if we have m threads, we can split up the work so that
each thread is responsible for populating roughly (numRows / m)
number
Post by Alexis Sarda
of rows of our matrix.
Here is a simplified example that represents my real situation (In
my
Post by Alexis Sarda
project I don't have the cpp11 plugin as I take care of this in
#include <Rcpp.h>
#include <thread>
// [[Rcpp::plugins(cpp11)]]
int myFactorial(int n) {
int res = 1;
for (int i = 1; i <= n; ++i)
res *= i;
return res;
}
std::vector<int> nthPerm(int n, int index) {
int temp = myFactorial(n);
std::vector<int> indexVec(n), res(n);
std::iota(indexVec.begin(), indexVec.end(), 1);
for (int k = 0, r = n; k < n; ++k, --r) {
temp /= r;
int j = (int) index / temp;
res[k] = indexVec[j];
index -= (temp * j);
indexVec.erase(indexVec.begin() + j);
}
return res;
}
// Simplified version for demonstration only. The real subroutines
// that carry out this task are more complicated
void PopulateMatrix(Rcpp::IntegerMatrix permuteMatrix,
std::vector<int> z, int count, int nRows, int n)
{
Post by Alexis Sarda
for (; count < nRows; ++count) {
for (int j = 0; j < n; ++j)
permuteMatrix(count, j) = z[j];
std::next_permutation(z.begin(), z.end());
}
}
// [[Rcpp::export]]
SEXP ParallelPerms(int n, int userThrds = 0) {
int nThreads = std::thread::hardware_concurrency() - 1;
std::vector<std::thread> myThreads;
nThreads;
Post by Alexis Sarda
int step = 0, numRows = myFactorial(n);
int stepSize = numRows / nThreads;
int nextStep = stepSize;
std::vector<int> z(n);
std::iota(z.begin(), z.end(), 1);
Rcpp::IntegerMatrix myMat = Rcpp::no_init_matrix(numRows, n);
for (std::size_t j = 0; j < (nThreads - 1); ++j) {
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z,
step, nextStep, n);
step += stepSize;
nextStep += stepSize;
z = nthPerm(n, step);
}
// Guarantee that we get all the rows. N.B. We are passing
numRows
Post by Alexis Sarda
// instead of nextStep... they aren't guaranteed to be the same
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z, step,
numRows, n);
for (auto& thr: myThreads)
thr.join();
return myMat;
}
I have read that Rcpp objects are not thread safe as they make
unpredictable calls to the garbage collector
(https://github.com/RcppCore/RcppParallel/issues/17), however Romain
"As soon as you don't use references for Rcpp types, you are not
safe.
Post by Alexis Sarda
If you use references, it all depends on what you do with them."
I have a couple of questions regarding this.
My initial thought was I thought Rcpp objects were passed by
reference
Post by Alexis Sarda
by default. Secondly, if this isn't the case, is it as simple as
adding an ampersand to all of my Rcpp objects in the function
parameters?
The project that I'm implementing this in (RcppAlgos) builds fine on
win-builder as well as all of the various rhub checks with no errors
(even check_with_sanitizers() and check_with_valgrind()). When I
submitted v 2.1.0 to CRAN, there were sporadic build errors on some
of
Post by Alexis Sarda
the linux platforms. By sporadic, I mean sometimes it passes, and
other times it would fail with the error : segfault from C stack
overflow. The current version (v 2.2.0) still has the argument for
parallel computing, but it doesn't do anything. It is only there for
backwards compatibility.
When I initially submitted, it should be noted that I did not have
my
Post by Alexis Sarda
matrices wrapped with std::ref in the call to create new threads, so
I'm not sure what effect this will have on those builds if I were to
submit to CRAN now. I will say that after I saw the errors on the
CRAN checks, I immediately sought to replicate them. I was
successful
Post by Alexis Sarda
in extreme situations. For example, if I called the parallel
implementation thousands of times I could generate the error. I
would
Post by Alexis Sarda
put these extreme tests in my tests folder and check them in a unit
test environment so as not to crash my r session.
I then sought out a more robust solution to my situation and found
that thread function arguments are by default pass by value, and if
you have a particular variable that is expected to be passed by
reference, then you must add std::ref (See
https://en.cppreference.com/w/cpp/thread/thread/thread). I have done
this and have noticed that I can't generate the issues with the
extreme tests above. HOWEVER, If I call it say 50000 times 10 times
in
Post by Alexis Sarda
a row, I can sometimes generate an issue (not necessarily the
segfault
Post by Alexis Sarda
above.. most of the time it is a stack imbalance warning... the
warning you get when the number of UNPROTECTS doesn't match the
number
Post by Alexis Sarda
of PROTECTS in the R C API).
I then revisited the RcppParallel package to see if there were any
solutions there. I know that RcppParallel implements fully thread
safe
Post by Alexis Sarda
objects like RMatrix<T>, however I can't see a way to set up a
Worker
Post by Alexis Sarda
to populate my matrix. I guess the issue I see is that if you look
at
Post by Alexis Sarda
my set-up above, we first get the starter vector with a call to
nthPerm from the parent function, then I pass this to PopulateMatrix
which proceeds to populate rows of my matrix for a given number of
rows. With the examples I've seen, there is no dependency on an
external function to get a specific entry point. I thought about
bypassing the RcppParallel::Worker altogether and simply use the
RMatrix<T> object, however I don't think this is how that object is
to
Post by Alexis Sarda
be utilized. For example, I have not seen any RcppParallel examples
that preallocate an RMatrix<T> object.
My question is, is there a way to make my current set up
thread-safe?
Post by Alexis Sarda
If not, is it possible at all to simply populate an object in
parallel
Post by Alexis Sarda
safely?
Alternatively, if my question seems a bit naive, a nudge in the
right
Post by Alexis Sarda
direction would be greatly appreciated. I don't mind going back to
square one, if need be.
Thanks,
Joseph Wood
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Post by Alexis Sarda
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
--
Sent from my phone. Please excuse my brevity.
Joseph Wood
2018-09-28 01:26:28 UTC
Permalink
Thank you Jeff for your reply,

I understand that parallelism is not a magic wand. Have you read my
original post? I have managed to parallelize generating permutations
by taking advantage of the fact that I can generate the ith
permutation via nthPerm. My question is about making this thread safe
not if it is possible.

I encourage you to run my code above.

Thanks again.
Joseph Wood
Post by Jeff Newmiller
You cannot parallelize a serial calculation. Sorry, parallelism is not a magic wand that you can wave at any problem. If you find portions of your calculations are independent, then you can parallelize those portions and do the rest serially.
Post by Joseph Wood
Hey Ralf and Alexis,
Thank you so much for your replies. All of the resources and
information are very much appreciated.
Alexis, yes I have looked at many examples including the one you
linked to dealing with matrix transform. I'm really glad you linked
that, because this is a perfect example of why this will not work for
my situation. I will first note that all of the examples I have seen
thus far, are carrying out operations that are independent of the
other entries in the matrix.
My situation is fundamentally different. The algorithm that fills the
matrix does so in a way that relies on the previous row and more
importantly, has a starter vector that isn't apart of the populate
process. That is, in the example I gave, you will notice I get the ith
entry point by calling nthPerm, and once I call PopulateMatrix, the
matrix is populated calling std::next_permutation which is dependent
on the previous permutation. The std::next_permutation is just an
example, but perfectly represents my challenge here. This algorithm
can't produce the 10th permutation without first being fed the 9th
permutation. In the matrix transform example, the square root of the
10th row has nothing to do with the 9th row whatsoever.
I hope this clarifies my question above, and I'm sorry for not
explaining more fully.
Again, I really appreciate the advice.
Joseph Wood
Post by Alexis Sarda
I'm not sure I understand the problem. You've found RcppParallel,
have you looked at the examples? There's one with handling matrices
Post by Alexis Sarda
http://gallery.rcpp.org/articles/parallel-matrix-transform/
In the call to parallelFor() you can change the values you iterate
over and change your logic based on it.
Post by Alexis Sarda
In my package I do a lot of cross-distance matrix calculations with
https://github.com/asardaes/dtwclust/blob/master/src/distmat/fillers.cpp
Post by Alexis Sarda
https://github.com/asardaes/dtwclust/blob/master/CONTRIBUTING.md
Regards,
Alexis.
On Wed, Sep 26, 2018 at 6:37 PM Ralf Stubner
I haven’t read your message in detail, but the second example from
https://cran.r-project.org/web/packages/dqrng/vignettes/parallel.html
Post by Alexis Sarda
There a matrix is filled in parallel by splitting the columns among
the threads. Splitting by columns is helpful since matrices in R are
stored that way. You could use the same method to obtain the transpose
of your matrix.
Post by Alexis Sarda
Greetings
Ralf

Ralf Stubner
Senior Software Engineer / Trainer
daqana GmbH
Dortustraße 48
14467 Potsdam
T: +49 331 23 70 81 66
F: +49 331 23 70 81 67
M: +49 162 20 91 196
Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr: DE300072622
Geschäftsführer: Prof. Dr. Dr. Karl-Kuno Kunze
As the subject states, this question is regarding populating a
matrix
Post by Alexis Sarda
Practical Multithreading" as I'd like to take some algorithms I have
to the next level. I have looked at the RcppParallel package, but
the
Post by Alexis Sarda
features offered there do not seem to apply to this situation. I
will
Post by Alexis Sarda
explain my reasoning further down. First, we set up our scenario.
1. We have an empty matrix with the number of rows equal to numRows
2. We are able to generate the ith row of the matrix at will
3. Our underlying subroutine populates the matrix from any
particular starting point one by one.
This scenario easily extends to a parallel setup. We have each entry
in our matrix being visited exactly one time by only one thread. The
idea is that if we have m threads, we can split up the work so that
each thread is responsible for populating roughly (numRows / m)
number
Post by Alexis Sarda
of rows of our matrix.
Here is a simplified example that represents my real situation (In
my
Post by Alexis Sarda
project I don't have the cpp11 plugin as I take care of this in
#include <Rcpp.h>
#include <thread>
// [[Rcpp::plugins(cpp11)]]
int myFactorial(int n) {
int res = 1;
for (int i = 1; i <= n; ++i)
res *= i;
return res;
}
std::vector<int> nthPerm(int n, int index) {
int temp = myFactorial(n);
std::vector<int> indexVec(n), res(n);
std::iota(indexVec.begin(), indexVec.end(), 1);
for (int k = 0, r = n; k < n; ++k, --r) {
temp /= r;
int j = (int) index / temp;
res[k] = indexVec[j];
index -= (temp * j);
indexVec.erase(indexVec.begin() + j);
}
return res;
}
// Simplified version for demonstration only. The real subroutines
// that carry out this task are more complicated
void PopulateMatrix(Rcpp::IntegerMatrix permuteMatrix,
std::vector<int> z, int count, int nRows, int n)
{
Post by Alexis Sarda
for (; count < nRows; ++count) {
for (int j = 0; j < n; ++j)
permuteMatrix(count, j) = z[j];
std::next_permutation(z.begin(), z.end());
}
}
// [[Rcpp::export]]
SEXP ParallelPerms(int n, int userThrds = 0) {
int nThreads = std::thread::hardware_concurrency() - 1;
std::vector<std::thread> myThreads;
nThreads;
Post by Alexis Sarda
int step = 0, numRows = myFactorial(n);
int stepSize = numRows / nThreads;
int nextStep = stepSize;
std::vector<int> z(n);
std::iota(z.begin(), z.end(), 1);
Rcpp::IntegerMatrix myMat = Rcpp::no_init_matrix(numRows, n);
for (std::size_t j = 0; j < (nThreads - 1); ++j) {
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z,
step, nextStep, n);
step += stepSize;
nextStep += stepSize;
z = nthPerm(n, step);
}
// Guarantee that we get all the rows. N.B. We are passing
numRows
Post by Alexis Sarda
// instead of nextStep... they aren't guaranteed to be the same
myThreads.emplace_back(PopulateMatrix, std::ref(myMat), z, step,
numRows, n);
for (auto& thr: myThreads)
thr.join();
return myMat;
}
I have read that Rcpp objects are not thread safe as they make
unpredictable calls to the garbage collector
(https://github.com/RcppCore/RcppParallel/issues/17), however Romain
"As soon as you don't use references for Rcpp types, you are not
safe.
Post by Alexis Sarda
If you use references, it all depends on what you do with them."
I have a couple of questions regarding this.
My initial thought was I thought Rcpp objects were passed by
reference
Post by Alexis Sarda
by default. Secondly, if this isn't the case, is it as simple as
adding an ampersand to all of my Rcpp objects in the function
parameters?
The project that I'm implementing this in (RcppAlgos) builds fine on
win-builder as well as all of the various rhub checks with no errors
(even check_with_sanitizers() and check_with_valgrind()). When I
submitted v 2.1.0 to CRAN, there were sporadic build errors on some
of
Post by Alexis Sarda
the linux platforms. By sporadic, I mean sometimes it passes, and
other times it would fail with the error : segfault from C stack
overflow. The current version (v 2.2.0) still has the argument for
parallel computing, but it doesn't do anything. It is only there for
backwards compatibility.
When I initially submitted, it should be noted that I did not have
my
Post by Alexis Sarda
matrices wrapped with std::ref in the call to create new threads, so
I'm not sure what effect this will have on those builds if I were to
submit to CRAN now. I will say that after I saw the errors on the
CRAN checks, I immediately sought to replicate them. I was
successful
Post by Alexis Sarda
in extreme situations. For example, if I called the parallel
implementation thousands of times I could generate the error. I
would
Post by Alexis Sarda
put these extreme tests in my tests folder and check them in a unit
test environment so as not to crash my r session.
I then sought out a more robust solution to my situation and found
that thread function arguments are by default pass by value, and if
you have a particular variable that is expected to be passed by
reference, then you must add std::ref (See
https://en.cppreference.com/w/cpp/thread/thread/thread). I have done
this and have noticed that I can't generate the issues with the
extreme tests above. HOWEVER, If I call it say 50000 times 10 times
in
Post by Alexis Sarda
a row, I can sometimes generate an issue (not necessarily the
segfault
Post by Alexis Sarda
above.. most of the time it is a stack imbalance warning... the
warning you get when the number of UNPROTECTS doesn't match the
number
Post by Alexis Sarda
of PROTECTS in the R C API).
I then revisited the RcppParallel package to see if there were any
solutions there. I know that RcppParallel implements fully thread
safe
Post by Alexis Sarda
objects like RMatrix<T>, however I can't see a way to set up a
Worker
Post by Alexis Sarda
to populate my matrix. I guess the issue I see is that if you look
at
Post by Alexis Sarda
my set-up above, we first get the starter vector with a call to
nthPerm from the parent function, then I pass this to PopulateMatrix
which proceeds to populate rows of my matrix for a given number of
rows. With the examples I've seen, there is no dependency on an
external function to get a specific entry point. I thought about
bypassing the RcppParallel::Worker altogether and simply use the
RMatrix<T> object, however I don't think this is how that object is
to
Post by Alexis Sarda
be utilized. For example, I have not seen any RcppParallel examples
that preallocate an RMatrix<T> object.
My question is, is there a way to make my current set up
thread-safe?
Post by Alexis Sarda
If not, is it possible at all to simply populate an object in
parallel
Post by Alexis Sarda
safely?
Alternatively, if my question seems a bit naive, a nudge in the
right
Post by Alexis Sarda
direction would be greatly appreciated. I don't mind going back to
square one, if need be.
Thanks,
Joseph Wood
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Post by Alexis Sarda
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
--
Sent from my phone. Please excuse my brevity.
Ralf Stubner
2018-09-28 09:07:49 UTC
Permalink
Hi Joseph,
Post by Joseph Wood
I understand that parallelism is not a magic wand. Have you read my
original post? I have managed to parallelize generating permutations
by taking advantage of the fact that I can generate the ith
permutation via nthPerm. My question is about making this thread safe
not if it is possible.
unfortunately, you messages are a bit long and contradictory. I had the
same reaction as Jeff when I read
Post by Joseph Wood
Post by Joseph Wood
My situation is fundamentally different. The algorithm that fills the
matrix does so in a way that relies on the previous row and more
[...]

If you have code that generates the n-th row based on some global input,
then the example I refered to would be the right starting point. It uses
global state (seed) and column indices to fill a matrix by column. It
does it for multiple columns together, since it is more efficient to
have one thread process multiple columns. However, the default grain
size for parallelFor is one, so it is easy to create one thread per
column and perform some action based on some global input and on the
column index:


#include <Rcpp.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
// [[Rcpp::plugins(cpp11)]]

struct ParallelFill : public RcppParallel::Worker {
RcppParallel::RMatrix<double> output;
RcppParallel::RVector<double> global_input;

ParallelFill(Rcpp::NumericMatrix output,
Rcpp::NumericVector input) :
output(output), global_input(input) {};

std::vector<double> create_column(std::size_t index) {
std::vector<double> result(global_input.size());
std::transform(global_input.begin(),
global_input.end(),
result.begin(),
[&index] (double a) {return a + index;});
return result;
}

// default grain size is 1, i.e. end == begin + 1
void operator()(std::size_t begin, std::size_t end) {
std::vector<double> column = create_column(begin);
std::copy(column.begin(), column.end(),
output.begin() + begin * output.nrow());
}
};

// [[Rcpp::export]]
Rcpp::NumericMatrix parallel_matrix(const int n,
Rcpp::NumericVector input) {
Rcpp::NumericMatrix res(input.length(), n);
ParallelFill parallelFill(res, input);
RcppParallel::parallelFor(0, n, parallelFill);
return res;
}

/*** R
set.seed(42)
res <- parallel_matrix(8, runif(1e7))
head(res)
*/

All this is done *by column*, since matrices in R are stored that way.

cheerio
ralf

Loading...