Discussion:
[Rcpp-devel] new to R, so don't understand Rcpp limits
George Coles
2017-08-03 15:30:22 UTC
Permalink
Hi,
I have recently been writing a lot of c++. I have been a big python user
the last couple of years, and never really use R but have it installed.

I need to share some data between my Python code and my c++ code, C++ does
not really have a lot of nice ideas like DataFrames. But if you save a
dataframe from Python into csv, you can readily read it using R. Csv is not
the best way to go, but it is a simple case.

I have generally been noticing as I google around, that R has a healthy and
seemingly growing list of packages that can be accessed by c++ code. From
c++, R does not look so bad to me, and I would like to get access to this
large library of native routines in R.

First on the list, is that I hope to read a dataframe or something like it
from data in a file, and then transform that dataframe or other tabular
object into something I can use in my c++ code for linear algebra, like an
Armadillo matrix.

So is there any native code in the R world that I can use to read a
dataframe from a file?

please forgive my ignorance. I think Rcpp is really cool, it might make me
a backdoor R user.

thx
George Coles
Dirk Eddelbuettel
2017-08-03 15:55:14 UTC
Permalink
On 3 August 2017 at 11:30, George Coles wrote:
| Hi,
| I have recently been writing a lot of c++. I have been a big python user
| the last couple of years, and never really use R but have it installed.

Coming from a common base is not a bad starting point at all. See my
RcppAnnoy package on top of the C++ (with Python glue) Annoy package, and
also see https://cran.r-project.org/package=fastcluster which "comes with"
Python and R bindings.

| I need to share some data between my Python code and my c++ code, C++ does
| not really have a lot of nice ideas like DataFrames. But if you save a
| dataframe from Python into csv, you can readily read it using R. Csv is not
| the best way to go, but it is a simple case.

CSVs are indeed a terrible format, yet annoyingly common. Try binary
alternatives if you can.

There are feather and Apache arrow for 'data.frame' reimplementations usable
from both R and Python...

Lastly, there is now reticulate (on CRAN) to access Python from R.

| I have generally been noticing as I google around, that R has a healthy and
| seemingly growing list of packages that can be accessed by c++ code. From
| c++, R does not look so bad to me, and I would like to get access to this
| large library of native routines in R.
|
| First on the list, is that I hope to read a dataframe or something like it
| from data in a file, and then transform that dataframe or other tabular
| object into something I can use in my c++ code for linear algebra, like an
| Armadillo matrix.
|
| So is there any native code in the R world that I can use to read a
| dataframe from a file?
|
| please forgive my ignorance. I think Rcpp is really cool, it might make me
| a backdoor R user.

Poke around the examples, and eg the Rcpp Gallery (at gallery.rcpp.org). The
seamless passage from R to C++ and back is really, really useful and smooth.
JJ and I have riffing for years on the need for a talk on 'R as C++ shell'.
No time for that yet though :)

Cheers, Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | ***@debian.org
Christian Gunning
2017-08-04 18:17:53 UTC
Permalink
Post by Dirk Eddelbuettel
| Hi,
| I need to share some data between my Python code and my c++ code, C++
does
| not really have a lot of nice ideas like DataFrames. But if you save a
| dataframe from Python into csv, you can readily read it using R. Csv is
not
| the best way to go, but it is a simple case.
CSVs are indeed a terrible format, yet annoyingly common. Try binary
alternatives if you can.
I disagree with Dirk that CSV is "a terrible format" - it excels (hah) at
human readability, is decently machine-readable and easily compressible,
but certainly is inappropriate for many tasks that require efficiency.
Post by Dirk Eddelbuettel
| I have generally been noticing as I google around, that R has a healthy
and
| seemingly growing list of packages that can be accessed by c++ code. From
| c++, R does not look so bad to me, and I would like to get access to this
| large library of native routines in R.
|
| First on the list, is that I hope to read a dataframe or something like
it
| from data in a file, and then transform that dataframe or other tabular
| object into something I can use in my c++ code for linear algebra, like
an
| Armadillo matrix.
|
| So is there any native code in the R world that I can use to read a
| dataframe from a file?
First off, only use a data.frame if what you really want is a data.frame.
Otherwise, stick with a matrix (or convert to one as early as possible).

* Data.frame = ordered collection of like-sized vectors, possibly of
heterogeneous type.
* Matrix = ordered collection values, of known / fixed dimension, by
default represented internally as columns of vectors in both R and
armadillo (as in LAPACK).

In R, for modest-sized objects, going between these two types is
"relatively seamless". But in C++/Rcpp, the underlying differences are more
apparent to the user. Matrices "just work" (e.g. easy construction of an
"identical" armadillo object), whereas data.frames require some care and
attention, and possibly extra object creation destruction. When possible,
stick with matrices.
Post by Dirk Eddelbuettel
|
| I think Rcpp is really cool,
it might make me
Post by Dirk Eddelbuettel
| a backdoor R user.
I became a backdoor C++ programmer through C++. +1 really cool.

I found Google Protocol Buffers absurdly useful for moving between R and
cpp in complex projects. It's well-documented, fast, encourages
separate/good metadata documentation, and works smoothly for R, C++, and
Python. I never did use protobufs for vector data, though. I did write
some test code using repeated fields, but didn't get to the point of
comfort there. For arrays of fixed dimension, I can imagine using one field
per dimension to code that dimension's length, and then a final repeated
field with the payload. See below for example.

Question for Dirk (et al):

Has anyone used protobuf messages for, e.g., passing arrays? Any obvious
downsides? When I last googled, I didn't find much re protobuf repeated
fields or Rcpp + protobufs...


// File PbTest.proto
syntax = "proto2";
package Array;
// see
https://developers.google.com/protocol-buffers/docs/reference/cpp-generated#fields

message a2d {
optional uint32 dim1 = 10;
optional uint32 dim2 = 20;
// add more dims here
//
// numeric vector
repeated float payload = 50;
}

## File pbarray.R
library(RProtoBuf)
aa <- matrix(1:30, ncol=3)
bb <- new(P("Array.a2d", file='PbTest.proto'))
bb$dim1 <- dim(aa)[1]
bb$dim2 <- dim(aa)[2]
bb$add(field='payload', values=aa)

cat(as.character(bb))

best,
Christian
http://www.x14n.org

Loading...