8  Reproducibility

8.1 Does R-universe archive old versions of packages? How does it work with renv?

R-universe does not archive old versions of packages, but it tracks the upstream git URL and commit ID in the R package description. This allows tools like renv to restore packages in environments that were installed from R-universe. For more details, see this tech note: How renv restores packages from r-universe for reproducibility or production.

You can also archive fixed versions of a universe for production or reproducibility, using what we call repository snapshots.

8.2 Repository snapshots

8.2.1 What is a snapshot of an universe?

A snapshot is a standard layout of a few directories consisting of static files containing binaries, source packages, and indexing files.
Snapshots allow you to host fixed versions of packages and install them as needed.

8.2.2 Downloading repository snapshots

The snapshot API lets you download a full copy of any repository on R-universe.
You can use this snapshot to mirror the CRAN-like repository on your own servers or to build a stable, validated release of your package suite.

The API endpoint is /api/snapshot and has several options to filter content. By default, the endpoint returns a zip file with all the repository’s packages, binaries, and documentation. If needed, you can filter the content to include only specific binaries, R versions, or subsets of packages.

Explore the API parameters for your universe, for instance https://ggseg.r-universe.dev/apis.

8.2.3 Using snapshots

A CRAN-like R package repository is essentially a static directory of package files and indexes, with a specific naming structure. R-universe automatically builds and updates all these files based on a registry of packages and upstream git sources. Once everything is generated, you can simply copy the entire folder elsewhere, and have a frozen CRAN-like package repository on that server. For instance…

  • An organization could host a mirror of their repository internally in their intranet. They could update the mirror every day, every month, etc.

  • You could create a GitHub Action which regularly downloads a snapshot from R-universe to publish on github-pages. Here is a minimal example of such an action: https://github.com/jeroen/backup.

Note how the aforementioned action is very fast: downloading and extracting the snapshot from R-universe only takes a few seconds. So this is easily something that could be done on demand, or automatically on a regular basis.

You can also install packages in R directly from a local snapshot folder, by prefixing the path that you pass to install.packages with file:// (Windows paths need to be normalized to look more unixy):

# Download and extract the snapshot
curl::curl_download("https://jeroen.r-universe.dev/api/snapshot/zip?binaries=4.3", "snapshot.zip")
snapshot <- file.path(tempdir(), 'jeroen')
unzip("snapshot.zip", exdir = snapshot)

# Install packages from the local repository
prefix <- ifelse (.Platform$OS.type == "windows", "file:///", "file://")
repos <- paste0(prefix, normalizePath(snapshot, "/"))
install.packages(c("V8", "mongolite"), repos = repos)

8.2.4 Using the S3 API

R-universe also exposes a partial S3-compatible API that you can use to list, download, or mirror package files.

In R, you can use the {paws} package to access the S3 API. Note that this requires using the virtual addressing scheme, where r-universe.dev is the endpoint and the universe name is the bucket.

library(paws)
client <- paws::s3(
  config = list(
    endpoint = "https://r-universe.dev",
    s3_virtual_address = TRUE
  ),
  credentials = list(anonymous = TRUE),
  # A region is required for API compatibility, but is not used
  region = "us-east-1" 
)
all_files <- client$list_objects_v2(Bucket = "jeroen")
sapply(all_files$Contents, \(x) x$Key) |> 
  head()
client$download_file(
  # Bucket is the universe name
  Bucket = "jeroen",  
  # Key is the path to the file
  Key = "src/contrib/RAppArmor_3.2.5.tar.gz", 
  Filename = "RAppArmor_3.2.5.tar.gz"
)

Outside of R, tools such as the AWS CLI or Rclone (see below) can be used to access the S3 API.

8.2.5 Example: Mirroring a universe with Rclone

R-Multiverse uses Rclone to efficiently mirror a universe, incrementally downloading only the files that have changed since the last mirror.

8.2.5.1 Configuration

After installing Rclone, use a terminal command to configure Rclone for R-Universe:

rclone config create r-universe s3 \
  list_version=2 force_path_style=false \
  endpoint=https://r-universe.dev provider=Other

Then, register an individual universe as an Rclone remote. For example, let’s configure https://maelle.r-universe.dev. We run an rclone config command that chooses maelle as the universe and maelle-universe as the alias that future Rclone commands will use:

rclone config create maelle-universe alias remote=r-universe:maelle

rclone config show should now show the following contents:1

[r-universe]
type = s3
list_version = 2
force_path_style = false
endpoint = https://r-universe.dev
provider = Other

[maelle-universe]
type = alias
remote = r-universe:maelle

8.2.5.2 Local downloads

After configuration, Rclone can download from the universe you configured. The following rclone copy command downloads all the package files from https://maelle.r-universe.dev to a local folder called local_folder_name, accelerating the process with up to 8 parallel checkers and 8 parallel file transfers:2

rclone copy maelle-universe: local_folder_name \
  --ignore-size --progress --checkers 8 --transfers 8

The full contents are available:

fs::dir_tree("local_folder_name", recurse = FALSE)
#> local_folder_name
#> ├── bin
#> └── src
fs::dir_tree("local_folder_name/src", recurse = TRUE)
#> local_folder_name/src
#> └── contrib
#>     ├── PACKAGES
#>     ├── PACKAGES.gz
#>     ├── cransays_0.0.0.9000.tar.gz
#>     ├── glitter_0.2.999.tar.gz
#>     └── roblog_0.1.0.tar.gz

8.2.6 Remote mirroring

You may wish to mirror a universe remotely on, say, an Amazon S3 bucket or a CloudFlare R23 bucket. For CloudFlare R2, you will need to give Rclone the credentials of the bucket.

rclone config create cloudflare-remote s3 \
  provider=Cloudflare \
  access_key_id=YOUR_CLOUDFLARE_ACCESS_KEY_ID \
  secret_access_key=YOUR_CLOUDFLARE_SECRET_ACCESS_KEY \
  endpoint=https://YOUR_CLOUDFLARE_ACCOUNT_ID.r2.cloudflarestorage.com \
  acl=private \
  no_check_bucket=true

Then, you can copy files directly from the universe to a bucket:4

rclone copy maelle-universe: cloudflare-remote:YOUR_BUCKET_NAME \
  --ignore-size --progress --checkers 8 --transfers 8

This command downloads each package file locally from https://maelle.r-universe.dev and uploads it to the bucket. But although packages go through your local computer in transit, at no point are all packages stored locally on disk. This makes it feasible to mirror large universes, which is why R-multiverse uses this pattern to create production snapshots.

8.2.6.0.1 Partial uploads

To only upload part of a universe, you can supply Rclone filtering commands. If you do, it is recommended to also manually edit the PACKAGES and PACKAGES.gz files in bin/ and src/contrib. PACKAGES is written in Debian Control Format (DCF), and PACKAGES.gz is a gzip archive of PACKAGES. The read.dcf() and write.dcf() functions in base R read and write DCF files, and R.utils::gzip() creates gzip archives.


  1. Rclone configuration is stored in an rclone.conf text file located at the path returned by rclone config file.↩︎

  2. See https://rclone.org/docs/ and https://rclone.org/commands/rclone_copy/ for documentation on the command line arguments.↩︎

  3. Cloudflare has its own Rclone documentation.↩︎

  4. To upload to a specific prefix inside a bucket, you can replace cloudflare-remote:YOUR_BUCKET_NAME with cloudflare-remote:YOUR_BUCKET_NAME/YOUR_PREFIX↩︎