Replication to a Publish Server

Requirements

Basic Idea

While it is possible to serve the public release of a site using the same server the editors use (compare setup e.g. in-place replication), this is sometimes undesirable if a site with high load is to be served. Depending on the performance and availability demands, it might be useful to separate concerns and set up one or several separate servers or clusters of servers (called here publish servers). For this, it is necessary to deploy the application code on that server (which can be done with the standard Sling means), and to replicate the content of the site from the author server. In the simplest case with very infrequent changes this could be done by manually creating packages of the site content (e.g. after in-place replication copied the content to /public) and deploying these on the publish servers. But for larger site with frequent changes this quickly becomes infeasible. This page discusses requirements and our solution to achieve an automatic replication of the site content to the publish servers.

In advanced cases it might also be necessary to share / replicate or collect user generated content on the site, such as comments, likes, orders etc. We leave this out for now, since there are a variety of thinkable settings for this. If the availability / performance requirements for this content are low enough, it might be easiest to put that on a separate server and integrate it in the browser via AJAX, or to just have this data on the publish server or a common datastore used by all hosts.

If you are familiar with AEM replication: there is a fundamental differences between AEM and our way of managing content. In AEM, you publish a page more or less individually: after triggering the publication, it is transferred to all publication servers. In Composum, the page versions are collected into releases, and you trigger the publication of a whole release. With this you can ensure that the pages are consistent wrt. each other, can easily deploy or roll back content changes that affect many pages, and it's easy to automatically synchronize servers. On the other hand - if you are just continuously publishing a lot of small changes to pages, it might be better not to use the release mechanism, since that' amount to a daily release or worse. But Composum allows you to emulate that behaviour: the latest release is the open release you're working on, where you can easily add / change pages, and you can set that release to be published immediately, so that page changes become effective within seconds once you activate a page (that is, create a version and add it to the open release).

Implementation of the transfer

Basics

Since a larger site can easily have thousands of pages and gigabytes of assets, we want to minimize the transfer needed for release changes (that is, only transfer the actual changes) and split it up into parts of manageable size.

There are several facets of the content that need to be transferred:

  • the content of the versionables
  • the attributes of their parent nodes
  • the ordering of the siblings at the parent nodes.

We also want to avoid buffering everything in memory, but want to use streaming into the requests / from the responses while processing the data in parallel.

Flow of the implementation

The transfer is done in the following steps using the replication servlet:

  1. Operation startupdate creates a temporary directory on the publishers side and returns it's ID.
  2. Operation contentstate is used to get information which versionables are present at the publishers side but not at the authors side. The resulting nodes to delete are saved in memory on the authors side.
  3. Operation comparecontent is used to get information about missing versionables. The content of the request are the version ids of the versionable nodes at the authors side. The publisher checks which versionables are new, have changed paths or are updated and returns the information about changed versionables in the response to the author.
  4. Operation pathupload is used repeatedly to transmit all new / updated / deleted versionables to the publisher. (The packages for deleted versionables just contain the parent nodes as far as still present).
  5. Operation commitupdate is used to transmit both the versionables to delete, as well as the sibling orderings of all parent nodes of new / updated versionables to the publisher, who updates the actual content and deletes the temporary directory. Also the attributes of the parent nodes are set from the information that was present in the packages.

All operations can be done in one transaction. In each step at beginning and at the end the release change number of the transmitted release is checked for changes - if it was modified the whole process is aborted and repeated.

Requirements

  1. A change or release switch should be transactional on the publish server - that is, all pages should change at the same time after the changed content was fully replicated.
  2. The editor and publish servers should run independently - downtimes of one of them should not influence the other.
  3. The replication should be capable of handling several gigabytes of data for a site.
  4. It should be possible to replicate several releases (e.g. the public and preview release) on one server. Different sites and different parts of a site should be able to be replicated to different locations.
  5. The switch of the release to publish or adding / deleting content to a release should not be impeded by a failing replication. The replication should be carried out as a separate job.

Basic implementation decisions

  • Just like the staging, the basic unit of replication are versionables whose state in a particular JCR-version is replicated. This avoids the need to replicate a whole release of a site at once (which could be gigabytes of data). It does, however, need mechanisms to still make the whole replication transactional, so that all parts of a release becomes active at the same time.
  • To have several releases of the same site on one server, it is necessary to rewrite the references in the content. Depending on the configuration, this can be done on the author servers site (first, in-place replication is done, and then the replicated content is transferred to the publish server), or on the publish servers side (by providing a hook-mechanism that could do that).
  • The configuration for the replication for a site /content/{pathtosite}/{site} is at /conf/{pathtosite}/{site}/replication/ with subnodes for each replication to perform.

Implementation of repository comparison

For a large site it is infeasible to compare the whole site. To simplify that, we make the assumption that the publication of a versionable does not change it's content, which seems reasonable since it is transferred in one piece as a package. So we need only to make sure that:

  1. release change numbers agree. (If they don't, there isn't much point in even comparing the contents.)
  2. all versionables present on our side have the same version on the other side
  3. all versionables present on the other side have the same version on our side
  4. the child orderings of all parent nodes of versionables that have orderable children are the same
  5. the attributes of all parent nodes of versionables are the same (excluding protected attributes like jcr:created)

Flow of the implementation of comparison

  • call operation contentstate on the remote system to verify point 2
  • call operation comparecontent on the remote system to verify point 1

Configuration

Configuration of the Servers

The following configurations have to be set for the remote replication to be operative.

Author server:

  • Composum Platform Remote Replication Service has to be enabled
  • Composum Platform Replication Receiver Backend Service has to be enabled (enabled by default) if in-place replication is wanted
  • Composum Platform Credential Service has to be enabled and configured (you'll want to configure a master password or master password file to have stored passwords encrypted)

Publisher server:

  • Composum Platform Replication Receiver Backend Service has to be enabled (enabled by default)

Configuration

For a site at /content/{pathtosite}/{site} we save each replication configuration as a subnode of /conf/{pathtosite}/{site}/replication , with the following attributes:

  • replicationType : the type of replication service - default "default".
  • jcr:title : a short name for the replication configuration
  • jcr:description : an optional description giving details
  • enabled : boolean value that allows to enable or disable a configuration
  • stage : either "preview" or "public" 
  • sourcePath : optional, an absolute path within the site below which this configuration applies
  • targetPath : optional, an absolute path to which the sourcePath is moved in the publication server
  • targetUrl : the url of the replication receiver servlet of the environment we are replicating to. Normally that'll be http://publisher:8080/bin/cpm/platform/replication/publishreceiver (with appropriate hostname / port, of course)
  • credentialId : an ID for the Credential Service at which the user and password for the replication server is kept
  • proxyKey : optional key for the proxy service we use to connect from the author host to the replication receiver (targetUrl)

This basic configuration is tailored to replication via a remote servlet receiving the data. Depending on the type of replicator, there can be additional attributes or some of these attributes can be meaningless - the mechanism is extensible by adding new replication services through OSGI.

If the optional tenants module is installed, an editor for the replication configurations of the tenants sites is available.

Open points

  • Reverse replication of user generated data and/or statistics to the editor server
  • How to deal with changes in the application code
  • Invalidation of cached content for a cache like apache or a CDN like akamai.

(Yet?) unused ideas

  • Replication into a filesystem. (Probably makes sense only for static content - files etc.)
  • replication of parent node attributes - both immediately over jcr:content as well as higher.