Composum Blog

various aspects of the Composum world

Migration to Sling Launcher 12

This tells some of the experiences, approaches and findings when migrating our live system from Sling Launcher 11 to Sling Launcher 12.

Hans-Peter Störr
22.03.2022

Introduction

Finally, the version 12 of the Sling Starter is released. It does feature many major changes. Notably, it moves to the OSGi Feature Model for deploying bundles and packages, using the Sling Feature Launcher. This requires various changes to the scripts starting it.

Setup of the system

A major difference between the Sling Starter 11 and the Sling Starter 12 is that 12 uses features / feature archives (FAR) for collecting and deploying bundles and packages. There are pre-made feature archives for using an OAK TAR repository or an OAK MongoDB repository, which collect the many bundles that make up a Sling system. The Sling Feature Launcher, which is used to start the Sling system, is able to use the artifacts for the features from the FAR, but also from $HOME/.m2/repository, if you are running it on a developer machine, or public maven repositories. Feature files are JSON files like the following example, that includes a bundle and a package. To construct FAR from this, a maven plugin is needed - some examples for composum are contained in the composum-launch project.

Caution:
 It is convenient to deploy content packages like this - just start the sling launcher, and the whole system is up and everything is deployed. This does however require that the content packages carefully specify their dependencies to ensure they are installed in the right order. But failure to do so should just need a manual intervention with the package manager by triggering "install" for the packages that failed to be installed. Likewise, upgrading packages might currently need such manual intervention, too.

{ "bundles":[ { "id":"com.composum.assets:metadata-extractor-bundle::${composum.assets.version}", "start-order":"9" } ], "content-packages:ARTIFACTS|required": [ { "id": "com.composum.platform:composum-platform-commons-package:zip:${composum.platform.version}"` } ] }

Until SLING-11220 is resolved, the feature launcher is always trying to find the needed artifacts in the maven repositories before it even considers looking up the artifact in the FAR. This can be desirable, but if the server doesn't have internet access it can be a problem, and if you are using it on a production system this provides for some ways to inject code into the server, so you might want to switch that off. We will discuss a way to do that shortly.

The feature launcher has a quite different command line than the previous starter - compare it's description. This is an example command line for starting a server, mostly tailored for local development (especially the jmxremote and debugging settings) - you can adapt that to your needs. It defines some system properties before the -jar argument; with -f you can give some FAR (or JSON feature files, if you rather want to gather the artifacts from the maven repositories), then there are some felix properties using -D - among them the HTTP port to use. (Note the different syntax of the felix properties to the system properties!)

There are some particularities here. First, neither the sling feature launcher nor the Sling Starter FARs contain the actual felix framework. To avoid loading that from the network, we packed it into a jar and added that in form of a JAR file URL as "maven repository" using the argument -u . Due to the SLING-11220 discussed above, we also give the Sling Starter FAR as "maven repository". This is not strictly necessary, but currently avoids some 2900 logfile outputs about artifacts not being found in the maven repositories.

java -server -Xms768m -Xmx2048m -Djava.awt.headless=true \ -Dcom.sun.management.jmxremote.port=9005 \ -Dcom.sun.management.jmxremote.authenticate=false \ -Dcom.sun.management.jmxremote.ssl=false \ -Djava.rmi.server.hostname=localhost \ --add-opens java.base/java.lang=ALL-UNNAMED \ -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=9999 \ -jar starter/org.apache.sling.feature.launcher-1.1.26.jar \ -f starter/org.apache.sling.starter-12-SNAPSHOT-oak_tar_far.far \ -D org.osgi.framework.system.packages.extra=sun.misc \ -D org.osgi.service.http.port=9090 \ -D sling.run.modes=local,develop \ -u jar:file:starter/org.apache.sling.starter-12-SNAPSHOT-oak_tar_far.far! \ -u jar:file:starter/composum-launcher-feature-felixcontainer-1.2.1-SNAPSHOT-zip.zip! \ -D sling.fileinstall.dir=fileinstall -D felix.startlevel.bundle=30 -v -fv 7.0.3

In composum-launch you find an extended example with various scripts that start composum using this discussed approach in a feature launcher.

Content migration

Since there is a change of the Jackrabbit OAK version, we have to consider how to migrate the JCR repository. There are many changes in Sling, there are many changes in /libs, /apps, configurations etc. Thus, the easiest and most sensible approach seems to recreate the system from scratch, and copy over the JCR content that needs to be kept to the new system using the oak-upgrade tool. This tool allows specifying the paths that are copied. So we can e.g. copy the content in /content (keeping the versions in version storage), but omit /libs and /apps and other paths that come from the Sling Launcher or the installed applications or from configurations. So we have to consider the paths to be copied.

In a system there are the following types of content in the JCR repository:

  • Actual user content (assets, sites, pages) in /content. For this we can copy /content as it is (unless we want some content from the Sling launcher to be kept, which would make some exceptions).

  • (OSGI-)Configurations. We took the approach that these are only configured in exceptional cases via the Felix console - normally they are contained in the code or in the code of special setup artifacts. ("Configuration is code.") Thus, they can be ignored for the content transfer.

  • Users and groups. These can be transferred by including some paths within /home/users and /home/groups into the set of copied paths. Please note that there are also system users there which are created by the sling launcher or are service users created by the applications. So it doesn't seem wise to copy everything there, but to pick some carefully chosen paths. While many of the users would be automatically recreated by single sign on, it is still necessary to copy those to avoid losing their group assignments. In our case that amounts to copying the folders /home/groups/composum, /home/groups/tenants, /home/users/keycloak (the SSO users) and /home/users/tenants.

  • Some system specific administrative paths. There are various paths in /var or /etc that contain automatically created content (e.g. cached client library content), but some also contain important data, such as /etc/tenants (tenant configuration) and /var/composum/content (containing metadata about content releases) and /var/composum/platform/mail (the mail queue), /public and /preview that contain some content that has been replicated, /robots.txt, /jcr:content with some configuration. For /var/composum the most sane approach seemed to be to copy it all, but exclude paths /var/composum/clientlibs and /var/composum/tmp that contain temporary data.

  • JCR Version storage at /jcr:system/jcr:versionStorage. As long it is referenced by versionable documents (pages, assets), it can be automatically transferred by the oak-upgrade tool.

For the migration, the old server has to be stopped to take a backup of the content for migration purposes. The new server should be started and fully initialized, so that all applications are deployed. Then it has to be stopped and the old content can now be copied in.

An example for a content migration script is in project composum-launcher. The actual command line is like this:

java -mx4g -jar oak-upgrade-*.jar oldrepository launcher/repository --copy-binaries \ --include-paths=/content,/conf/composum,/conf/content,/conf/tenants,/etc/tenants,/etc/map.live,/etc/map.test,/var/audit,/var/composum,/home/groups/composum,/home/groups/tenants,/home/users/keycloak,/home/users/tenants,/public,/preview,/robots.txt,/jcr:content \ --exclude-paths=/content/slingshot,/content/starter,/var/composum/clientlibs,/var/composum/tmp \ --merge-paths=/content/slingshot,/content/starter,/var/composum/clientlibs

Special Considerations

We stumbled over a particular problem with oak-update.jar: it copies the version histories of mix:versionable nodes, but it does not check for other references in the version storage. But our content releases aggregate the versions of pages or assets by keeping references to the corresponding versions in the version storage. This, if a page or asset has been put into a release but has been deleted in the meantime, it's version history isn't copied, leaving the repository in an inconsistent state. So we had to run a script before executing the content migration that checks out all those orphaned version histories into an "cpl:attic" directory of the site, which can later be deleted on the new system. (It has to be a subdirectory of the site, since otherwise the user isn't able to access the versions: a version history is readable if the path where it was last checked out would be readable.)