Skip to main content

Fine-Tuning the Publishers-Deployers Infrastructure

In order to squeeze the most performance out of the publishing in a Tridion, SDL Web system, there is a high amount of parameters and configurations that must all be optimized. This post presents a few of these parameters, especially in the context of AWS infrastructure.

CME

The Tridion GUI, the Content Manager Explorer server is a normal Tridion installation, but with Publisher and Transport services disabled. This server is only used for running the website -- the CME.

Publishers

Publisher servers are Tridion installations without the CME. Namely these servers have the Publisher and Transport services enabled and any other Tridion services, if installed, are disabled. The purpose of a Publisher is simply to lookup publishable items in the Publishing Queue and proceed to render them, hand them over to the Transport service and have the transport package pushed to the Content Delivery.

Publishers should use approximately twice the number of CPU cores as number of rendering threads and the same number of cores are transporting threads. If more threads are used, typically no significant performance gain is achieved. Instead, it is better to scale-out the Publisher servers.

In AWS context, it is advisable to scale-out the Publishers to 2-4 instances in order to obtain optimal performance. Scaling out to more than 4 Publishers results in little to no performance gain achieved, mostly due to the limitations on database transactions and number of permitted I/O operations, but more about that below.

In terms of instance type, anything from t2.medium to m4.xlarge are advisable Publisher instance types. Some very interesting performance is achieved with several smaller type instances than with fewer larger instances.

Deployer Receiver

The receiver on transport packages on the Content Delivery side is this so called Deployer Receiver, which only listens to incoming HTTP connections that post transport packages and other admin/update notification requests.

A receiver is usually a light weight server, so smaller instances in the t2.large vicinity will do the job nicely.

It is possible to scale-out receivers, by placing them under an Elastic Load Balancer, in an active-active manner using normal round-robin requests allocation.

The location where deployers store the incoming transport packages can be configured to shared file system (e.g. EFS) or a Redis instance. Performance of Redis is better than shared file system. Also Redis is more reliable than shared file systems, because the latter are prone to locking issues under load. However, one bit limitation of Redis is that it can't store transport packages larger than 512MB. So rule of thumb: use Redis if possible, otherwise fall back to shared FS, for example EFS.

The notification mechanism the receiver uses to notify workers there are new available transport packages can be configured to use file system queues (e.g. EFS) or JMS (e.g. SQS or ActiveMQ). The file system queues have to exist on the shared file system and their performance is greatly impacted by the fact that the notification is not actually sent, rather the workers monitor a certain folder in order to detect changes in it. This is also prone to file locking issues and is in general less stable than a messaging system. Therefore the JMS implementation (e.g. SQS) is highly recommended here. The receiver will post a message to the SQS mechanism and that is relayed further to all listening worker servers.

Deployer Worker

This server is the one actually performing the deploy/undeploy of content. Upon noticing a transport package is available, it will proceed to handle it and return its status to the receiver.

The configuration that yields the best performance is to use Redis as binary storage, if possible, and to use JMS notification mechanism, such as SQS.

The worker server can be configured to use a number of threads that perform the deploying/undeploying internally. A good number if around 15 worker threads. Less than 10 will result in poor performance and more than 20 will not yield any noticeable performance gain.

In AWS context, the instance type of a deployer worker can range from t2.medium to m4.xlarge, with some very nice surprises for several smaller instances rather than fewer larger instances. A number of 2-3 instances are sufficient to yield great publishing throughputs.


Comments

Popular posts from this blog

Scaling Policies

This post is part of a bigger topic Autoscaling Publishers in AWS . In a previous post we talked about the Auto Scaling Groups , but we didn't go into details on the Scaling Policies. This is the purpose of this blog post. As defined earlier, the Scaling Policies define the rules according to which the group size is increased or decreased. These rules are based on instance metrics (e.g. CPU), CloudWatch custom metrics, or even CloudWatch alarms and their states and values. We defined a Scaling Policy with Steps, called 'increase_group_size', which is triggered first by the CloudWatch Alarm 'Publish_Alarm' defined earlier. Also depending on the size of the monitored CloudWatch custom metric 'Waiting for Publish', the Scaling Policy with Steps can add a difference number of instances to the group. The scaling policy sets the number of instances in group to 1 if there are between 1000 and 2000 items Waiting for Publish in the queue. It also sets the

Running sp_updatestats on AWS RDS database

Part of the maintenance tasks that I perform on a MSSQL Content Manager database is to run stored procedure sp_updatestats . exec sp_updatestats However, that is not supported on an AWS RDS instance. The error message below indicates that only the sa  account can perform this: Msg 15247 , Level 16 , State 1 , Procedure sp_updatestats, Line 15 [Batch Start Line 0 ] User does not have permission to perform this action. Instead there are several posts that suggest using UPDATE STATISTICS instead: https://dba.stackexchange.com/questions/145982/sp-updatestats-vs-update-statistics I stumbled upon the following post from 2008 (!!!), https://social.msdn.microsoft.com/Forums/sqlserver/en-US/186e3db0-fe37-4c31-b017-8e7c24d19697/spupdatestats-fails-to-run-with-permission-error-under-dbopriveleged-user , which describes a way to wrap the call to sp_updatestats and execute it under a different user: create procedure dbo.sp_updstats with execute as 'dbo' as

Toolkit - Dynamic Content Queries

This post if part of a series about the  File System Toolkit  - a custom content delivery API for SDL Tridion. This post presents the Dynamic Content Query capability. The requirements for the Toolkit API are that it should be able to provide CustomMeta queries, pagination, and sorting -- all on the file system, without the use third party tools (database, search engines, indexers, etc). Therefore I had to implement a simple database engine and indexer -- which is described in more detail in post Writing My Own Database Engine . The querying logic does not make use of cache. This means the query logic is executed every time. When models are requested, the models are however retrieved using the ModelFactory and those are cached. Query Class This is the main class for dynamic content queries. It is the entry point into the execution logic of a query. The class takes as parameter a Criterion (presented below) which triggers the execution of query in all sub-criteria of a Criterio