Skip to main content

A Java Service Oriented Architecture with Content Delivery Web Service

Since the introduction of Tridion's Content Delivery Web Service (aka OData), we see more and more implementations that choose Service Oriented Architecture (SOA) over the classic hosted CD stack approach.

So what does this imply? Well, rather than implementing the Content Delivery stack on each AppServer, we separate the Tridion servers from the other 'middle-tier' servers, such that:
  • the App Servers 1..n contain customer specific logic and only a very thin client to communicate with the Content Delivery Web Servers (CDWS Server 1..n), potentially behind a Load Balancer;
  • the CDWS Servers 1..n run the Tridion Content Delivery Web Service (OData). These are the only servers in the architecture running SDL Tridion code. These servers connect to either a shared or dedicated Content Delivery Database (CD DB) -- fka the Broker DB;
Service Oriented Architecture using light weight clients connecting Tridion OData service
This article presents the above-mentioned approach in a sample Java implementation. The entire code is available in my Google Code project under SOA with OData in Java.

The general idea is:
  • Request comes in and goes through the Load Balancer into one of the App Servers;
  • Servlet on App Server intercepts request;
  • Servlet requests Odata to look-up the Tridion Page by URL;
  • OData service responds with Page entity expanded with PageContent;
  • Servlet writes PageContent into the response;

App Servers 1..n

These servers are basically the Presentation servers, but they only run a very thin client layer that connects to the OData endpoint (potentially through a Load Balancer). The client application communicates with the OData service and queries for content. In my implementation, these servers are Java Application Servers (e.g. Apache Tomcat).

Content Assembly Servlet

My thin client consists of a Java Servlet that intercepts all URL patterns and requests that resource to the OData service. I chose to communicate with OData service using OData4j v0.7 open source library. Out of the entire OData4j stack, I only made use of the consumer JAR odata4j-0.7.0-clientbundle.jar, which is a self-containing archive including all the necessary third parties to use, such as Jersey, Joda, Core4j, Stax, JAX-RS, etc.

The declaration of the Servlet is standard, by simply extending the HttpServlet class. In the init() method, however, I'm instantiating the ODataConsumer client (this class is part of the OData4j project). I'm going to use this client for the entire lifecycle of the servlet, and, as it's not maintaining an open connection to the OData endpoint, it makes sense to create it once and use it forever.

The OData endpoint is configurable in the web.xml (see below), as servlet init parameter ODataEndpoint, which I'm reading in the init() method.

public class ContentAssemblerServlet extends HttpServlet {

    private ODataConsumer client;

    @Override
    public void init(ServletConfig servletConfig) throws ServletException {
        String oDataEndpoint = servletConfig.getInitParameter("ODataEndpoint");
        clientODataConsumers.create(oDataEndpoint);
    }
...
}

The methods doGet and doPost will be handled the same way, therefore I'm delegating the implementation to the handleRequest method.

@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException {
    handleRequest(request, response);
}

@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException {
    handleRequest(request, response);
}

Next, the handleRequest -- the main logic for the servlet. The method starts by resolving the URL to use for looking up the Tridion page. Since this is a Java web-application, we need to strip the context-path from the beginning of the request URI, if there is one.

We create the filter we will use when making a request to OData: $filter=Url eq 'theUrl', where Url is the property of the Tridion Page entity exposed by OData.

The actual call to the OData service is made using OData4j's client execute() method. I'm passing a $top=2 limit, in order to check whether there are more than one Page with the same URL. Also, I'm expanding the result with the PageContent linked entity, as this is the entity containing the actual page content.

private void handleRequest(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException {
    String servletUrl = request.getRequestURI().substring(request.getContextPath().length());
    String filter = String.format("Url eq '%s'", servletUrl);
    Enumerable<OEntity> pageEntities = client.getEntities("Pages").filter(filter)
        .expand("PageContent").top(2).execute();

It is time now to check how many Page entities we received from the web service. There can be three cases: a) no entities, meaning there is no Page with the requested URL; b) two Pages, meaning there are actually multiple pages with the same URL. This means we can't identify the page solely by looking at its URL, and, in this simple implementation, we simply return an error; and c) only one Page -- the Page we want its content.

    int count = pageEntities.count();
    if (count == 0) {
        response.sendError(HttpServletResponse.SC_NOT_FOUND, "Not found");
        return;
    } else if (count == 2) {
        response.sendError(HttpServletResponse.SC_PRECONDITION_FAILED,
                "Multiple resources available for the same URL");
        return;
    }

If we reached this point, it means we have one and only one Page returned for the given URL. We can proceed with retrieving its linked entity PageContent. Since we expanded the original request with it, we don't have to make another call to the service. Rather, we check whether such linked entity exists; it could be the page content is not published to the CD DB.

    OEntity pageContentEntity = pageEntities.first().getLink("PageContent", OLink.class)
        .getRelatedEntity();
    if (pageContentEntity == null) {
        response.sendError(HttpServletResponse.SC_NOT_FOUND, "Not found");
        return;
    }

Finally, we retrieve the actual page content from the Content property and we print it to the response printer. Note that whatever dynamic parts (i.e. REL tags) the Page content might have contained, they would have already been resolved at this moment.

    String pageContent = pageContentEntity.getProperty("Content", String.class).getValue();
    PrintWriter writer = response.getWriter();
    writer.print(pageContent);
}

web.xml

The deployment descriptor declares the servlet and maps it to all requests (/*). This approach is ok if all requests are expected to go into the OData service to retrieve the content. In real implementations, the mapping rules would be more complex. Things to consider are binaries -- are they stored on the file-system? would they be served by application server or web server? would the request for binaries actually reach the application server?

How about other URL patterns for custom logic that is not coming from SDL Tridion? These URLs should not reach the ContentAssemblerServlet.

The web.xml also specifies the OData endpoint (or LB address) to use. Change the init-param ODataEndpoint for that.

<servlet>
      <servlet-name>Content Assembler Servlet</servlet-name>
      <servlet-class>mitza.net.ContentAssemblerServlet</servlet-class>
      <init-param>
            <param-name>ODataEndpoint</param-name>
            <param-value>http://localhost:9091/odata/odata.svc</param-value>
      </init-param>
</servlet>

<servlet-mapping>
      <servlet-name>Content Assembler Servlet</servlet-name>
      <url-pattern>/*</url-pattern>
</servlet-mapping>

Trouble with Edm.DateTime

There seems to be a bug in Tridion 2011SP1 CD WS -- the format of Edm.DateTime values (the Tridion dates) are in invalid format according to the OData specification. The values coming from OData contain timezone information, while they should not. The OData Edm.DateTime format does not allow timezone information. There is however an OData type Edm.DateTimeOffset, which allows the date formats with timezone information.

When using the OOTB OData4j libraries, you would get an Illegal datetime format exception. I fixed this issue by patching OData4j library (thank you for being open source! ;-) ).

In class org.odata4j.internal.InternalUtil, change the method parseDateTimeFromXml, and comment the throw statement at the end. Instead, delegate the parsing to parseDateTimeOffsetFromXml.

...
    return parseDateTimeOffsetFromXml(value).toLocalDateTime();
    // throw new IllegalArgumentException("Illegal datetime format " + value);
}

The new patched odata4j-0.7.0-clientbundle-patched.jar is available in my Google Code project.

CDWS Servers 1..n

These are standard Content Delivery Web Service installations, with or without Ambient Data Framework (depending on the actual need), hosted either in an App Server (Java implementation) or IIS (.NET implementation).

The CDWS servers connect to a Content Delivery Database -- either a common DB (shared by all CDWS servers) or a dedicated DB for each CDWS server.

Make sure, however, that all content is published to the CD DB (the only exception might be the binaries, which are not to be served by the OData service anyway).

Also, if there are any dynamic parts in the Page content (i.e. dynamic Component Presentations, dynamic links, any custom REL tags, etc), make sure the Publication Target language is set to REL.


Comments

Popular posts from this blog

Scaling Policies

This post is part of a bigger topic Autoscaling Publishers in AWS . In a previous post we talked about the Auto Scaling Groups , but we didn't go into details on the Scaling Policies. This is the purpose of this blog post. As defined earlier, the Scaling Policies define the rules according to which the group size is increased or decreased. These rules are based on instance metrics (e.g. CPU), CloudWatch custom metrics, or even CloudWatch alarms and their states and values. We defined a Scaling Policy with Steps, called 'increase_group_size', which is triggered first by the CloudWatch Alarm 'Publish_Alarm' defined earlier. Also depending on the size of the monitored CloudWatch custom metric 'Waiting for Publish', the Scaling Policy with Steps can add a difference number of instances to the group. The scaling policy sets the number of instances in group to 1 if there are between 1000 and 2000 items Waiting for Publish in the queue. It also sets the

Running sp_updatestats on AWS RDS database

Part of the maintenance tasks that I perform on a MSSQL Content Manager database is to run stored procedure sp_updatestats . exec sp_updatestats However, that is not supported on an AWS RDS instance. The error message below indicates that only the sa  account can perform this: Msg 15247 , Level 16 , State 1 , Procedure sp_updatestats, Line 15 [Batch Start Line 0 ] User does not have permission to perform this action. Instead there are several posts that suggest using UPDATE STATISTICS instead: https://dba.stackexchange.com/questions/145982/sp-updatestats-vs-update-statistics I stumbled upon the following post from 2008 (!!!), https://social.msdn.microsoft.com/Forums/sqlserver/en-US/186e3db0-fe37-4c31-b017-8e7c24d19697/spupdatestats-fails-to-run-with-permission-error-under-dbopriveleged-user , which describes a way to wrap the call to sp_updatestats and execute it under a different user: create procedure dbo.sp_updstats with execute as 'dbo' as

Toolkit - Dynamic Content Queries

This post if part of a series about the  File System Toolkit  - a custom content delivery API for SDL Tridion. This post presents the Dynamic Content Query capability. The requirements for the Toolkit API are that it should be able to provide CustomMeta queries, pagination, and sorting -- all on the file system, without the use third party tools (database, search engines, indexers, etc). Therefore I had to implement a simple database engine and indexer -- which is described in more detail in post Writing My Own Database Engine . The querying logic does not make use of cache. This means the query logic is executed every time. When models are requested, the models are however retrieved using the ModelFactory and those are cached. Query Class This is the main class for dynamic content queries. It is the entry point into the execution logic of a query. The class takes as parameter a Criterion (presented below) which triggers the execution of query in all sub-criteria of a Criterio