Skip to main content

Retrieve Classified Items for Multiple Keywords

Using the standard Content Delivery API, it is possible to retrieve related content (i.e. items that are classified against Keywords) only for one Keyword at a time. If you find yourself with a requirement to read classified items for all Keywords in a Taxonomy, then you'll quickly realize that approach is a performance killer.

To better express what I'm looking for is the following:

Keyword A -- classified with Component 1, Component 2
    Keyword B -- classified with Component 3, Page 1
    Keyword C -- classified with Component 1, Page 2

I want to read the classified items for all Keywords in a Taxonomy and still maintain the 'classified' relationship between each Keyword and its corresponding list of Components & Pages.

The way I did it was to write my own Spring/Hibernate query and piggy-back on the existing DAO objects and methods available in the CD Storage API.

The key is the bean com.tridion.storage.RelatedKeyword, which basically maps the records in DB table ITEM_CATEGORIES_AND_KEYWORDS. This table contains all information needed in the following columns:
  • publicationId
  • itemId
  • taxonomyId
  • keywordId
This provides the mapping from TaxonomyId, KeywordId to PublicationId, ItemId. The only unknown is the ItemType, which is available however in table ITEMS. So it's enough to make a join on PublicationId and ItemId and retrieve items by type. We are only interested in Components and Pages, so we'll make 2 queries -- one for each type.

The Hibernate query looks like this:

select distinct(rk) from RelatedKeyword rk, ItemMeta im
    where rk.publicationId = :publicationId and rk.taxonomyId = :taxonomyId and
    im.itemType = :itemType and rk.itemId = im.itemId and rk.publicationId = im.publicationId

We execute the Hibernate query in the following way:

public List<RelatedKeyword> getRelatedItems(String taxonomyURI, int itemType) {
    TCMURI taxonomyTcmUri = new TCMURI(taxonomyURI);
    int publicationId = taxonomyTcmUri.getPublicationId();

    Map<String, Object> queryParams = new HashMap<>();
    queryParams.put("publicationId", publicationId);
    queryParams.put("taxonomyId", taxonomyTcmUri.getItemId());
    queryParams.put("itemType", itemType);

    JPABaseDAO itemDAO = (JPABaseDAO) StorageManagerFactory.getDAO(
        publicationId, StorageTypeMapping.ITEM_META);
    return itemDAO.executeQueryListResult(THE_QUERY, queryParams);
}

From the calling code, we need to call getRelatedItems twice -- once for Components and once for Pages:

List<RelatedKeyword> components = getRelatedItems(taxonomyURI, ItemTypes.COMPONENT);
List<RelatedKeyword> pages = getRelatedItems(taxonomyURI, ItemTypes.PAGE);

Finally, what I want to do is to merge the components and pages lists into one single Map where the Keyword TCMURI is a key and the value is a Set of TCMURIs of the items directly classified against the said Keyword. The following method accomplishes just that:

private void mergeRelatedItems(List<RelatedKeyword> relatedKeywords,
        Map<String, Set<TCMURI>> result, int itemType) {
    for (RelatedKeyword keyword : relatedKeywords) {
        int publicationId = keyword.getPublicationId();
        String key = String.format("tcm:%d-%d-1024", publicationId, keyword.getKeywordId());
        Set<TCMURI> itemList = result.get(key);
        if (itemList == null) {
            itemList = new TreeSet<>();
            result.put(key, itemList);
        }
        TCMURI itemURI = new TCMURI(publicationId, keyword.getItemId(), itemType, 0);
        itemList.add(itemURI);
    }
}

We call the method with the following code:

Map<String, Set<TCMURI>> result = new HashMap<>();
mergeRelatedItems(components, result, ItemTypes.COMPONENT);
mergeRelatedItems(pages, result, ItemTypes.PAGE);


Comments

Popular posts from this blog

Scaling Policies

This post is part of a bigger topic Autoscaling Publishers in AWS . In a previous post we talked about the Auto Scaling Groups , but we didn't go into details on the Scaling Policies. This is the purpose of this blog post. As defined earlier, the Scaling Policies define the rules according to which the group size is increased or decreased. These rules are based on instance metrics (e.g. CPU), CloudWatch custom metrics, or even CloudWatch alarms and their states and values. We defined a Scaling Policy with Steps, called 'increase_group_size', which is triggered first by the CloudWatch Alarm 'Publish_Alarm' defined earlier. Also depending on the size of the monitored CloudWatch custom metric 'Waiting for Publish', the Scaling Policy with Steps can add a difference number of instances to the group. The scaling policy sets the number of instances in group to 1 if there are between 1000 and 2000 items Waiting for Publish in the queue. It also sets the

Running sp_updatestats on AWS RDS database

Part of the maintenance tasks that I perform on a MSSQL Content Manager database is to run stored procedure sp_updatestats . exec sp_updatestats However, that is not supported on an AWS RDS instance. The error message below indicates that only the sa  account can perform this: Msg 15247 , Level 16 , State 1 , Procedure sp_updatestats, Line 15 [Batch Start Line 0 ] User does not have permission to perform this action. Instead there are several posts that suggest using UPDATE STATISTICS instead: https://dba.stackexchange.com/questions/145982/sp-updatestats-vs-update-statistics I stumbled upon the following post from 2008 (!!!), https://social.msdn.microsoft.com/Forums/sqlserver/en-US/186e3db0-fe37-4c31-b017-8e7c24d19697/spupdatestats-fails-to-run-with-permission-error-under-dbopriveleged-user , which describes a way to wrap the call to sp_updatestats and execute it under a different user: create procedure dbo.sp_updstats with execute as 'dbo' as

Toolkit - Dynamic Content Queries

This post if part of a series about the  File System Toolkit  - a custom content delivery API for SDL Tridion. This post presents the Dynamic Content Query capability. The requirements for the Toolkit API are that it should be able to provide CustomMeta queries, pagination, and sorting -- all on the file system, without the use third party tools (database, search engines, indexers, etc). Therefore I had to implement a simple database engine and indexer -- which is described in more detail in post Writing My Own Database Engine . The querying logic does not make use of cache. This means the query logic is executed every time. When models are requested, the models are however retrieved using the ModelFactory and those are cached. Query Class This is the main class for dynamic content queries. It is the entry point into the execution logic of a query. The class takes as parameter a Criterion (presented below) which triggers the execution of query in all sub-criteria of a Criterio