Skip to main content

A DD4T.net Implementation - Taxonomy Performance Issues

Retrieving Classified Items for Multiple Keywords

My intention when retrieving the classified items for a taxonomy is to resolve the entire taxonomy (i.e. all its keywords) first, and then cache it. This way, retrieving the related content for a given keyword would be very fast. I described this approach for DD4T Java in post Retrieve Classified Items for Multiple Keywords.

There is one issue with that approach in .NET -- it is not standard API and one must write their own Java Hibernate code to retrieve the classified items. In DD4T Java that is not an issue, but in DD4T .NET, exposing the custom Java logic to the .NET CLR is not easy. It would involve writing some JNI proxy classes to bridge the two virtual machines. I just don't feel like writing that code.

Enter the second best approach -- resolving classified items on-the-fly, for one keyword at the time, on demand, as described in post Taxonomy Factory.

Retrieving Large Taxonomies

Another performance issue is to retrieve large taxonomies. The API call to read entire taxonomies is TaxonomyFactory.GetTaxonomyKeywords(taxonomyUri). This is the only method on the Tridion.ContentDelivery.Taxonomies.TaxonomyFactory class that will retrieve a root Keyword with all its Parent/Child keyword properties resolved, so the taxonomy is fully navigable up and down.

However, the method above is a bottleneck for large (really large) taxonomies. Internally the method reads all keywords in the taxonomy and all their custom meta objects. This can take a significant hit on performance.

My solution for this problem was to use a discovery algorithm that would read one keyword in the taxonomy and resolve its parent keywords up to the root. Resolving means reading its custom meta and the items directly classified against it. The root keyword would be cached, together with all its discovered child keywords.

When a new keyword would be requested, first the algorithm tries to read it from the cached root keyword (as one of its possible children). If that search didn't find the keyword, we assume the keyword was not resolved yet, and the discovery process would start once again, and it would attach the new keyword at its appropriate level in the taxonomy.

Slowly and on-demand, the taxonomy structure would be created and it would consist only of the keywords that were requested. The following code shows this algorithm. Note the usage of TaxonomyFactory.GetTaxonomyKeyword() method that returns a partially resolved keyword.

using dd4t = DD4T.ContentModel;
using tridion = Tridion.ContentDelivery.Taxonomies;

public IMyKeyword ResolveKeywordLazy(dd4t.IKeyword keyword)
{
    IMyKeyword result;

    if (keyword == null)
    {
        return null;
    }

    if (keyword is IMyKeyword)
    {
        result = (IMyKeyword)keyword;
    }
    else
    {
        IMyKeyword root;
        string key = GetKey(keyword.TaxonomyId);
        CacheWrapper.TryGet(key, out root);
        if (root == null)
        {
            result = ResolveKeywordLazyRecursive(keyword, out root);
            CacheWrapper.Insert(key, root, cacheMinutes);
        }
        else
        {
            result = ResolveKeywordLazyRecursive(root, keyword);
        }
    }
    return result;
}

private IMyKeyword ResolveKeywordLazyRecursive(IMyKeyword root, dd4t.IKeyword keyword)
{
    if (keyword == null)
    {
        return null;
    }

    IMyKeyword result = GetKeywordByUri(root, keyword.Id);
    if (result == null)
    {
        tridion.Keyword tridionKeyword = taxonomyFactory.GetTaxonomyKeyword(keyword.Id);
        result = new TaxonomyConverter().ConvertToDD4T(tridionKeyword);
        IMyKeyword parent = ResolveKeywordLazyRecursive(root, result.ParentKeyword);

        if (parent != null)
        {
            result.ParentKeywords.Clear();
            result.ParentKeywords.Add(parent);
            parent.ChildKeywords.Add(result);
        }
    }

    return result;
}

private IMyKeyword ResolveKeywordLazyRecursive(dd4t.IKeyword keyword, out IMyKeyword root)
{
    if (keyword == null)
    {
        root = null;
        return null;
    }

    tridion.Keyword tridionKeyword = taxonomyFactory.GetTaxonomyKeyword(keyword.Id);
    IMyKeyword result = new TaxonomyConverter().ConvertToDD4T(tridionKeyword);
    IMyKeyword parent = ResolveKeywordLazyRecursive(result.ParentKeyword, out root);

    if (parent == null)
    {
        root = result;
    }
    else
    {
        result.ParentKeywords.Clear();
        result.ParentKeywords.Add(parent);
        parent.ChildKeywords.Add(result);
    }

    return result;
}


Comments

Popular posts from this blog

Scaling Policies

This post is part of a bigger topic Autoscaling Publishers in AWS . In a previous post we talked about the Auto Scaling Groups , but we didn't go into details on the Scaling Policies. This is the purpose of this blog post. As defined earlier, the Scaling Policies define the rules according to which the group size is increased or decreased. These rules are based on instance metrics (e.g. CPU), CloudWatch custom metrics, or even CloudWatch alarms and their states and values. We defined a Scaling Policy with Steps, called 'increase_group_size', which is triggered first by the CloudWatch Alarm 'Publish_Alarm' defined earlier. Also depending on the size of the monitored CloudWatch custom metric 'Waiting for Publish', the Scaling Policy with Steps can add a difference number of instances to the group. The scaling policy sets the number of instances in group to 1 if there are between 1000 and 2000 items Waiting for Publish in the queue. It also sets the

Running sp_updatestats on AWS RDS database

Part of the maintenance tasks that I perform on a MSSQL Content Manager database is to run stored procedure sp_updatestats . exec sp_updatestats However, that is not supported on an AWS RDS instance. The error message below indicates that only the sa  account can perform this: Msg 15247 , Level 16 , State 1 , Procedure sp_updatestats, Line 15 [Batch Start Line 0 ] User does not have permission to perform this action. Instead there are several posts that suggest using UPDATE STATISTICS instead: https://dba.stackexchange.com/questions/145982/sp-updatestats-vs-update-statistics I stumbled upon the following post from 2008 (!!!), https://social.msdn.microsoft.com/Forums/sqlserver/en-US/186e3db0-fe37-4c31-b017-8e7c24d19697/spupdatestats-fails-to-run-with-permission-error-under-dbopriveleged-user , which describes a way to wrap the call to sp_updatestats and execute it under a different user: create procedure dbo.sp_updstats with execute as 'dbo' as

Toolkit - Dynamic Content Queries

This post if part of a series about the  File System Toolkit  - a custom content delivery API for SDL Tridion. This post presents the Dynamic Content Query capability. The requirements for the Toolkit API are that it should be able to provide CustomMeta queries, pagination, and sorting -- all on the file system, without the use third party tools (database, search engines, indexers, etc). Therefore I had to implement a simple database engine and indexer -- which is described in more detail in post Writing My Own Database Engine . The querying logic does not make use of cache. This means the query logic is executed every time. When models are requested, the models are however retrieved using the ModelFactory and those are cached. Query Class This is the main class for dynamic content queries. It is the entry point into the execution logic of a query. The class takes as parameter a Criterion (presented below) which triggers the execution of query in all sub-criteria of a Criterio