Sunday, August 22, 2010

Using Coveo with Sitecore. Part 1: Indexing Sitecore items

Coveo Search Platform provides a Sitecore package with a module called "Sitecore Connector", which allows you to crawl not just pages, but also content items without presentation, media library, etc. While Connector configuration is described rather well in this document, there are lot of major points that are not enough clear to the programmer who starts using it.
There are lot of things to do with Coveo, so I'll break the whole story into the different blog posts:
  • Indexing Sitecore items - this post
  • Performing a query (Coveo controls and custom Query Wrappers)
  • Tips and Tricks (snippets, hidden features and real-world examples)

Let's begin from creating a Sitecore template that will contain different field types:

And a very simple content tree:

Then add sample content to all items.

It's time to set-up a Coveo fields for our template.

I will add the same fields that are defined Sitecore template.

and configure additional settings:
  1. "Fields Queries" - allowed for all fields as we may want to search exact field
  2. "Free Text Queries" - it makes sense to check this option for text fields only, as we don't want to search ID's, etc. in our front-end search interface
  3. "Group By" - it makes sense to group our items by "Tags" and "Related Items" fields
  4. "Sort By" - enabled for "Published" field only

Actually, you can check all these options, but it may affect performance, so, I'd recommend using "Sort By" and "Group By" only if you will group / sort results by these fields.

We have two ways to map Sitecore fields to the Coveo index:

  1. Output Sitecore field values as meta tags. This approach does not require any customizations from Coveo side, but it is acceptable only for items that have presentation
  2. Set-up index mapping file that will tell Coveo how to find appropriate fields in Sitecore

Also, you can map some predefined Sitecore fields like item or template name, etc. out of the box. Second approach is generally more flexible, but you can even mix them.

Here's the Index Mapping file for our sample template:

When Coveo will get the item of template defined in the Index Mapping file, it will simple copy Sitecore fields to the index.
You should have noticed something strange in "Tags" field:  <tags>%[Tags._CESSCName]</tags>. As Sitecore stores ID's in a fields of TreeList type, Coveo can find related item by ID and index any it's field as well.

Now click "Rebuild" in the index definition and you'll get the following results in index viewer:

I've highlighted custom fields. Coveo also adds few system fields to the index, some of them can be very useful, for example, index name("@syssource") and file type. More about that later.

Note that "Tags" field contains item names instead of IDs. It's a very useful feature, in the next blog post of this series I'll show how it can be used and some advanced use-cases. Stay tuned!