Monday, August 19, 2013

Making Lucene index smaller in Sitecore 7

With Sitecore 7, indexing is now extremely fast, but if the database is large enough, you may easily end up with 500+ megabytes of disk space occupied by indexes and index rebuild time might still be an issue, as well as a query performance. It might not be a big deal for a global website search, but if you only use Lucene index to find a few most recent articles, etc. - this can be an issue.


Also, "multiple vs single index" question is a rather hot topic, see the following discussion, for example: http://stackoverflow.com/questions/2746568/multiple-or-single-index-in-lucene

When you don't want to index just everything, it would be great to be able to choose which templates you want to index and leave the rest of the data out instead of manually excluding hundreds of templates. Unfortunately, it is not supported out of the box, even though there are some unused  methods / properties for "list:IncludeTemplates" functionality. But it can be achieved by customizing the LuceneIndexConfiguration class and the following trick - you can put all templates to "Exclude Templates", except the ones that are added to "Include Templates".

Step 1. Compile the following code:


Step 2.
 Replace Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration with %your namespace%.ExtendedIndexConfiguration in your index config.


Step 3. Configure templates you wan to index in a following way:

I've managed to reduce index size from 500mb to ~12mb and increase index rebuild speed by up to 15 times by using this trick. Such functionality will be probably implemented in the future versions of CMS, so do not forget to check it if you're using any version newer than 7.0 RTM.