Michael Schau

My SharePoint Blog

Wrong wordbreaker on Index Server

Recently I noticed that I didn't get the expected search results when querying for data indexed through BDC.

The search term was System.ArithmeticException and I knew for sure that this term existed in the BDC datasource, but I got no results. I started investigating, but I'll spare you that story and instead tell you why this happened.

In order to get correct search behaviour you need to make sure that the same language specific wordbreaker is used both on query and indexing time. So if you're indexing English content your query wordbreaker should also be English. MOSS is using the browsers language settings to determine what wordbreaker to use.

     

On indexing time it's a different story. The iFilter checks if the text chunk from a document (or BDC source) has Locale info. If yes, it'll use the wordbreaker assosiated with that Locale.

If it doesn't contain Locale info (the BDC datasource didn't) the System Locale of the Index Server is used. In this case the Regional Settings on the Index Server was set to Danish, meaning that the BDC source was indexed using the Danish wordbreaker. Outch!! Yell

I did a quick test and put the text System.ArithmeticException into a Word document and indexed it. No problem, I got hit on it when searching. I tried the same on a txt document. This time I didn't got a hit because a txt file doesn't contain Locale info.

Changing the Regional Settings to English and reindexing all content solved the problem...

Now you know what it means if the Regional Setting of your Index Server is NOT set to English. Of cource in some cases it might makes sense to set to something other than Ensligh, but not is this case.