Skip Ribbon Commands
Skip to main content

Enrique Blanco

:

Enrique Blanco > Posts > MOSS Crawler: Avoid indexing certain parts of the page
December 04
MOSS Crawler: Avoid indexing certain parts of the page

Everybody knows that moss search is security trimmed, but there are certain circumstances where it is not enough, for example:

Imagine a site that is using a custom master page with some links or text in the header, footer and may be in the lateral menu, the problem is that, when you search for a word located in the master page all the pages in the site (at least all the pages that use that master) will be found and that can be a problem sometimes. For example search “reserved” in www.ferrari.com, try some of the webs in http://www.wssdemo.com/Pages/topwebsites.aspx and you will see what I mean ;)

The solution we found was to hide that parts of the page to the crawler.

How to do that? I suggest two options:

  1. If the content is rendered inside a control you can control the visualization in the control logic.
  2. If the content is static you can place it into a custom control that shows the content or not depending if the visit comes from the search service or not. You must inherit the control from the System.Web.UI.WebControls.Panel  class and overriding the RenderControls method.

How can you know if the visitor is the search service?

Some people might think that identifying the user is the best way, I don’t like that method: In production servers the search service might run under a different account, but in most developer machines all services run under the same account, so we need to find another way.

We checked the User Agent server parameter, MOSS Search Service’s user agent uses something like this:

“Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 5.0 Robot Crawler)”

So, by checking the user agent, you can tell if the visitor is the search service and then trim the contents showed. By the way, it usually does not cause problems with the cache, since the user agent is taken into account by default and the crawler is not an anonymous user :)

After setting up your master and controls with the changes, you must make a full crawl and only the proper portions of the pages will be indexed.

Here is more info on MOSS search user agents.

Comments

Furqan

Could I get any sample code to hide navigation from search???
System Account on 12/9/2009 10:18 AM

AJ

Rock solid method...!!!
I loved this, compared to suggestions like "check users...blah blah". Checking users is not just risky but it's also going to be extremely slow, I mean checking if the current user exists in the Advanced Permissions is kind of useless technique IMHO, it's going to do a lot of API processing.
But this technique seems to be straight forward (without creating any SPWeb objects).
But out of curiosity, how did you check what user agent Crawler uses? Is there anyway (any logs) to confirm this?
And as someone said, if you can provide the code snippet it will be good, not that it's going to be a complex code or something but it's better to see what others are doing...lol
System Account on 12/9/2009 5:17 PM

Enrique

I took the user agent from here... When I was writing the article: http://sharepoint.microsoft.com/blogs/LKuhn/Lists/Posts/Post.aspx?List=29310d0a-1eda-4834-bb4c-06ee575a40c3&ID=49
What I did to find it out was to launch and crawl and debug :)
System Account on 12/9/2009 5:24 PM

Markuz

When you're blocking the navigationitems (in our case) for the search crawler, it isn't able to see/index them, how do you ensure that the crawler reaches all your pages within the sitecollection?
System Account on 1/28/2010 4:00 PM

Enrique

In our case we the crawler gets all the pages in the sites (I assume that it does it by accessing allitems.aspx with the crawler account). We use pages as data repositories and that pages are never linked in the site, but the crawler actually indexes them. A good test is to hide the links and reindex the content, then check the crawl log. Let me see it that works for you.
System Account on 1/28/2010 8:00 PM

Ethan

Edit
 http://www.bothdress.com/mother-of-the-bride-dresses.html technologies not solely is life obtaining much easier but knowledge is continuous to generate additional variations in our lives by contributing to several spheres of our lives like leisure . http://www.aiwatches.org/rolex-watches.html  Using the worldwideweb all this has now come to our extremely doorstep at just the click of the mouse    http://www.bothdress.com/wedding-dresses-princess-wedding-dresses.html  http://www.hisdress.com  http://www.rootwatches.org . http://www.redbottom4u.org/classic-collection.html  Earlier for the function of listening to a song atleast a visit to your native music retailer was expected and seeing a video meant watching the band carry out at surely among their live shows or then getting a
 on 3/7/2012 1:10 PM

Makayla

Edit
 http://www.isweddingdress.org cocktail dress under 100  http://www.isweddingdress.org cheap vintage bridal gowns honeycomb pattern http://www.isweddingdress.org halter beach dress . This dial is accented with grey hour figures and skeleton palms. This observe can also be produced to become purposeful with its date indicator located on three o'clock, twelve hour counter identified on six o'clock, http://www.isweddingdress.org evening gown dresses  modest seconds counter on nine o'clock, as well as a moment counter on twelve o'clock place. http://www.isweddingdress.org cheap cocktail prom dresses  This look at also consists of two middle seconds hand exactly where 1 of which is often a sweep seconds chronograph.This P'6920 by Porsche Style might be introduced about the coming Baselworld 2009. This look at will even be obtainable in restricted version of two hundred items http://www.isweddingdress.org bridal gown dresses .As all of us know, Tissot is really a watchmaker that generates versions of watches which characteristics not just stylishness but additionally performance http://www.isweddingdress.org/Wedding-Dresses.html white bridal gowns . http://www.isweddingdress.org cheap bridesmaid dresses  From 1 of their most practical collections,    they're presenting this new see for 2009 named because the Tissot Sea Contact.As opposed to
 on 3/23/2012 7:25 AM

clive

Edit
Remain in contact along with your suppliers so you will know once they include goods and solutions that may possibly be of curiosity for your brides http://www.isweddingdress.org brides dresses . Also, allow them know if you include companies for your company so they are able to refer far more people today for you http://www.isweddingdress.org/Wedding-Dresses.html formal dress .  As a brand new marriage ceremony planner, there will likely be numerous occasions if you are from your workplace and not able to solution your telephone.    http://www.isweddingdress.org cheap beach dresses  http://www.isweddingdress.org plus size bridal gowns dresses  http://www.isweddingdress.org plus cocktail dresses  So http://www.isweddingdress.org cheap dresses wholesale  http://www.isweddingdress.org bridal dresses under 100 , make sure the greeting in your voice mail seems skilled given that it could possibly be the very first time a possible consumer hears your voice http://www.isweddingdress.org beach dresses . Right here are ten actions to recording a skilled communication: one. Create down what you wish to say prior to you report This can allow it to be less complicated for you personally to don't forget what to say and enable you to
 on 3/23/2012 7:31 AM

long dresses

Edit
 on 3/23/2012 9:15 AM

Grace

Edit
 on 3/30/2012 7:42 AM
1 - 10Next

Add Comment

Items on this list require content approval. Your submission will not appear in public views until approved by someone with proper rights. More information on content approval.

Title


Body *


CommentUrl


Attachments