SharePoint crawling not working with non-default zone (public facing site)

Its been very long time to back in here to share the work around fixes.  I have been configured public facing branding site in SharePoint 2013 multi tier environment.

Servers in the farm

SPAPP-V01
SPAPP-V02
SPWFE-V01
SPWFE-V02
SPSQL-AG (Availability group listener)
SPSQL-V01
SPSQL-V02
SPSQL-V03

Web application has been extended to Internet zone, which is exposed in public via F5, SSL configured on IIS.

AAM looks like

http://SPAPP-V01:8080       Default      http://SPAPP-V01:8080
https://www.sathiya.io            Internet    https://www.sathiya.io
http://www.sathiya.io              Internet    https://www.sathiya.io
http://sathiya.io                          Internet    https://www.sathiya.io
https://sathiya.io                       Internet     https://www.sathiya.io

SharePoint search was configured with below topology;
SPAPP-V01 : Admin, Crawler, Content Processing, Analytics Processing & Index Partition
SPAPP-V02 : Admin, Crawler, Content Processing, Analytics Processing & Index Partition
SPWFE-V01 : Query Processing
SPWFE-V02 : Query Processing

Lets dig into the fixes here;
Most of the forum says to use the default zone url into your search content source, this would be purely depends on requirements of the application. But here it was Internet zone.

  • Ignore SSL warning if off-loaded SSL, which is configured in f5.  Go to ->CA -> Manage Service applications -> Select appropriate application -> Farm Search administration -> select Ignore SSL warning and uncheck if checked already.
  • Make sure if Default content access account has site permission, open public facing site with that account and confirm before moving on next
  • Create new content source with Internet zone Url
  • Make them all with Search and Offline Availability, configure from Site Settings -> Under Search -> Search and Offline Availability
    • Allow this site to appear in search results? Yes
    • This site contains fine-grained permissions. Specify the site’s ASPX page indexing behavior: Always index all Web Parts on this site
    • Allow items from this site to be downloaded to offline clients? Yes
  • Make sure all list and libraries are in searchable
  • Disable loop back check with below PowerShell commend and restart the machine
  • Supposed if you get an access denied in crawling logs, you should configure crawling rules with Default content access account.  Navigate to Crawl rule -> New Crawl Rule -> Enter the path -> select Include all items in the path under Crawl Configuration -> Select Use default content access account under Specify authentication
  • Most of the public facing site did security fixes as disabling host headers.  Supposed if you were disabled  MicrosoftSharePointTeamServices, the crawl won’t happen child level site collections/sites.  The crawl will happen only top level.  In this scenario bring MicsoftSharePointTeamService header back and add <clear /> tag under <CustomHeaders>
  • Some of the environment has to configure proxy to find your Url, either host entry is need in crawling server.  Point any one of the WFE application server to application Url (Ex. 192.168.10.153    https://www.sathiya.io)
  • Do Index reset and go ahead with full crawl

Hope this would help you out in any scenario, if not please drop a comments below to discuss more.

Cheers…!

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.