
iSeries Webserver Search Engine - Set up for Crawling Sites Requiring Authentication
This page contains the documentation that is specific to authentication requirements for the web crawler. Be sure to read Getting Started Running the Web Crawler.
Some sites cannot be entered without some sort of authentication, such as a Userid and Password, or a Certificate Authentication. The web crawler has the capacity to handle either case as long as you do the required set up.
For a site requiring a Userid and Password, you must create a Validation List Object, entering the URL, Userid, and Password. Then be sure to specify the Validation List Object when you start crawling. See How to create a validation list object .
For a site that requires Client Authentication, you will need to follow the steps listed below.
- Make sure base option 34 for 5722SS1 is installed in order to use the Digital Certificate Manager.
- Next get a VeriSign. level 1 certificate using Netscape Navigator:
- Go to VeriSign, Inc. at http://www.verisign.com/.
- Under Home & Home Office, click Secure E-Mail.
- In the right hand column click Try a Digital ID FREE for 60 Days.
- Under Class 1 Digital ID, click the Enroll Now button
- Enter a first name such as Web and a last name such as Crawler.
- Enter a valid e-mail address to receive information about your certificate.
- Use test for the Challenge Phrase, without quotes
- Make sure you select I'd like to test drive a 60-day trial Digital ID for free.
- Skip the the Billing Information fields
- Click Accept on the bottom when you are done
- In the Generate A Private Key window up click OK
- Type test in the Password Entry Dialog
- You should get an email within a half an hour or whatever is stated in the message.
- When you get the e-mail, follow the instructions there.
- Go to the web site specified in the email, https://digitalid.verisign.com/enrollment/nspickup.htm.
- Paste (or enter) your Digital ID PIN in the appropriate box and select the Submit button to install
- Accept the ID into Netscape
- Then click the security button on the toolbar in netscape
- In the left hand column of the window that popped up, under Certificates, click Yours.
- The certificate you just got from VeriSign should be listed there (ie. "webcrawler's Verisign, Inc. ID ...)
- Select it
- After you select it, click the Export button
- Give your password test twice.
- Save the certificate in you PC (ie. web_cert.p12) then FTP the certificate to your iSeries system, saving the certificate in a directory. Be sure to transfer the certificate in binary mode.
- Now set up your system for authentication
- On your browser go to the ADMIN Server - http://server:2001/.
- You must have already created your Server Certificate
- Click the Digital Certificate Manager link
- Click the Select a Certificate Store button
- Select *SYSTEM
- Click Continue.
- Enter the password test for your certificate.
- Now click Fast path in the left frame
- Click Work with Server and Client Certificates
- On the Work with Server and Client Certificates form click Import on the bottom.
- Select the file you just FTPed as your import file
- Then click Continue.
- Scroll to the bottom and click Import.
- In the Import file: field, input the path and filename of your certificate, for example, /directory/web_cert.p12).
- Click Continue.
- >Enter your password test.
- You should receive a successful message. The certificate has been imported.
- Now, in the left hand column under Fast Path, click Work with server applications.
- Select Webserver Search Engine radio button
- Then click the Work with Application button on the bottom.
- Click the Update Certificate Assignment button.
- Select the certificate you just imported, then click Assign New Certificate button on the bottom
- You must receive a successful message The certificate was assigned to the application.)
- You are now ready to crawl.
- Be sure to enter the following when you start the crawling session or when you create an options object:
- Proxy server for HTTPS
- Proxy port for HTTPS
- Return to Getting Started - Running the web crawler.
Note: To find out more details about the crawling activity, start a trace by entering mkdir '/CRLTRC' before you start the crawl. Once crawling has started, a file will be created in this directory called TRCHTTPCRL.xxxxx and will contain multiple statements about crawling activity. A new file with a different number will be created for each new session.
|