Skip to main content

 
IBM Power Systems software  >  IBM i  > Software  > 

HTTP Server for i

Secure, powerful, and complete

  
Overview Getting Started Documentation Support


iSeries Webserver Search Engine - Features

This page contains the documentation that is specific to the iSeries Webserver Search Engine features. A tips and techniques section supplies help for various topics.

Table of Contents
  Configure HTTP Search Command
  Field Search
  Sorting Results
  Search in Results
  Thesaurus Support
  Tips and Techniques: Sample HMTL, Net.Data macro, and PTFs
  Back to Getting Started

Configure HTTP Search Command

The CL command, Configure HTTP Search ( CFGHTTPSCH), allows search administration features to be processed interactively or in a batch job. This command is especially useful for regularly updating your indexes when documents change. Use job scheduling to do this.

Command options

This command allows you to do many of the same actions as you can do from the Configuration and Administration interface. See also Getting Started - Running the Web Crawler for options related to web crawling.

To create an index, using the command, you must first create a document list. 

CFGHTTPSCH OPTION(*CRTDOCL) DOCLIST('/QIBM/UserData/HTTPSVR/index/myindex.DOCUMENT.LIST') STRDIR('/mydir/mydocs')

This will create the document list /QIBM/UserData/HTTPSVR/index/myindex.DOCUMENT.LIST finding all the files with an *.HTM* extension by traversing this directory and all sub-directories.

Next create the index: 

CFGHTTPSCH OPTION(*CRTIDX) IDX(myindex) DOCLIST('/QIBM/UserData/HTTPSVR/index/myindex.DOCUMENT.LIST')

This will create an index called myindex in the default directory
/QIBM/USERDATA/HTTPSVR/INDEX. The defaults specify that the documents are HTML, that you want to continue to index documents, skipping errors in file processing, that you allow a case sensitive search, that both alphabetic and numeric characters are valid characters, and that you are not indexing any fields.

Examples for Scheduling a Job using the  ADDJOBSCDE command

This example shows how to submit a job to update an index on Friday of every week at 11:30 p.m. 

ADDJOBSCDE JOB(UPDATE)    CMD( CFGHTTPSCH OPTION(*ADDDOC) IDX('myindex') DOCLIST('/QIBM/UserData/HTTPSVR/index/myindex.DOCUMENT.LIST')
      JOB(UPDATE)
      FRQ(*WEEKLY)
      SCDDATE(*NONE)
      SCDDAY(*FRI)
      SCDTIME('23:30:00')
 

This example shows how to submit a job to update an index at 11:30 p.m. on the last day of every 
month. 

ADDJOBSCDE JOB(UPDATE)  CMD(CFGHTTPSCH OPTION(*ADDDOC) IDX('myindex') 
DOCLIST('/QIBM/UserData/HTTPSVR/index/myindex.DOCUMENT.LIST') )
      JOB(UPDATE)
      SCDDATE(*MONTHEND)
      SCDTIME('23:30:00')
      FRQ(*MONTHLY)

Field Search

If you want to limit your searches to the titles of documents or to the contents of a META tag, then you need to index your documents by selecting the fields you want to be able to search. This is done when you create your index.

A field is character text between or within selected HTML tags of an HTML document, such as the text between the tags or a quoted portion of a tagged section.  For example, a typical META tag section of an HTML document might be

To allow users to search on one of the keywords listed, you must select the META tag NAME = Keywords field when you create your index. All of the terms in the tag will be indexed. Additionally the terms will be included in the indexing of the entire document.

You can search all of the document that now includes the fields or just the individual field(s).  Fields will be indexed only when the document content selected is HTML. Fields are ignored if the document content selected is TEXT.  When you update the index with a new document list, these fields will also be indexed. 

The selected fields are retrieved for the index on the search form and are listed in the drop down box Select the scope of the search. You can select whether you want to search all of the document or limit the search just to this field, Search on META tag NAME = Keywords.

The following field types are supported:

  • Title - The text between tags.
  • All strings in META tags - All quoted strings following

  • CONTENT = when there is a NAME= tag.
  • META tag NAME = "Abstract" - The quoted string following CONTENT = when NAME = "Abstract".
  • META tag NAME = "Author" - The quoted string following 

  • CONTENT = when NAME="Author".
  • META tag NAME = "Description" - The quoted string following CONTENT = when NAME="Description".
  • META tag NAME = "Keywords" - The quoted string following CONTENT = when NAME="Keywords".

Sorting Results

The sample Net.Data macro has a drop down box that allows you to select how you want the results sorted. The default support is by ranking or rating where the document with the highest rating appears first. You can select to sort by the title of the document or the time the document was last updated. The sort order can be ascending (best for a title sort) or descending (best for sort by rank or time.) 

Search in Results

The sample Net.Data macro output form has 2 radio buttons for New search or Search in results. If you select to search in results, clear the query text box and enter a new search term and search. Notice that just under the query text box, the expanded query is displayed. 

On a simple search, be sure to select AND for the operator. For example, using the sample recipes index, enter crab on the original search. Note the number of documents found. Now on the results page, clear the box and enter cakes. The search query is crab cakes using the operator AND. Notice there are fewer documents found. 

On an advanced search, the new query must have operators inserted if more than one term is entered. The new terms will be ANDed with the previous terms. For example, if you enter crab AND mushroom on the first search, then enter quiche on the next query, checking search in results, the actual query will be (crab AND mushroom) AND (quiche).

Thesaurus Support

The Webserver Search Engine thesaurus support allows you to automatically expand a search query by using a thesaurus. You can put words into your thesaurus dictionary that are relevant to your documents.

How a thesaurus works

In order to improve the search result, the search condition can be enhanced using a thesaurus. A thesaurus contains words that are synonyms or related terms of a key word. Using ping-pong as an example, ping-pong and table tennis are synonyms and ping-pong and ball game are related terms. When searching for documents containing ping-pong without a thesaurus, only those documents containing  the exact word ping-pong will be found. Using a thesaurus that includes synonyms such as table tennis would find documents containing either ping-pong or table tennis.

In another example, if ping-pong is included as a related term of ball game, the search results would increase again. Thesaurus support provides an automated way for users to easily find documents with material relevant to the search term entered.

Net.Data samples with thesaurus support:

A sample Net.Data search macro called thesaurus_sample_search.ndm is supplied. It contains all features such as field search, search in results, and sort search results. There are comments inside the macro that will help modify the thesaurus related sections.

A sample thesaurus definition file is also supplied which contains items specifically for the sample recipes. It can be used to test the creation of a thesaurus dictionary.

The sample search form with thesaurus support is in:
 /QIBM/ProdData/HTTP/Public/HTTPSVR/thesaurus_sample_search.ndm.

The sample thesaurus definition file for the sample recipes is in:
 /QIBM/ProdData/HTTP/Public/HTTPSVR/sample_thesaurus.txt.

Other samples:

The sample search form without thesaurus support is in:
 /QIBM/ProdData/HTTP/Public/HTTPSVR/sample_search.ndm.

The sample html form is in:
 /QIBM/ProdData/HTTP/Public/HTTPSVR/sample_html.html
 

Search Net.Data Macros

The simple and advanced sample search forms allow using a thesaurus on a search. A form contained within the sample is a pop up window created when you select to open a new window to expand terms. This is a good way to find out how your thesaurus will work on a search.

Thesaurus radio buttons:

  • Do not use a thesaurus.
  • This is the default which will search using only the entered terms.
  • Open a new window to expand terms
  •  Selecting this button will cause a new window to be opened. This new form (thesaurus) allows you to enter a term and one or more relationships. The Expand Terms button will retrieve into a table the terms expanded from the original term, using the selected relationships. You can add terms to  the original term, and expand the terms  again. Notice that any text entered in this text box is reflected in the text box on the search form. Use this new form to find out how your thesaurus will expand terms on an actual search with thesaurus.
  • Automatically use a thesaurus on a search.
  • Selecting this button will initiate a search and will expand the search terms, using the selected relationships.

The simple search form has been re-arranged so that the operator buttons immediately follow the thesaurus buttons.

Be sure to select the operator OR when you use thesaurus support to expand the search. If you use the pop-up window, a table of expanded terms will be displayed for the term you enter when you click the Expand terms button. Clicking on the check box adds the term to the query. Un-checking the box, removes the term from the query.

The advanced search form also has the thesaurus buttons added. If you use the pop-up window to expand the search terms, you must manually add the terms, being sure to include an operator between each term. Other attributes such as @F1 for a field search can be specified to terms you want expanded either on the pop-up window or on the search form..

If you are using thesaurus support, you should remove the Search in results radio button from the Net.Data  output form. On a simple search with thesaurus, the operator is set to OR to broaden the search. Searching in results requires the operator AND in order to narrow the search.
 

Configure HTTP Search (CFGHTTPSCH) CL command

This command has options for thesaurus support:

  • Create a thesaurus dictionary - OPTION(*CRTTHSDCT)
  • Retrieve a thesaurus definition file from a dictionary - OPTION(*RTVTHSDFNF)
  • Delete a thesaurus dictionary - OPTION(*DLTTHSDCT)

To create a thesaurus dictionary, the following are required:

  •  The name of a thesaurus definition file -- This file can be either in the Root (/) directory of the IFS or the QSYS.LIB file system.
  •  The name of the thesaurus -- The name of the thesaurus can be no longer than 8 single byte characters and cannot contain a ".".
  •  The name of the thesaurus directory -- The directory cannot be the same one used for indexes. The default directory is /QIBM/UserData/HTTPSVR/search.

 CFGHTTPSCH OPTION(*CRTTHSDCT)  THSDCT(mydict)  THSDFNF('/mydir/mythesdef.txt')

This command creates a thesaurus dictionary called mydict using the thesaurus definition file  /mydir/mythesdef.txt. All thesaurus related files are stored in the default directory /QIBM/USERDATA/HTTPSVR/SEARCH.

The following thesaurus related  files are created under the directory specified. xxxx is the name of the thesaurus dictionary.

    xxxx.wdf    xxxx.wdv    xxxx.grf    xxxx.grv
    xxxx.MEY    xxxx.ROS    xxxx.NEY    xxxx.SOS   xxxx.lkn (where n is a digit)
 

Generating a Thesaurus Dictionary

To generate a thesaurus dictionary, you will need to create a thesaurus definition file. The definition file contains terms to be used to expand the search plus the relationships between terms. Create the thesaurus definition file as a text file using a text editor.

Planning the thesaurus definition file

You should plan the thesaurus by selecting terms and topics appropriate to the documents. Create a list of terms that users are likely to enter on a search. If your documents do not contain these terms, then adding them to the thesaurus will guarantee that resulting documents will be found.  For example, if your documents contain AS/400 (with a slash) but a user enters AS400 (without a slash) or iSeries, no documents will be found. If you add AS400 and iSeries as synonyms of AS/400 to your thesaurus, these documents will be found.

Term relationships can be defined as higher, lower, synonyms, or related to. A term that is higher_than another term means that it is more general. For example, dessert is higher than pie. A term that is lower_than  another term means that it is a more specific instance of the general term. Using the same example, pie is lower than dessert. If you specify a higher or lower relationship, the opposite is implied. If you define pie as lower than dessert, you do not need to define dessert as higher than pie. A term that is a synonym_of another term means that it has the same or nearly the same meaning as another term. For example, PC and personal computer are synonyms. A term that is related_to another term means the term is not a synonym but is in the same category or context as another term. For example, pie and tart are related to pastry.

In the thesaurus definition file you can define many relationships including higher than , lower than, synonym, and related to. However, when a search is performed, search terms are only expanded using the relationships specified on the search request. Multiple relationships can be specified on a simple request.

When thesaurus terms are expanded, the relationship is extended to the new terms. Here is an example of a thesaurus definition:

:WORDS:SYNONYM
fish
seafood

:WORDS
.LOWER_THAN fish
crab
shrimp
lobster
walleye

Here are the results from expanding the term crab using various relationships.:
  Term: crab
   Specified relationship: HIGHER
   Resulting additional terms:  fish

  Term: crab
   Specified relationship: HIGHER, SYNONYM
   Resulting additional terms:  fish, seafood   (seafood is a synonym for fish which is higher than crab)

  Term: crab
   Specified relationship: HIGHER, SYNONYM, LOWER,
   Resulting additional terms:  fish seafood  lobster shrimp walleye (lobster, shrimp, and walleye are lower than fish)

As you create the thesaurus definition, remember that your Net.Data macro will most likely have a hard coded relationship rather than allowing the user to select from the list of relationships on the browser form.  This might limit the variety of relationships you need to define.
 

Thesaurus Definition File Format

Example thesaurus definition file:
***************************************************

# Recipe thesaurus definition file
 

:WORDS:SYNONYM
fish
seafood
shellfish

:WORDS
crab
shrimp
lobster
.LOWER_THAN shellfish
 

:WORDS
halibut
walleye
trout
.LOWER_THAN fish

# the following  blocks show the two methods for defining synonyms

:WORDS:SYNONYM
chocolate cake
chocolate torte
chocolate rum cake

:WORDS
     chicken
  .SYNONYM_OF poultry
  .SYNONYM_OF rooster
  .SYNONYM_OF hen

# End of recipe thesaurus definition file
***********************************************

A Thesaurus definition file consists of blocks containing elements. Each element of the block is defined by a capitalized  keyword
and contains terms which are  single or multiple words -- cake is a term -- chocolate cake is a term. All of the terms grouped under
:WORDS are called the member terms.
 

    :WORDS[:RELATED or :SYNONYM] Starts the definition block
            :RELATED        Member terms are related to one another although not synonyms
            :SYNONYM       Member terms are synonyms of one another.
              Member  terms go here, one per line.
      ( The following are relationships and can be specified in any order. One or more is required.)
          .LOWER_THAN      term    --  Member terms are lower (narrower in meaning) than this term.
          .HIGHER_THAN     term    --  Member terms  are higher (broader in meaning) than this term
          .RELATED_TO        term    --  Member terms are related to this term.
          .SYNONYM_OF      term    --  Member terms are synonyms of this term.
  Member terms can also be listed after the relationships.

  * or # in the first column is a comment line

Term relationships are completely at the discretion of the person building the thesaurus definition file. For example, you might choose to use a looser interpretation for the meaning of synonym by defining both related terms and actual synonyms as synonyms. In this case, you would not define any related terms.

A thesaurus will contain multiple definition blocks.
This is one definition block:
-------------------------------------------------------------------------
 |    :WORDS
 |            beef  wellington
 |            chicken  cordon bleu
 |            pork chops
 |         .LOWER_THAN dinner
 |         .RELATED_TO meat
-------------------------------------------------------------------------

The following block is interpreted exactly the same as the one above.
-------------------------------------------------------------------------
 |    :WORDS
 |         .LOWER_THAN dinner
 |         .RELATED_TO meat
 |            beef  wellington
 |            chicken  cordon bleu
 |            pork chops
-------------------------------------------------------------------------
Each block must begin with the keyword :WORDS
At least one term must be in the definition block and at least one relationship must be defined.

Terms will be handled as follows:

  • Preceding and trailing blanks and control characters are removed.
  • A term beginning with the 1 byte character period(".") or 1 byte character colon(":") cannot be specified.
  • Terms containing a blank character in the middle can be specified, for example, computer science.
  • 1 byte character and 2 byte characters of the same character are regarded as the same.
  • Capital letters and small letters of the same character are regarded as the same.
  • The maximum length of a term is 64  bytes or single characters.

Creating the thesaurus dictionary

Use the CFGHTTPSCH CL command to create the dictionary from the thesaurus definition file, OPTION(*CRTTHSDCT).
The name and directory containing the thesaurus will need to be set in  the sample_search_thesaurus.ndm Net.Data macro.

When the dictionary is created the terms in it are sorted and normalized as follows.
    Letters with no diacritical marks (English letters) are capitalized. 
     (table tennis -> TABLE TENNIS)
    Letters with diacritical marks are not changed.
 

Retrieving the thesaurus definition file from a thesaurus dictionary

Once a thesaurus dictionary is created, you can retrieve the thesaurus definition from the dictionary. Use the CFGHTTPSCH CL command to retrieve the definition file from a thesaurus dictionary, OPTION(*RTVTHSDFNF).   The definition can be retrieved into either the Root (/) directory of the IFS or the QSYS.LIB file system as a source physical file. This new file can be updated and then used to create a new thesaurus dictionary. Be sure to delete the old thesaurus before creating a new one. 


Tips and Techniques: Sample HMTL and Net.Data macros

If you are using sample_html.html, you will need to make sure that all the parameters for function call @search_index in the sample macros are set as hidden values in sample_html.html. Updated versions of the samples have been available for several releases. However, if you are using older versions of the samples that are not providing the expected results, check below.

Added parameters in sample_search.ndm

These parameters have been added to the @search_index function call:

  • frmFieldName - field search PTF
  • frmSortOrder - sort results PTF
  • frmAscDesc - sort results PTF

This is the function call that contains the new parameters shown in bold :

@search_index("Search", frmQueryType, frmIndexName, frmDir, String2, frmStartNum, frmMaxCount, frmCaseSensitive, frmPrecision, frmStemming, frmLogical, frmPreference, frmMapFile, Hits, Occurrences, DocsReturned, ResultsTable, idxRC, frmFieldName,frmSortOrder, frmAscDesc

The section of hidden values in sample_html.html should look like this to pass parameters correctly to sample_search.ndm.
















You can alternately set these variables in the %DEFINE block in the macro. Then you do not need to change the sample_html.html. See the macro for the allowable constants.Some of these variables are already set.

%DEFINE  {

frmFieldName = "BODY"  (already set)
frmSortOrder="SORT_BY_RANK"  (not set to a value)
frmAscDesc="DESCENDING"  (not set to a value)

 %} 
 


Macro for Thesaurus Support - thesaurus_sample_search.ndm

This macro contains all of the same function as the sample_search.ndm. Make sure all of the following variables are set in the sample_html.html.

These parameters have been added to the @search_index function call:

  • frmFieldName
  • frmSortOrder
  • frmAscDesc x
  • frmThesaurusName
  • frmThesaurusDir x
  • relationship
  • frmTerms

This is the function call that contains the new parameters shown in bold :

@search_index("Search", frmQueryType, frmIndexName, frmDir, String2, frmStartNum, frmMaxCount, frmCaseSensitive, frmPrecision, frmStemming, frmLogical, frmPreference, frmMapFile, Hits, Occurrences, DocsReturned, ResultsTable, idxRC, frmFieldName,frmSortOrder, frmAscDesc,frmThesaurusName,frmThesaurusDir,relationship,frmTerms)

The section of hidden values in sample_html.html should look like this to pass parameters correctly to thesaurus_sample_search.ndm.


















  
  
  
  

You can alternately set these variables in the %DEFINE block in the macro. Then you do not need to change the sample_html.html. See the macro for the allowable constants. Some of these variables are already set. 

%DEFINE  {

frmFieldName = "BODY"  (already set)
frmSortOrder="SORT_BY_RANK"  (not set to a value)
frmAscDesc="DESCENDING"  (not set to a value) 
frmThesaurusName="RecipesT" (already set)
frmThesaurusDir="/QIBM/UserData/HTTPSVR/search" (already set)
relationship="ALL"  (change the %LIST variable to this)
frmTerms="5" (already set)

 %} 

Error from incorrect parameters on function call @search_index: 
Values not set on the search_index function call result in the error, One of the input parameters to the application program is invalid. Please check your calling routine.

Debugging a function call
To check the value of a form variable that is passed on a function call, add the following to the macro right before the function call to see the actual value being passed.

 

frmIndexName=$(frmIndexName)

If when you bring up the browser page, you see :
frmIndexName=Myindex, the value is set correctly.
If you see: frmIndexName=    , there is no value for the variable because the variable has never been set.

Use the new Net.Data macro without sort support
(Customer question)
In some of the example Net.Data macros it was possible to define a macro that did not sort the results. Is this still possible...? Is it simply a case of just not specify the 'sortby' and 'order' parameters?
(Answer)
Use either of the following methods.
Method 1:
If you are using the macro with these variables, set them in the DEFINE section  at the beginning of the macro and then delete the drop down boxes for sorting. These are the original default values.

 %DEFINE { 
    frmSortOrder = "SORT_BY_RANK" 
    frmAscDesc = "DESCENDING"
}
Method 2:
The direct call to search_index keeps changing as we add new search features. The new parameters are always added to the end of the list.  The colors show the additional parameters.

 %FUNCTION(DTW_DIRECTCALL) 
      search_index(IN CHAR(6)      Request_Type, 
                                       CHAR(8)      Query_Type, 
                                       CHAR(10)      Index_Name, 
                                       CHAR(256)   Index_Dir, 
                                       CHAR(256)   SearchString, 
                                       CHAR(6)      StartNumber, 
                                        CHAR(6)      MaxCount, 
                                       CHAR(4)      Case_Sensitive, 
                                       CHAR(3)      Precision, 
                                       CHAR(4)      Stemming, 
                                      CHAR(6)      Logical, 
                                      CHAR(8)      SearchPreference, 
                                       CHAR(256)   Map_File, 
                             OUT  CHAR(6)      DocumentHits, 
                                       CHAR(6)      Occurrences, 
                                      CHAR(6)      DocumentsReturned, 
                                       DTWTABLE     Results, 
                                      CHAR(8)      Return_Code,     Original set  - required 
                              IN   CHAR(30)    Field,                  Next update 
                                     CHAR(30)    SortBy,               Last update
                                     CHAR(30)    Order ){ 

Our next update was for thesaurus support when we added a new macro with additional parameters. You can pass any of the above groups of parameters (18, 19, or 21) and the search will work. HOWEVER, you will also need to make sure you change the actual call to search_index so that the parameter lists match.

 @ search_index("Search", frmQueryType, frmIndexName, frmDir, String2, frmStartNum, frmMaxCount, frmCaseSensitive, frmPrecision,  frmStemming, frmLogical, frmPreference, frmMapFile, Hits, Occurrences, DocsReturned, ResultsTable, idxRC,frmFieldName, frmSortOrder, frmAscDesc) 

Removing the subject section from the search results

Sometimes the subject text is not appropriate for search results. The subject can be removed by deleting a line from the while loop that displays the results. The following shows you which line can be deleted from the macro so that the subject is not displayed. 

 %WHILE (loopCounter <= $(loopStop)) { 

 %{ This displays the number of the document and the title %} 
 

@DTW_TB_rGETV(ResultsTable, loopCounter, resDocNum). 
 
  @DTW_TB_rGETV(ResultsTable, loopCounter, resDocTitle) 

  %{ This displays the subject of the document which is 
    the first 150 characters of the text. If you do not want the 
    text to appear on the results form, delete this line. %} 
  
@DTW_TB_rGETV(ResultsTable, loopCounter, resDocText)
 

   %{ This line displays the URL of the document %} 
   URL: @DTW_TB_rGETV(ResultsTable, 
                        loopCounter, resDocURL)
 

  %{ This displays the rating, modify date, size %} 
    Rating: @DTW_TB_rGETV(ResultsTable, loopCounter, resDocRating)%; 
     Last Modified: @DTW_TB_rGETV(ResultsTable, loopCounter, 
                                resDocModified); 
     File Size in Bytes: @DTW_TB_rGETV(ResultsTable, loopCounter, 
                                resDocSize) 
    %{ This is setting a variable - do not delete %} 
   @dtw_Assign(LastDoc, @DTW_TB_rGETV(ResultsTable, loopCounter, 
    resDocNum)) 

      @dtw_Add(loopCounter, "1", loopCounter) 
  %} 
 
 

Back to top