Making Add-on Modules
(make an add-on module that parses a search engine output)
 

SEARCH ENGINE PROFILE:
 
Search Engine Name:
example: Altavista

Main URL of the search engine's web-site:
example: http://www.altavista.com/

Put automatically before each user-entered metasearch query string the following search terms. The purpose is to restrict by default the expected results set by adding more bounding search terms in front of each query sent to the search engine. This could improve the relevancy of the returned results from the module when the expected results are from a given topic described by these words. The boolean syntax of this string must be the search engine's native because it will be send 'AS IS'. The default search terms must be most important for the search topic and not too many because some of the engines will cut too long search query strings and thus we could fail to submit to the search engine the user-entered query terms.
example: (CV OR Resume OR Vitae) AND

Extra terms to append automatically to each metasearch query. The purpose is to restrict additionally the expected results set for not having 'prohibited' words or so. This could improve more the relevancy of the returned results from the module. The boolean syntax of this string must be the search engine's native because it will be attached 'AS IS'. Put in this box less important query terms than above because if the remote search engine cut the query we do not loose much accuracy.
tip: start with the most important search terms and continue to the less important because some engines may remove the last query words if the query is too long 
example: -send -post -apply -jobs

 
INITIAL HTTP REQUEST DATA:

The role of an add-on module is to define: what http request to be sent to the search engine and how to extract the search results from its response html page. Now you have to define the first item: what http query to be sent to the search engine in order to get an html page with the requested search results.

How to determine what is the request url for this search engine? Just perform a search query with the search engine and when the response page became loaded by the browser, in the address/location bar of your web browser you will see the request url for this query. It has 2 parts delimited by '?' sign. The part before '?' sign is the url to the cgi form handler and the string after the '?' sign is the parameters part of the cgi request.
example if the request method is GET: 99% of the search engines today use GET http method so most likely you can look at this example first. If the query url was: http://www.altavista.com/sites/search/web?q=test&kl=XX  the url to the cgi handler will be: http://www.altavista.com/sites/search/web and the parameters part will be: q=test&kl=XX
In the parameter's part we have to replace test with the special string $e_qry_str. This string will be changed by the script in runtime with the actual search query string for the current metasearch query and this way submitted to the search engine. So for our purposes the parameter's part will be: q=$e_qry_str&kl=XX
example if the request method is POST: For few engines you have request using POST method. If you see in the cgi form html source code METHOD="POST" in an html tag like <FORM ACTION="/path/to/seeker" METHOD="POST"> you can be sure that this search engine uses POST request method. In this case the cgi form handler url is the url you see in the browser's address/location bar. The parameter's string could be build by joining all input form field name and value pairs in one line. Each pair must be separated from the others by '&' sign and pairs must have '=' between field name and its value. For example if the form code is:
<FORM ACTION="/path/to/seeker" METHOD="POST">
  <INPUT TYPE=TEXT NAME="query">
  <INPUT TYPE=HIDDEN NAME="atx" VALUE="">
  <INPUT TYPE=HIDDEN NAME="p" VALUE="10">
</FORM>

The parameters request data string will be: query=$e_qry_str&atx=p=10

Search engine's CGI form handler/script url: This is the stuff before '?' sign in the request url if method is 'GET'.
example1: http://www.altavista.com/sites/search/web
example2: http://www.searchfeed.com/rd/feed/JavaScriptFeed.jsp

CGI parameter's data string in a www-url-encoded format passed to the cgi form handler/script: This is the stuff after '?' sign in the request url if method is 'GET'. You can embed in this string the following parameter tags that will be replaced with real values during execution of the query by the script:
$e_qry_str - url encoded query string or use:
$e_keywords[0] - first keyword
$e_keywords[1] - second keyword
$e_keywords[2] - third keyword
.....
$$in{key} - elements of the cgi input perl hash $in
$e_affiliateID - url encoded affiliate ID. If the search site supports affiliate codes in the url replace the current code with this string. From Admin Panel you can set/change your actual Affiliate ID at any time.
$ip
- web server ip address
$e_ip
- url-encoded web server ip address
$user_ip
- user ip address
$e_user_ip
- url-encoded user ip address
example1: q=$e_qry_str&kl=XX
example2: cat=$e_qry_str&pID=$e_affiliateID&nl=20&excID=

HTTP Request Method: The 'GET' option should work in 99% of the cases.

Use the following test parameter values: Parameter names always start with a dollar '$' sign.

parameter
(name always starts with '$')

parameter's test value
(perl expressions are allowed too)

 
Parameters with values depending on the category. You can define some additional parameter tags, which values depend on the metasearch category. For example you can define parameter $language that have for category 'French' value 'fr' and for category 'web' value 'any'. So you will have one module with different request data for each category. In the first row of the table below enter parameters names. Each parameter name starts with a dollar '$' sign.

$out{per_category_table}

Tip: You can access password protected sites by having the urls in this shape: http://username:password@host:port/path/to/something.htm
Example: http://user123:dh34fH@myhost.com:8080/hello.asp
Currently only basic http authorization is supported. The username and password are in url-encoded form. So if you have the "@" sign you have to replace it with its code %40. The code of ":" is %3A

 
NEXT RESULTS PAGES HTTP REQUEST DATA:

NOTE: The following section is OPTIONAL. By default the metasearch query requests and downloads only the first/top search results. However you can define for this module a possibility to access and parse search results contained in the second, third, etc ... search engine results pages associated with the current search query. Skip this section if you need only the top (first page) results and do not need to access&parse the next results pages.

Usually the request data for the next results pages differs from the initial main request data only by a few parameters that are most likely depending somehow on the number of the requested 'next' page. For example if you request the second results page most probably you will have in the parameters part of the request a parameter like page=2 or start_from=20  For this purpose we introduce an additional parameter $page that you can use within the parameters string. As well you can embed there any valid perl expression which includes the allowed so far perl variables since the data string is evaluated as a perl expression at runtime. For example you can embed some of the following derivative perl expressions within the parameter's part of the request:
$page  - the number of the current results page requested
${\(1+$page)}  - the number of the next results page we want to receive
${\(10*$page)}  - how many results retrieved so far (if we get 10 results per page)
${\(10*$page+1)}  - start results from this index
${\(10*$page+10)}  - the last result index to deliver
.....................

If it is easier for you skip this section and define later in the next module building step a regular expression that parses the 'next page request url' directly from the results html source code.

Search engine's 'Next Page' CGI form handler/script url:
example: http://www.altavista.com/sites/search/web

'Next Page' parameters in a www-url-encoded format passed to the cgi form handler/script:
example: q=$e_qry_str&kl=XX&stq=${\(10*$page)}

HTTP Request Method: