FAQs (Frequently Asked Questions)

 
Installation

Structure and Performance of this Software

Usage and Customization

Troubleshooting

  


 

Installation

[ top ]  Q: How to use with mod_perl or FastCGI mode? Examples.
mod_perl is an extension of the Apache web server which incorporates Perl & Apache. Running perl scripts via Registry mod_perl mode can increase the performance up to 20 times in comparison with standard cgi mode. FastCGI works similar to the mod_perl cgi interface, but it is available for non-Apache web servers too.
A good scenario for usage of this software is to run 2 Apache servers compiled in DSO (memory sharing) mode:
- 1st is a plain Apache web server running on standard port 80 and serving static objects as html files and images and proxy cgi requests to the second Apache server. It has two enabled modules: mod_proxy and mod_rewrite.
- 2nd is a mod_perl or FastCGI enabled Apache server running on port 8080. For optimal performance it will serve only requests to the cgi scripts.
You can put the configuration directives in separate files and include them in the main config file httpd.conf with a directive such as:

      include "conf/modules.conf"

Sample configuration files could be found in data/docs directory:

   modules.conf  (sample configuration of some frequently used modules, including mod_perl and mod_fastcgi)
   proxy.conf
 (sample proxy configuration for mod_proxy and mod_rewrite)

[ top ]  Q: Known problems.
 - On Windows platform, the tcp socket connect function may not timeout and continue to wait for up to 50 seconds. This is especially true if you have a DNS problem. If this makes sense for you, consider an Unix platform.

 

Structure and Performance of this Software

[ top ]  Q: Files, directories and their purpose.

Directories:
    MTS4/STD/ -
perl modules with relatively standard functionality (could be used in other modules/projects also).
    MTS4/NAVG/ - perl navigation modules. They receive the request data and build response pages using the standard modules.
    MTS4/ADMIN/ - perl modules used only by the administration script wrappers like: admin/admin.cgi
   
MTS4/MODULES/ - add-on search engines modules
    data/ - all data files used and created by this software during its life.
    data/templates/ - html templates by theme
    data/logs/ - log files
    data/tmp/ - a temporary directory used by the software for internal purposes

Files:
    admin/admin.cgi - your Admin Panel; (Security Tip: protect the http access to admin directory with password)
    x.cgi - standard cgi script wrapper
    nph-x.cgi - nph-cgi script wrapper
    x.fcgi - FastCGI script wrapper
    MTS4/STD/Parameters.pm - contains this software's parameters (updatable via Admin Panel)
     ... files having -t or t in the name have set perl's tainting checks off.

[ top ]  Q: What is a nph-script?
NPH(Non-Parsed-Header) script means that the web server will not buffer the cgi script output and the script will talk to the client directly. It is used when we have to receive script streamed data rather than an immediately generated html response. Usually used when the http request triggers a process that could take a lot of time before a complete response. For NPH behavior we use nph-x.cgi, ... cgi wrappers.

[ top ]  Q: How to improve performance? Some test results.
    - run as a FastCGI application or under mod_perl (Apache::Registry mode)
    - use the latest versions of Apache, mod_perl and FastCGI because of bug fixes and performance improvements
    - install and enable Apache::DBI perl module. It will make database connections persistent under mod_perl. Very useful if the current software is SQL database active.
    - if you notice that the swap memory space is used on your server, add more RAM. Having more RAM will save the CPU by caching and buffering a lot of frequently used data.
    - mod_perl is a memory hungry technology. Using proxy servers will reduce the RAM usage tremendously.
Performance test results: Test on single processor PC 800Mhz, Apache 1.3.20, perl 5.6.1 gave the following times in milliseconds for the access to the public part of this software:
 

MetaSearch results

results clustering

0 results

50 results

100 results

200 results

standard CGI mode

yes

380 ms

630 ms

900 ms

1,150 ms

no

580 ms

750 ms

900 ms

mod_perl Registry or
FastCGI mode

yes

28 ms

300 ms

570 ms

780 ms

no

250 ms

400 ms

 540 ms

Tests were made with Chatologica HTTP Benchmark Utility v1.1

 

Usage and Customization

[ top ]  Q: Can I rename files and directories?
In general it's not recommended because this software may fail to find some of its components if you rename them. However feel free to rename any of the script wrapper file names: admin/admin.cgi, x.cgi, nph-x.cgi, .... However you have to use the new name within the nph-script wrapper too. So if you have a script name like myscript.pl your nph-script counterpart must appear as nph-myscript.pl. Similarly the admin script could be named my_new_admin.cgi and respectively nph-my_new_admin.cgi. It's not recommended to remove the nph- and nph prefix from the nph-script names because it is used to recognize nph scripts. You can't rename the admin directory.
You can rename add-on modules only through the Modules Test utility because the module's source codes and file names are tied.

[ top ]  Q: How to make my own design? Templates usage.
The templates are files with html, javascript or plain text content that reside under your data/templates directory.
They are used by the perl modules of this software as a base for making the final http response pages. When a template is requested at runtime the software loads its content and replaces the chatologica special tags and strings found there with real values generated by the software. Then it sends the reworked template as a web page to the client. You can control the look of this software by modifying these template files.
The following special chatologica template constructions are currently supported:
    $out{my_key} - will replace this string with a value generated by the requested NAVG perl module.
    <C!OUT=my_key>
- the same as above but less recommended because the insertion tag is invisible when the template is previewed during the html design phase. Also some advanced html editors may complain and delete this tag or it's possible that you might delete it by mistake. Use it in a last resort!
    <C!INCLUDE=http://some_url>
- will fetch this url content and insert it 'AS IS' in the place of this tag.
    <C!INCLUDE_AS_HTML=http://some_url>
- will fetch this url content, html encode it, and insert in the place of this tag.
    <C!INCLUDE=path_to/file> -
will read a file and insert its content 'AS IS' in the place of this tag.
    <C!INCLUDE_AS_HTML=path_to/file> -
will read a file and insert its html encoded content in the place of this tag.
    <C!SHOW=ALL> - include this tag in any template file to have displayed all possible special tags available for the template.

Tags available by default for most templates:
    $out{real_script_url} - the actual url of the non-nph-script wrapper
    $out{cgi_url} - url of the directory where the scripts are
    $out{real_nph_script_url} - the actual url of the nph-script wrapper
    $out{script_url} - the actual or the one set in the Admin Panel url of the non-nph-script wrapper
    $out{nph_script_url} - the actual or the one set in the Admin Panel url of the nph-script wrapper
    $out{random100} - random number between 1 and 100 (good for LinkExchange banners)

You can use the Templates Manager from your Admin Panel to edit/delete/create your html template files online. Another approach is to design templates offline and then upload/save in the selected theme directory. When you consider editing an original (coming with this distribution) theme, it's strongly recommended that you create a new unique theme directory and copy the files to that directory. It's a safe way to preserve the edited theme if in a later time you want to reinstall the software or make minor version upgrade over previous installation. When editing a template you have to preserve all special chatologica tags if you want to keep it working properly. You can use the "Find" function of your html/text editor to find which chatologica tags are included in a template file.
Some special template files used by this software for each theme:
    header.htm - html code that we show at the beginning of each non-admin page
    footer.htm - html code that we show at the end of each non-admin page
    notice.htm - custom message shown to client
In most templates the following convention works: $out{h_something} is html encoded and $out{he_something} is html and url encoded value of $out{something}

[ top ]  Q: Template files.
Below are some template files and what chatologica tags you can use.

Ad.htm - will be inserted after every few search results displayed. Usually you put there your banner rotation html code. Some available tags are:
   $out{qry_str} - the query string 'AS IS'.
   $out{h_qry_str} - html encoded query string - suitable for insertion within a web page.
   $out{he_qry_str} - html and url encoded query string - suitable for insertion within a url in a web page.
   $out{random100} - random number between 1 and 100 - good for LinkExchange banners.

result_default.htm - a search result will be displayed by default through this template.
   $out{order} - the search result order of appearance.
   $out{relevancy_percent} - search result relevancy percent.
   $out{h_source_link} - links to the search engines where the search result was found. 
   $out{bh_title}, $out{bh_description}, $out{bh_description_line_1}, $out{bh_description_line_2}, $out{bh_truncated_real_url}  - are the title, description and shortened url ready for insertion within a html code and all query keywords highlighted. If you do not wish highlighting you can just use the following tags: $out{h_title}, $out{h_description}, $out{h_truncated_real_url}
   $out{he_follow_url}, $out{he_real_url} -
are the click through url and the actual site url of the result in a html and url encoded form. You can use these tags to make links to Altavista's online translation service, etc...
   $out{stars} - stars/flags images indicating number of duplicating of this url in the results.

highlighted_result_default.htm - accepts same tags as result_default.htm but used by highlighted modules.

results_default.htm - a search results page will include by default this template. It layouts a number of search results.
   $out{total_results} - total results received.
   $out{search_time} - how many seconds elapsed to generate the results.
   $out{page} - which search result page is displayed.
   $out{start_results} - the first result displayed number.
   $out{end_results} - the last result displayed number.
   $out{pages_links} - links to all search result pages, Next & Previous pages.
   $out{results} - the sequence of search results html codes and possible Ad.htm inclusions.
   $out{per_page} - how many results to display on one search results page.
   $out{pages} - number of search result pages.
   $out{xxx_per_page_url} - url to search results page showing xxx results per page
   $out{resultsadditional_category} - url to search results produced by additional_category.
   $out{results_summary} - summary information about the search engines responses for the 'main' metasearch query.

clustered_results_default.htm - accepts same tags as results_default.htm and is used when have to display clustered results. Additionally you can use:
   $out{h_phrase} - the phrase of the current cluster in html encoded format.
   $out{count} - number of search results in the current cluster.

BeforeResults.htm - we send this template just after the sending if header.htm template and just before the actual querying of the search engines starts. Usually we put there the metasearch cgi form. 
   $out{h_qry_str} - the html encoded query string.
   $out{select_category} - select options for category 'drop down select menu'.
   $out{modules_checkboxes} - html codes of checkboxes through which we select which engines to participate in the metasearch query.
   $out{select_per_page_10} - has value ' SELECTED' if the per_page setting is 10. For each different value you have different tag like: $out{select_per_page_20}, $out{select_per_page_30}, ...
   $out{select_timeout_2} - has value ' SELECTED' if the timeout setting is 2

Results.htm - we send this template once the results are received, sorted and ready for display.
   $out{results_page} - results from the main and the additional metasearch queries in one html piece of code.

no_results_found.htm - included if no results found for a search query in a 'main' metasearch category.

notice.htm - included if have to show some notification.
   $out{message} - the message

header.htm - this is the header file which every page starts with.
   $out{charset} - charset value used in the content-type header metatag.
   $out{base_url} - used in <BASE> html header tag to define what is the main url according which all relative urls are resolved. Usually points to the images directory for the current theme.

cluster_link.htm - link to a cluster of search results relative to a phrase.
   $out{pages_count} - count of links in the results cluster.
   $out{h_url} - url of the search results page showing the clustered results.
   $out{bh_phrase} - the cluster's phrase in html encoded format with keywords highlighted
   $out{h_phrase} - the cluster's phrase in html encoded format with keywords non-highlighted

cluster_links.htm - all links to clusters formatted and ready to be placed in the main results template.
   $out{clustered_links} - all links to clusters appearing consecutively.

summary.htm - display summary of the 'main' metasearch process.
   $out{summary} - details about search results counts and errors if any for each queried search engine.

summary_line.htm - display summary of the metasearch response for one queried search engine. It builds summary.htm template content.
   $out{h_site} - name of the search engine html encoded.
   $out{h_site_URL} - url encoded address of a search engine.
   $out{reported_results_msg} - It displays a message about how may found results a search engine reports (if any).
   $out{received_results} - number of actually received results.
   $out{error_msg} - error message (if any).

Also you can control the pages view through style sheets. The definition file is available at html/images/the_theme_dir/styles.css. For example through this file you can change the appearance of the links to the result pages. Some classes to modify:
   .highlight - used to highlight query words.
    .page_link
- used for the link to a results page.
    .page_no_link
- used to display the current results page in the list of links to results pages.

NOTE: The most of the available tags in one template could be accessed in all other templates too. Usually the tags accessibility depends on the time of their definition. If you are not sure what tag what runtime value has, just include it in a template file, run the script and view the response html code to find the included tag value. Also you can use the tag <C!SHOW=ALL> to see the tags names and their current values for all supported for a template special chatologica tags. For a better understanding of the used tags review the default templates used in this software. Use the Templates Manager to view and edit templates.

[ top ]  Q: How to extend this software's functionality?
Write additional perl modules under MTS4/NAVG/ directory. They are modules with defined function run() that must implement the actual work for this new functionality. Examine the current modules in your MTS4/NAVG/ directory to use them as examples. Access urls are like:  http://.../x.cgi?NAVG=SomeModule. Underlying logic could be placed within modules under MTS4/STD/ directory especially if you consider to use their functionality in other modules.

[ top ]  Q: Modules usage.
The module is a just definition of the search site profile data and parsing regular expressions. By default it uses MTS4::STD::SE_Query::output_parsing subroutine. You can copy and paste this subroutine in your module and modify it to have a custom parsing if you want. Modules should not use '-' symbol in their names.

[ top ]  Q: How to password protect my admin directory?
This software comes with automated password setup utility that currently works only for web servers such as Apache, Netscape that understand .htaccess or .nsconfig files.
Here are the instructions if you decide to write manually a .htaccess file. Upload in your admin directory in ASCII mode a file .htaccess like this one:

AuthUserFile "/usr/home/chatologica/www/cgi-bin/myscript/admin/.htpasswd"
AuthGroupFile /dev/null
AuthName "Admin Panel"
AuthType Basic
<Limit GET POST>
require valid-user
</Limit>

You have sample.htaccess file in the same directory. Replace the example path above:  /usr/home/chatologica/www/cgi-bin/myscript/admin/.htpasswd  with the full path to your own admin directory. Then from shell cd to the admin directory and type the command:
     htpasswd -bc .htpasswd admin mypassword
This will create a password file .htpasswd and setup password mypassword for user admin. Now when you access http://.../admin/admin.cgi you will get a message box on the screen asking to enter username/password. Enter your username (admin in this example) and the password mypassword and you will access your Admin Panel. If later on you forgot your login data just empty admin/.htaccess and admin/.nsconfig files and repeat the procedure above. It's not recommended that they be deleted, just save them with empty content. This is because you have to chmod 777 the admin directory to allow file creation and in this situation some web-servers will refuse executing scripts in world-writable directories. Keeping the files empty will not require chmod 777 and at the same time will allow password-free http access.
If you can't setup a password for one reason or another, you can just rename your admin scripts. Nobody can access the Admin Panel provided that nobody knows the names of your admin scripts.

 

Troubleshooting

[ top ]  Q: Error message: "403 Forbidden. You do not have the right to access this file." or what file permissions I have to set?
This error indicates that the web server user id (usually nobody) does not have permission to read or execute the requested file/directory. Set world-read and/or execute permission (usually chmod 6755) to avoid this problem. Look at README.TXT to see which ones are the right file permissions that you have to set in order to successfully run this software.

[ top ]  Q: Error message: "500 Internal Server Error" and some suggestions on how to fix similar problems.
This error indicates that the script has not produced valid output. Check the following:

[ top ]  Q: "core" file - what is this?
This is a so called 'core dump' and happens during memory faults (wrong pointers in the underlying C libraries). The operating system writes down the current state of memory in a file named 'core' and kills the failed process. Don't worry and delete this file.