FAQs (Frequently Asked Questions)
Installation
Structure and Performance of this Software
Installation
[
top ] Q: How to use
with mod_perl or FastCGI mode? Examples.
mod_perl is an extension of
the Apache web server which incorporates
Perl & Apache. Running perl scripts via Registry mod_perl mode
can increase the performance up to 20 times in comparison with standard cgi
mode. FastCGI works similar to the
mod_perl cgi interface, but it is available for non-Apache web servers
too.
A good scenario for usage of this software is to run 2 Apache servers compiled
in DSO (memory sharing) mode:
- 1st is a plain Apache web server running on standard port 80 and
serving static objects as html files and images and proxy cgi requests to
the second Apache server. It has two enabled modules: mod_proxy and
mod_rewrite.
- 2nd is a mod_perl or FastCGI enabled Apache server
running on port 8080. For optimal performance it will serve only requests
to the cgi scripts.
You can put the configuration directives in separate files and include them
in the main config file httpd.conf with a directive such as:
include "conf/modules.conf"
Sample configuration files could be found in data/docs directory:
modules.conf
(sample configuration of some frequently used modules, including mod_perl
and mod_fastcgi)
proxy.conf
(sample proxy configuration for mod_proxy and
mod_rewrite)
[
top ] Q: Known
problems.
- On Windows platform, the tcp socket connect function may not timeout
and continue to wait for up to 50 seconds. This is especially true if
you have a DNS problem. If this makes sense for you, consider an Unix platform.
Structure and Performance of this Software
[ top ] Q: Files, directories and their purpose.
Directories:
MTS4/STD/ - perl modules with relatively standard functionality
(could be used in other modules/projects also).
MTS4/NAVG/ - perl navigation modules. They receive the
request data and build response pages using the standard modules.
MTS4/ADMIN/ - perl modules used only by the administration
script wrappers like: admin/admin.cgi
MTS4/MODULES/ - add-on search engines modules
data/ - all data files used and created by this software
during its life.
data/templates/ - html templates by theme
data/logs/ - log files
data/tmp/ - a temporary directory used by the software
for internal purposes
Files:
admin/admin.cgi - your Admin Panel; (Security Tip:
protect the http access to admin directory with password)
x.cgi - standard cgi script wrapper
nph-x.cgi - nph-cgi script wrapper
x.fcgi - FastCGI script wrapper
MTS4/STD/Parameters.pm - contains this software's parameters
(updatable via Admin Panel)
... files having -t or t in the name have
set perl's tainting checks off.
[ top
] Q: What is a nph-script?
NPH(Non-Parsed-Header) script means that the web server will not buffer
the cgi script output and the script will talk to the client directly. It
is used when we have to receive script streamed data rather than an immediately
generated html response. Usually used when the http request triggers a process
that could take a lot of time before a complete response. For NPH behavior
we use nph-x.cgi, ... cgi wrappers.
[ top
] Q: How to improve performance? Some test
results.
- run as a FastCGI application or under mod_perl
(Apache::Registry mode)
- use the latest versions of Apache, mod_perl and FastCGI because
of bug fixes and performance improvements
- install and enable Apache::DBI perl module. It will make
database connections persistent under mod_perl. Very useful if the current
software is SQL database active.
- if you notice that the swap memory space is used on your
server, add more RAM. Having more RAM will save the CPU by caching and buffering
a lot of frequently used data.
- mod_perl is a memory hungry technology. Using proxy servers
will reduce the RAM usage tremendously.
Performance test results: Test on single processor PC 800Mhz, Apache
1.3.20, perl 5.6.1 gave the following times in milliseconds for the access
to the public part of this software:
MetaSearch results |
results clustering |
0 results |
50 results |
100 results |
200 results |
standard CGI mode |
yes |
380 ms |
630 ms |
900 ms |
1,150 ms |
no |
580 ms |
750 ms |
900 ms |
||
mod_perl Registry or |
yes |
28 ms |
300 ms |
570 ms |
780 ms |
no |
250 ms |
400 ms |
540 ms |
Tests were made with Chatologica HTTP Benchmark Utility v1.1
Usage and Customization
[ top ]
Q: Can I rename files and
directories?
In general it's not recommended because this software may fail to find some
of its components if you rename them. However feel free to rename any of
the script wrapper file names: admin/admin.cgi, x.cgi, nph-x.cgi, ....
However you have to use the new name within the nph-script wrapper too. So
if you have a script name like myscript.pl your nph-script counterpart
must appear as nph-myscript.pl. Similarly the admin script could be
named my_new_admin.cgi and respectively nph-my_new_admin.cgi.
It's not recommended to remove the nph- and nph prefix from
the nph-script names because it is used to recognize nph scripts. You can't
rename the admin directory.
You can rename add-on modules only through the
Modules Test utility
because the module's source codes and file names are tied.
[ top ]
Q: How to make my own design? Templates
usage.
The templates are files with html, javascript or plain text content that
reside under your data/templates directory.
They are used by the perl modules of this software as a base for making the
final http response pages. When a template is requested at runtime the software
loads its content and replaces the chatologica special tags and strings
found there with real values generated by the software. Then it sends
the reworked template as a web page to the client. You can control the look
of this software by modifying these template files.
The following special chatologica template constructions are currently
supported:
$out{my_key}
- will replace this string with a value generated by the requested
NAVG perl module.
<C!OUT=my_key>
- the same as above but less recommended because the insertion tag is invisible
when the template is previewed during the html design phase. Also some advanced
html editors may complain and delete this tag or it's possible that you
might delete it by mistake. Use it in a last resort!
<C!INCLUDE=http://some_url>
- will fetch this url content and insert it 'AS IS' in the place of this
tag.
<C!INCLUDE_AS_HTML=http://some_url>
- will fetch this url content, html encode it, and insert in
the place of this tag.
<C!INCLUDE=path_to/file> -
will read a file and insert its content 'AS IS' in the place
of this tag.
<C!INCLUDE_AS_HTML=path_to/file> -
will read a file and insert its html encoded content in the place
of this tag.
<C!SHOW=ALL> - include this tag
in any template file to have displayed all possible special tags available
for the template.
Tags available by default for most templates:
$out{real_script_url} -
the actual url of the non-nph-script wrapper
$out{cgi_url} - url of the
directory where the scripts are
$out{real_nph_script_url} -
the actual url of the nph-script wrapper
$out{script_url} - the actual
or the one set in the Admin Panel url of the non-nph-script wrapper
$out{nph_script_url} - the
actual or the one set in the Admin Panel url of the nph-script wrapper
$out{random100} - random
number between 1 and 100 (good for LinkExchange banners)
You can use the Templates
Manager from your Admin Panel to edit/delete/create your html
template files online. Another approach is to design templates offline and
then upload/save in the selected theme directory. When you consider editing
an original (coming with this distribution) theme, it's strongly recommended
that you create a new unique theme directory and copy the files to that
directory. It's a safe way to preserve the edited theme if in a later time
you want to reinstall the software or make minor version upgrade over previous
installation. When editing a template you have to preserve all special
chatologica tags if you want to keep it working properly. You can use
the "Find" function of your html/text editor to find which chatologica tags
are included in a template file.
Some special template files used by this software for each theme:
header.htm - html code that we show at the beginning
of each non-admin page
footer.htm - html code that we show at the end of each
non-admin page
notice.htm - custom message shown to client
In most templates the following convention works:
$out{h_something} is html encoded and
$out{he_something} is html and url encoded value of
$out{something}
[ top
] Q: Template files.
Below are some template files and what chatologica tags you
can use.
Ad.htm
- will be inserted after every few search results displayed. Usually you
put there your banner rotation html code. Some available tags are:
$out{qry_str} - the query string
'AS IS'.
$out{h_qry_str} - html encoded query string -
suitable for insertion within a web page.
$out{he_qry_str} - html and url encoded query
string - suitable for insertion within a url in a web page.
$out{random100} - random number between 1 and
100 - good for LinkExchange banners.
result_default.htm
- a search result will be displayed by default through this template.
$out{order} - the search result
order of appearance.
$out{relevancy_percent} - search
result relevancy percent.
$out{h_source_link} - links to
the search engines where the search result was found.
$out{bh_title},
$out{bh_description},
$out{bh_description_line_1},
$out{bh_description_line_2},
$out{bh_truncated_real_url}
- are the title, description and
shortened url ready for insertion within a html code and all query keywords
highlighted. If you do not wish highlighting you can just use the following
tags: $out{h_title},
$out{h_description},
$out{h_truncated_real_url}
$out{he_follow_url},
$out{he_real_url} -
are the click through url and the actual site url of the
result in a html and url encoded form. You can use these tags to make links
to Altavista's online translation service, etc...
$out{stars} -
stars/flags images indicating number of duplicating of
this url in the results.
highlighted_result_default.htm - accepts same tags as result_default.htm but used by highlighted modules.
results_default.htm
- a search results page will include by default this template. It layouts
a number of search results.
$out{total_results} - total results
received.
$out{search_time} - how many
seconds elapsed to generate the results.
$out{page} - which search result
page is displayed.
$out{start_results} - the first
result displayed number.
$out{end_results} - the last
result displayed number.
$out{pages_links} - links to
all search result pages, Next & Previous pages.
$out{results} - the sequence
of search results html codes and possible Ad.htm inclusions.
$out{per_page} - how many results
to display on one search results page.
$out{pages} - number of search
result pages.
$out{xxx_per_page_url}
- url to search results page showing xxx results
per page
$out{resultsadditional_category}
- url to search results produced by
additional_category.
$out{results_summary} - summary
information about the search engines responses for the 'main' metasearch
query.
clustered_results_default.htm
- accepts same tags as
results_default.htm
and is used when have to display clustered results. Additionally you can
use:
$out{h_phrase} - the phrase of
the current cluster in html encoded format.
$out{count} - number of search
results in the current cluster.
BeforeResults.htm
- we send this template just after the sending if header.htm template
and just before the actual querying of the search engines starts. Usually
we put there the metasearch cgi form.
$out{h_qry_str} - the html encoded query
string.
$out{select_category} - select options
for category 'drop down select menu'.
$out{modules_checkboxes} - html codes of
checkboxes through which we select which engines to participate in the metasearch
query.
$out{select_per_page_10} - has value '
SELECTED' if the per_page setting is 10. For each different value you have
different tag like: $out{select_per_page_20},
$out{select_per_page_30}, ...
$out{select_timeout_2} - has value ' SELECTED'
if the timeout setting is 2
Results.htm
- we send this template once the results are received, sorted and ready for
display.
$out{results_page} - results from the main
and the additional metasearch queries in one html piece of code.
no_results_found.htm - included if no results found for a search query in a 'main' metasearch category.
notice.htm
- included if have to show some notification.
$out{message} - the message
header.htm
- this is the header file which every page starts with.
$out{charset} - charset value used in the
content-type header metatag.
$out{base_url} - used in <BASE> html header
tag to define what is the main url according which all relative urls are
resolved. Usually points to the images directory for the current theme.
cluster_link.htm
- link to a cluster of search results relative to a phrase.
$out{pages_count} - count of links in the
results cluster.
$out{h_url} - url of the search results page showing
the clustered results.
$out{bh_phrase} - the cluster's phrase in html
encoded format with keywords highlighted
$out{h_phrase} - the cluster's phrase in html
encoded format with keywords non-highlighted
cluster_links.htm
- all links to clusters formatted and ready to be placed in the main results
template.
$out{clustered_links} - all links to clusters
appearing consecutively.
summary.htm
- display summary of the 'main' metasearch process.
$out{summary} - details about search results
counts and errors if any for each queried search engine.
summary_line.htm
- display summary of the metasearch response for one queried search engine.
It builds
summary.htm
template content.
$out{h_site} - name of the search engine
html encoded.
$out{h_site_URL} - url encoded address of a search
engine.
$out{reported_results_msg} - It displays a message
about how may found results a search engine reports (if any).
$out{received_results} - number of actually received
results.
$out{error_msg} - error message (if any).
Also you can control the pages view through style sheets. The definition
file is available at html/images/the_theme_dir/styles.css. For example
through this file you can change the appearance of the links to the result
pages. Some classes to modify:
.highlight - used to highlight
query words.
.page_link - used for the link to a results
page.
.page_no_link - used to display the current
results page in the list of links to results pages.
NOTE: The most of the available tags in one template could be accessed in all other templates too. Usually the tags accessibility depends on the time of their definition. If you are not sure what tag what runtime value has, just include it in a template file, run the script and view the response html code to find the included tag value. Also you can use the tag <C!SHOW=ALL> to see the tags names and their current values for all supported for a template special chatologica tags. For a better understanding of the used tags review the default templates used in this software. Use the Templates Manager to view and edit templates.
[ top ]
Q: How to extend this software's
functionality?
Write additional perl modules under MTS4/NAVG/ directory. They are modules
with defined function run() that must implement the actual work for
this new functionality. Examine the current modules in your MTS4/NAVG/ directory
to use them as examples. Access urls are like:
http://.../x.cgi?NAVG=SomeModule. Underlying logic could be
placed within modules under MTS4/STD/ directory especially if you consider
to use their functionality in other modules.
[ top
] Q: Modules usage.
The module is a just definition of the search site profile data and parsing
regular expressions. By default it uses MTS4::STD::SE_Query::output_parsing
subroutine. You can copy and paste this subroutine in your module and modify
it to have a custom parsing if you want. Modules should not use '-' symbol
in their names.
[ top ]
Q: How to password protect my admin
directory?
This software comes with automated
password setup
utility that currently works only for web servers such as Apache, Netscape
that understand .htaccess or .nsconfig files.
Here are the instructions if you decide to write manually a .htaccess
file. Upload in your admin directory in ASCII mode a file
.htaccess like this one:
AuthUserFile
"/usr/home/chatologica/www/cgi-bin/myscript/admin/.htpasswd"
AuthGroupFile /dev/null
AuthName "Admin Panel"
AuthType Basic
<Limit GET POST>
require valid-user
</Limit>
You have sample.htaccess file in the same directory. Replace the example
path above:
/usr/home/chatologica/www/cgi-bin/myscript/admin/.htpasswd
with the full path to your own admin directory. Then from shell
cd to the admin directory and type the command:
htpasswd -bc .htpasswd admin mypassword
This will create a password file .htpasswd and setup password
mypassword for user admin. Now when you access
http://.../admin/admin.cgi you will get a message box on the screen asking
to enter username/password. Enter your username (admin in this example)
and the password mypassword and you will access your Admin Panel.
If later on you forgot your login data just empty admin/.htaccess
and admin/.nsconfig files and repeat the procedure above. It's not
recommended that they be deleted, just save them with empty content. This
is because you have to chmod 777 the admin directory to allow file creation
and in this situation some web-servers will refuse executing scripts in
world-writable directories. Keeping the files empty will not require chmod
777 and at the same time will allow password-free http access.
If you can't setup a password for one reason or another, you can just rename
your admin scripts. Nobody can access the Admin Panel provided that nobody
knows the names of your admin scripts.
Troubleshooting
[ top
] Q: Error message: "403 Forbidden. You do not
have the right to access this file." or what file permissions I have to
set?
This error indicates that the web server user id (usually nobody)
does not have permission to read or execute the requested file/directory.
Set world-read and/or execute permission (usually chmod
6755) to avoid this problem. Look at README.TXT to see which ones are
the right file permissions that you have to set in order to successfully
run this software.
[ top
] Q: Error message: "500 Internal Server Error"
and some suggestions on how to fix similar problems.
This error indicates that the script has not produced valid output. Check
the following:
[ top
] Q: "core" file - what is this?
This is a so called 'core dump' and happens during memory faults (wrong pointers
in the underlying C libraries). The operating system writes down the
current state of memory in a file named 'core' and kills the failed process.
Don't worry and delete this file.