Copyright © 2005, 2006 Interchange Development Group
This documentation is free; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
It is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Abstract
The purpose of this document is to describe the search "subsystem" in Interchange and link together all search-related topics.
Table of Contents
ac — mv_all_chars
bd — mv_base_directory
bs — mv_begin_string
ck — mv_cache_key
cs — mv_case
op — mv_column_op
co — mv_coordinate
cv — mv_verbatim_columns
de — mv_dict_end
df — mv_dict_fold
di — mv_dict_limit
dl — mv_dict_look
DL — mv_raw_dict_look
do — mv_dict_order
dr — mv_record_delim
em — mv_exact_match
er — mv_spelling_errors
fc — mv_force_coordinate
ff — mv_field_file
fi — mv_search_file
ft — mv_field_title
fm — mv_first_match
fn — mv_field_names
hs — mv_head_skip
id — mv_index_delim
lb — mv_search_label
lf — mv_like_field
lo — mv_list_only
lr — mv_search_line_return
ls — mv_like_spec
ma — mv_more_alpha
mc — mv_more_alpha_chars
md — mv_more_decade
mi — mv_more_id
ml — mv_matchlimit
mm — mv_max_matches
MM — mv_more_matches
mp — mv_profile
ms — mv_min_string
ne — mv_negate
ng — mv_negate
nh — mv_no_hide
nm — mv_no_more
np — mv_nextpage
ns — mv_next_search
nu — mv_numeric
os — mv_orsearch
pm — mv_more_permanent
ra — mv_return_all
dr — mv_return_delim
re — mv_search_reference
rf — mv_return_fields
rg — mv_range_alpha
rl — mv_range_look
rm — mv_range_min
rn — mv_return_file_name
rr — mv_return_reference
rs — mv_return_spec
rx — mv_range_max
sd — mv_small_data
se — mv_searchspec
sf — mv_search_field
sg — mv_search_group
si — mv_search_immediate
sm — mv_start_match
sp — mv_search_page
sq — mv_sql_query
sr — mv_search_relate
st — mv_searchtype
su — mv_substring_match
tf — mv_sort_field
to — mv_sort_option
un — mv_unique
va — mv_value
The Swish search module allows you to search index files generated by Swish-e.
To enable any Swish searching, modify your interchange.cfg
to add:
Require module Vend::Swish AddDirective Swish hash Variable swish Vend::Swish
To configure your catalog to use Swish, modify the appropriate catalog.cfg
and add:
Swish command /usr/bin/swish-e Swish index products/swish-e.db
Finally, in search parameters, use mv_searchtype=swish
or
the shorthand notation st=swish
.
The fields to be returned from Swish to Interchange are configurable, and default to:
mv_return_fields=code score title url mod_date filesize mv_field_names=code score title url mod_date filesize
These correspond to:
code swishreccount score swishrank url swishdocpath title swishtitle filesize swishdocsize mod_date swishlastmodified
The date in the mod_date
field is returned in the
format %Y-%m-%d %H:%M:%S
.
You can change that with the date_format
option:
Swish date_format "%d %b %Y"
See time glossary entry for supported format strings.
Simple search for the term Swish:
swish-e -w Swish
Same search with specifying the index file:
swish-e -w Swish -f db/xmldocs
You can include properties in the output:
swish-e -w Swish -f db/xmldocs -p purpose
Or search within a property:
swish-e -w purpose=LWP -f db/xmldocs
Indexing web sites is pretty easy. Swish provides a spider script, which is
simply called with the parameters default
. Create a configuration
file similar to the following:
starting_URL
IndexFile db/icdevgroup IndexDir /usr/local/lib/swish-e/spider.pl SwishProgParameters default http://www.icdevgroup.org/docs/
Now you can start indexing with swish-e -S prog -c
.
icdevgroup.conf
(directory_name
,
default ProductDir
)
base directory in which to look up text files to search
(related option fi
).
Directory paths can be absolute, provided that the pathname is
equal to the MV_SEARCH_FILE
variable, or
a scratch variable of the same name is 1.
To enable searching in say,
/etc/dict
, use either
[calcn]$Variable->{MV_SEARCH_FILE} =
'/etc/dict'; return[/calcn]
or
[tmp /etc/dict]1[/tmp]
.
(1/0, default false)
the search string matches only at the beginning of a column.
(search_reference_pointer
,
default none)
not intended for common use. When more
tag is used,
this option automatically provides a pointer to the search
reference.
(rm
| eq
|
tq
| aq
,
default rm
)
operation to perform to check field for a match.
For tq
and aq
matching
using Text::Query
module, see
Q: .
(0
/1
,
default 0
)
the so-called "coordinated" search allows for multiple search options to be stacked on top of each other.
If the number of search fields (sf
options) equals the
number of search specs (se
options), the search will
return items that match all or one of the field-specification blocks
(controlled with mv_orsearch
).
When the two numbers do not match, coordinated mode will be automatically
and silently turned off. To force a coordinated search, see
mv_force_coordinate
.
When coordinated searching is used, case sensitivity, substring matching, negation and other options can be specified multiple times and work on a field-by field basis, according to the following rules:
If only one instance of the option is set, it will affect all fields (search specifications).
If the number of instances of the option is greater than, or equal to, the number of search specifications, all will be used independently. (Eventual trailing, excess instances will be ignored).
If more than one instance of the option is set, but fewer than the total number of search specifications, the default, documented setting will be used for trailing search specifications.
If a search specification is blank, it will be removed and all
case-sensitivity, negation, substring and other options will be
adjusted accordingly. If you need to match on a blank string,
use quotes (""
).
(/
,
default
)
Make dictionary matching case-insensitive. Ignored unless
mv_dict_look
is set.
(/
,
default
)
Make dictionary matching follow dictionary order, where only word
characters and whitespace matter.
Ignored unless mv_dict_look
is set.
(record_delimiter
,
default \n
)
delimiter for counting records in search index files. The default, a newline, works well for most line-based index files.
(0
/1
,
default 0
)
require that search field matches the search specification exactly
(as opposed to the default word-based matching, or substring matching
with su
). Search specification will behave as it
was enclosed in quotes.
(0
/1
,
default 0
)
force coordinated search (enabled with mv_coordinate
).
Normally, coordinated mode is automatically turned off when the number of search specifications does not match the number of search fields. With this option, however, instead of disabling coordinated mode, Interchange ensures the number of search specifications does match the number of fields by filling the missing specifications with the last one specified, or by discarding extras.
This option is useful when you want to search for one string in multiple fields with different options.
(header_filename
,
default none)
specify filename containing a single line with the list of database fields, separated by TABs. This is used when you are searching databases without the "field header" on the first line, but you would still want to refer to fields by their names.
(search_result_number
,
default 1)
return search results from the specified result number onwards. When this option is set, Interchange will return search results starting from the match number specified even if there is only one page of results. If set to a value greater than the total number of matches, it will act as if no matches were found.
(row_count
,
default 1
for text files, 0
otherwise)
number of lines to skip at the beginning of a search index or text
file. Interchange normally skips one line for text-based searches
(st=text
) to exclude the header line.
(field_delimiter
,
default \t
)
delimiter for counting fields in search index files. The default, a TAB character, works well for most line-based index files.
(
,
default none)field_name
perform search similar to SQL "LIKE" functionality.
When defined, mv_like_spec
is required as well.
(
,
default none)search_specification
string to search for in mv_like_field
.
The behaviour of the %
character and case-sensitivity
depends upon your SQL implementation.
(record_count
,
default 50
)
maximum number of records (search results) to return from a search.
When all the results are
displayed on a single page, this option is equivalent to
mm
. When the more
tag is used
to display results multi-page, then this option
determines the number of results per page.
To specify unlimited, use none
or
all
, not 0
.
(record_count
,
default unlimited)
final, maximum number of records (search results) to return from a search
(related option ml
).
(
,
default min_length
1
for text-based searches)
minimum size of a search string for a search operation.
(1/0, default 0)
search operator will perform numeric (instead of string) comparison.
(1/0, default 0)
the one and only match from the search will be the value of the
mv_searchspec
itself. Useful in testing, or yes/no
confirmation whether the search string was found
(SQL_Query
,
default none)
for text-based searches (st=text
only), this option
specifies the SQL query to run over the lines in the file.
This is not the same as an external SQL database search.
Furthermore, the SQL_Query
undergoes a
little modification before it is used. Here's a practical
example:
Artist: <input name="artist" /> Title: <input name="title" /> <input type="hidden" name="mv_sql_query" value=" SELECT code FROM products WHERE artist LIKE artist AND title LIKE title " />
If the right-hand side of every part of expression is an alphanumeric, unquoted word, then it is replaced with the appropriate form variable value. (Or if it's a one-click search, scratch variables are used instead). Quoted right-hand side values are taken literally.
If the left-hand side of every part of expression is a quoted word, the behavior is reversed. That part is replaced with the appropriate form variable value. (Or if it's a one-click search, scratch variables are used instead). Unquoted left-hand side values are taken literally.
Here's an example that allows users to select whether they want to search in title or artist fields:
Search for: <input name="searchstring" /><br /> Search in <input type="radio" name="column" value="title" /> title <input type="radio" name="column" value="artist" /> artist <input type=hidden name="mv_sql_query" value=" SELECT code FROM products WHERE 'column' LIKE searchstring " />
Just for a reference, here's what the two above examples would look like when coded "manually":
[page search=" co=yes sf=artist op=rm se=[value artist] sf=title op=rm se=[value title] "] Search for [value artist], [value title] </a> [page search=" co=yes sf=[value column] op=rm se=[value searchstring] "] Search for [value searchstring] in [value column] </a>
( [ glimpse
| db
|
sql
| text
|
ref
],
default none)
select search type. glimpse
uses the Glimpse search
engine (see Glimpse
), db
(or the
equivalent sql
) iterate over every row in the
SQL database, text
searches
corresponding database text source files, and
ref
iterates over the results from some
previous, already-performed search (related option lb
).
(0
/1
,
default 0
)
match on substrings as well as whole words. This is typically set in dictionary-based searches.
(field_name_or_index [,field_name2_or_index2...]
,
default none)
determine sort order of the returned data. It is possible to refer
to columns by both using their names (if the search is such that column
names are known) and their indices, starting from 0
.
(0
/1
,
default 0
)
removes duplicate records from the result
set. Duplicates are determined by comparing the value
of the first
search return field (set with rf
).
(value_variable_name
=value
,
default none)
assign value
to a value variable. This
is exactly what happens with normal variables in search profiles
when you use the
syntax,
so you should use this option only where variables cannot be set
directly (i.e. in one-click searches):
variable_name
=value
[page href=scan arg="se=Renaissance se=Impressionists va=category_name=Renaissance and Impressionist Paintings os=yes" ]Renaissance and Impressionist Paintings<a>