This page is also available in Spanish/Español
is a new version of the indexer package SWISH-E built by Kevin Hughes. The original URL is http://sunsite.berkeley.edu/SWISH_E. There, you can find the documentation, mail list, etc.

Swish-e 1.3.X is free software, so wish-e 2.x is also free software. You can use it wherever you want. More information about the license in the main site

The last stable version is 2.0.5 This version includes the following addons:
Phrase search. Use " to delimite the phrase
 Eg: swish-e -w 'tit="this is a phrase"' -f index_file
 Eg: swish-e -w 'tit="th* is a pra*"' -f index_file
Limited support of XML documents. It supports tags like <field> </field>. Nested tags are allowed
 Eg: <field1> bla bla <field2> bla bla </field2> bla bla</field1>
Fixes several bugs and solves some memory leaks of 1.3.X
New option for sorting results: Use relevance (the old one) or use field(s) (properties)
 Eg: swish-e -w 'search' -f index_file -s title address
Faster index proccess and search.
Automatic extraction of fields (metanames) including the reserved word automatic in the Metanames option of the config file (Do not use this feature with html documents. This documents include tags like <p> without their correspondent end tag)

Includes external fitering option from Rainer Scherg (it allows the parsing of PDF documents, WOrd documents, etc). More information
Now, you can put your stopwords in an external file usind the IgnoreWords File:path-to-file in the config file. You can find some stopwords files in german (contributed by Rainer Scherg), english (taken from 1.3.X distribution), dutch (contributed by Bas Meijer) and spanish (contributed by me). This option has been contributed by Rainer Scherg
 Eg: IgnoreWords File:/path/german.txt
New option for the config file: TranslateCharacters. This option allows changing some characters for a different ones prior to index a word. This is very useful, for example, for changing accuted values by their correspondent non accuted ones. Well, this is really useful for non english languages
 Eg: TranslateCharacters αινσϊ aeiou
 With this configuration the word camión will be indexed as camion and the word árbol as arbol
The old option -D now shows more information of the contents of the index file. If you also uses -v 4, the output is even richer
  
  
Now, the development of new addons go on. The last beta is 2.1-dev20:
A C library. The code has been partly rewritten to get a C "thread-safe" library. This library is being used in the development of a perl module and a php extension More information about the library
Example, and totally functional, Perl module, based on the C library. This helps the coding of perl CGI scripts
Now, you can define document types using DefaultContents and IndexContents in your config file. Uptoday there is only 3 types of documents: Text (TXT), html (HTML) and xml (XML)
 Eg:
 DefaultContents TXT
 IndexContents XML .xml
 IndexContents HTML .htm .html .php .php3
As an option, the index proccess can use less memory using the economic mode (option -e). If set, the index proccess will write part of the information to temporal files. This option is very useful if your box do not have enough memory. You can detect this condition if your index proccess takes long time (look at your swap). By default (without -e) swish-e stores all data in memory in the index proccess
Extended search output using -x option. If your search uses more than one index file at the same time, it will display the header info of all the index files. Also, for each result line, a new value is added: the index file of the result. All the results are displayed in a mixed way, as if you have searched using just one index file.
Optional compression of the file data (File path, title and properites). The index proccess is slower but you will reduce input/output in the searchs. More information
Like in version 2.0.X it can sorts the result list by relevance or properties, but now, you can also use a combined especification of ascending and descending sorting (using asc and desc).
 Eg: swish-e -w 'search' -f index_file -s title asc otherfiled desc
New directive in config file: BumpPositionCounterCharacters. With this option, when one of those characters are found, the word's position counter is incremented. This is usefulfor separating phrases inside a document.
 Eg:
 BumpPositionCounterCharacters |
 See this document: this a phrase | this is another phrase. With the option you cannot find the phrase "phrase this". Without it, you can because "phrase" and "this" have consecutive position counter
New directive in config file: UseWords. With this option, only the words in the list are indexed. Like IgnoreWords, it can use a external file.
 Eg:
 UseWords word1 word2 word3
 Eg:
 UseWords File: path_to_external_file
New command line option -k. It returns all the words in the index file starting with the given character.
 Eg:
 swish-e -k t -f index_file
  
  
Swish-e 2.X has been entirely developed under Linux and it has been tested it in Solaris and Aix.

Although, not initially develped for Windows, Windows users can find binaries in http://www.webaugur.com/wares/swish. (Thanks to David Norris).


Credits
jmruiz@boe.es