Fuzzy search

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Fuzzy search

Chris-PATON7J9CR
1.)I tried to do a fuzzy search, but the server timed out every time:
Fuzzy Searches
Fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to "roam" use the fuzzy search:
 
roam~
This search will find terms like foam and roams.
A parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example:
 
roam~0.8

2.) What is the meaning of ^ - I couldn't find it in the help file
Thanks for any suggestions
CH-PATON
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy search

Iustin
Administrator
1) You need to use the fuzzy search only with EN_TI, EN_AB, FR_TI, FR_AB ... fields (titles and abstracts). For example:
EN_TI:roam~ works fine.
Using the generic fields like ALL:roam~ will time out because of the huge amounts of terms that we have in the description and claims fields (they are OCR-ed and therefore there are lots of bad terms)

2)We use Lucene for the search engine, so you can have a look here for details about the query operators
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

Iustin