Starting with FEP-7, WCS provided
REST Services to interact with Solr Server directly without going through BOD /
Component Services layer. This post explores the new search framework in detail
and touches upon various customization points provided by the framework to
implement customer requirements.
The best way to start learning WCS
Solr framework is to first get yourself familiarized with basics of Solr. Solr
mainly consists of 2 parts -
1. Indexing the data.
2. Querying the index to get the
desired results.
In this article we will look at how
to query the index without worrying too much about the steps involved in
building the index. So it's better to get familiar with Solr edisMax (parser
used by WCS) query syntax. At the minimum you should understand the usage
of: q, fq, fl, qf, facet, bq parameters. (You can install Solr Admin
UI which provides a intuitive user interface to query the index and learn more
about the query syntax.
Once you are comfortable with Solr
basic query syntax you are good to start with WCS - Solr framework. Let us
consider a simple example of searching for keyword 'Apple'.
How do you write a DB query to select products whose name or short description contains keyword 'Apple' ?
Lets start with a basic query :
Select * from catentry where catentry_id in (select catentry_id fromcatentdesc where language_id = -1 and ( name like '%Apple%' orshortdescription like '%Apple%'))
Then add store constraint:
and member_id = (select member_id from storeent where storeent_id = 10001))
To return only product beans and ignore item beans, add below constraint:
and catenttype_id = ‘ProductBean’
How do you write a DB query to select products whose name or short description contains keyword 'Apple' ?
Lets start with a basic query :
Select * from catentry where catentry_id in (select catentry_id fromcatentdesc where language_id = -1 and ( name like '%Apple%' orshortdescription like '%Apple%'))
Then add store constraint:
and member_id = (select member_id from storeent where storeent_id = 10001))
To return only product beans and ignore item beans, add below constraint:
and catenttype_id = ‘ProductBean’
To return only published catalog entries, add below constraint:
and catentdesc.published = 1
To return products belonging to current sales catalog, the constraint will be:
join catgpenrel where catentry_id in ( ) and catalog_id = ( )
What happens if we want to return
products within certain price range, belonging to certain category and want the
result set to be sorted by price or relevance ? The query starts getting more
and more complicated and difficult to manage.
Let us see how the SOLR query looks
in this case.
You will start with a simple query
using 'q' parameter:
q =
"Apple"
And then tell Solr to search in name
and short description fields. Also products where searchTerm appears in name
are more relevant than products where search term appears in shortDescription
field.
qf=name^10.0
shortDescription^5.0
Add filter queries to return
only published products ( ignore item Beans )
belonging to current catalog and store.
fq=catalog_id:"10052"
fq=storeent_id:("10001")
fq=published:1
fq=-catenttype_id_ntk_cs:ItemBean
Return only few selected fields.
fl=catentry_id,storeent_id,buyable,partNumber_ntk
Also return facets based on category and brand name
facet=true
facet.field=parentCatgroup_id_search
facet.field=mfName_ntk_cs
f.mfName_ntk_cs.facet.limit=21
f.mfName_ntk_cs.facet.mincount=1
f.mfName_ntk_cs.facet.sort=count
Add spell check
spellcheck=true
spellcheck.count=5
spellcheck.onlyMorePopular=false
spellcheck.accuracy=0.3
spellcheck.alternativeTermCount=5
spellcheck.maxResultsForSuggest=3
spellcheck.q=apple
Add some meta data around pagination, debugging:
start=0
rows=50
timeAllowed=15000
defType=edismax
echoHandler=true
echoParams=all
degug = true
fq=published:1
fq=-catenttype_id_ntk_cs:ItemBean
Return only few selected fields.
fl=catentry_id,storeent_id,buyable,partNumber_ntk
Also return facets based on category and brand name
facet=true
facet.field=parentCatgroup_id_search
facet.field=mfName_ntk_cs
f.mfName_ntk_cs.facet.limit=21
f.mfName_ntk_cs.facet.mincount=1
f.mfName_ntk_cs.facet.sort=count
Add spell check
spellcheck=true
spellcheck.count=5
spellcheck.onlyMorePopular=false
spellcheck.accuracy=0.3
spellcheck.alternativeTermCount=5
spellcheck.maxResultsForSuggest=3
spellcheck.q=apple
Add some meta data around pagination, debugging:
start=0
rows=50
timeAllowed=15000
defType=edismax
echoHandler=true
echoParams=all
degug = true
As you can see Solr provides you a
simple yet powerful query syntax to query the index and organize the results
the way you want. You can set the fields to query, set the fields to retrieve,
boost the results based on field names, add sorting, spell check, highlighting
features to result set, get the stats about the result set, control the facets,
filter query based on various conditions etc., quite easily.
The final Solr query will look like
something below:
fl=catentry_id,partNumber_ntk,name,shortDescription,thumbnail,storeent_id,childCatentry_id,catentry_id,partNumber_ntk,name,shortDescription,thumbnail,storeent_id,childCatentry_id&start=0&rows=4&timeAllowed=15000&defType=edismax&qf=name^10.0
defaultSearch^1.0 categoryname^100.0 shortDescription^5.0 name_suggest^1.0
shortDesc_suggest^1.0&pf=name^10.0 defaultSearch^1.0 categoryname^100.0
shortDescription^5.0 name_suggest^1.0 shortDesc_suggest^1.0&ps=100&mm=1&tie=0.1&tie=0.1&wt=json&json.nl=map&q="apple"&fq=catalog_id:"10052"&fq=storeent_id:("10001")&fq=published:1&fq=-(catenttype_id_ntk_cs:ItemBean AND parentCatentry_id:[*
TO *])
Once the final Solr Query is built,
it will be executed by SolrRESTSearchExpressionProcessor class. But who is
responsible for building this Solr Query ? The entire query is NOT built by a
single java class. Instead the query is split into separate logical parts
like q, fq, fl, qf etc., and each part of the query is built
by a separate java class known as Providers and PreProcessors. The parts of
query are then assembled by SolrRESTSearchExpressionProcessor and executed
against Solr Server.
No comments:
Post a Comment