Thursday, 7 February 2019

SOLR framework in WCS



Starting with FEP-7, WCS provided REST Services to interact with Solr Server directly without going through BOD / Component Services layer. This post explores the new search framework in detail and touches upon various customization points provided by the framework to implement customer requirements.

The best way to start learning WCS Solr framework is to first get yourself familiarized with basics of Solr. Solr mainly consists of 2 parts -
1. Indexing the data.
2. Querying the index to get the desired results.

In this article we will look at how to query the index without worrying too much about the steps involved in building the index. So it's better to get familiar with Solr edisMax (parser used by WCS) query syntax. At the minimum you should understand the usage of: q, fq, fl, qf, facet, bq parameters. (You can install Solr Admin UI which provides a intuitive user interface to query the index and learn more about the query syntax.

Once you are comfortable with Solr basic query syntax you are good to start with WCS - Solr framework. Let us consider a simple example of searching for keyword 'Apple'.

How do you write a DB query to select products whose name or short description contains keyword 'Apple' ?

Lets start with a basic query :
Select * from catentry where catentry_id in (select catentry_id fromcatentdesc where language_id = -1 and ( name like '%Apple%' orshortdescription like '%Apple%'))

Then add store constraint:
and member_id = (select member_id from storeent where storeent_id = 10001))


To return only product beans and ignore item beans, add below constraint:
and catenttype_id = ‘ProductBean’

To return only published catalog entries, add below constraint:
and catentdesc.published = 1

To return products belonging to current sales catalog, the constraint will be:
join catgpenrel where catentry_id in ( ) and catalog_id = ( )
What happens if we want to return products within certain price range, belonging to certain category and want the result set to be sorted by price or relevance ? The query starts getting more and more complicated and difficult to manage.

Let us see how the SOLR query looks in this case.

You will start with a simple query using 'q' parameter:
q = "Apple" 

And then tell Solr to search in name and short description fields. Also products where searchTerm appears in name are more relevant than products where search term appears in shortDescription field.
qf=name^10.0 shortDescription^5.0

Add filter queries to return only published products ( ignore item Beans ) belonging to current catalog and store.
fq=catalog_id:"10052"
fq=storeent_id:("10001")
fq=published:1
fq=-catenttype_id_ntk_cs:ItemBean

Return only few selected fields.
fl=catentry_id,storeent_id,buyable,partNumber_ntk

Also return facets based on category and brand name 
facet=true
facet.field=parentCatgroup_id_search
facet.field=mfName_ntk_cs 
f.mfName_ntk_cs.facet.limit=21 
f.mfName_ntk_cs.facet.mincount=1
f.mfName_ntk_cs.facet.sort=count

Add spell check 
spellcheck=true
spellcheck.count=5 
spellcheck.onlyMorePopular=false
spellcheck.accuracy=0.3
spellcheck.alternativeTermCount=5 
spellcheck.maxResultsForSuggest=3
spellcheck.q=apple

Add some meta data around pagination, debugging: 
start=0
rows=50
timeAllowed=15000 
defType=edismax
echoHandler=true
echoParams=all
degug = true
As you can see Solr provides you a simple yet powerful query syntax to query the index and organize the results the way you want. You can set the fields to query, set the fields to retrieve, boost the results based on field names, add sorting, spell check, highlighting features to result set, get the stats about the result set, control the facets, filter query based on various conditions etc., quite easily.

The final Solr query will look like something below:
fl=catentry_id,partNumber_ntk,name,shortDescription,thumbnail,storeent_id,childCatentry_id,catentry_id,partNumber_ntk,name,shortDescription,thumbnail,storeent_id,childCatentry_id&start=0&rows=4&timeAllowed=15000&defType=edismax&qf=name^10.0 defaultSearch^1.0 categoryname^100.0 shortDescription^5.0 name_suggest^1.0 shortDesc_suggest^1.0&pf=name^10.0 defaultSearch^1.0 categoryname^100.0 shortDescription^5.0 name_suggest^1.0 shortDesc_suggest^1.0&ps=100&mm=1&tie=0.1&tie=0.1&wt=json&json.nl=map&q="apple"&fq=catalog_id:"10052"&fq=storeent_id:("10001")&fq=published:1&fq=-(catenttype_id_ntk_cs:ItemBean AND parentCatentry_id:[* TO *])

Once the final Solr Query is built, it will be executed by SolrRESTSearchExpressionProcessor class. But who is responsible for building this Solr Query ? The entire query is NOT built by a single java class. Instead the query is split into separate logical parts like q, fq, fl, qf etc., and each part of the query is built by a separate java class known as Providers and PreProcessors. The parts of query are then assembled by SolrRESTSearchExpressionProcessor and executed against Solr Server.

No comments:

Post a Comment

Java 7 Interview Question's

New Features introduced from  JAVA 7 : 1) Using strings in switch statements 2) Binary values with prefix 0B (Example: int n = 0b1101) 3) In...