Elasticsearch can add a wonderful working value to any relational database management system which is in vogue now-a-days, but it needs to be used with a working knowledge of “why to use” and “how to use” before being used with any SQL server.
SQL server may be storing data in any form which can be either relational or multi-dimensional. In case it is multi-dimensional form, services of SQL Server Analysis Services (SSAS) may be used. The default search service of SQL server, Full Text Search service may provide solutions, but proves to be ineffective in cases when the query definitions are more complex and when an exhaustive search has to be made over huge sets of data. Elasticsearch is a search service but has features like Facets and Aggregation Search in its kitty, which can add dynamism to the problems related to data analysis. Many e-commerce sites have a feature of dynamically creating categories as one sifts through the range of products. For such features, one can use the Elasticsearch .
Now the real question: How to use Elasticsearch with an SQL server? As of today, the best means to import data from SQL server to Elasticsearch index is to use “Elasticsearch JDBC River”. In the context of database management systems, a river is basically a library to read data from the external source into the cluster.
The steps to install are as follows:
Configuring the JDBC River
Once that is done, input of the command “sql” will let you start doing everything which is valid in SQL. You can add joins, aliases, conditions and work out the ordering as well.
The command “poll” defines the time interval between which data is brought from the database.
The command “index” allows the user to configure and customize the metadata for the indexes.
Every time a new PUT request is sent, the field “_version” increments automatically.
The best way to start the process of indexing is to start the server again, as the river is started immediately after the restart. But there is a downside. It can result in entire data set being reloaded in the index. If only the IDs are being brought in, all the records that already exist, would receive an update. But if Elasticsearch generates IDs automatically, then there would be a set of new data which would be a duplicate of already existing data. So the river has to be used very carefully. Once the data is loaded into the index, the river may be deleted, if there is no further intention of the data migration.