How to install and configure Sphinx on Ubuntu 16.04

Introduction

Sphinx is an open source search engine that allows full text search. As we all know, it can search big data very effectively. The data to be indexed usually comes from very different sources: SQL databases, plain text files, HTML files, mailboxes, etc.

Some of the main features of Sphinx include:

High index and search performance
Advanced indexing and query tools
Advanced result set post-processing
Proven scalability to billions of documents, terabytes of data and thousands of queries per second
Easy to integrate with SQL and XML data sources and SphinxQL, SphinxAPI or SphinxSE search interface
Scale easily with distributed search

In this tutorial, we will use the sample SQL files included in the distribution package to set up Sphinx and MySQL servers. It will provide you with basic knowledge on how to use Sphinx for your project.

Preparation

Before starting this guide, you need to:

An Ubuntu 16.04 server. Students who don’t have a server can buy it from here, but I personally recommend you to use the free Tencent Cloud Developer Lab for experimentation, and then buy server.
A non-root user with sudo privileges.
Install MySQL on your server. If you use it in a production environment, I suggest you directly use Cloud Relational Database. Cloud Relational Database allows you to easily deploy, manage and expand a relational database in the cloud, providing safe and reliable , Scalable and flexible on-demand cloud database service. Tencent Cloud relational database provides MySQL, SQL Server, MariaDB, PostgreSQL database engines, and optimizes the performance of the database engine.

Step 1-Install Sphinx

Installing Sphinx on Ubuntu is very simple because it is located in the native package repository. Use apt-get to install it.

sudo apt-get install sphinxsearch

Now you have successfully installed Sphinx on the server. Before starting the Sphinx daemon, let's configure it.

Step 2-Create test database

Next, we will use the sample data in the SQL file provided with the package to set up the database. This will allow us to test whether Sphinx search works in the future.

Let's import the sample SQL file into the database. First, log in to the MySQL server shell.

mysql -u root -p

Enter the password of the MySQL root user when asked. Your prompt will change to mysql>.

Create a virtual database. Here, we call it test, but you can name it as you want.

CREATE DATABASE test;

Import the sample SQL file.

SOURCE /etc/sphinxsearch/example.sql;

Then leave the MySQL shell.

quit

Now you have a database populated with sample data. Next, we will customize the configuration of Sphinx.

Step 3-Configure Sphinx

The configuration of Sphinx should be in a file named sphinx.conf in /etc/sphinxsearch. This configuration contains 3 main blocks that are essential for operation: Index, Search, and Source. We will provide a sample configuration file for you to use and explain each part so that you can customize it later.

First, create the sphinx.conf file.

sudo nano /etc/sphinxsearch/sphinx.conf

The following describes the index, search and source blocks in sequence. Then, at the end of this step, the entire content including sphinx.conf will be available for you to copy and paste into the file.

**The source ** block contains the source, username and password of the MySQL server type. The first column of sql_query should be a unique id. The SQL query will run on each index and dump the data to the Sphinx index file. The following is a description of each field and the source block itself.

type: The type of data source to be indexed. In our example, this is mysql. Other supported types include pgsql, mssql, xmlpipe2, odbc, etc.
sql_host: The host name of the MySQL host. In our example, this is localhost. This can be a domain or an IP address.
sql_user: The username for MySQL login. In our example, this is root.
sql_pass: the password of the MySQL user. In our example, this is the password of the root MySQL user.
sql_db: The name of the database where the data is stored. In our case, this is test.
sql_query: A query that dumps data from the database to the index.

This is the source block:

source src1
{
 type          = mysql

 # SQL settings(for ‘mysql’ and ‘pgsql’ types)

 sql_host      = localhost
 sql_user      = root
 sql_pass      = password
 sql_db        = test
 sql_port      =3306 # optional,default is 3306

 sql_query     = \
 SELECT id, group_id,UNIX_TIMESTAMP(date_added) AS date_added, title, content \
 FROM documents

 sql_attr_uint         = group_id
 sql_attr_timestamp    = date_added
}

The index component contains the source and storage data path. in

source: The name of the source block. In our example, this is src1.
path: The path to save the index.

index test1
{
 source        = src1
 path          =/var/lib/sphinxsearch/data/test1
 docinfo       = extern
}

The searchd component contains ports and other variables to run the Sphinx daemon.

listen: The port where the Sphinx daemon will run, followed by the protocol. In our example, this is 9306: mysql41. The known protocols are : sphinx (SphinxAPI) and : mysql41 (SphinxQL)
query_log: The path to save the query log.
pid_file: The path of the PID file of the Sphinx daemon.
seamless_rotate: Prevent search stalls when rotating indexes with large amounts of data to the pre-cache.
preopen_indexes: Whether to force all indexes to be opened in advance at startup.
unlink_old: Whether to delete the old index copy when the rotation is successful.

searchd
{
 listen            =9312:sphinx       #SphinxAPI port
 listen            =9306:mysql41      #SphinxQL port
 log               =/var/log/sphinxsearch/searchd.log
 query_log         =/var/log/sphinxsearch/query.log
 read_timeout      =5
 max_children      =30
 pid_file          =/var/run/sphinxsearch/searchd.pid
 seamless_rotate   =1
 preopen_indexes   =1
 unlink_old        =1
 binlog_path       =/var/lib/sphinxsearch/data
}

The complete configuration to be copied and pasted is as follows. The only variable you need to change below is the sql_pass variable in the source block, which looks like this:

source src1
{
 type          = mysql

 sql_host      = localhost
 sql_user      = root
 sql_pass      = your_root_mysql_password
 sql_db        = test
 sql_port      =3306

 sql_query     = \
 SELECT id, group_id,UNIX_TIMESTAMP(date_added) AS date_added, title, content \
 FROM documents

 sql_attr_uint         = group_id
 sql_attr_timestamp    = date_added
}
index test1
{
 source            = src1
 path              =/var/lib/sphinxsearch/data/test1
 docinfo           = extern
}
searchd
{
 listen            =9306:mysql41
 log               =/var/log/sphinxsearch/searchd.log
 query_log         =/var/log/sphinxsearch/query.log
 read_timeout      =5
 max_children      =30
 pid_file          =/var/run/sphinxsearch/searchd.pid
 seamless_rotate   =1
 preopen_indexes   =1
 unlink_old        =1
 binlog_path       =/var/lib/sphinxsearch/data
}

To explore more configurations, you can check the /etc/sphinxsearch/sphinx.conf.sample file, which contains all the variables explained in more detail.

Step 4-Manage Index

In this step, we will add data to the Sphinx index and use cron to ensure that the index keeps the latest data.

First, add data to the index using the configuration we created earlier.

sudo indexer --all

You should get something like the following.

Sphinx 2.2.9-id64-release(rel22-r5006)Copyright(c)2001-2015, Andrew Aksyonoff
Copyright(c)2008-2015, Sphinx Technologies Inc(http://sphinxsearch.com)

using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'test1'...
collected 4 docs,0.0 MB
sorted 0.0 Mhits,100.0% done
total 4 docs,193 bytes
total 0.010 sec,18552 bytes/sec,384.50 docs/sec
total 4 reads,0.000 sec,0.1 kb/call avg,0.0 msec/call avg
total 12 writes,0.000 sec,0.1 kb/call avg,0.0 msec/call avg

In a production environment, the index must be kept up to date. To do this, let's create a cronjob. First, open crontab.

crontab -e

You may be asked which text editor you want to use. Choose any one you like; in this tutorial, we have used nano.

The following cronjob will run every hour and use the configuration file we created earlier to add new data to the index. Copy and paste it to the end of the file, then save and close the file.

@ hourly /usr/bin/indexer --rotate --config /etc/sphinxsearch/sphinx.conf --all

Now that Sphinx is fully set up and configured, we can start the service and try it out.

Step 5-Start Sphinx

By default, the Sphinx daemon is turned off. First, we will enable it by changing the line START=no in /etc/default/sphinxsearch to START=yesin.

sudo sed -i 's/START=no/START=yes/g'/etc/default/sphinxsearch

Then, use systemctl to restart the Sphinx daemon.

sudo systemctl restart sphinxsearch.service

To check whether the Sphinx daemon is running properly, run it.

sudo systemctl status sphinxsearch.service

You should get something like the following.

● sphinxsearch.service - LSB: Fast standalone full-text SQL search engine
 Loaded:loaded(/etc/init.d/sphinxsearch; bad; vendor preset: enabled)
 Active:active(running) since Tue 2016-07-2601:50:00 EDT; 15s ago
   ...

This will also ensure that the Sphinx daemon will start even if the server is restarted.

Step 6-Testing

Now that everything is set up, let's test the search function. Connect to SphinxQL (on port 9306) using the MySQL interface. Your prompt will change to mysql>.

mysql -h0 -P9306

Let's search for a sentence.

SELECT * FROM test1 WHERE MATCH('test document'); SHOW META;

You should get something like the following.

+- - - - - - +- - - - - - - - - - +- - - - - - - - - - - - +| id   | group_id | date_added |+------+----------+------------+|1|1|1465979047||2|1|1465979047|+------+----------+------------+2 rows inset(0.00 sec)

+- - - - - - - - - - - - - - - +- - - - - - - - - - +| Variable_name | Value    |+---------------+----------+| total         |2|| total_found   |2|| time          |0.000|| keyword[0]| test     || docs[0]|3|| hits[0]|5|| keyword[1]| document || docs[1]|2|| hits[1]|2|+---------------+----------+9 rows inset(0.00 sec)

In the above results, you can see that Sphinx found 2 matches in the index test1 of our test sentence. The SHOW META; command also displays the number of hits for each keyword in the sentence.

Let's search some keywords.

CALL KEYWORDS('test one three','test1',1);

You should get something like the following.

+- - - - - - +- - - - - - - - - - - +- - - - - - - - - - - - +- - - - - - +- - - - - - +| qpos | tokenized | normalized | docs | hits |+------+-----------+------------+------+------+|1| test      | test       |3|5||2| one       | one        |1|2||3| three     | three      |0|0|+------+-----------+------------+------+------+3 rows inset(0.00 sec)

In the above results, you can see that in the test1 index, Sphinx found:

5 matches in 3 documents for the keyword "test"
1 2 matching keywords "one" in documents
0 0 matches in documents are the keyword "three"

Now you can leave the MySQL shell.

quit

in conclusion

In this tutorial, we showed you how to install Sphinx and use SphinxQL and MySQL for simple searches.

By using Sphinx, you can easily add custom searches to your website.

To learn more about the related tutorials on installing and configuring Sphinx, please go to [Tencent Cloud + Community] (https://cloud.tencent.com/developer?from=10680) to learn more.

Reference: "How To Install and Configure Sphinx on Ubuntu 16.04"