How to install and configure Sphinx on Ubuntu 16.04

Introduction

Sphinx is an open source search engine that allows full text search. As we all know, it can search big data very effectively. The data to be indexed usually comes from very different sources: SQL databases, plain text files, HTML files, mailboxes, etc.

Some of the main features of Sphinx include:

In this tutorial, we will use the sample SQL files included in the distribution package to set up Sphinx and MySQL servers. It will provide you with basic knowledge on how to use Sphinx for your project.

Preparation

Before starting this guide, you need to:

Step 1-Install Sphinx

Installing Sphinx on Ubuntu is very simple because it is located in the native package repository. Use apt-get to install it.

sudo apt-get install sphinxsearch

Now you have successfully installed Sphinx on the server. Before starting the Sphinx daemon, let's configure it.

Step 2-Create test database

Next, we will use the sample data in the SQL file provided with the package to set up the database. This will allow us to test whether Sphinx search works in the future.

Let's import the sample SQL file into the database. First, log in to the MySQL server shell.

mysql -u root -p

Enter the password of the MySQL root user when asked. Your prompt will change to mysql>.

Create a virtual database. Here, we call it test, but you can name it as you want.

CREATE DATABASE test;

Import the sample SQL file.

SOURCE /etc/sphinxsearch/example.sql;

Then leave the MySQL shell.

quit

Now you have a database populated with sample data. Next, we will customize the configuration of Sphinx.

Step 3-Configure Sphinx

The configuration of Sphinx should be in a file named sphinx.conf in /etc/sphinxsearch. This configuration contains 3 main blocks that are essential for operation: Index, Search, and Source. We will provide a sample configuration file for you to use and explain each part so that you can customize it later.

First, create the sphinx.conf file.

sudo nano /etc/sphinxsearch/sphinx.conf

The following describes the index, search and source blocks in sequence. Then, at the end of this step, the entire content including sphinx.conf will be available for you to copy and paste into the file.

**The source ** block contains the source, username and password of the MySQL server type. The first column of sql_query should be a unique id. The SQL query will run on each index and dump the data to the Sphinx index file. The following is a description of each field and the source block itself.

This is the source block:

source src1
{
 type          = mysql
​
 # SQL settings(for ‘mysql’ and ‘pgsql’ types)
​
 sql_host      = localhost
 sql_user      = root
 sql_pass      = password
 sql_db        = test
 sql_port      =3306 # optional,default is 3306
​
 sql_query     = \
 SELECT id, group_id,UNIX_TIMESTAMP(date_added) AS date_added, title, content \
 FROM documents
​
 sql_attr_uint         = group_id
 sql_attr_timestamp    = date_added
}

The index component contains the source and storage data path. in

index test1
{
 source        = src1
 path          =/var/lib/sphinxsearch/data/test1
 docinfo       = extern
}

The searchd component contains ports and other variables to run the Sphinx daemon.

searchd
{
 listen            =9312:sphinx       #SphinxAPI port
 listen            =9306:mysql41      #SphinxQL port
 log               =/var/log/sphinxsearch/searchd.log
 query_log         =/var/log/sphinxsearch/query.log
 read_timeout      =5
 max_children      =30
 pid_file          =/var/run/sphinxsearch/searchd.pid
 seamless_rotate   =1
 preopen_indexes   =1
 unlink_old        =1
 binlog_path       =/var/lib/sphinxsearch/data
}

The complete configuration to be copied and pasted is as follows. The only variable you need to change below is the sql_pass variable in the source block, which looks like this:

source src1
{
 type          = mysql
​
 sql_host      = localhost
 sql_user      = root
 sql_pass      = your_root_mysql_password
 sql_db        = test
 sql_port      =3306
​
 sql_query     = \
 SELECT id, group_id,UNIX_TIMESTAMP(date_added) AS date_added, title, content \
 FROM documents
​
 sql_attr_uint         = group_id
 sql_attr_timestamp    = date_added
}
index test1
{
 source            = src1
 path              =/var/lib/sphinxsearch/data/test1
 docinfo           = extern
}
searchd
{
 listen            =9306:mysql41
 log               =/var/log/sphinxsearch/searchd.log
 query_log         =/var/log/sphinxsearch/query.log
 read_timeout      =5
 max_children      =30
 pid_file          =/var/run/sphinxsearch/searchd.pid
 seamless_rotate   =1
 preopen_indexes   =1
 unlink_old        =1
 binlog_path       =/var/lib/sphinxsearch/data
}

To explore more configurations, you can check the /etc/sphinxsearch/sphinx.conf.sample file, which contains all the variables explained in more detail.

Step 4-Manage Index

In this step, we will add data to the Sphinx index and use cron to ensure that the index keeps the latest data.

First, add data to the index using the configuration we created earlier.

sudo indexer --all

You should get something like the following.

Sphinx 2.2.9-id64-release(rel22-r5006)Copyright(c)2001-2015, Andrew Aksyonoff
Copyright(c)2008-2015, Sphinx Technologies Inc(http://sphinxsearch.com)
​
using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'test1'...
collected 4 docs,0.0 MB
sorted 0.0 Mhits,100.0% done
total 4 docs,193 bytes
total 0.010 sec,18552 bytes/sec,384.50 docs/sec
total 4 reads,0.000 sec,0.1 kb/call avg,0.0 msec/call avg
total 12 writes,0.000 sec,0.1 kb/call avg,0.0 msec/call avg

In a production environment, the index must be kept up to date. To do this, let's create a cronjob. First, open crontab.

crontab -e

You may be asked which text editor you want to use. Choose any one you like; in this tutorial, we have used nano.

The following cronjob will run every hour and use the configuration file we created earlier to add new data to the index. Copy and paste it to the end of the file, then save and close the file.

@ hourly /usr/bin/indexer --rotate --config /etc/sphinxsearch/sphinx.conf --all

Now that Sphinx is fully set up and configured, we can start the service and try it out.

Step 5-Start Sphinx

By default, the Sphinx daemon is turned off. First, we will enable it by changing the line START=no in /etc/default/sphinxsearch to START=yesin.

sudo sed -i 's/START=no/START=yes/g'/etc/default/sphinxsearch

Then, use systemctl to restart the Sphinx daemon.

sudo systemctl restart sphinxsearch.service

To check whether the Sphinx daemon is running properly, run it.

sudo systemctl status sphinxsearch.service

You should get something like the following.

● sphinxsearch.service - LSB: Fast standalone full-text SQL search engine
 Loaded:loaded(/etc/init.d/sphinxsearch; bad; vendor preset: enabled)
 Active:active(running) since Tue 2016-07-2601:50:00 EDT; 15s ago
   ...

This will also ensure that the Sphinx daemon will start even if the server is restarted.

Step 6-Testing

Now that everything is set up, let's test the search function. Connect to SphinxQL (on port 9306) using the MySQL interface. Your prompt will change to mysql>.

mysql -h0 -P9306

Let's search for a sentence.

SELECT * FROM test1 WHERE MATCH('test document'); SHOW META;

You should get something like the following.

+- - - - - - +- - - - - - - - - - +- - - - - - - - - - - - +| id   | group_id | date_added |+------+----------+------------+|1|1|1465979047||2|1|1465979047|+------+----------+------------+2 rows inset(0.00 sec)
​
+- - - - - - - - - - - - - - - +- - - - - - - - - - +| Variable_name | Value    |+---------------+----------+| total         |2|| total_found   |2|| time          |0.000|| keyword[0]| test     || docs[0]|3|| hits[0]|5|| keyword[1]| document || docs[1]|2|| hits[1]|2|+---------------+----------+9 rows inset(0.00 sec)

In the above results, you can see that Sphinx found 2 matches in the index test1 of our test sentence. The SHOW META; command also displays the number of hits for each keyword in the sentence.

Let's search some keywords.

CALL KEYWORDS('test one three','test1',1);

You should get something like the following.

+- - - - - - +- - - - - - - - - - - +- - - - - - - - - - - - +- - - - - - +- - - - - - +| qpos | tokenized | normalized | docs | hits |+------+-----------+------------+------+------+|1| test      | test       |3|5||2| one       | one        |1|2||3| three     | three      |0|0|+------+-----------+------------+------+------+3 rows inset(0.00 sec)

In the above results, you can see that in the test1 index, Sphinx found:

Now you can leave the MySQL shell.

quit

in conclusion

In this tutorial, we showed you how to install Sphinx and use SphinxQL and MySQL for simple searches.

By using Sphinx, you can easily add custom searches to your website.

To learn more about the related tutorials on installing and configuring Sphinx, please go to [Tencent Cloud + Community] (https://cloud.tencent.com/developer?from=10680) to learn more.


Reference: "How To Install and Configure Sphinx on Ubuntu 16.04"

Recommended Posts

How to install and configure Sphinx on Ubuntu 16.04
How to install and configure NATS on Ubuntu 16.04
How to install and configure Gogs on Ubuntu 18.04
How to install and configure Cyberpanel on Ubuntu 18.04
How to install and configure ownCloud on Ubuntu 16.04
How to install and configure GitLab on Ubuntu 18.04
How to install and configure Ansible on Ubuntu 18.04
How to install and configure Elasticsearch on Ubuntu 16.04
How to install and configure PostGIS on Ubuntu 14.04
How to install and configure VNC on Ubuntu 18.04
How to install and configure OrientDB on Ubuntu 14.04
How to install and configure AppScale on Ubuntu 12.04
How to install and configure PostGIS on Ubuntu 14.04
How to install and configure Elasticsearch on CentOS 7
How to install and secure phpMyAdmin on Ubuntu 16.04
How to install and use Docker on Ubuntu 20.04
How to install and configure VNC on CentOS 8
How to install and use Curl on Ubuntu 18.04
How to install and use Composer on Ubuntu 18.04
How to install and use Wine on Ubuntu 18.04
How to install and secure phpMyAdmin on Ubuntu 16.04
How to install and configure Redis on CentOS 8
How to install and use Composer on Ubuntu 20.04
How to install and use BaasBox on Ubuntu 14.04
How to install and use PostgreSQL on Ubuntu 16.04
How to install and configure phpMyAdmin on CentOS 6
How to install and configure Owncloud on CentOS 8
How to install and use Docker on Ubuntu 16.04
How to install and configure Redmine on CentOS 8
How to install Ruby on Ubuntu 20.04
How to install Memcached on Ubuntu 20.04
How to install Java on Ubuntu 20.04
How to install MySQL on Ubuntu 20.04
How to install VirtualBox on Ubuntu 20.04
How to install Elasticsearch on Ubuntu 20.04
How to install Protobuf 3 on Ubuntu
How to install Nginx on Ubuntu 20.04
How to install Apache on Ubuntu 20.04
How to install Git on Ubuntu 20.04
How to install Node.js on Ubuntu 16.04
How to install MySQL on Ubuntu 20.04
Install and configure MySQL on Ubuntu
How to install Vagrant on Ubuntu 20.04
How to install Bacula-Web on Ubuntu 14.04
How to install PostgreSQL on Ubuntu 16.04
How to install Git on Ubuntu 20.04
How to install Anaconda3 on Ubuntu 18.04
How to install Memcached on Ubuntu 18.04
How to install Jenkins on Ubuntu 16.04
How to install MemSQL on Ubuntu 14.04
How to install Go on Ubuntu 20.04
How to install MongoDB on Ubuntu 16.04
How to install Mailpile on Ubuntu 14.04
How to install PrestaShop on Ubuntu 16.04
How to install Skype on Ubuntu 20.04
How to install Jenkins on Ubuntu 20.04
How to install Python 3.8 on Ubuntu 18.04
How to install KVM on Ubuntu 20.04
How to install opencv3.0.0 on ubuntu14.04
How to install Anaconda on Ubuntu 20.04
How to install Prometheus on Ubuntu 16.04