Sphinx is an open source search engine that allows full text search. As we all know, it can search big data very effectively. The data to be indexed usually comes from very different sources: SQL databases, plain text files, HTML files, mailboxes, etc.
Some of the main features of Sphinx include:
In this tutorial, we will use the sample SQL files included in the distribution package to set up Sphinx and MySQL servers. It will provide you with basic knowledge on how to use Sphinx for your project.
Before starting this guide, you need to:
Installing Sphinx on Ubuntu is very simple because it is located in the native package repository. Use apt-get
to install it.
sudo apt-get install sphinxsearch
Now you have successfully installed Sphinx on the server. Before starting the Sphinx daemon, let's configure it.
Next, we will use the sample data in the SQL file provided with the package to set up the database. This will allow us to test whether Sphinx search works in the future.
Let's import the sample SQL file into the database. First, log in to the MySQL server shell.
mysql -u root -p
Enter the password of the MySQL root user when asked. Your prompt will change to mysql>
.
Create a virtual database. Here, we call it test, but you can name it as you want.
CREATE DATABASE test;
Import the sample SQL file.
SOURCE /etc/sphinxsearch/example.sql;
Then leave the MySQL shell.
quit
Now you have a database populated with sample data. Next, we will customize the configuration of Sphinx.
The configuration of Sphinx should be in a file named sphinx.conf
in /etc/sphinxsearch
. This configuration contains 3 main blocks that are essential for operation: Index, Search, and Source. We will provide a sample configuration file for you to use and explain each part so that you can customize it later.
First, create the sphinx.conf
file.
sudo nano /etc/sphinxsearch/sphinx.conf
The following describes the index, search and source blocks in sequence. Then, at the end of this step, the entire content including sphinx.conf
will be available for you to copy and paste into the file.
**The source ** block contains the source, username and password of the MySQL server type. The first column of sql_query
should be a unique id. The SQL query will run on each index and dump the data to the Sphinx index file. The following is a description of each field and the source block itself.
type
: The type of data source to be indexed. In our example, this is mysql. Other supported types include pgsql, mssql, xmlpipe2, odbc, etc. sql_host
: The host name of the MySQL host. In our example, this is localhost
. This can be a domain or an IP address. sql_user
: The username for MySQL login. In our example, this is root. sql_pass
: the password of the MySQL user. In our example, this is the password of the root MySQL user. sql_db
: The name of the database where the data is stored. In our case, this is test. sql_query
: A query that dumps data from the database to the index.This is the source block:
source src1
{
type = mysql
# SQL settings(for ‘mysql’ and ‘pgsql’ types)
sql_host = localhost
sql_user = root
sql_pass = password
sql_db = test
sql_port =3306 # optional,default is 3306
sql_query = \
SELECT id, group_id,UNIX_TIMESTAMP(date_added) AS date_added, title, content \
FROM documents
sql_attr_uint = group_id
sql_attr_timestamp = date_added
}
The index component contains the source and storage data path. in
source
: The name of the source block. In our example, this is src1. path
: The path to save the index.index test1
{
source = src1
path =/var/lib/sphinxsearch/data/test1
docinfo = extern
}
The searchd component contains ports and other variables to run the Sphinx daemon.
listen
: The port where the Sphinx daemon will run, followed by the protocol. In our example, this is 9306: mysql41. The known protocols are : sphinx (SphinxAPI) and : mysql41 (SphinxQL) query_log
: The path to save the query log. pid_file
: The path of the PID file of the Sphinx daemon. seamless_rotate
: Prevent search stalls when rotating indexes with large amounts of data to the pre-cache. preopen_indexes
: Whether to force all indexes to be opened in advance at startup. unlink_old
: Whether to delete the old index copy when the rotation is successful.searchd
{
listen =9312:sphinx #SphinxAPI port
listen =9306:mysql41 #SphinxQL port
log =/var/log/sphinxsearch/searchd.log
query_log =/var/log/sphinxsearch/query.log
read_timeout =5
max_children =30
pid_file =/var/run/sphinxsearch/searchd.pid
seamless_rotate =1
preopen_indexes =1
unlink_old =1
binlog_path =/var/lib/sphinxsearch/data
}
The complete configuration to be copied and pasted is as follows. The only variable you need to change below is the sql_pass
variable in the source block, which looks like this:
source src1
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass = your_root_mysql_password
sql_db = test
sql_port =3306
sql_query = \
SELECT id, group_id,UNIX_TIMESTAMP(date_added) AS date_added, title, content \
FROM documents
sql_attr_uint = group_id
sql_attr_timestamp = date_added
}
index test1
{
source = src1
path =/var/lib/sphinxsearch/data/test1
docinfo = extern
}
searchd
{
listen =9306:mysql41
log =/var/log/sphinxsearch/searchd.log
query_log =/var/log/sphinxsearch/query.log
read_timeout =5
max_children =30
pid_file =/var/run/sphinxsearch/searchd.pid
seamless_rotate =1
preopen_indexes =1
unlink_old =1
binlog_path =/var/lib/sphinxsearch/data
}
To explore more configurations, you can check the /etc/sphinxsearch/sphinx.conf.sample
file, which contains all the variables explained in more detail.
In this step, we will add data to the Sphinx index and use cron
to ensure that the index keeps the latest data.
First, add data to the index using the configuration we created earlier.
sudo indexer --all
You should get something like the following.
Sphinx 2.2.9-id64-release(rel22-r5006)Copyright(c)2001-2015, Andrew Aksyonoff
Copyright(c)2008-2015, Sphinx Technologies Inc(http://sphinxsearch.com)
using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'test1'...
collected 4 docs,0.0 MB
sorted 0.0 Mhits,100.0% done
total 4 docs,193 bytes
total 0.010 sec,18552 bytes/sec,384.50 docs/sec
total 4 reads,0.000 sec,0.1 kb/call avg,0.0 msec/call avg
total 12 writes,0.000 sec,0.1 kb/call avg,0.0 msec/call avg
In a production environment, the index must be kept up to date. To do this, let's create a cronjob. First, open crontab.
crontab -e
You may be asked which text editor you want to use. Choose any one you like; in this tutorial, we have used nano
.
The following cronjob will run every hour and use the configuration file we created earlier to add new data to the index. Copy and paste it to the end of the file, then save and close the file.
@ hourly /usr/bin/indexer --rotate --config /etc/sphinxsearch/sphinx.conf --all
Now that Sphinx is fully set up and configured, we can start the service and try it out.
By default, the Sphinx daemon is turned off. First, we will enable it by changing the line START=no
in /etc/default/sphinxsearch
to START=yes
in.
sudo sed -i 's/START=no/START=yes/g'/etc/default/sphinxsearch
Then, use systemctl
to restart the Sphinx daemon.
sudo systemctl restart sphinxsearch.service
To check whether the Sphinx daemon is running properly, run it.
sudo systemctl status sphinxsearch.service
You should get something like the following.
● sphinxsearch.service - LSB: Fast standalone full-text SQL search engine
Loaded:loaded(/etc/init.d/sphinxsearch; bad; vendor preset: enabled)
Active:active(running) since Tue 2016-07-2601:50:00 EDT; 15s ago
...
This will also ensure that the Sphinx daemon will start even if the server is restarted.
Now that everything is set up, let's test the search function. Connect to SphinxQL (on port 9306) using the MySQL interface. Your prompt will change to mysql>
.
mysql -h0 -P9306
Let's search for a sentence.
SELECT * FROM test1 WHERE MATCH('test document'); SHOW META;
You should get something like the following.
+- - - - - - +- - - - - - - - - - +- - - - - - - - - - - - +| id | group_id | date_added |+------+----------+------------+|1|1|1465979047||2|1|1465979047|+------+----------+------------+2 rows inset(0.00 sec)
+- - - - - - - - - - - - - - - +- - - - - - - - - - +| Variable_name | Value |+---------------+----------+| total |2|| total_found |2|| time |0.000|| keyword[0]| test || docs[0]|3|| hits[0]|5|| keyword[1]| document || docs[1]|2|| hits[1]|2|+---------------+----------+9 rows inset(0.00 sec)
In the above results, you can see that Sphinx found 2 matches in the index test1
of our test sentence. The SHOW META;
command also displays the number of hits for each keyword in the sentence.
Let's search some keywords.
CALL KEYWORDS('test one three','test1',1);
You should get something like the following.
+- - - - - - +- - - - - - - - - - - +- - - - - - - - - - - - +- - - - - - +- - - - - - +| qpos | tokenized | normalized | docs | hits |+------+-----------+------------+------+------+|1| test | test |3|5||2| one | one |1|2||3| three | three |0|0|+------+-----------+------------+------+------+3 rows inset(0.00 sec)
In the above results, you can see that in the test1 index, Sphinx found:
Now you can leave the MySQL shell.
quit
In this tutorial, we showed you how to install Sphinx and use SphinxQL and MySQL for simple searches.
By using Sphinx, you can easily add custom searches to your website.
To learn more about the related tutorials on installing and configuring Sphinx, please go to [Tencent Cloud + Community] (https://cloud.tencent.com/developer?from=10680) to learn more.
Reference: "How To Install and Configure Sphinx on Ubuntu 16.04"
Recommended Posts