PhpWind教程:Sphinx全文索引安裝教程_PhpWind教程
首先了解一下sphinx全文索引的相關(guān)知識(shí)
官方網(wǎng)站:http://www.sphinxsearch.com/
官方文檔:http://www.sphinxsearch.com/docs/
中文支持:http://www.coreseek.cn/
中文使用手冊(cè)下載:http://www.coreseek.cn/uploads/pdf/sphinx_doc_zhcn_0.9.pdf
基本上看看上面的官方教程和中文使用手冊(cè),你應(yīng)該會(huì)安裝和使用Sphix全文索引,當(dāng)然,還有一些細(xì)節(jié),需要不斷的google和baidu,那為了節(jié)省大家的時(shí)間,就出一個(gè)完整的Sphinx安裝教程和結(jié)合PHPWIND程序的使用教程(PHPWIND7.5版本支持)。
接下來開始Sphinx的技術(shù)之旅吧!
考慮到Sphinx全文索引使用的實(shí)際需要,主要介紹Sphinx全文索引中文方面的支持。
這里需要感謝李沫南同學(xué)對(duì)Sphinx全文索引中文支持的貢獻(xiàn)!
一,Windows下安裝Sphinx
1,開始前的準(zhǔn)備工作
來源:http://www.coreseek.cn/products/ft_down/
下載csft3.1:http://www.coreseek.cn/uploads/csft/3.1/win32/csft3.1.bin.zip
下載標(biāo)準(zhǔn)詞庫(kù):http://www.coreseek.cn/uploads/csft/3.1/data.zip
解壓:csft3.1.bin.zip 如下目錄,解壓在C:\csft3.1目錄下
解壓:data.zip,解壓在C:\csft3.1\data目錄下 [分詞包]

(1)復(fù)制 C:\csft3.1\conf\csft.conf.in 文件到 C:\csft3.1\bin\ 目錄下,并重命名為csft.conf
注意csft.conf文件里的類似:path = @CONFDIR@/data/test1
把@CONFDIR@替換為C:\csft3.1\ 如上更改為:path = C:\csft3.1\ data\test1
(2)把測(cè)試數(shù)據(jù) C:\csft3.1\conf\example.sql 導(dǎo)入數(shù)據(jù)庫(kù) [這個(gè)基本都會(huì)吧!]
(3)建立索引,在DOC界面下運(yùn)行:indexer.exe --all 如下圖,

sql_host = localhost #數(shù)據(jù)庫(kù)主機(jī)地址
sql_user = test #數(shù)據(jù)庫(kù)用戶名,擁有數(shù)據(jù)庫(kù)所有權(quán)限
sql_pass =
sql_db = test #數(shù)據(jù)庫(kù)名
sql_port = 3306 #可用端口,一般不需要更改
其它配置使用默認(rèn),先體驗(yàn)下sphinx全文索引功能。
(4)測(cè)試搜索是否正常,運(yùn)行:search.exe test 如下圖

測(cè)試正常將返回
(5)開啟搜索進(jìn)程服務(wù),運(yùn)行:searchd.exe 如下圖

附:PHPWind程序支持Sphinx全文索引的配置。
Windows下安裝Sphix使用csft非常簡(jiǎn)單,如果大家有興趣可從sphinx[www.sphinxsearch.com]官方下載安裝,不過有點(diǎn)復(fù)雜,這里就不介紹了,高手們慢慢體驗(yàn)。
二,linux下安裝Sphinx全文索引,以CentOS 5.3為例
只能說windows下安裝sphinx只是為了體驗(yàn),因?yàn)閘inux下安裝sphinx才是正道。
為了詳細(xì)體驗(yàn)Centos下安裝Sphinx,重新安裝Centos系統(tǒng),完整體驗(yàn)Sphinx安裝過程。
Coreseek 全文檢索服務(wù)器版本已經(jīng)集成sphinx和中文分詞補(bǔ)丁,只需要下載MMSeg和Coreseek Fulltext Server(源代碼),就能實(shí)現(xiàn)Sphinx服務(wù)支持。
下載地址:http://www.coreseek.cn/products/ft_down/
推薦源代碼安裝
1,開始前的準(zhǔn)備工作 [如果已經(jīng)安裝就不需要,如果下面列表沒有還有其它的請(qǐng)補(bǔ)上]
1)安裝mysql
2)安裝php
3)安裝apache
4)安裝python
5)安裝libiconv
6)安裝gcc-c++
7)下載Coreseek Fulltext Server(源代碼):http://www.coreseek.cn/uploads/csft/3.1/Source/csft-3.1.tar.gz
8)下載Coreseek Mmseg(源代碼):http://www.coreseek.cn/uploads/csft/3.1/Source/mmseg-3.1.tar.gz
執(zhí)行如下命令
yum install python python-dev
2,安裝步驟
(1)下載CSFT與MMseg
#wget http://www.coreseek.cn/uploads/csft/3.1/Source/mmseg-3.1.tar.gz
#wget http://www.coreseek.cn/uploads/csft/3.1/Source/csft-3.1.tar.gz
(2)安裝MMseg中文分詞
# pwd
/usr/local [知道當(dāng)前的安裝目錄]
# wget http://www.coreseek.cn/uploads/csft/3.1/Source/mmseg-3.1.tar.gz
# tar xzvf mmseg-3.1.tar.gz
# mkdir /usr/local/mmseg
# cd mmseg-3.1
# ./configure --prefix=/usr/local/mmseg
# make
# make install
運(yùn)行如下,看看mmseg是否安裝成功
# /usr/local/mmseg/bin/mmseg
Coreseek COS(tm) MM Segment 1.0
Copyright By Coreseek.com All Right Reserved.
Usage: /usr/local/mmseg/bin/mmseg
-u
-r Combine with -u, used a plain text build Unigram Dictionary, default Off
-b
-h print this help and exit
(3)安裝csft-3.1
# pwd
/usr/local
# wget http://www.coreseek.cn/uploads/csft/3.1/Source/csft-3.1.tar.gz
# tar xzvf csft-3.1.tar.gz
# mkdir /usr/local/csft
# cd csft-3.1
#
./configure --prefix=/usr/local/csft
--with-mmseg=/usr/local/mmseg/bin/mmseg
--with-mmseg-includes=/usr/local/mmseg/include/mmseg/
--with-mmseg-libs=/usr/local/mmseg/lib/
# make
# make install
這里make的時(shí)候可能出錯(cuò),解決如下:
1,檢查環(huán)境是否安裝如下軟件
# yum install mysql mysql-devel php-mysql qt4-mysql [mysql環(huán)境要首先安裝]
# yum install python python-dev
2,是否安裝libiconv
下載地址:http://savannah.gnu.org/projects/libiconv/
3,如果還有錯(cuò)誤,打開src/Makefile文件,進(jìn)行修改
# vi src/Makefile 找到182行

LIBS = -lm -lz -lexpat -L/usr/local/lib -lpthread
LIBS = -lm -lz -lexpat -liconv -L/usr/local/lib -lpthread
這樣,如果一切順利,就開始配置你的sphinx全文索引服務(wù)器吧[如果安裝有什么問題,歡迎在PHPWind官方提問]!
3,按下來就是配置
#cp /usr/local/csft/etc/sphinx-min.conf.dist /usr/local/csft/etc/sphinx.conf
修改sphinx.conf文件中的數(shù)據(jù)庫(kù)參數(shù)配置,方法同windows下一樣
sql_host = localhost
sql_user = root
sql_pass =
sql_db = test
4,把體驗(yàn)數(shù)據(jù)/usr/local/csft/etc/example.sql 導(dǎo)入到數(shù)據(jù)庫(kù) [這一步應(yīng)該都會(huì)]
5,新建索引
# /usr/local/csft/bin/indexer --all
6,測(cè)試搜索
# /usr/local/csft/bin/search test
如果測(cè)試有返回,恭喜你的sphinx全文索引服務(wù)器配置成功
7,接下來就是支持中文的配置和實(shí)現(xiàn)
UTF8編碼實(shí)例 [如果已經(jīng)存在utf8的數(shù)據(jù)庫(kù)就不需要新建,這里只是舉例]
1)創(chuàng)建一個(gè)新的數(shù)據(jù)庫(kù),注意編碼為utf8_general_ci,如phpwind
2)導(dǎo)入部分現(xiàn)有的GBK數(shù)據(jù),如pw_threads
3)配置csft.conf如下
source數(shù)據(jù)源部分
sql_host = localhost
sql_user = root
sql_pass =
sql_db = phpwind
sql_query_pre = SET NAMES utf8
sql_query_pre = SET SESSION query_cache_type=OFF
sql_query = SELECT tid,fid,authorid,subject FROM pw_threads
sql_attr_uint = fid
sql_attr_uint = authorid
索引部分
charset_type = zh_cn.utf-8
charset_dictpath = /usr/local/csft/
min_prefix_len = 0
min_infix_len = 0
min_word_len = 2
4)創(chuàng)建數(shù)據(jù)詞典
#pwd
/usr/local/mmseg-3.1/data [這是你解壓mmseg的目錄下的data]
運(yùn)行如下命令
# mmseg -u unigram.txt
# ll
總計(jì) 10152
-rwxr-xr-x 1 root root 715 06-06 18:40 build_unigram.py
-rwxr-xr-x 1 root root 32674 06-06 18:40 char.stat.txt
-rwxr-xr-x 1 root root 1051268 06-06 18:40 Lexicon_full_words.txt
-rwxr-xr-x 1 root root 1826251 06-06 18:40 unigram.txt
-rw-r--r-- 1 root root 3729280 09-16 20:20 unigram.txt.uni
將會(huì)生成 unigram.txt.uni 文件
# mv unigram.txt.uni uni.lib
# cp uni.lib /usr/local/csft/ [這就是上面我們?cè)谂渲盟饕杏玫腸harset_dictpath]
其它的默認(rèn)不變,如上方法創(chuàng)建索引
# /usr/local/csft/bin/indexer --all
測(cè)試是否成功
# /usr/local/csft/bin/search 測(cè)試
以上就是utf8編碼的全文索引實(shí)現(xiàn)過程
GBK編碼實(shí)例
與utf8一樣,區(qū)別在于數(shù)據(jù)庫(kù)和數(shù)據(jù)表使用gbk編碼
同時(shí)只需要修改如下配置部分[csft.conf]
source數(shù)據(jù)源部分
sql_query_pre = SET NAMES gbk
索引部分
charset_type = zh_cn.gbk
這里需要注意一下,如果要想測(cè)試支持gbk,可以寫一個(gè)PHP文件,調(diào)用sphinx提供的api接口,注意要開啟searchd進(jìn)程
# /usr/local/csft/bin/searchd
編寫如下代碼 [注意要與sphinxapi.php目錄存放在一個(gè)目錄]
sphinxapi.php目錄在# /usr/local/csft-3.1/api/下
也可以直接使用api目錄下的test.php直接測(cè)試
require_once 'sphinxapi.php';
$sc = new SphinxClient();
$sc->SetServer('127.0.0.1',3312);
$sc->SetConnectTimeout(1);
$sc->SetWeights(array(100,1));
$sc->SetMatchMode(SPH_MATCH_ALL);
$sc->SetArrayResult(TRUE);
$res = $sc->query("簡(jiǎn)單");
var_dump($res);
?>
也可以直接運(yùn)行search工具[utf8版],如下

[root@localhost ~]# /usr/local/csft/bin/search 便宜
Coreseek Full Text Server 3.1
Copyright (c) 2006-2008 coreseek.com
using config file '/usr/local/csft/etc/csft.conf'...
index 'test1': query '便宜 ': returned 4 matches of 4 total in 0.015 sec
displaying matches:
1. document=3, weight=1, fid=7, authorid=1
2. document=97, weight=1, fid=35, authorid=1
3. document=108, weight=1, fid=32, authorid=1
4. document=146, weight=1, fid=7, authorid=1
words:
1. '便宜': 4 documents, 4 hits
如果返回false,請(qǐng)檢查searchd進(jìn)程是否開啟,如果返回成功,恭喜,你已經(jīng)成為sphinx的使用者,向下一個(gè)高層次進(jìn)軍吧!
三,后記
其實(shí)很想制作一個(gè)安裝視頻教程,但由于時(shí)間有限,在安裝過程中肯定會(huì)存在一些細(xì)節(jié)上的問題,只要大家按照上面的步驟一步一步安裝,相信能把sphinx拿下,如果有什么問題
大家可查看http://www.sphinxsearch.com/和http://www.coreseek.cn/網(wǎng)站獲取更多幫助,同時(shí)也可以查看中文手冊(cè)。
同時(shí)也可以在phpwind官方網(wǎng)站www.phpwind.net提問和分享你的安裝過程,把一個(gè)細(xì)節(jié)都亮出來,幫助別人也幫助自己。BY [email protected] 2009-9-17
其它鏈接
用 PHP 構(gòu)建自定義搜索引擎
http://www.ibm.com/developerworks/cn/opensource/os-php-sphinxsearch/index.html
MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm
http://technology.chtsai.org/mmseg/
附phpwind配置實(shí)例[gbk版]
PHPWind搜索sphinx配置實(shí)例 [修改部分參數(shù)就可直接應(yīng)用于phpwind程序]
部分解讀:
如下全文索引使用的是主索引+增量索引的方式,具體大家結(jié)合手冊(cè)了解相關(guān)知識(shí)
需要?jiǎng)?chuàng)建一張表 [編碼自己定,如下是gbk]
CREATE TABLE IF NOT EXISTS `search_counter` (
`counterid` int(11) NOT NULL DEFAULT '0',
`max_doc_id` int(11) NOT NULL DEFAULT '0',
`min_doc_id` int(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`counterid`)
) ENGINE=MyISAM DEFAULT CHARSET=gbk;
csft.conf配置文件
source tmsgs
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass = xxxx
sql_db = phpwind
sql_port = 3307 # optional, default is 3306
sql_sock = /tmp/mysql3307.sock
sql_query_pre = SET NAMES gbk
sql_query_pre = SET SESSION query_cache_type=OFF
sql_query_pre = REPLACE INTO search_counter SELECT 1,MAX(tid),MIN(tid) FROM pw_tmsgs
sql_query_range = SELECT min_doc_id, max_doc_id FROM search_counter WHERE counterid = 1
sql_range_step = 1000
sql_query = SELECT
th.tid,th.subject,th.authorid,th.postdate,th.lastpost,th.fid,th.digest,th.hits,th.replies,t.content
FROM pw_threads th LEFT JOIN pw_tmsgs t USING(tid) WHERE th.tid >
$start AND th.tid <= $end
sql_attr_uint = authorid
sql_attr_uint = hits
sql_attr_uint = replies
sql_attr_uint = fid
sql_attr_timestamp = postdate
sql_attr_timestamp = lastpost
sql_attr_uint = digest
}
source addtmsgs
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass = xxxx
sql_db = phpwind
sql_port = 3307 # optional, default is 3306
sql_sock = /tmp/mysql3307.sock
sql_query_pre = SET NAMES gbk
sql_query_pre = SET SESSION query_cache_type=OFF
sql_query_range = SELECT max_doc_id, max_doc_id+100000 FROM search_counter WHERE counterid = 1
sql_range_step = 100000
sql_query = SELECT
th.tid,th.subject,th.authorid,th.postdate,th.lastpost,th.fid,th.digest,th.hits,th.replies,t.content
FROM pw_threads th LEFT JOIN pw_tmsgs t USING(tid) WHERE th.tid >
$start AND th.tid <= $end
sql_attr_uint = authorid
sql_attr_uint = hits
sql_attr_uint = replies
sql_attr_uint = fid
sql_attr_timestamp = postdate
sql_attr_timestamp = lastpost
sql_attr_uint = digest
sql_query_post = REPLACE INTO search_counter SELECT 1,MAX(tid),MIN(tid) FROM pw_tmsgs
#sql_attr_uint = tid
}
source threads
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass = xxxxxxx
sql_db = phpwind
sql_port = 3307 # optional, default is 3306
sql_sock = /tmp/mysql3307.sock
sql_query_pre = SET NAMES gbk
sql_query_pre = SET SESSION query_cache_type=OFF
sql_query_pre = REPLACE INTO search_counter SELECT 3,MAX(tid),MIN(tid) FROM pw_threads
sql_query_range = SELECT min_doc_id, max_doc_id FROM search_counter WHERE counterid = 3
sql_range_step = 1000
sql_query = SELECT
th.tid,th.subject,th.authorid,th.postdate,th.lastpost,th.fid,th.digest,th.hits,th.replies
FROM pw_threads th WHERE th.tid > $start AND th.tid <= $end
sql_attr_uint = authorid
sql_attr_uint = hits
sql_attr_uint = replies
sql_attr_uint = fid
sql_attr_timestamp = postdate
sql_attr_timestamp = lastpost
sql_attr_uint = digest
}
source addthreads
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass = xxx
sql_db = phpwind
sql_port = 3307 # optional, default is 3306
sql_sock = /tmp/mysql3307.sock
sql_query_pre = SET NAMES gbk
sql_query_pre = SET SESSION query_cache_type=OFF
sql_query_range = SELECT max_doc_id, max_doc_id+100000 FROM search_counter WHERE counterid = 3
sql_range_step = 100000
sql_query = SELECT
th.tid,th.subject,th.authorid,th.postdate,th.lastpost,th.fid,th.digest,th.hits,th.replies
FROM pw_threads th WHERE th.tid > $start AND th.tid <= $end
sql_attr_uint = authorid
sql_attr_uint = hits
sql_attr_uint = replies
sql_attr_uint = fid
sql_attr_timestamp = postdate
sql_attr_timestamp = lastpost
sql_attr_uint = digest
sql_query_post = REPLACE INTO search_counter SELECT 3,MAX(tid),MIN(tid) FROM pw_threads
#sql_attr_uint = tid
}
index tmsgsindex
{
source = tmsgs
path = /usr/local/csft/var/data/tmsgs
docinfo = extern
charset_type = zh_cn.gbk
#min_prefix_len = 0
#min_infix_len = 2
#ngram_len = 2
charset_dictpath = /usr/local/csft/
min_prefix_len = 0
min_infix_len = 0
min_word_len = 2
}
index addtmsgsindex
{
source = addtmsgs
path = /usr/local/csft/var/data/addtmsgs
docinfo = extern
charset_type = zh_cn.gbk
#min_infix_len = 2
#ngram_len = 2
charset_dictpath = /usr/local/csft/
min_prefix_len = 0
min_infix_len = 0
min_word_len = 2
}
index threadsindex
{
source = threads
path = /usr/local/csft/var/data/threads
docinfo = extern
charset_type = zh_cn.gbk
#min_prefix_len = 0
#min_infix_len = 2
#ngram_len = 2
charset_dictpath = /usr/local/csft/
min_prefix_len = 0
min_infix_len = 0
min_word_len = 2
}
index addthreadsindex
{
source = addthreads
path = /usr/local/csft/var/data/addthreads
docinfo = extern
charset_type = zh_cn.gbk
#min_infix_len = 2
#ngram_len = 2
charset_dictpath = /usr/local/csft/
min_prefix_len = 0
min_infix_len = 0
min_word_len = 2
}
indexer
{
mem_limit = 128M
}
searchd
{
port = 3312
log = /usr/local/csft/var/log/searchd.log
query_log = /usr/local/csft/var/log/query.log
read_timeout = 5
max_children = 30
pid_file = /usr/local/csft/var/log/searchd.pid
max_matches = 1000
seamless_rotate = 1
preopen_indexes = 0
unlink_old = 1
}
- PhpWind教程:PHPWind forum V7.5 SP1 使用與修復(fù)說明
- PhpWind教程:PHPWind Forum V7.5 升級(jí)教程
- PHPWind7.3 升級(jí)7.3.2 圖文教程
- PhpWind教程:sitemap生成問題
- PhpWind教程:熱榜排行 for 75
- PHPwind 6.3.2升級(jí)至PHPwind 7.3.2 圖文教程
- PHPWind 7.0 升級(jí)至PHPwind 7.3.2 圖文教程
- PHPWind 7.0 升級(jí)到PHPwind v7.3 圖文教程
- PHPWind 6.3.2升級(jí)到 PHPwind v7.3 圖文教程
- PHPWind6.3.2升級(jí)PHPWind7.0說明教程
- PhpWind教程:PHPWind6.3(6.3rc、6.3) to PHPWind6.3.2升級(jí)教程
- PhpWind教程:PHPWind6.x(6.0rc、6.0) to PHPWind6.3.2升級(jí)教程
PhpWind教程教程Rss訂閱Cms教程搜索
PhpWind教程推薦
- PhpWind教程:附件轉(zhuǎn)移后!原先上傳的附件地址怎么改?!
- PhpWind教程:論壇數(shù)據(jù)備份、恢復(fù)
- PhpWind教程:windcode代碼的使用
- DVBBS 8.2 轉(zhuǎn) PHPWind 7.3教程
- PhpWind教程:安裝了PHPWind7.0后,后臺(tái)文字不顯示,圖片顯示小叉叉
- PhpWind教程:站長(zhǎng)換服務(wù)器的教程
- PhpWind教程:積分(設(shè)置、使用、查詢)
- PhpWind教程:系統(tǒng)限制,附件上傳不能超過2m
- PhpWind教程:熱榜排行 for 75
- PhpWind教程:用戶信息顯示方式設(shè)置
- 相關(guān)鏈接:
- 教程說明:
PhpWind教程-PhpWind教程:Sphinx全文索引安裝教程
。