sphinx 增量索引

分类：其它 2023-02-18 20:19:37 颜色：橙色　默认　　字号：大中小阅读(794) | 评论(0)

sphinx创建索引之后，如果我们的数据库又增加了一条数据，需要重新创建索引。但是如果数据量十分庞大时，每次都重新创建索引显然是不合适的。

我们希望实现的效果是，每次都只创建新增的数据的索引。

增量索引实现原理

假设现在数据库中有三条数据，id分别为1，2，3。

使用indexer命令为这三条数据创建索引，并把max_doc_id=3记录到一张表（sphinx_counter）中，表示当前主索引已经创建了id<=3的所有数据的索引

假设此时有新的数据插入，id为4，5。然后为这两条新数据创建索引，称之为增量索引，同时更新sphinx_counter表，max_doc_id=5
最后使用indexer --merge将增量索引合并到主索引中

sphinx_counter表结构如下：

DROP TABLE IF EXISTS `sphinx_counter`;
CREATE TABLE `sphinx_counter` (
  `counter_id` int(11) NOT NULL COMMENT '标识不同的数据表',
  `max_doc_id` int(11) NOT NULL COMMENT '文档已索引的最大id',
  PRIMARY KEY (`counter_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

sphinx_data 表结构如下：

CREATE TABLE `sphinx_data` (
	`id` INT(11) NOT NULL AUTO_INCREMENT,
	`group_id` INT(11) NOT NULL,
	`group_id2` INT(11) NOT NULL,
	`date_added` DATETIME NOT NULL,
	`title` VARCHAR(255) NOT NULL,
	`content` TEXT NOT NULL,
	PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;

增量索引配置

#主索引数据源
source article_src
{
    type            = mysql

    sql_host        = localhost
    sql_user        = root
    sql_pass        = root
    sql_db          = test
    sql_port        = 3306  # optional, default is 3306


    sql_query_pre      = SET NAMES utf8
    sql_query_pre      = SET SESSION query_cache_type=OFF
     #创建主索引时，将最大文档id插入到sphinx_counter表中
    sql_query_pre      = replace into sphinx_counter select 1,max(id) from sphinx_data
    sql_query          = SELECT id, id as user_id, group_id, date_added, title, content FROM sphinx_data 
                        where id <= (select max_doc_id from sphinx_counter where counter_id = 1)

    sql_attr_uint		= group_id
    sql_attr_uint		= user_id
    sql_attr_timestamp	= date_added
}

#主索引
index article_index
{
    source          = article_src
    path            = E:sphinxdataarticle
}

#增量索引数据源
source article_delta_src : article_src
{
    sql_query_pre = SET NAMES utf8

    sql_query_range               =
    sql_range_step                = 10000
     #增量索引只查询新增的数据
    sql_query   = SELECT id, id as user_id, group_id, date_added, title, content FROM sphinx_data 
                 where id > (select max_doc_id from sphinx_counter where counter_id = 1)

 }

#增量索引
index article_delta_index : article_index
{
    source   = article_delta_src
    path     = E:sphinxdataarticle_delta
}

定时索引任务
我们要写一个增量索引的脚本，定时去执行这个脚本

#!/bin/bash
#创建增量索引
/usr/local/sphinx/bin/indexer article_delta_index --rotate
#合并增量索引和主索引
/usr/local/sphinx/bin/indexer --merge article_index article_delta_index --rotate

–rotate表示通知searchd进程有索引更新了，如果没有这个参数，必须停止searchd进程才能创建索引

一般情况我们也会创建一个主索引脚本，每天半夜无人时去执行一次，来全部重新创建索引

#!/bin/bash

/usr/local/sphinx/bin/indexer article_index --rotate

注意
1.如果不合并增量索引和主索引，每次创建增量索引都会覆盖之前创建的增量索引，导致索引丢失
2.合并索引时的IO消耗是索引文件大小的两倍，假设主索引文件大小是100G，增量索引大小是2G，那么IO消耗是(100+2)*2 = 204G，虽然IO消耗大，但相比全部重新创建索引来说，还是小得多。

上一篇：hi 今天天气非常好　　下一篇：python装饰器

最新评论查看所有评论>>