mysql里ft是什么

发布时间：2023-06-28 11:30:43 所属栏目：MySql教程来源：未知

导读： 　　为大家详细介绍“mysql ft是什么”，内容详细，步骤清晰，细节处理妥当，希望这篇“mysql ft是什么”文章能帮助大家解决疑惑，下面跟着小编的思路慢慢深入，一起来

　　为大家详细介绍“mysql ft是什么”，内容详细，步骤清晰，细节处理妥当，希望这篇“mysql ft是什么”文章能帮助大家解决疑惑，下面跟着小编的思路慢慢深入，一起来学习新知识吧。

　　mysql ft指的是FullText，即全文索引；全文索引是为了解决需要基于相似度的查询，而不是精确数值比较；全文索引在大量的数据面前，能比like快N倍，速度不是一个数量级。

　　MySQL 全文索引 (FullText)

　　一、简介

　　基本概念

　　全文索引是为了解决需要基于相似度的查询，而不是精确数值比较。

　　虽然使用 like + % 也可以实现模糊匹配，但是对于大量的文本数据检索，是不可想象的。全文索引在大量的数据面前，能比 like 快 N 倍，速度不是一个数量级。

　　版本支持

　　MySQL 5.6 以前的版本，只有 MyISAM 存储引擎支持全文索引

　　MySQL 5.6 及以后的版本，MyISAM 和 InnoDB 存储引擎均支持全文索引

　　MySQL 5.7.6 中，提供了支持中文、日文和韩文（CJK）的内置全文 ngram 解析器，以及用于日文的可安装 MeCab 全文解析器插件

　　全文索引只能用于InnoDB或MyISAM表，只能为CHAR、VARCHAR、TEXT列创建

　　对于大型数据集，<span quot="quot" dashed="""dashed""" yellow="""yellow""">将数据加载到没有全文索引的表中然后创建索引要比将数据加载到具有现有全文索引的表中快得多

　　RDS MySQL 5.6 虽然也支持中文全文检索，但存在BUG

　　限制与缺点

　　导致磁盘资源的大量占用。全文索引本身就是一个利用磁盘空间换取性能的方法。全文索引大的原因是，按照某种语言来进行分词

　　全文索引创建速度慢，而且对有全文索引的各种数据修改操作也慢

　　使用全文索引并不是对应用透明的。如果要想利用全文索引，必须修改查询语句。原有的查询语句是不可能利用全文索引的，需要改成全文索引规定的语法

　　不区分大小写

　　分区表不支持全文搜索

　　由多列组合而成的全文检索的索引必须使用相同的字符集与排序规则

　　全文索引可能存在精度问题，即全文索引找到的数据，可能和like到的不一致

　　MATCH()函数中的列必须与FULLTEXT索引中定义的列完全一致，除非是在MyISAM表中使用IN BOOLEAN MODE模式的全文搜索（可在没有建立索引的列执行搜索，但速度很慢）

　　单列分别建立全文索引时，多列模糊查询时不生效

　　不同表的全文索引不能放在一起查询，可以两个语句中加上OR

　　二、操作全文索引

　　2.1 配置最小搜索长度

　　我们可以通过 SQL 命令查看当前配置的最小搜索长度（分词长度）：

　　SHOW VARIABLES LIKE 'ft%';

　　Variable_name Value

　　ft_boolean_syntax + -><()~*:""&|

　　ft_max_word_len 84

　　ft_min_word_len 1

　　ft_query_expansion_limit 20

　　ft_stopword_file (built-in)

　　全文索引的相关参数都无法进行动态修改，必须通过修改 MySQL 的配置文件来完成。修改最小搜索长度的值为 1，首先打开 MySQL 的配置文件 /etc/my.cnf，在 [mysqld] 的下面追加以下内容：

　　[mysqld]

　　innodb_ft_min_token_size = 1

　　# 最短的索引字符串，默认值为4

　　ft_min_word_len = 1

　　配置完后重启 MySQL 服务器，并修复或重建全文索引方可生效。

　　可使用下面的命令修复：

　　repair table test quick;

　　2.2 创建索引

　　建表时创建全文索引

　　CREATE TABLE fulltext_test (

　　 id int(11) NOT NULL AUTO_INCREMENT,

　　 content TEXT NOT NULL,

　　 tag VARCHAR(255),

　　 PRIMARY KEY (id),

　　 FULLTEXT KEY content_tag_fulltext(content, tag) WITH PARSER ngram

　　) ENGINE = InnoDB DEFAULT CHARSET=utf8mb4;

　　在已存在的表上创建全文索引

　　CREATE FULLTEXT INDEX content_fulltext ON fulltext_test(content) with parser ngram;

　　通过 SQL 语句 ALTER TABLE 创建全文索引

　　ALTER TABLE fulltext_test ADD FULLTEXT INDEX content_fulltext(content) with parser ngram;

　　2.3 删除索引

　　使用 DROP INDEX 删除全文索引

　　DROP INDEX content_fulltext ON fulltext_test;

　　通过 SQL 语句 ALTER TABLE 删除全文索引

　　ALTER TABLE fulltext_test DROP INDEX content_fulltext;

　　三、检索数据

　　3.1 自然语言的全文检索

　　默认情况下，或者使用 in natural language mode 修饰符时，match() 函数对文本集合执行自然语言搜索。

　　SELECT * FROM 表名 WHERE Match(列名1,列名2) Against (检索内容1 检索内容2);

　　检索内容不需要用逗号隔开！

　　自然语言搜索引擎将计算每一个文档对象和查询的相关度。这里，相关度是基于匹配的关键词的个数，以及关键词在文档中出现的次数。在整个索引中出现次数越少的词语，匹配时的相关度就越高。相反，非常常见的单词将不会被搜索，如果一个词语的在超过 50% 的记录中都出现了，那么自然语言的搜索将不会搜索这类词语。

　　3.2 布尔全文检索

　　在布尔搜索中，我们可以在查询中自定义某个被搜索的词语的相关性，当编写一个布尔搜索查询时，可以通过一些前缀修饰符来定制搜索。

　　空(也就是默认状况)，表示可选的，包含该词的顺序较高

　　+ 表示必须包含

　　- 表示必须排除

　　“>” 表示出现该单词时增加相关性，查询的结果靠前

　　“<” 表示出现该单词时降低相关性，查询的结果靠后

　　* 表示通配符，只能接在词后面

　　~ 允许出现该单词，但是出现时相关性为负，表示拥有该字会下降相关性，但不像「-」将之排除，只是排在较后面

　　"" 双引号表示短语，表示要彻底相符，不可拆字效果，类同于 like '%keyword%'

　　() 经过括号来使用字条件:

　　+aaa +(>bbb <ccc) aaa="aaa" sql="sql" select="select" from="from" test="test" where="where" match="match" against="against" in="in" boolean="boolean" mode="mode" select="select" from="from" tommy="tommy" where="where" match="match" against="against" in="in" boolean="boolean" mode="mode" select="select" from="from" tommy="tommy" where="where" match="match" against="against">李秀琴 <练习册 <不是人>是个鬼' in boolean mode);

　　四、测试结果

　　测试环境：本机4核16G Windows10，MySQL 8.0

　　测试数据量：salebilldetail 表 1276万行，salebill 表 269 万行, customer 表 30 万行, goods 表 75 万行。

　　争对测试用的SQL语句，增加了以下全文索引：

　　CREATE FULLTEXT INDEX billno_fulltext ON salebill(billno) WITH PARSER ngram;

　　CREATE FULLTEXT INDEX remarks_fulltext ON salebill(remarks) WITH PARSER ngram;

　　CREATE FULLTEXT INDEX remarks_fulltext ON salebilldetail(remarks) WITH PARSER ngram;

　　CREATE FULLTEXT INDEX goodsremarks_fulltext ON salebilldetail(goodsremarks) WITH PARSER ngram;

　　CREATE FULLTEXT INDEX remarks_goodsremarks_fulltext ON salebilldetail(remarks, goodsremarks) WITH PARSER ngram;

　　CREATE FULLTEXT INDEX custname_fulltext ON customer(custname) WITH PARSER ngram;

　　CREATE FULLTEXT INDEX goodsname_fulltext ON goods(goodsname) WITH PARSER ngram;

　　CREATE FULLTEXT INDEX goodscode_fulltext ON goods(goodscode) WITH PARSER ngram;

　　测试结果，总的来说很魔幻。

　　为什么魔幻，看下面几个语句：

　　test_1

　　-- 测试1，原始 like 查询方式，用时 0.765s

　　select 1 from salebilldetail d where d.tid=260434 and ((d.remarks like concat('%','葡萄','%')) or (d.goodsremarks like concat('%','葡萄','%')));

　　test_2

　　-- 测试2，使用全文索引 remarks_fulltext、goodsremarks_fulltext, 用时 0.834s

　　select 1 from salebilldetail d where d.tid=260434 and ((match(d.remarks) Against(concat('"','葡萄','"') in boolean mode)) or (match(d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode)));

　　test_3

　　-- 测试3，使用全文索引 remarks_goodsremarks_fulltext, 用时 0.242s

　　select 1 from salebilldetail d where d.tid=260434 and ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode)));

　　test_4

　　-- 测试4，原始 like 查询方式，不过滤 tid ，用时 22.654s

　　select t from salebilldetail d where ((d.remarks like concat('%','葡萄','%')) or (d.goodsremarks like concat('%','葡萄','%')));

　　test_5

　　-- 测试5，使用全文索引 remarks_fulltext、goodsremarks_fulltext, 不过滤 tid ，用时 24.855s

　　select 1 from salebilldetail d where ((match(d.remarks) Against(concat('"','葡萄','"') in boolean mode)) or (match(d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode)));

　　test_6

　　-- 测试6，使用全文索引 remarks_goodsremarks_fulltext, 不过滤 tid ，用时 0.213s

　　select 1 from salebilldetail d where ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode)));

　　test_7

　　-- 测试7，使用全文索引 remarks_goodsremarks_fulltext, 用时 0.22s

　　select count(1) from salebilldetail d where d.tid=260434 and ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode)));

　　test_8

　　-- 测试8，使用全文索引 remarks_goodsremarks_fulltext, 不过滤 tid ，用时 0.007s

　　select count(1) from salebilldetail d where ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode)));

　　从上面的测试语句可以看出，数据量越多，查询越简单，全文索引的效果越好。

　　再来看看我们的业务测试SQL:

　　test_9

　　-- 测试9

　　select

　　    i.billid

　　    ,if(0,0,i.qty) as qty

　　    ,if(0,0,i.goodstotal) as total

　　    ,if(0,0,i.chktotal) as selfchktotal

　　    ,if(0,0,i.distotal) as distotal

　　    ,if(0,0,i.otherpay) as feetotal

　　    ,if(0,0,ifnull(d.costtotal,0)) as costtotal

　　    ,if(0,0,ifnull(d.maoli,0)) as maoli

　　    ,i.billno

　　    ,from_unixtime(i.billdate,'%Y-%m-%d') as billdate /*单据日期*/

　　    ,from_unixtime(i.createdate,'%Y-%m-%d %H:%i:%s') as createdate /*制单日期*/

　　    ,if(i.sdate=0,'',from_unixtime(i.sdate,'%Y-%m-%d %H:%i:%s')) as sdate /*过账日期*/

　　    ,from_unixtime(i.udate,'%Y-%m-%d %H:%i:%s') as udate /*最后修改时间*/

　　    ,i.custid ,c.custname

　　    ,i.storeid ,k.storename

　　    ,i.empid ,e.empname

　　    ,i.userid ,u.username

　　    ,i.remarks                               /*单据备注*/

　　    ,i.effect,i.settle,i.redold,i.rednew     /*单据状态*/

　　    ,i.printtimes /* 打印次数 */

　　    ,(case when i.rednew=1 then 1 when i.redold=1 then 2 when i.settle=1 then 3 when i.effect=1 then 4 else 9 end) as state /*单据状态*/

　　    ,(case when i.rednew=1 then '红冲单' when i.redold=1 then '已红冲' when i.settle=1 then '已结算' when i.effect=1 then '已过账' else '草稿' end) as statetext

　　    ,'' as susername /* 操作人 */

　　    ,'' as accname /* 科目 */

　　from salebill i

　　left join coursecentersale d on d.tid=i.tid and d.billid=i.billid

　　left join customer c on c.tid=i.tid and c.custid=i.custid

　　left join store k on k.tid=i.tid and k.storeid=i.storeid

　　left join employee e on e.tid=i.tid and e.empid=i.empid

　　left join user u on u.tid=i.tid and u.userid=i.userid

　　where i.tid=260434 and (i.billtype = 5 or i.effect = 1)

　　    and ('_billdate_f_'!='')

　　    and ('_billdate_t_'!='')

　　    and ('_sdate_f_'!='')

　　    and ('_sdate_t_'!='')

　　    and ('_udate_f_'!='')

　　    and ('_udate_t_'!='')

　　    and ('_cdate_f_'!='')

　　    and ('_cdate_t_'!='')

　　    and ('_billid_'!='')      /*单据id*/

　　    and ('_custid_'!='')      /*客户ID*/

　　    and ('_storeid_'!='')     /*店仓ID*/

　　    and ('_empid_'!='')       /*业务员ID*/

　　    and ('_custstop_'!='')       /*客户是否停用*/

　　    and (

　　        (i.billno like concat('%','葡萄','%'))

　　        or (i.remarks like concat('%','葡萄','%'))

　　        or exists(select 1 from salebilldetail d where d.tid=260434 and d.billid=i.billid and ((d.remarks like concat('%','葡萄','%')) or (d.goodsremarks like concat('%','葡萄','%'))))

　　        or exists(select 1 from customer c where c.tid=260434 and c.custid=i.custid and (c.custname like concat('%','葡萄','%')))

　　        or exists(select 1 from goods g join salebilldetail d on d.tid=g.tid and d.goodsid=g.goodsid where d.tid=260434 and d.billid=i.billid and ((g.goodsname like concat('%','葡萄','%')) or (g.goodscode like concat('%','葡萄','%'))))

　　    )

　　    and i.rednew=0 /*单据列表不含红冲单*/

　　    and i.billid not in (select billid from coursecenter_del t where t.tid=260434)

　　    and ((i.settle=1 and i.effect=1 and i.redold=0 and i.rednew=0)) /*已结算*/

　　order by udate desc,billno desc

　　limit 0,100;

　　执行时间约 1.6 秒，使用的是 like 方式。

　　改成使用全文索引方式：

　　test_10

　　-- 测试10

　　select

　　    i.billid

　　    ,if(0,0,i.qty) as qty

　　    ,if(0,0,i.goodstotal) as total

　　    ,if(0,0,i.chktotal) as selfchktotal

　　    ,if(0,0,i.distotal) as distotal

　　    ,if(0,0,i.otherpay) as feetotal

　　    ,if(0,0,ifnull(d.costtotal,0)) as costtotal

　　    ,if(0,0,ifnull(d.maoli,0)) as maoli

　　    ,i.billno

　　    ,from_unixtime(i.billdate,'%Y-%m-%d') as billdate /*单据日期*/

　　    ,from_unixtime(i.createdate,'%Y-%m-%d %H:%i:%s') as createdate /*制单日期*/

　　    ,if(i.sdate=0,'',from_unixtime(i.sdate,'%Y-%m-%d %H:%i:%s')) as sdate /*过账日期*/

　　    ,from_unixtime(i.udate,'%Y-%m-%d %H:%i:%s') as udate /*最后修改时间*/

　　    ,i.custid ,c.custname

　　    ,i.storeid ,k.storename

　　    ,i.empid ,e.empname

　　    ,i.userid ,u.username

　　    ,i.remarks                               /*单据备注*/

　　    ,i.effect,i.settle,i.redold,i.rednew     /*单据状态*/

　　    ,i.printtimes /* 打印次数 */

　　    ,(case when i.rednew=1 then 1 when i.redold=1 then 2 when i.settle=1 then 3 when i.effect=1 then 4 else 9 end) as state /*单据状态*/

　　    ,(case when i.rednew=1 then '红冲单' when i.redold=1 then '已红冲' when i.settle=1 then '已结算' when i.effect=1 then '已过账' else '草稿' end) as statetext

　　    ,'' as susername /* 操作人 */

　　    ,'' as accname /* 科目 */

　　from salebill i

　　left join coursecentersale d on d.tid=i.tid and d.billid=i.billid

　　left join customer c on c.tid=i.tid and c.custid=i.custid

　　left join store k on k.tid=i.tid and k.storeid=i.storeid

　　left join employee e on e.tid=i.tid and e.empid=i.empid

　　left join user u on u.tid=i.tid and u.userid=i.userid

　　where i.tid=260434 and (i.billtype = 5 or i.effect = 1)

　　    and ('_billdate_f_'!='')

　　    and ('_billdate_t_'!='')

　　    and ('_sdate_f_'!='')

　　    and ('_sdate_t_'!='')

　　    and ('_udate_f_'!='')

　　    and ('_udate_t_'!='')

　　    and ('_cdate_f_'!='')

　　    and ('_cdate_t_'!='')

　　    and ('_billid_'!='')      /*单据id*/

　　    and ('_custid_'!='')      /*客户ID*/

　　    and ('_storeid_'!='')     /*店仓ID*/

　　    and ('_empid_'!='')       /*业务员ID*/

　　    and ('_custstop_'!='')       /*客户是否停用*/

　　    and (

　　        (match(i.billno) against(concat('"','葡萄','"') in boolean mode))

　　        or (match(i.remarks) against(concat('"','葡萄','"') in boolean mode))

　　        or exists(select 1 from salebilldetail d where d.tid=260434 and d.billid=i.billid and ((match(d.remarks) Against(concat('"','葡萄','"') in boolean mode)) or (match(d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode))))

　　        or exists(select 1 from customer c where c.tid=260434 and c.custid=i.custid and (match(c.custname) Against(concat('"','葡萄','"') in boolean mode)))

　　        or exists(select 1 from goods g join salebilldetail d on d.tid=g.tid and d.goodsid=g.goodsid where d.tid=260434 and d.billid=i.billid

　　     and ((match(g.goodsname) Against(concat('"','葡萄','"') in boolean mode))

　　     or (match(g.goodscode) Against(concat('"','葡萄','"') in boolean mode))))

　　    )

　　    and i.rednew=0 /*单据列表不含红冲单*/

　　    and i.billid not in (select billid from coursecenter_del t where t.tid=260434)

　　    and ((i.settle=1 and i.effect=1 and i.redold=0 and i.rednew=0)) /*已结算*/

　　order by udate desc,billno desc

　　limit 0,100;

　　执行时间约 1.6 秒，与使用的是 like 方式差不多。

　　最魔幻的地方来了，如果将上面的SQL语句中（salebilldetail表使用全文索引 remarks_fulltext、goodsremarks_fulltext的地方）

　　exists(select 1 from salebilldetail d where d.tid=260434 and d.billid=i.billid and ((match(d.remarks) Against(concat('"','葡萄','"') in boolean mode)) or (match(d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode))))

　　test_11

　　改成使用全文索引 remarks_goodsremarks_fulltext

　　-- 测试11

　　exists(select 1 from salebilldetail d where d.tid=260434 and d.billid=i.billid and ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode))))

　　执行时间无限长（跑了半天没成功）？

　　经分析，在 where 子句中，一个条件子句中包含一个以上 match 时会出现这样的情况。即：

　　-- and 中只有一个全文检索时正常，用时0.2秒

　　select xxx from xxx

　　...

　　and (

　　 exists(select 1 from salebilldetail d where d.tid=260434 and d.billid=i.billid and ((match(d.remarks,d.goodsremarks) Against(concat('"','葡萄','"') in boolean mode))))

　　)

　　...

　　-- 下面这样就异常了，会慢成百上千倍，用时 160 秒，如果有更多的 match ，会更夸张的慢下去

　　select xxx from xxx

　　.

（编辑：吕梁站长网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!