【向量检索】OB 与 chroma 结果顺序不一致问题排查记录

这篇内容来自弥鸢老哥,我只是文章的搬运工。

背景知识

  • OceanBase 数据库支持向量数据库能力,最高支持 16000 维的 Float 类型的稠密向量,支持稀疏向量,支持曼哈顿距离、欧式距离、内积、余弦距离等多种类型向量距离的计算,支持基于 HNSW/IVF 向量索引的创建,支持增量更新删除,同时增量更新删除操作不会影响召回率。

  • Chroma 是一款专为 AI 应用设计的开源向量数据库。基于 HNSW 算法实现高维数据的快速相似性搜索,支持大规模嵌入向量存储与检索。

  • Faiss(Facebook Index Similarity Search)是一款开源向量数据库库,专为大规模高维向量的相似性搜索与聚类而设计。

客户问题场景

客户问题:在部分查询下,OceanBase 返回的结果和 chroma 返回的结果顺序不同,chroma 返回的结果和 faiss 返回的结果一致。

表结构:

CREATE TABLE `eb_collection_chunked` (
  `id` varchar(4096) NOT NULL,
  `embedding` VECTOR(1024) DEFAULT NULL,
  `chunk_content` longtext DEFAULT NULL,
  `metadata` json DEFAULT NULL,
  `collection_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  VECTOR KEY `vidx` (`embedding`) WITH (DISTANCE=L2,M=16,EF_CONSTRUCTION=256,LIB=VSAG,TYPE=HNSW, EF_SEARCH=64)
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC;

OceanBase 里的查询 SQL

SELECT id,
       metadata
FROM eb_collection_chunked
ORDER BY l2_distance(embedding, '(这里填查询向量)') approximate LIMIT 3;

OB 中的查询结果:

+--------------------------------------+------------------------------------------------------------------------------------+
| id                                   | metadata                                                                           |
+--------------------------------------+------------------------------------------------------------------------------------+
| 3262a790-85bb-4930-a0de-03b947d9bbac | {"id": 5549, "name": "爱茉莉ec订单至入库_已暂停"}                                     |
| 38cf3b0e-460a-4454-a883-b0f100307242 | {"id": 617, "name": "OMS_库存快照表_商品维度(待删)"}                                 |
| ebefdd0c-27aa-42cb-988f-a3bac8082ff2 | {"id": 5454, "name": "唯品会_JIT绩效考核_普通JITX发货超时明细_月"}                      |
+--------------------------------------+------------------------------------------------------------------------------------+

客户给的 chroma 中的查询结果:

查询语句 是否专业 核心关键词 向量数据库 结果条目 metadata distance
查询仓库J今日出库的货品数量和金额" 货品出库流水监控 chroma 结果1 {‘id’: 5549, ‘name’: ‘爱茉莉ec订单至入库_已暂停’} 0.58376309
查询仓库J今日出库的货品数量和金额" 货品出库流水监控 chroma 结果2 {‘id’: 5454, ‘name’: ‘唯品会_JIT绩效考核_普通JITX发货超时明细_月’} 0.76973236
查询仓库J今日出库的货品数量和金额" 货品出库流水监控 chroma 结果3 {‘id’: 617, ‘name’: ‘OMS_库存快照表_商品维度(待删)’} 0.77507573

客户问题分析

对比上面 OB 的查询结果和 chroma 的查询结果,可以看出问题出在 617 和 5454 这两条记录的排序顺序。

  1. 首先分析 OB 的查询结果,在 OB 中使用暴力搜索(去掉 sql 中的 APPROXIMATE 关键字,可以得到精准结果),发现 OB 使用暴力搜索的结果与使用向量索引搜索的结果一致,并且查看 OB 计算的 l2_distance 发现 OB 的结果符合预期

  2. 再看 chroma 计算出来的 l2 距离,发现 617 这一行的 l2 距离与 OB 计算出来的 l2 距离差异比较大,因此这里先是写了一个脚本计算 617 这一行与查询向量的 l2 欧式距离,发现 OB 与计算出来的欧式距离最相符

将上述结果反馈给客户后,显然自己手动计算的欧式距离不太能说服客户,但是客户前面说 chroma 返回的结果与 faiss 一致,chroma 无法做暴力搜索,但是 faiss 可以,如果是 faiss 的暴力搜索的结果与 OB 一致的话对客户来说会更有说服力。

因此我们开始尝试搭建 faiss 环境,并做相应的测试。测试脚本如下:

import faiss
import numpy as np

def parse_vector(vec_str):
    # 去除方括号和空格
    cleaned = vec_str.replace(' ', '').strip('[]')
    # 按逗号分割并转换为浮点数列表
    elements = [float(x) for x in cleaned.split(',')]
    # 转换为 NumPy 数组并验证维度
    if len(elements) != 1024:
        raise ValueError("向量维度必须为 1024")
    return np.array(elements, dtype=np.float32)

vectors = np.loadtxt('data.txt', delimiter=',')
query_vector = np.loadtxt('query.txt', delimiter=',')
#query_vector = parse_vector('[-0.035386957228183746, 0.012596512213349342, -0.06964558362960815, 0.01639978401362896, -0.030406715348362923, -0.0033509640488773584, -0.04163171723484993, 0.06198067590594292, -0.04272114485502243, 0.024084143340587616, 0.010057755745947361......]')

index = faiss.IndexFlatL2(1024)
index.add(vectors)
k = 3
print(query_vector.shape)
distances, indices = index.search(query_vector, k)

print("距离:", distances)
print("索引:", indices)

测试脚本同时还需要两个数据集,一个是 data.txt(embedding 的结果集),另外一个是 query.txt(查询向量的结果集)。实际测试中,发现查询向量的结果集最好插入两个或以上向量,否则会报错,原因是 numpy 没有正确解析向量,因临时解决客户问题,未对脚本做优化。

-0.035386957228183746, 0.012596512213349342, -0.06964558362960815, 0.01639978401362896, -0.030406715348362923, -0.0033509640488773584, -0.04163171723484993, 0.06198067590594292, -0.04272114485502243......
-0.0590525,-0.00187155,-0.0810449,-0.00470622,-0.0533451,0.0042306,-0.0341682,0.09421,-0.0391907,0.039419,0.0276047,0.0390005,0.0144873,0.0444415,0.0638847,-0.00629715,0.0286321,-0.00696777,-0.030896,-0.0606505,-0.00527458,-0.0082234,-0.0249603,0.0441752,0.0565032,0.059471,-0.0191198,-0.0364702,0.0565412,-0.0184254,0.0682223,-0.0337687,-0.0114528,-0.0171602,-0.00185728,-0.00442798,0.005988,-0.0256833,-0.09421,0.005988,0.00089713,-0.00379541,0.0118143,-0.0346248,0.0611832,-0.0250745,-0.015857,-0.0609169,-0.0404464,0.00407365,-0.00144825,-0.0426532,0.0470289,-0.0528504,0.0288413,0.0460016,-0.0259686,-0.0080284,-0.00168606,-0.0180068,-0.0547148,-0.0134314,0.00470622,0.0338829,0.0204515,0.0833279,0.0364892,-0.0161709,-0.0444415,-0.00238997,-0.00821864,0.0289935,-0.0205656,0.0273955,-0.0209651,0.0330268,0.0332741,-0.0281565,-0.00793803,-0.00663008,0.0365463,0.0320946,-0.0364892,-0.00165277,0.00172886,0.0261018,0.0146299,0.00808071,-0.0151912,0.0152387,0.00703911,-0.0323038,0.054068,-0.0148678,-0.0350244,-0.00838035,-0.0269199,0.00975964,0.0331599,0.0089606,0.000919722,0.00477043,0.0294501,-0.0278901,0.0168273,0.0189866,0.0443274,0.0148868,-0.0390386,-0.0247891,0.00169557,0.0546007,-0.00298687,0.0151531,-0.0202422,-0.0228676,0.00943622,-0.0320755,-0.024732,-0.00205466,0.0166085,0.0130414,-0.00417829,-0.0366985,0.0160948,-0.0484748,0.0334453,-0.00717229,0.0239901,0.00774303,-0.0194527,0.0894919,0.00560275,0.00514616,-0.0708858,-0.0331219,0.0393049,0.0279662,0.0497684,-0.0111675,0.0722175,0.0598895,-0.0535353,-0.00542202,0.011586,-0.016599,0.0377068,-0.00173243,-0.0128511,-0.0234574,0.00646838,0.00572166,0.0380302,0.0167036,-0.0315809,0.0103018,-0.000478886,0.00561702,-0.0169224,0.0331029,-0.00368127,-0.0555139,-0.0104445,0.013146,-0.0213837,-0.0304965,0.0292979,-0.0190437,-0.0238949,0.00694399,0.0155622,-0.0755278,0.0325511,-0.0114433,-0.00924597,0.0159902,0.0945905,0.00709143,-0.00852779,-0.0504533,-0.0218974,-0.00445652,-0.012737,0.047866,0.0230579,0.0323419,0.0425771,0.0183778,0.00404511,0.030877,0.00161709,-0.0168653,0.0157524,-0.053307,0.0433381,0.0189485,0.038582,0.061792,-0.0200139,-0.035519,-0.0118809,0.0166846,0.00636374,0.0171888,-0.0132887,0.0081901,0.0312004,-0.0464582,0.0142495,-0.0423488,-0.0276047,0.0361658,0.0291267,-0.0026468,-0.0330648,-0.00750997,0.0390005,0.0184539,-0.0140021,-0.0485128,-0.0443274,0.0449742,0.0011177,0.0534973,0.00223777,-0.00860389,-0.0351385,0.0265964,0.0175978,-0.0321897,-0.0422727,-0.0107204,0.00662533,-0.0174361,0.0334643,-0.00830901,-0.0163041,0.0205086,0.0319043,-0.00348151,-0.00854206,-0.0397234,-0.0108821,-0.059433,-0.0573783,0.028613,0.0108345,0.0633901,0.00572166,0.00993086,0.0136977,-0.0260828,-0.0114814,0.010026,0.0218213,-0.00472049,0.0223349,-0.0148012,-0.0153243,0.0207749,0.0200139,-0.0369268,-0.00286559,0.012756,0.0523177,0.0189961,-0.0320946,-0.0495021,0.0317521,-0.0195954,-0.0418922,-0.0373454,-0.0165324,0.0236476,-0.0287082,0.0309531,-0.020851,0.0134219,0.020052,0.00151484,0.0197666,-0.0492358,0.0106253,-0.0167417,0.0291077,-0.0136216,-0.00261589,-0.0685648,-0.0107775,-0.0387912,-0.0335785,0.00153862,0.0521275,0.0493119,0.0129938,0.0217832,0.0115099,-0.166504,-0.00687741,-0.0081901,0.0236857,0.0118809,0.00964549,-0.0156382,-0.0479421,-0.0455069,0.00929353,0.00283943,-0.0411313,0.0106918,-0.0274716,0.033217,-0.00271814,0.0205276,0.0105206,0.00660155,-0.0114053,-0.0315999,-0.00353145,0.0060213,-0.0146109,-0.0318663,-0.00948378,0.0662818,0.0311053,0.00371694,-0.0120806,0.0387532,-0.0291457,-0.00943622,0.018511,0.0245989,0.0118619,0.0259686,-0.0165705,0.012718,0.00555044,0.0397995,-0.0193861,0.0460777,0.033978,0.00491787,-0.0542963,0.0241803,-0.0109392,-0.00704863,0.0318663,-0.0282706,0.0100926,-0.00972159,0.00524604,-0.0404464,-0.0283848,0.0184539,0.0179307,-0.009655,0.0212125,-0.0303253,-0.100222,0.0317141,0.0149058,0.0153053,-0.0528124,0.0176168,0.0286891,-0.00701058,-0.0401039,0.0485509,-0.0088417,-0.027814,-0.0304965,-0.0281374,0.0248652,-0.0252076,-0.00231863,-0.0230579,-0.120616,-0.0472191,0.0265774,-0.00955988,0.0278901,-0.0156858,-0.00473476,-0.0213076,0.0448601,-0.0132792,0.208206,-0.0588622,-0.0241803,0.0472191,0.0669287,0.0185871,-0.0150675,0.0368507,-0.0214978,-0.0618301,0.010045,-0.0113672,-0.0263681,-0.0069535,-0.011976,0.012718,-0.0302111,-0.00580727,0.0399137,-0.00928402,-0.00328175,0.0129843,-0.000234241,-0.032418,-0.0464962,-0.0050558,-0.00113256,0.000758013,-0.00679179,0.0373644,-0.0307628,0.0112626,-0.0118714,0.0264633,-0.043985,-0.0385629,0.00930305,-0.0203754,0.0155336,0.00256833,-0.0419683,-0.0247701,-0.00946476,-0.0288223,0.00963122,-0.00396664,0.0461538,0.00848023,-0.0168939,-0.0132316,0.0003255,-0.0499587,-0.00495592,0.00301065,0.0157048,-0.0313526,0.0103304,0.00317474,-0.00846596,0.0475616,0.0182732,0.0170936,-0.0174075,0.0196334,-0.00776205,0.0128416,0.0136026,-0.0456972,0.0035909,-0.00618301,-0.0437947,0.0100735,-0.0288033,0.00230555,0.0376688,-0.00274668,0.01658,0.00424249,-0.0462299,0.000015699,-0.0735873,-0.013165,0.023952,0.0422347,-0.00165395,-0.0320755,-0.0497684,0.0472952,-0.0266916,-0.00326035,-0.00978817,-0.000701533,0.0275857,-0.0331409,0.0281184,-0.0163041,-0.00258497,-0.0011171,-0.0402181,0.0310672,-0.030877,-0.00346011,-0.000820437,-0.00993086,0.015039,-0.0213076,-0.0121948,0.0106348,-0.00981671,-0.0462299,-0.0334643,0.0222398,0.0292218,0.0495021,-0.0167131,0.0205466,-0.00255406,0.0230769,0.0627052,0.0296594,-0.0154861,-0.0183397,-0.00603556,-0.0261589,0.00406889,-0.0405225,-0.0244467,-0.0404083,0.00344108,0.0732068,0.00341254,-0.0191388,-0.0105967,0.00661582,0.0233242,0.0152958,0.0183588,0.0225822,0.0127465,0.0179117,-0.017379,0.0217452,-0.0391146,0.0169129,0.0084945,0.0314287,0.0154861,-0.000162453,0.00501775,0.0364131,-0.0220115,0.0363941,0.0611832,-0.0329126,0.000608194,-0.0311433,0.0156287,0.000382871,-0.00247558,-0.0124611,-0.0156953,-0.0382205,0.0313526,0.0582534,0.0150485,-0.00320327,-0.000414083,-0.0457352,0.0227725,-0.0195193,0.00645886,-0.0774683,-0.0783815,-0.00121758,0.0307819,-0.0479421,-0.0797513,-0.0160663,-0.0235715,-0.0282135,0.0294882,-0.0365463,-0.0175312,0.0481323,0.0187012,-0.00758607,0.00613069,-0.0291648,-0.0407508,-0.000980958,-0.0454308,0.128759,-0.0210032,-0.0385059,0.00880365,-0.019329,-0.0197476,-0.00137929,0.0145634,-0.0317331,0.00296309,0.0115575,0.014278,-0.0541441,0.0270911,0.018901,0.00143279,0.00458494,-0.0152958,0.00892731,0.0211744,-0.0137738,0.0547909,-0.0304394,0.0392288,-0.0313526,0.0183302,-0.0152197,-0.00598325,-0.0709999,0.00511763,0.00643984,0.0323799,-0.0297355,-0.00910329,0.0189105,0.0443274,0.00162185,-0.0115194,0.00917939,0.000820437,-0.00960744,0.011957,-0.0386581,0.00586434,-0.018549,0.0260828,0.0132887,-0.00947902,-0.0576446,0.0470669,-0.00986427,-0.0319233,0.0295833,0.0381063,0.0456591,-0.023952,-0.00862767,0.0199378,-0.000147738,-0.00321041,0.0281184,0.00774778,0.0161139,0.0321707,0.0566934,-0.0242754,0.0249603,-0.0247701,-0.0392668,-0.0166275,0.0194146,-0.0598515,0.00355523,0.0028537,-0.0196715,-0.0106157,-0.0325511,-0.00066824,-0.0158,0.00115931,-0.00854206,-0.0337307,0.010806,0.0180258,0.00642557,-0.0336356,0.000230079,-0.0416259,0.016989,-0.0293931,-0.036299,-0.00341968,0.0487411,-0.0274145,-0.0375737,-0.0228866,0.0174646,-0.0225632,-0.0370029,0.0179022,0.00363608,-0.0496923,0.014668,0.0112245,-0.0192149,0.00821864,0.0167131,-0.0512904,0.0197476,-0.0147346,-0.0193671,-0.0226393,-0.0307058,0.0152102,-0.0144873,-0.00669191,-0.0523938,0.00512238,0.0140212,0.00210817,-0.0196524,0.0116145,0.0929163,0.0321897,0.0275286,0.0491597,-0.0254169,-0.000970256,0.0176453,-0.0225822,0.00812828,-0.000740772,0.0202993,-0.0258164,-0.0164563,-0.00815206,0.00258497,0.0079523,0.00482512,-0.0174551,-0.00199521,-0.0423869,-0.0406747,0.0185395,-0.00516994,0.0194717,-0.0311053,0.0191007,0.0533451,0.00928878,0.00248985,0.0629335,-0.0117762,0.0407127,-0.0290316,-0.00904621,-0.00487506,0.0226583,0.00467293,0.010064,-0.00703911,-0.0308389,-0.0382585,-0.0042092,-0.00962646,0.00740058,0.0317902,-0.0107109,0.0186156,-0.0132982,0.0357283,-0.00994037,-0.0134219,0.0263872,-0.0162851,0.0330077,0.000553498,0.0250935,0.0222017,-0.0079951,-0.0284038,-0.00726265,-0.0203564,-0.00379779,0.0193385,0.0281565,-0.00170746,0.0242754,-0.0327224,-0.00538873,0.00103446,0.0119855,-0.00830425,0.0267106,-0.00569788,-0.044822,-0.0168748,0.0312194,-0.0334263,0.0252267,0.0252837,-0.0253979,0.0466865,-0.141391,0.00985476,0.0018656,-0.0140497,-0.00519848,-0.0302492,0.03862,-0.0462679,0.0035148,0.0105301,0.0415118,-0.0213647,0.000545175,-0.0304014,0.0309721,-0.00263729,-0.0157429,-0.0107489,0.00907475,0.0145634,0.00854682,0.0186441,0.0189771,0.00372883,0.0552095,-0.0306867,0.00131983,0.0557041,-0.0115575,-0.0132126,-0.00750046,-0.0332931,0.0428435,0.0666623,0.0152863,-0.00555995,-0.00634947,-0.028556,-0.0192244,-0.0397615,-0.0157239,0.0157524,0.00125563,0.0137453,0.0166656,0.0428054,0.0662818,-0.0275286,0.000989876,-0.0122899,-0.0336736,-0.0125658,-0.0541441,0.00215811,0.0289365,-0.0338448,-0.0354048,-0.0078429,0.0107299,0.0272433,0.00400706,-0.00407127,-0.0347961,-0.0990803,-0.0041545,-0.00112602,-0.007724,0.0170461,-0.020851,-0.0271482,0.00210222,-0.0098833,0.0245037,-0.00767168,-0.0368888,0.0105682,0.009265,-0.0217642,-0.0266535,-0.0137358,-0.00703436,0.011605,-0.0281184,0.0760985,0.0308389,-0.0273574,0.0166751,0.0150485,-0.0636184,-0.0283277,-0.0140592,-0.0404083,0.0189771,0.0353858,0.0026801,0.0369458,-0.0089939,0.0232671,0.0213456,0.0415118,0.00236976,-0.0442893,0.0213456,0.0171127,0.0104635,0.049388,0.0312955,-0.0353668,-0.0129368,0.0144397,0.0240091,0.00897487,-0.00276095,0.0394571,-0.0357853,-0.0255501,0.00993086,-0.00191673,-0.00904146,-0.0183017,0.0059452,-0.0374595,-0.00142209,-0.0224871,0.00302967,-0.021593,0.0232671,0.00557422,-0.0168939,-0.0597374,0.0305726,-0.0418542,0.00157904,0.0103399,0.0467625,0.0259116,-0.00422109,0.0421966,-0.0457733,0.000363252,0.038582,0.00309864,0.0237237,-0.0242564,0.0352907,-0.0412835,-0.0151151,-0.0129653,0.00258735,-0.0439089,-0.0531929,-0.0259306,0.00078001,0.0492358,-0.0211934,-0.0144873,-0.0267677,0.0863718,-0.00768595,0.0254169,0.0663579,0.0315238,0.00158261,0.0205276,-0.0379351,-0.00511287,-0.0375927,-0.00891304,-0.0147441,-0.0182256,-0.00613544,-0.0319614,-0.0159426,0.0595852,-0.00655399,-0.0144302,-0.0113387,-0.00273003,0.0169224,-0.00719607,-0.00314382,0.0172268,-0.000817465,0.0220495,0.0161139,0.0179307,-0.00648264,0.0278711,-0.030896,0.0110058,0.00278235,0.00389054,-0.000117715,0.00754327,0.0218783,0.0455069,0.0175312,-0.0424249,0.00803791,0.074919,-0.0506436,0.00111591,-0.0149534,-0.0195193,-0.0381254,-0.0205847,0.0367936,0.0221637,-0.0117953,0.0196144,-0.0117477,0.0104445,0.0701248,-0.0266345,-0.0085373,-0.0459255,-0.0148582,0.0431098,-0.015467,0.0226393,-0.0144777,0.0665862,-0.0117858,0.0373263,-0.063314,-0.00221399,-0.00469908,-0.00770973,0.0298116,0.040142,-0.0365273,0.0428435,-0.0228676,0.0290126,0.000967284,0.00264442,-0.0294311,-0.0143826,-0.00442085,-0.0275857,-0.0038192,0.00893682,0.0286891,-0.0166561......

测试结果:

首先看一下客户场景的三行数据分别对应的向量(可以跟上面的 data.txt 做比对):

5549: [-0.0590525,-0.00187155,-0.0810449,-0.00470622,-0.0533451,0.0042306,-0.0341682,0.09421,-0.0391907,0.039419,0.0276047,0.0390005,0.0144873,0.0444415,0.0638847,-0.00629715,0.0286321,-0.00696777,-0.030896,-0.0606505,-0.00527458......]
617: [-0.0625166,-0.0181,-0.0735626,0.0277313,-0.036665,0.0115111,-0.0352504,0.0863527,-0.025464,0.0356379,-0.0219176,0.0210843,-0.0305994,0.0746478,0.0482924,0.0199797,0.0373821,-0.0118212,-0.0228478,-0.0479824,-0.0149896,-0.0266461......]
5454: [-0.0389573,0.00282359,-0.0618927,-0.000736588,-0.0808803,0.0216259,-0.016022,0.110613,-0.0338927,0.0410564,-0.00894498,0.0110248,0.0251692,0.0568858,0.0851939,0.0245144,0.0596974,-0.0306575,-0.0170811,-0.0529573,-0.012469......]

脚本执行结果:

1751019406

结论

faiss 暴力搜索(精确搜索)的结果为 5549,617,5454(0,1,2 是 data.txt 中向量的顺序,转换为客户场景的 collection_id 就是 5549,617,5454)。

这个结果与 OB 查询的结果是一致的,也足以在客户的场景下证明 OB 查询的结果相对于 chroma 是更精确的。

同时我们还可以再看 faiss 暴力搜索结果中打印的 distance,发现确实是 617 这一行的 l2 距离计算误差较大导致,这个也可以证明我们前面的分析是正确的。

6666

原来是学习贴,留一个痕迹!!