通过 DB-GPT x OceanBase 构建 chat data 应用
实验介绍
这一部分的内容,只是个人对这个实验的理解,欢迎大佬们来纠错。
这一小节标题中的 chat data 应用,含义是可以使用自然语言告知大模型生成对应的查询 SQL,不仅可以直接在数据库中执行该 SQL 获取结果,还能够将得到的查询结果数据进行可视化展示。
为了简单阐述我对这个实验的理解,以及为方便大家理解 OceanBase 数据库在这次实验中的作用,随手画了下面这张图(灵魂画手,大家轻喷)。
图中展示的这个 OceanBase 租户里有三类 database,分别为:
- 用于存储用户数据的库(图中的 User Data 库);
- 用于存向量数据的库(图中的 Vector 库);
- 以及其他库(图中的 Others 库)。
chat data 应用的服务对象是数据库,那这个被服务数据库自然就是 OceanBase,对应图中的 User Data 库。
chat data 应用需要对用户输入的自然语言,将数据库对象的元数据拿出来,进行相似性检查。所以也需要一个服务于 chat data 应用的向量数据库,那这个提供服务的数据库,自然也落到了 OceanBase 头上,对应图中的 Vector 库。
也就是说,这次实验,不需要专门去另外搭建一个向量数据库,通过 DB-GPT,利用 OceanBase 的向量能力,对在 OceanBase 中存储的用户数据进行服务,自产自销、自给自足。
上面这张图画的实在太宽了,大家可以切成两半,分开来看。图中 OceanBase 库的左侧,是生产向量的过程。
如上图所示,DB-GPT 在搭建 chat data 应用的过程中:
- 首先需要创建一个 User Data 库的连接,在创建这个连接的时候,就会把连接中对应库(例如这个库的真名叫 dbgpt_test_db)中用户数据的元信息(表名、列名等)拿出来;
- 然后把这些元信息转成向量的形式;
- 最后存入 Vector 库中的一张叫做 dbgpt_test_db_profile 的表内。
DB-GPT 每创建一个新的 User Data 库的连接,就会在 OceanBase 的 Vector 库内创建一张叫做 database_name_profile 的表,表中有一个 document 列,用于存储元数据的文本信息;还有一个 embedding 列,用于存储将 document 列转换成的 1024 维向量。
obclient [dbgpt_vec_db]> desc dbgpt_test_db_profile;
+-----------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------+------+-----+---------+-------+
| id | varchar(4096) | NO | PRI | NULL | |
| embedding | VECTOR(1024) | YES | | NULL | |
| document | longtext | YES | | NULL | |
| metadata | json | YES | | NULL | |
+-----------+---------------+------+-----+---------+-------+
4 rows in set (0.003 sec)
obclient [dbgpt_vec_db]> select document from dbgpt_test_db_profile;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| document |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |
| plant_and_animal_table(id, name, name_embedding) |
| lineitem(l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment) |
| t1(c1, c2, c3, c4) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.003 sec)
obclient [dbgpt_vec_db]> select embedding from dbgpt_test_db_profile limit 1\G
*************************** 1. row ***************************
embedding: [-0.0513874,0.000697833,-0.102775,-0.0497347,-0.0157103,-0.0107329,0.0206203,0.0187754,0.0106561,0.0389345,0.0656468,0.0567683,-0.0221961,-0.00616399,0.0408178,-0.0155181,0.0164501,-0.0248097,-0.0175071,-0.0595356,-0.0593819,0.0459297,-0.0351679,0.0226189,0.0173149,0.0114632,-0.0527711,-0.0593819,-0.00333663,-0.0568452,0.00423985,-0.00812417,0.044546,-0.0516949,-0.0157487,-0.0108675,0.0104255,0.0232531,-0.10585,0.0384541,0.016969,-0.0193808,0.0441617,-0.0598815,0.000535986,-0.10093,0.0264816,-0.044546,0.0286148,0.0269236,-0.0222346,0.0150088,-0.00247665,-0.0367245,0.0186505,0.0361096,-0.0278653,-0.021389,-0.0364939,0.0295756,-0.0236374,-0.0082683,0.0217349,0.047813,0.0134618,0.0860173,0.039857,-0.0203897,-0.027577,-0.0603428,-0.0315935,0.015816,-0.0129622,-0.00887846,-0.0944729,-0.00964235,0.0505419,0.0249442,-0.0161523,-0.0521561,0.0411638,0.00171035,0.034649,-0.0128853,0.0461603,0.0481205,-0.0361672,0.0744484,-0.0117899,0.0119821,-0.0108579,-0.0530401,0.0298639,-0.0359366,-0.0123664,-0.00706241,-0.026251,0.0128469,0.0286532,0.0503881,0.00205026,0.0502344,0.0279229,-0.0545391,0.0137693,0.0219271,-0.00136083,0.00645706,0.0418171,-0.00944057,0.0251556,0.0504266,-0.0140383,-0.00412935,-0.00361288,-0.0144707,-0.0108867,-0.00345914,-0.0216388,0.0337074,0.00217878,0.00368494,-0.00970481,0.0149223,-0.00225445,-0.0133849,0.0128372,-0.00510223,-0.043662,0.0237528,0.050734,0.0173341,-0.0025343,-0.0108867,-0.0475824,-0.00122331,0.0273464,0.0113479,-0.0102141,-0.0526942,0.0372434,0.00913789,-0.0231954,0.00786954,-0.0148166,-0.00404527,0.0190925,0.0080473,-0.019131,-0.0298255,0.0324198,-0.0273656,0.0558459,-0.00638019,-0.0267315,-0.0659158,-0.0124913,0.0436236,0.00781669,0.0108771,0.0349181,0.0212737,-0.0210239,-0.00966637,-0.0323429,0.00598143,0.0335152,-0.0170363,-0.0249827,-0.0180644,0.0360711,-0.0171227,-0.02996,-0.027308,0.0713351,0.0340917,0.107387,0.0300561,0.00941654,-0.0621492,-0.0364747,0.00725458,-0.0430855,-0.0232339,-0.0204666,0.0565761,0.0278076,-0.010166,0.03438,0.039857,-0.0137309,-0.013702,-0.0268275,-0.0263279,0.0348604,-0.00645706,0.0118572,0.0200822,-0.016969,-0.0235414,-0.0817894,0.0167768,0.00241179,-0.0265969,0.00168393,0.0159216,-0.00533284,-0.00671169,0.0160754,-0.0370512,-0.0146149,-0.00293787,-0.0108579,-0.0258282,-0.0520408,-0.0126162,-0.040741,0.0262895,-0.00633695,-0.0158832,-0.00205747,0.0273849,-0.0244254,0.0549619,0.042509,-0.0193231,0.0481973,0.0483895,-0.00582288,-0.00549619,-0.0254823,-0.0374356,-0.0242716,-0.00371617,0.0390691,0.0192463,-0.0073795,-0.0421246,0.0170843,-0.0000144318,0.00335585,0.0261357,0.040126,0.0279614,-0.037032,0.00482598,-0.00425426,0.0680297,-0.01547,0.0122319,-0.00172597,-0.017632,-0.00600065,0.0170651,-0.0359943,0.00524636,0.00296189,0.00167912,0.0287109,-0.00185809,0.00368735,-0.00884002,-0.0191886,0.0230801,0.0209662,-0.00214274,-0.0209086,0.0178146,0.0245599,0.031113,-0.00976726,-0.0290568,0.0136252,0.013212,0.0342839,0.0377431,-0.0259435,-0.0352448,0.0745637,-0.00756206,0.0146341,0.0379544,0.0117419,-0.00557786,0.0726419,-0.0380121,-0.0244446,-0.00902259,0.00736989,-0.0656083,-0.0203705,0.0461987,0.0744484,0.00908985,-0.0152394,-0.0104735,0.0216965,-0.172803,0.00589975,0.0101564,0.0493888,0.0188139,0.00830193,-0.0923974,0.00597662,-0.033669,-0.0121358,-0.0266162,-0.0669536,-0.0518486,-0.0139615,-0.0355907,-0.0118091,-0.04985,0.0324583,0.0346106,-0.0487354,0.00689426,-0.0277692,0.0152683,0.00349277,0.00480436,0.00267122,0.0330348,0.00889767,-0.0240795,-0.0332462,-0.0124817,-0.0507725,0.00258234,0.0361672,-0.00707202,-0.0200246,0.0288838,-0.0270966,0.0306518,-0.0458528,-0.0196979,0.0279422,0.0238873,0.0024214,-0.00681258,-0.0400107,-0.00597182,-0.0119148,-0.0551925,-0.0248866,-0.036417,-0.0320739,-0.00907544,-0.00951263,-0.0810976,0.00270966,-0.044546,0.0468521,0.0023181,0.00146653,-0.0292874,-0.0163636,0.030229,-0.0457759,-0.00612556,-0.0111942,-0.00677415,0.00810976,0.0296909,-0.00978168,0.053386,-0.00300993,-0.0245791,-0.0221193,-0.0292489,0.00562591,-0.0204281,-0.0152298,-0.0446229,-0.105619,-0.0452379,-0.011665,-0.0335536,0.0109155,-0.0402029,-0.0641478,0.0286148,0.0477746,0.0128853,0.210008,0.0278653,-0.0140864,-0.0037522,0.0511953,0.0088208,-0.0475824,-0.0252902,-0.00325495,-0.0374932,-0.0327273,-0.00734106,-0.00682219,-0.0215812,-0.0219463,0.0356676,-0.0591897,-0.0223499,0.0649165,-0.00948381,-0.000689426,-0.0112518,0.0059478,-0.0220424,-0.0408563,-0.0400876,0.0241948,-0.0279422,-0.0548081,0.0789452,0.0230609,0.0561918,0.0145956,-0.022004,-0.0148839,-0.0291913,0.00960391,0.0133369,-0.0214467,0.0541547,-0.037032,-0.0299408,-0.0132696,-0.0145764,0.00146413,0.000667806,0.0303828,0.042509,-0.00133081,0.0323814,-0.000711646,-0.0114344,-0.011444,0.0119725,0.0340917,-0.0118187,0.00825869,0.00711045,-0.022446,0.0184776,0.0586132,-0.00718252,-0.0123184,0.0269621,-0.00775904,0.0448151,0.00818182,0.019448,-0.00646186,0.014365,-0.0253862,0.0134234,0.00257994,0.00335585,0.021389,0.0329003,0.0115977,-0.0146917,0.0511184,0.0620723,0.0135003,-0.000102018,0.0143939,0.0196402,0.0277308,-0.0580367,-0.03659,0.0325351,-0.014144,-0.0212545,-0.0438158,-0.0228495,0.0211584,0.0176608,-0.0143939,-0.0173053,0.0205627,0.0371665,-0.0151145,-0.0183334,-0.0298062,-0.0338996,0.0252325,-0.00315166,-0.000766295,0.0159409,0.0313629,-0.00393958,-0.00293546,-0.0130871,-0.0192078,0.0208894,-0.00347595,0.025982,-0.0126643,0.0166327,0.00214875,0.0310362,0.056384,0.0186025,0.00562591,-0.0320739,-0.0245215,-0.0251748,-0.00640421,-0.0254631,-0.0261165,0.00251028,-0.0271158,0.0264048,0.0120205,-0.0279422,0.017584,0.0370897,0.035264,-0.00645706,-0.0237335,0.00351199,-0.00321412,-0.0222346,-0.0151433,-0.0306518,-0.0273272,-0.00458336,-0.00679336,0.00131519,-0.0177953,0.058421,0.00516469,0.00997385,0.0189676,0.0349757,-0.0348028,-0.059036,0.0308248,-0.00433353,0.027135,-0.00417018,0.0163636,-0.0493119,-0.00536647,-0.0298447,0.0388,0.0794065,-0.034207,-0.0320547,-0.0087103,0.0219079,-0.0157967,-0.00572199,0.0585748,-0.0199477,-0.0579214,0.0599968,0.00114764,-0.00159144,-0.0234261,-0.00372818,-0.0411638,-0.0368014,0.0102045,-0.00333903,-0.0254823,-0.0238873,0.00625047,0.0277115,0.0212545,-0.0325351,0.0288838,-0.0222538,-0.0337074,0.0871703,-0.000805931,-0.0272311,0.00569797,0.00741793,-0.036417,0.0108771,0.010118,-0.00163949,-0.0208317,0.0410484,0.0226381,-0.0188043,-0.000978888,-0.0360711,-0.014586,0.0473133,0.0127027,0.00589495,-0.0154604,0.0707586,-0.033227,-0.00934928,0.0288069,0.0220808,0.0516949,0.0151722,-0.031728,-0.0417787,-0.0230033,0.00832115,-0.00904661,-0.0460065,-0.0345337,-0.0330156,0.019006,-0.0253862,0.0161907,-0.000805331,-0.00144611,0.0128276,-0.0310746,-0.0348028,0.0463909,-0.0274425,0.00368014,0.0431624,0.0144323,-0.0242332,0.0191598,-0.0477746,0.0485048,0.0574217,0.0277115,0.00368014,-0.0642247,-0.000802328,-0.00952705,-0.0117034,-0.000449208,0.0023109,0.0497347,-0.0370512,-0.00344713,0.0438542,0.012991,0.0448535,-0.0286148,0.0175743,-0.0100507,0.0373203,-0.0329771,-0.031728,-0.0234453,-0.0388,0.0536935,-0.0644937,-0.0223114,-0.019006,0.0139134,-0.00516949,-0.0532707,-0.0310938,0.0179203,-0.00197459,0.00822987,0.0135675,-0.00421102,0.00768698,-0.0560381,0.00568356,-0.00909465,-0.0135483,0.0195634,0.0108482,0.0242332,0.00714408,0.0063898,-0.0456222,-0.0424321,-0.0117515,-0.0692212,0.0261357,-0.0322276,0.0145764,-0.0038603,0.0246944,-0.00605349,0.0309401,0.00933968,0.0169978,-0.0431239,-0.033227,0.0285187,0.0294988,0.00252709,-0.0334768,0.00656276,0.0194673,0.0108579,-0.00819143,-0.015643,-0.00469866,0.0172573,0.00202023,0.0131639,0.03438,-0.00695671,0.0127892,0.010608,-0.004629,-0.00406689,0.0224652,-0.0214082,0.00350718,0.014096,-0.0260973,0.014365,-0.000807132,-0.00978168,0.033054,-0.0203897,-0.029518,-0.0385117,0.0157775,0.032612,-0.0404719,-0.00106116,0.025367,0.0114824,-0.00314686,0.00748039,-0.0237143,0.0244446,0.0196402,-0.0287109,-0.00856617,-0.00333423,-0.0123664,0.0265008,0.0180163,-0.0115016,-0.0435083,0.049581,-0.00377142,0.0109251,0.0204666,0.00391796,-0.0286724,0.00906102,0.056384,-0.020678,-0.00802809,0.0548081,0.000265291,-0.009479,0.0141633,-0.00453051,-0.0316703,0.000943456,0.0301137,-0.0181124,0.0770235,-0.0660696,0.0175647,0.0221577,0.00155901,0.0370897,-0.054616,-0.0111845,-0.0148743,-0.0014377,-0.0143458,0.0221577,0.0226574,-0.00138486,-0.0256361,0.0270005,0.0260204,0.0312283,0.00761971,0.0313052,-0.0230225,-0.134753,0.00486922,0.0105696,0.00130919,-0.0495426,0.0478899,-0.00567875,-0.0174206,0.0171708,0.0059478,-0.0225997,0.0197171,0.0360327,-0.00557306,0.0285571,0.00150857,0.0270966,-0.0533092,-0.0343223,-0.00382187,0.00495329,0.0234261,0.0517333,0.0146917,0.0348989,0.000468425,-0.0104735,0.0503881,-0.0371665,-0.0268468,0.0338611,-0.00337026,0.0362057,0.00248866,0.0357637,0.0107714,0.00746117,0.0426243,-0.0382235,-0.000316787,0.0128565,-0.00701917,0.00811456,-0.0484664,0.0763317,0.0313821,-0.0263663,-0.0255976,-0.0206203,0.0044128,0.0470443,0.0237335,-0.0312668,0.00723056,0.0445845,-0.00385069,-0.0330732,0.0231762,-0.00895052,0.035437,0.000472028,0.0294027,-0.0283265,-0.0136156,-0.0233876,0.0298639,-0.0383388,-0.0192751,0.00844126,0.0137693,0.0120974,-0.0278269,0.0215812,-0.0353793,0.0349181,-0.0063946,0.0335921,0.000581027,-0.0330732,-0.00557306,0.0146245,0.0268468,-0.00780228,0.0596125,0.00445604,0.0333999,0.018737,-0.015864,-0.0503113,-0.0109539,-0.0189388,-0.00527038,0.016479,-0.0234837,-0.0126739,0.00749961,-0.0234068,0.0120974,0.0198708,-0.00701917,-0.009258,-0.0554615,0.00828272,0.0271735,-0.00616399,-0.0125009,0.0404719,-0.00423024,-0.00977687,0.00232891,-0.037301,0.0124625,-0.0243677,0.0051791,-0.0393957,-0.0284418,-0.00994502,0.0313052,-0.0115881,0.0233108,0.0120782,0.0014377,-0.0263279,-0.0324967,0.0205819,0.0403182,0.0358213,0.0154508,-0.00830193,-0.00961833,0.031113,0.0099162,0.0102525,0.0235221,0.0531554,-0.00775424,-0.0210431,0.026693,-0.0225613,-0.0592281,-0.0108771,0.00173678,-0.0120686,-0.00459057,-0.0353024,0.0245599,-0.013212,0.00196018,-0.00167192,-0.0321123,0.0228688,0.0365131,0.00295228,-0.00869109,-0.0618033,-0.0138846,-0.0118379,0.0532323,0.0097,-0.0263279,-0.0530786,-0.016085,-0.0145188,-0.0121646,0.0069471,-0.00858539,-0.0270966,-0.016921,0.0459297,-0.00492927,0.00928202,-0.00408611,-0.0236567,0.0105215,0.048082,-0.00917152,0.0172669,0.00185448,0.0435467,-0.0144035,0.0212929,0.0148166,-0.0140479,-0.00855656,0.0242332,0.0120589,0.0181701,-0.00252469,0.0131543,-0.0379737,-0.0458912,-0.00235294,0.00921957,-0.026866,0.0458912,-0.00790317,-0.0348797,0.0288262,0.00254631,0.0404335,-0.0364363,-0.00645706,-0.0181124,-0.0247905,0.00669728,0.0403566,-0.0561534,0.0560765,-0.00116025,0.0485433,0.0183719,0.0133946,-0.0417018,-0.00701437,0.00632734,-0.00184968,-0.0142305,0.0284995,0.00176921,0.0204666,0.0286916,-0.0175167,0.00338948,-0.0262702,-0.00534725,-0.00160105,-0.0139038,-0.0124721,0.0156334,0.0166423,-0.0147686,-0.0295564,0.0192751,0.00454012,-0.036321,-0.0150761,0.0124337,-0.0126451,0.0052944,-0.0490813,0.0329195,0.0187274,-0.0213506,-0.0204858]
1 row in set (0.001 sec)
最上面那张图中 OceanBase 库的右侧,是和用户交互,然后通过大模型消费向量数据,产生答案的过程。
如上图所示,用户在和 chat data 应用交互的过程中:
步骤 1 - 3:首先会把用户的自然语言请求,通过模型转换为向量,并在 dbgpt_test_db_profile 表内查询相似度最高的向量。
步骤 4 - 5:大语言模型会基于 Vector 库返回的元数据信息,把自然语言转换为对应的 SQL,并在 User Data 库中执行 SQL 和收集结果数据。还可以根据用户需求将结果数据生成适合的图表。
chat data 应用在网上看到的效果是这样,可以直接对根据自然语言对数据库进行查询,并生成适合展示结果数据的图表: