搭建智能RAG问题

https://open.oceanbase.com/blog/15622679120

按照上面的步骤搭建,执行到这一步报错:

报错内容如下:

Traceback (most recent call last):
  File "/data/yyai/ai-workshop-2024/utils/extract.py", line 54, in <module>
    cur = client.perform_raw_text_sql(f"SELECT COUNT(*) FROM {args.table_name}")
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/pyobvector/client/milvus_like_client.py", line 684, in perform_raw_text_sql
    return super().perform_raw_text_sql(text_sql)
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/pyobvector/client/ob_vec_client.py", line 736, in perform_raw_text_sql
    return conn.execute(text(text_sql))
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1418, in execute
    return meth(
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement
    ret = self._execute_context(
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
    return self._exec_single_context(
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context
    self._handle_dbapi_exception(
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2355, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
    cursor.execute(statement, parameters)
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/pymysql/cursors.py", line 153, in execute
    result = self._query(query)
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/pymysql/cursors.py", line 322, in _query
    conn.query(q)
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/pymysql/connections.py", line 563, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/pymysql/connections.py", line 825, in _read_query_result
    result.read()
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/pymysql/connections.py", line 1199, in read
    first_packet = self.connection._read_packet()
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/pymysql/connections.py", line 775, in _read_packet
    packet.raise_for_error()
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/pymysql/protocol.py", line 219, in raise_for_error
    err.raise_mysql_exception(self._data)
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/pymysql/err.py", line 150, in raise_mysql_exception
    raise errorclass(errno, errval)
sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) (1146, "Table 'ailive.corpus' doesn't exist")
[SQL: SELECT COUNT(*) FROM corpus]
(Background on this error at: https://sqlalche.me/e/20/f405)

对应的数据库:

mysql> show tables;
+------------------+
| Tables_in_ailive |
+------------------+
| t1               |
+------------------+
1 row in set (0.01 sec)

错误码:
1146, “Table ‘ailive.corpus’ doesn’t exist”

按步骤来的,不知错在哪一步

您好,这篇博客上的内容更新不及时,我们在代码仓库中有些更新,最新的文档麻烦查看 ai-workshop-2024: OceanBase 2024 产品发布会 AI 动手实战营项目

1 个赞

https://open.oceanbase.com/blog/15622679120

@ubuntu:~/ai-workshop-2024$ poetry run python embed_docs.py --doc_base docs/mongodb.md --component mongodb
args Namespace(doc_base='docs/mongodb.md', table_name='corpus', skip_patterns=['oracle'], batch_size=4, component='mongodb', limit=300, echo=False)
Using BGEEmbedding
Fetching 30 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 21087.50it/s]
0it [00:00, ?it/s]

上面是报错的上一步。

看到这篇文章的最新更新也是3天前的,应该也是最新的,为什么按上面步骤一步步下来,中间某个步骤就莫名报错呢?

您不用指定 component,也不需要精确到某一篇文档,直接通过 doc_base 指定某个文件目录即可,像下面这样

poetry run python embed_docs.py --doc_base docs

poetry run python embed_docs.py --doc_base doc_repos/black-myth-wukong-portraits/docs

報錯:


args Namespace(doc_base='doc_repos/black-myth-wukong-portraits/docs', table_name='corpus', skip_patterns=['oracle'], batch_size=4, component='observer', limit=300, echo=False)
Using RemoteOpenAI
  1%|█▋                                                                                                                                                                      | 2/203 [00:00<00:14, 13.95it/s]
Traceback (most recent call last):
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/requests/models.py", line 974, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/yyai/ai-workshop-2024/embed_docs.py", line 128, in <module>
    insert_batch(batch, comp=args.component)
  File "/data/yyai/ai-workshop-2024/embed_docs.py", line 112, in insert_batch
    vs.add_documents(
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/langchain_core/vectorstores/base.py", line 287, in add_documents
    return self.add_texts(texts, metadatas, **kwargs)
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/langchain_community/vectorstores/oceanbase.py", line 300, in add_texts
    embeddings = self.embedding_function.embed_documents(texts)
  File "/data/yyai/ai-workshop-2024/rag/embeddings.py", line 88, in embed_documents
    data = res.json()
  File "/data/yyai/.cache/pypoetry/virtualenvs/ai-workshop-MjiVjgAq-py3.10/lib/python3.10/site-packages/requests/models.py", line 978, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

编码错误,好难搞,防不胜防!

您的 .env 里两处 API_KEY 都填写了吗?

image

写了,和这里保持一致

但是上面的错误 明显是编码的错误,就是unicode和str之间的编码转换有问题。

OPENAI_EMBEDDING_API_KEY= 后面的空格去掉试一下呢?用 requests 发的请求应该是没有正常返回,所以获取 res.json() 结果报错了

image

这里提示进度11%了,是不是可以说明api_key这些配置应该没有问题。

另外按您所说去掉空格,还是一样的报错

遇到’Expecting value: line1 column1 (char0)'错误,通常是由于接收到的数据不是有效的JSON格式。问题源于Python3的字符串编码使用Unicode,需要将接收到的字节串转换为字符串。解决方法是在程序开始添加#–coding:utf-8–来设定编码,或者在解析前使用.info.decode(‘utf-8’)将字节串解码为字符串。

好的,您通过这个方式解决了吗?感谢您提供这个解决方案!

难道跟这个有关系:
image

没有解决,但是我不确定模型调用服务 key 开通的是否正确?
image

我是这样开通的

但是阿里云百炼主界面 显示的又没有开通,如下:
image

如果百炼是这个状态的话应该没办法正常调用文本嵌入模型。您的这个账号是不是很早之前就开通过百炼,可能没有免费额度了,您可以在阿里云控制台充值 1 块钱再开通“模型调用服务”

今天中午才开通的

模型调用服务无法开通吗?

是的

比较奇怪,按理来说新用户是有不少免费额度的,您看看要不要往阿里云控制台充值 1 块钱试试呢?

充值了1元,开通了,结果还是一样的报错,看来跟模型服务调用没有关系。

现在的报错信息和之前是一样的吗?重新创建一个 api key 试试呢?