【 使用环境 】测试环境
【 OB or 其他组件 】OceanBase
【 使用版本 】v4.3.3
【问题描述】在执行poetry run python embed_docs.py --doc_base doc_repos/oceanbase-doc/zh-CN/640.ob-vector-search
该命令发生了如下报错信息。
【复现路径】
root@oceanbase:~# cd ai-workshop-2024/
root@oceanbase:~/ai-workshop-2024# pwd
/root/ai-workshop-2024
root@oceanbase:~/ai-workshop-2024# poetry run python embed_docs.py --doc_base doc_repos/oceanbase-doc/zh-CN/640.ob-vector-search/
【附件及日志】
root@oceanbase:~/ai-workshop-2024# poetry run python embed_docs.py --doc_base doc_repos/oceanbase-doc/zh-CN/640.ob-vector-search/
args Namespace(doc_base=‘doc_repos/oceanbase-doc/zh-CN/640.ob-vector-search/’, table_name=‘corpus’, skip_patterns=[‘oracle’], batch_size=4, component=‘observer’, limit=300, echo=False)
Using RemoteOpenAI
0%| | 0/9 [00:00<?, ?it/s]
Traceback (most recent call last):
File “/root/ai-workshop-2024/embed_docs.py”, line 128, in
insert_batch(batch, comp=args.component)
File “/root/ai-workshop-2024/embed_docs.py”, line 112, in insert_batch
vs.add_documents(
File “/root/.cache/pypoetry/virtualenvs/ai-workshop-aLQYZfdO-py3.12/lib/python3.12/site-packages/langchain_core/vectorstores/base.py”, line 287, in add_documents
return self.add_texts(texts, metadatas, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/root/.cache/pypoetry/virtualenvs/ai-workshop-aLQYZfdO-py3.12/lib/python3.12/site-packages/langchain_community/vectorstores/oceanbase.py”, line 300, in add_texts
embeddings = self.embedding_function.embed_documents(texts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/root/ai-workshop-2024/rag/embeddings.py”, line 84, in embed_documents
res = requests.post(
^^^^^^^^^^^^^^
File “/root/.cache/pypoetry/virtualenvs/ai-workshop-aLQYZfdO-py3.12/lib/python3.12/site-packages/requests/api.py”, line 115, in post
return request(“post”, url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/root/.cache/pypoetry/virtualenvs/ai-workshop-aLQYZfdO-py3.12/lib/python3.12/site-packages/requests/api.py”, line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/root/.cache/pypoetry/virtualenvs/ai-workshop-aLQYZfdO-py3.12/lib/python3.12/site-packages/requests/sessions.py”, line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/root/.cache/pypoetry/virtualenvs/ai-workshop-aLQYZfdO-py3.12/lib/python3.12/site-packages/requests/sessions.py”, line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/root/.cache/pypoetry/virtualenvs/ai-workshop-aLQYZfdO-py3.12/lib/python3.12/site-packages/requests/adapters.py”, line 667, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File “/root/.cache/pypoetry/virtualenvs/ai-workshop-aLQYZfdO-py3.12/lib/python3.12/site-packages/urllib3/connectionpool.py”, line 789, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File “/root/.cache/pypoetry/virtualenvs/ai-workshop-aLQYZfdO-py3.12/lib/python3.12/site-packages/urllib3/connectionpool.py”, line 495, in _make_request
conn.request(
File “/root/.cache/pypoetry/virtualenvs/ai-workshop-aLQYZfdO-py3.12/lib/python3.12/site-packages/urllib3/connection.py”, line 440, in request
self.putheader(header, value)
File “/root/.cache/pypoetry/virtualenvs/ai-workshop-aLQYZfdO-py3.12/lib/python3.12/site-packages/urllib3/connection.py”, line 354, in putheader
super().putheader(header, *values)
File “/usr/lib/python3.12/http/client.py”, line 1309, in putheader
values[i] = one_value.encode(‘latin-1’)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: ‘latin-1’ codec can’t encode characters in position 9-15: ordinal not in range(256)
实验环境说明:
【操作系统】 Ubuntu 24.04.1 LTS
【系统内核】Linux 6.8.0-1016-aws
【系统架构】x86-64
【已解决】
① 问题发现:
在.env配置文件中"API_KEY"和"OPENAI_EMBEDDING_API_KEY"的值需要保持一致,否则会导致上述报错信息的发生。
② 解决办法:
更新.env配置文件的参数值,保存后继续执行上述命令。
root@oceanbase:~/ai-workshop-2024# poetry run python embed_docs.py --doc_base doc_repos/oceanbase-doc/zh-CN/640.ob-vector-search/
args Namespace(doc_base=‘doc_repos/oceanbase-doc/zh-CN/640.ob-vector-search/’, table_name=‘corpus’, skip_patterns=[‘oracle’], batch_size=4, component=‘observer’, limit=300, echo=False)
Using RemoteOpenAI
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:25<00:00, 2.87s/it]
特别感谢:感谢ai-workshop-2024 的GitHub代码仓库维护者:powerfool 对该问题的大力支持和解答!