客户端embedding问题

我做了client/server的这种架构,server上安装了seekdb没有问题。运行客户端(官网示例程序),有一个疑问,我看到客户端(windows)上下载了all-MiniLM-L6-v2这个模型,我能不能认为受到这个模型的能力影响,我只能chunk到256大小?

另外,既然我的seekdb是装在server(Linux)上,为什么embedding不能由server端来承担(能力强一些)

# Alternative: Server mode (connecting to remote SeekDB server)
client = pyseekdb.Client(
    host="__my_server_ip__",
    port=2881,
    database="test",
    user="root",
    password=""
)

# ==================== Step 2: Create a Collection with Embedding Function ====================
# A collection is like a table that stores documents with vector embeddings
collection_name = "my_simple_collection"

# Create collection with default embedding function
# The embedding function will automatically convert documents to embeddings
collection = client.create_collection(
    name=collection_name,
)

print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
print(f"Embedding function: {collection.embedding_function}")

# ==================== Step 3: Add Data to Collection ====================
# With embedding function, you can add documents directly without providing embeddings
# The embedding function will automatically generate embeddings from documents

documents = [
    "Machine learning is a subset of artificial intelligence",
    "Python is a popular programming language",
    "Vector databases enable semantic search",
    "Neural networks are inspired by the human brain",
    "Natural language processing helps computers understand text"
]

ids = ["id1", "id2", "id3", "id4", "id5"]

# Add data with documents only - embeddings will be auto-generated by embedding function
collection.add(
    ids=ids,
    documents=documents,  # embeddings will be automatically generated
    metadatas=[
        {"category": "AI", "index": 0},
        {"category": "Programming", "index": 1},
        {"category": "Database", "index": 2},
        {"category": "AI", "index": 3},
        {"category": "NLP", "index": 4}
    ]
)

运行程序能够明显看到客户端(windows运行)用了all-MiniLM-L6-v2,而且也有384维,感觉有点拉低了性能。

1 个赞