Spaces:

DanofficeIT
/

privatellm

Runtime error

privatellm / examples /server /tests /features /embeddings.feature

lhhj

first

57e3690 3 months ago

3.83 kB

	@llama.cpp
	@embeddings
	Feature: llama.cpp server

	Background: Server startup
	Given a server listening on localhost:8080
	And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf
	And a model file bert-bge-small.gguf
	And a model alias bert-bge-small
	And 42 as server seed
	And 2 slots
	# the bert-bge-small model has context size of 512
	# since the generated prompts are as big as the batch size, we need to set the batch size to <= 512
	# ref: https://huggingface.co/BAAI/bge-small-en-v1.5/blob/5c38ec7c405ec4b44b94cc5a9bb96e735b38267a/config.json#L20
	And 128 as batch size
	And 128 as ubatch size
	And 512 KV cache size
	And enable embeddings endpoint
	Then the server is starting
	Then the server is healthy

	Scenario: Embedding
	When embeddings are computed for:
	"""
	What is the capital of Bulgaria ?
	"""
	Then embeddings are generated

	Scenario: Embedding (error: prompt too long)
	When embeddings are computed for:
	"""
	Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
	Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
	Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
	Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
	Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
	Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
	Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
	Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
	"""
	And embeddings request with 500 api error

	Scenario: OAI Embeddings compatibility
	Given a model bert-bge-small
	When an OAI compatible embeddings computation request for:
	"""
	What is the capital of Spain ?
	"""
	Then embeddings are generated

	Scenario: OAI Embeddings compatibility with multiple inputs
	Given a model bert-bge-small
	Given a prompt:
	"""
	In which country Paris is located ?
	"""
	And a prompt:
	"""
	Is Madrid the capital of Spain ?
	"""
	When an OAI compatible embeddings computation request for multiple inputs
	Then embeddings are generated

	Scenario: Multi users embeddings
	Given a prompt:
	"""
	Write a very long story about AI.
	"""
	And a prompt:
	"""
	Write another very long music lyrics.
	"""
	And a prompt:
	"""
	Write a very long poem.
	"""
	And a prompt:
	"""
	Write a very long joke.
	"""
	Given concurrent embedding requests
	Then the server is busy
	Then the server is idle
	Then all embeddings are generated

	Scenario: Multi users OAI compatibility embeddings
	Given a prompt:
	"""
	In which country Paris is located ?
	"""
	And a prompt:
	"""
	Is Madrid the capital of Spain ?
	"""
	And a prompt:
	"""
	What is the biggest US city ?
	"""
	And a prompt:
	"""
	What is the capital of Bulgaria ?
	"""
	And a model bert-bge-small
	Given concurrent OAI embedding requests
	Then the server is busy
	Then the server is idle
	Then all embeddings are generated

	Scenario: All embeddings should be the same
	Given 10 fixed prompts
	And a model bert-bge-small
	Given concurrent OAI embedding requests
	Then all embeddings are the same