OpenHands Message Format and litellm Integration
Overview
OpenHands uses its own Message
class (openhands/core/message.py
) which provides rich content support while maintaining compatibility with litellm's message handling system.
Class Structure
Our Message
class (openhands/core/message.py
):
class Message(BaseModel):
role: Literal['user', 'system', 'assistant', 'tool']
content: list[TextContent | ImageContent] = Field(default_factory=list)
cache_enabled: bool = False
vision_enabled: bool = False
condensable: bool = True
function_calling_enabled: bool = False
tool_calls: list[ChatCompletionMessageToolCall] | None = None
tool_call_id: str | None = None
name: str | None = None
event_id: int = -1
litellm's Message
class (litellm/types/utils.py
):
class Message(OpenAIObject):
content: Optional[str]
role: Literal["assistant", "user", "system", "tool", "function"]
tool_calls: Optional[List[ChatCompletionMessageToolCall]]
function_call: Optional[FunctionCall]
audio: Optional[ChatCompletionAudioResponse] = None
How It Works
Message Creation: Our
Message
class is a Pydantic model that supports rich content (text and images) through itscontent
field.Serialization: The class uses Pydantic's
@model_serializer
to convert messages into dictionaries that litellm can understand. We have two serialization methods:def _string_serializer(self) -> dict: # convert content to a single string content = '\n'.join(item.text for item in self.content if isinstance(item, TextContent)) message_dict: dict = {'content': content, 'role': self.role} return self._add_tool_call_keys(message_dict) def _list_serializer(self) -> dict: content: list[dict] = [] for item in self.content: d = item.model_dump() if isinstance(item, TextContent): content.append(d) elif isinstance(item, ImageContent) and self.vision_enabled: content.extend(d) return {'content': content, 'role': self.role}
The appropriate serializer is chosen based on the message's capabilities:
@model_serializer def serialize_model(self) -> dict: if self.cache_enabled or self.vision_enabled or self.function_calling_enabled: return self._list_serializer() return self._string_serializer()
Tool Call Handling: Tool calls require special attention in serialization because:
- They need to work with litellm's API calls (which accept both dicts and objects)
- They need to be properly serialized for token counting
- They need to maintain compatibility with different LLM providers' formats
litellm Integration: When we pass our messages to
litellm.completion()
, litellm doesn't care about the message class type - it works with the dictionary representation. This works because:- litellm's transformation code (e.g.,
litellm/llms/anthropic/chat/transformation.py
) processes messages based on their structure, not their type - our serialization produces dictionaries that match litellm's expected format
- litellm handles rich content by looking at the message structure, supporting both simple string content and lists of content items
- litellm's transformation code (e.g.,
Provider-Specific Handling: litellm then transforms these messages into provider-specific formats (e.g., Anthropic, OpenAI) through its transformation layers, which know how to handle both simple and rich content structures.
Token Counting
To use litellm's token counter, we need to make sure that all message components (including tool calls) are properly serialized to dictionaries. This is because:
- litellm's token counter expects dictionary structures
- Tool calls need to be included in the token count
- Different providers may count tokens differently for structured content
Note
- We don't need to inherit from litellm's
Message
class because litellm works with dictionary representations, not class types - Our rich content model is more sophisticated than litellm's basic string content, but litellm handles it correctly through its transformation layers
- The compatibility is maintained through proper serialization rather than inheritance