zzz / openhands /core /message_format.md
ar08's picture
Upload 1040 files
246d201 verified

OpenHands Message Format and litellm Integration

Overview

OpenHands uses its own Message class (openhands/core/message.py) which provides rich content support while maintaining compatibility with litellm's message handling system.

Class Structure

Our Message class (openhands/core/message.py):

class Message(BaseModel):
    role: Literal['user', 'system', 'assistant', 'tool']
    content: list[TextContent | ImageContent] = Field(default_factory=list)
    cache_enabled: bool = False
    vision_enabled: bool = False
    condensable: bool = True
    function_calling_enabled: bool = False
    tool_calls: list[ChatCompletionMessageToolCall] | None = None
    tool_call_id: str | None = None
    name: str | None = None
    event_id: int = -1

litellm's Message class (litellm/types/utils.py):

class Message(OpenAIObject):
    content: Optional[str]
    role: Literal["assistant", "user", "system", "tool", "function"]
    tool_calls: Optional[List[ChatCompletionMessageToolCall]]
    function_call: Optional[FunctionCall]
    audio: Optional[ChatCompletionAudioResponse] = None

How It Works

  1. Message Creation: Our Message class is a Pydantic model that supports rich content (text and images) through its content field.

  2. Serialization: The class uses Pydantic's @model_serializer to convert messages into dictionaries that litellm can understand. We have two serialization methods:

    def _string_serializer(self) -> dict:
        # convert content to a single string
        content = '\n'.join(item.text for item in self.content if isinstance(item, TextContent))
        message_dict: dict = {'content': content, 'role': self.role}
        return self._add_tool_call_keys(message_dict)
    
    def _list_serializer(self) -> dict:
        content: list[dict] = []
        for item in self.content:
            d = item.model_dump()
            if isinstance(item, TextContent):
                content.append(d)
            elif isinstance(item, ImageContent) and self.vision_enabled:
                content.extend(d)
        return {'content': content, 'role': self.role}
    

    The appropriate serializer is chosen based on the message's capabilities:

    @model_serializer
    def serialize_model(self) -> dict:
        if self.cache_enabled or self.vision_enabled or self.function_calling_enabled:
            return self._list_serializer()
        return self._string_serializer()
    
  3. Tool Call Handling: Tool calls require special attention in serialization because:

    • They need to work with litellm's API calls (which accept both dicts and objects)
    • They need to be properly serialized for token counting
    • They need to maintain compatibility with different LLM providers' formats
  4. litellm Integration: When we pass our messages to litellm.completion(), litellm doesn't care about the message class type - it works with the dictionary representation. This works because:

    • litellm's transformation code (e.g., litellm/llms/anthropic/chat/transformation.py) processes messages based on their structure, not their type
    • our serialization produces dictionaries that match litellm's expected format
    • litellm handles rich content by looking at the message structure, supporting both simple string content and lists of content items
  5. Provider-Specific Handling: litellm then transforms these messages into provider-specific formats (e.g., Anthropic, OpenAI) through its transformation layers, which know how to handle both simple and rich content structures.

Token Counting

To use litellm's token counter, we need to make sure that all message components (including tool calls) are properly serialized to dictionaries. This is because:

  • litellm's token counter expects dictionary structures
  • Tool calls need to be included in the token count
  • Different providers may count tokens differently for structured content

Note

  • We don't need to inherit from litellm's Message class because litellm works with dictionary representations, not class types
  • Our rich content model is more sophisticated than litellm's basic string content, but litellm handles it correctly through its transformation layers
  • The compatibility is maintained through proper serialization rather than inheritance