|
|
@@ -0,0 +1,89 @@
|
|
|
+# OpenHands Message Format and litellm Integration
|
|
|
+
|
|
|
+## Overview
|
|
|
+
|
|
|
+OpenHands uses its own `Message` class (`openhands/core/message.py`) which provides rich content support while maintaining compatibility with litellm's message handling system.
|
|
|
+
|
|
|
+## Class Structure
|
|
|
+
|
|
|
+Our `Message` class (`openhands/core/message.py`):
|
|
|
+```python
|
|
|
+class Message(BaseModel):
|
|
|
+ role: Literal['user', 'system', 'assistant', 'tool']
|
|
|
+ content: list[TextContent | ImageContent] = Field(default_factory=list)
|
|
|
+ cache_enabled: bool = False
|
|
|
+ vision_enabled: bool = False
|
|
|
+ condensable: bool = True
|
|
|
+ function_calling_enabled: bool = False
|
|
|
+ tool_calls: list[ChatCompletionMessageToolCall] | None = None
|
|
|
+ tool_call_id: str | None = None
|
|
|
+ name: str | None = None
|
|
|
+ event_id: int = -1
|
|
|
+```
|
|
|
+
|
|
|
+litellm's `Message` class (`litellm/types/utils.py`):
|
|
|
+```python
|
|
|
+class Message(OpenAIObject):
|
|
|
+ content: Optional[str]
|
|
|
+ role: Literal["assistant", "user", "system", "tool", "function"]
|
|
|
+ tool_calls: Optional[List[ChatCompletionMessageToolCall]]
|
|
|
+ function_call: Optional[FunctionCall]
|
|
|
+ audio: Optional[ChatCompletionAudioResponse] = None
|
|
|
+```
|
|
|
+
|
|
|
+## How It Works
|
|
|
+
|
|
|
+1. **Message Creation**: Our `Message` class is a Pydantic model that supports rich content (text and images) through its `content` field.
|
|
|
+
|
|
|
+2. **Serialization**: The class uses Pydantic's `@model_serializer` to convert messages into dictionaries that litellm can understand. We have two serialization methods:
|
|
|
+ ```python
|
|
|
+ def _string_serializer(self) -> dict:
|
|
|
+ # convert content to a single string
|
|
|
+ content = '\n'.join(item.text for item in self.content if isinstance(item, TextContent))
|
|
|
+ message_dict: dict = {'content': content, 'role': self.role}
|
|
|
+ return self._add_tool_call_keys(message_dict)
|
|
|
+
|
|
|
+ def _list_serializer(self) -> dict:
|
|
|
+ content: list[dict] = []
|
|
|
+ for item in self.content:
|
|
|
+ d = item.model_dump()
|
|
|
+ if isinstance(item, TextContent):
|
|
|
+ content.append(d)
|
|
|
+ elif isinstance(item, ImageContent) and self.vision_enabled:
|
|
|
+ content.extend(d)
|
|
|
+ return {'content': content, 'role': self.role}
|
|
|
+ ```
|
|
|
+
|
|
|
+ The appropriate serializer is chosen based on the message's capabilities:
|
|
|
+ ```python
|
|
|
+ @model_serializer
|
|
|
+ def serialize_model(self) -> dict:
|
|
|
+ if self.cache_enabled or self.vision_enabled or self.function_calling_enabled:
|
|
|
+ return self._list_serializer()
|
|
|
+ return self._string_serializer()
|
|
|
+ ```
|
|
|
+
|
|
|
+3. **Tool Call Handling**: Tool calls require special attention in serialization because:
|
|
|
+ - They need to work with litellm's API calls (which accept both dicts and objects)
|
|
|
+ - They need to be properly serialized for token counting
|
|
|
+ - They need to maintain compatibility with different LLM providers' formats
|
|
|
+
|
|
|
+4. **litellm Integration**: When we pass our messages to `litellm.completion()`, litellm doesn't care about the message class type - it works with the dictionary representation. This works because:
|
|
|
+ - litellm's transformation code (e.g., `litellm/llms/anthropic/chat/transformation.py`) processes messages based on their structure, not their type
|
|
|
+ - our serialization produces dictionaries that match litellm's expected format
|
|
|
+ - litellm handles rich content by looking at the message structure, supporting both simple string content and lists of content items
|
|
|
+
|
|
|
+5. **Provider-Specific Handling**: litellm then transforms these messages into provider-specific formats (e.g., Anthropic, OpenAI) through its transformation layers, which know how to handle both simple and rich content structures.
|
|
|
+
|
|
|
+### Token Counting
|
|
|
+
|
|
|
+To use litellm's token counter, we need to make sure that all message components (including tool calls) are properly serialized to dictionaries. This is because:
|
|
|
+- litellm's token counter expects dictionary structures
|
|
|
+- Tool calls need to be included in the token count
|
|
|
+- Different providers may count tokens differently for structured content
|
|
|
+
|
|
|
+## Note
|
|
|
+
|
|
|
+- We don't need to inherit from litellm's `Message` class because litellm works with dictionary representations, not class types
|
|
|
+- Our rich content model is more sophisticated than litellm's basic string content, but litellm handles it correctly through its transformation layers
|
|
|
+- The compatibility is maintained through proper serialization rather than inheritance
|