--- license: mit --- # Table of Contents * [VisionAtomicFlow](#VisionAtomicFlow) * [VisionAtomicFlow](#VisionAtomicFlow.VisionAtomicFlow) * [get\_image](#VisionAtomicFlow.VisionAtomicFlow.get_image) * [get\_video](#VisionAtomicFlow.VisionAtomicFlow.get_video) * [get\_user\_message](#VisionAtomicFlow.VisionAtomicFlow.get_user_message) # VisionAtomicFlow ## VisionAtomicFlow Objects ```python class VisionAtomicFlow(ChatAtomicFlow) ``` This class implements the atomic flow for the VisionFlowModule. It is a flow that, given a textual input, and a set of images and/or videos, generates a textual output. It uses the litellm library as a backend. See https://docs.litellm.ai/docs/providers for supported models and APIs. *Configuration Parameters*: - `name` (str): The name of the flow. Default: "VisionAtomicFlow" - `description` (str): A description of the flow. This description is used to generate the help message of the flow. Default: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output." - enable_cache (bool): If True, the flow will use the cache. Default: True - `n_api_retries` (int): The number of times to retry the API call in case of failure. Default: 6 - `wait_time_between_api_retries` (int): The time to wait between API retries in seconds. Default: 20 - `system_name` (str): The name of the system. Default: "system" - `user_name` (str): The name of the user. Default: "user" - `assistant_name` (str): The name of the assistant. Default: "assistant" - `backend` (Dict[str, Any]): The configuration of the backend which is used to fetch api keys. Default: LiteLLMBackend with the default parameters of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule). Except for the following parameters whose default value is overwritten: - `api_infos` (List[Dict[str, Any]]): The list of api infos. Default: No default value, this parameter is required. - `model_name` (Union[Dict[str,str],str]): The name of the model to use. When using multiple API providers, the model_name can be a dictionary of the form {"provider_name": "model_name"}. Default: "gpt-4-vision-preview" (the name needs to follow the name of the model in litellm https://docs.litellm.ai/docs/providers). - `n` (int) : The number of answers to generate. Default: 1 - `max_tokens` (int): The maximum number of tokens to generate. Default: 2000 - `temperature` (float): The temperature to use. Default: 0.3 - `top_p` (float): An alternative to sampling with temperature. It instructs the model to consider the results of the tokens with top_p probability. Default: 0.2 - `frequency_penalty` (float): The higher this value, the more likely the model will repeat itself. Default: 0.0 - `presence_penalty` (float): The higher this value, the less likely the model will talk about a new topic. Default: 0.0 - `system_message_prompt_template` (Dict[str,Any]): The template of the system message. It is used to generate the system message. By default its of type aiflows.prompt_template.JinjaPrompt. None of the parameters of the prompt are defined by default and therefore need to be defined if one wants to use the system prompt. Default parameters are defined in aiflows.prompt_template.jinja2_prompts.JinjaPrompt. - `init_human_message_prompt_template` (Dict[str,Any]): The prompt template of the human/user message used to initialize the conversation (first time in). It is used to generate the human message. It's passed as the user message to the LLM. By default its of type aiflows.prompt_template.JinjaPrompt. None of the parameters of the prompt are defined by default and therefore need to be defined if one wants to use the init_human_message_prompt_template. Default parameters are defined in aiflows.prompt_template.jinja2_prompts.JinjaPrompt. - `previous_messages` (Dict[str,Any]): Defines which previous messages to include in the input of the LLM. Note that if `first_k`and `last_k` are both none, all the messages of the flows's history are added to the input of the LLM. Default: - `first_k` (int): If defined, adds the first_k earliest messages of the flow's chat history to the input of the LLM. Default: None - `last_k` (int): If defined, adds the last_k latest messages of the flow's chat history to the input of the LLM. Default: None - Other parameters are inherited from the default configuration of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule). *Input Interface Initialized (Expected input the first time in flow)*: - `query` (str): The textual query to run the model on. - `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys: - `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys: - `type` (str): The type of the image. It can be "local_path" or "url". - `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image. - `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys: - `video_path` (str): The path to the video. - `resize` (int): The resize we want to apply on the frames of the video. - `frame_step_size` (int): The step size between the frames of the video (to send to the model). - `start_frame` (int): The start frame of the video (to send to the model). - `end_frame` (int): The last frame of the video (to send to the model). *Input Interface (Expected input the after the first time in flow)*: - `query` (str): The textual query to run the model on. - `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys: - `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys: - `type` (str): The type of the image. It can be "local_path" or "url". - `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image. - `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys: - `video_path` (str): The path to the video. - `resize` (int): The resize we want to apply on the frames of the video. - `frame_step_size` (int): The step size between the frames of the video (to send to the model). - `start_frame` (int): The start frame of the video (to send to the model). - `end_frame` (int): The last frame of the video (to send to the model). *Output Interface*: - `api_output`s (str): The api output of the flow to the query and data #### get\_image ```python @staticmethod def get_image(image) ``` This method returns an image in the appropriate format for API. **Arguments**: - `image` (`Dict[str, Any]`): The image dictionary. **Returns**: `Dict[str, Any]`: The image url. #### get\_video ```python @staticmethod def get_video(video) ``` This method returns the video in the appropriate format for API. **Arguments**: - `video` (`Dict[str, Any]`): The video dictionary. **Returns**: `Dict[str, Any]`: The video url. #### get\_user\_message ```python @staticmethod def get_user_message(prompt_template, input_data: Dict[str, Any]) ``` This method constructs the user message to be passed to the API. **Arguments**: - `prompt_template` (`PromptTemplate`): The prompt template to use. - `input_data` (`Dict[str, Any]`): The input data. **Returns**: `Dict[str, Any]`: The constructed user message (images , videos and text).