GPT-4V(ision) system card

GPT-4 with vision (GPT-4V) enables to instruct GPT-4 to analyze inputs provided by the user, and is the latest capability we are making broadly available. Incorporating additional modalities (such as image inputs) into (LLMs) is viewed by some as a key frontier in artificial intelligence research and . Multimodal LLMs offer the possibility of expanding the of language-only with novel interfaces and capabilities, enabling them to solve new and provide novel for their users. In this system card, we analyze the properties of GPT-4V. Our on safety for GPT-4V builds on the work done for GPT-4 and here we dive deeper into the evaluations, preparation, and mitigation work done specifically for image inputs.

Source link