GPT-4V(ision) system card

GPT-4 with vision (GPT-4V) enables to instruct GPT-4 to analyze inputs provided by the user, and is the latest capability we are making broadly available. Incorporating additional modalities (such as image inputs) into large (LLMs) is viewed by some as a key frontier in . Multimodal LLMs offer the possibility of expanding the of language-only systems with novel interfaces and capabilities, enabling them to solve new and provide novel for their users. In this card, we analyze the properties of GPT-4V. Our work on safety for GPT-4V builds on the work done for GPT-4 and here we dive deeper into the evaluations, preparation, and mitigation work done specifically for image inputs.

Source link