Agent guide: Image analyzer

Who can use this feature

Available on Starter, Team and Enterprise plans

Palmyra-Vision, our model for visual and learning understanding, answers questions and generates text based on images. With the image analyzer agent, you can answer questions and generate content based on your image files.

How to use the image analyzer agent

From the Dashboard, select All agents to open up the Agent Library. Search for "image analyzer" to highlight it.

Select Preview to understand what the image analyzer agent needs for inputs, and what kind of outputs it can produce. Select Open to open the agent.

Upload your image

The image analyzer agent accepts jpg, jpeg, and png file formats.

You can different types of visuals, including images, handwritten text, charts, graphs, infographics, flowcharts, and handwritten flowcharts.

Once your image has been uploaded and processed, you'll see a preview in the input field. Select the eye icon to preview the image you've uploaded, or the x button to delete the image.

Submit your question or request

Enter your question or request in the input field and select Generate text.

After a few moments, WRITER will produce an output. If you want to try again, you can select Generate text to create another draft using the same inputs.

The new drafts will be available in the left sidebar. You can generate as many versions as you'd like.

What kinds of questions or requests can I submit?

Describing the image

Prompt: Extract the text in this image

WRITER will be able to extract copy from advertisements, candid photographs, and more. You can use this prompt to transcript a photograph of handwritten content (i.e. a doctor's note or an old letter written in messy handwriting).

Prompt: Describe this image.

WRITER can help generate easy captions and alt text to make your content more accessible.

Analyzing the image

Prompt: Describe this chart.

WRITER can interpret charts, graphics, infographics and more, and analyze the contents of what it finds.

Prompt: In this chart, how many companies spent the same amount in 2019 and 2020?

WRITER can retrieve answers from the data in a chart. Note that WRITER cannot perform mathematical calculations on its own, but it can identify the relevant statistic or figure from an image when asked.

Prompt: What are some takeaways and learnings from this infographic?

WRITER can understand the content within an image, then use that to generate summaries, analysis, and more.

Prompt: Describe this flowchart.

Getting lost in a complex flowchart? Let WRITER connect the arrows for you. Whether you're looking at a complex diagram from a textbook or a hastily photographed whiteboard session from your last meeting, WRITER can transcribe the flowchart into an easy-to-read summary.

Asking questions about an image

Prompt:

Is there a person in this image?
Is there [object] in this image?
Is there [color] in this image?

WRITER can tell you when something is in an image, or when something is not in an image, and do so with considerable nuance. For example, if you upload a photograph of a bear and ask "Is there a real bear in this image?" WRITER will say yes. If you upload a photograph of a teddy bear and ask "Is there a real bear in this image?" WRITER will say no. If you upload a picture of children playing on a playground and ask, "Are there children in this image?" WRITER will say yes. If you ask "Are there babies in this image?" WRITER will say no. This is especially useful if your images must abide by specific compliance standards.

Generating content about an image

Prompt: Generate a LinkedIn post about this image.

Your graphic design team just produced a beautiful new infographic, but you've got writer's block when it comes time to share it on social media. WRITER can understand what the infographic is sharing, then generate a social media post for you – hashtags and emojis included.

Prompt: Generate a product description for this product

Save time generating product descriptions by using the text on the product itself - labeling, packaging, and more - as a foundation for its description.

Prompt: Generate a script for an ad using the facts from this graphic.

Get creative! Once WRITER understands what your image is, it can generate just about any kind of related content, from ad scripts to presentation notes to email subject lines and more.

Use cases

We've described the kinds of questions you can ask in the section above, but how does that translate into your day-to-day work?

Upload images for social media and draft captions and ALT text
Generate product descriptions based on product images
Review images against brand and compliance standards (retail, healthcare industry)
In-field troubleshooting from photos
Extracting handwritten notes from police reports for auto claims
Turn team white-boarding sessions into detailed notes
Draft key takeaways from graphs and charts for a report
Describe how something works based on infographics, instructional diagrams, flowcharts

Frequently asked questions

What technology are you using to support the image analyzer agent?

Palmyra-Vision, our model for visual and language understanding, answers questions and generates text based on images. It achieves top scores on Visual Question Answering v2, a key benchmark for visual understanding, outperforming GPT-4v and Gemini 1.0 Ultra. Palmyra, our family of LLMs, offers state-of-the-art multimodal capabilities.

Can the image analyzer agent handle images with text in foreign language?

Palmyra-Vision may be able to extract text in foreign languages and translate them into English, but we recommend verifying your outputs.

Can the image analyzer agent handle video?

No, our model focuses on still images.

Do we support PDFs?

Not within the image analyzer agent. However our other generative AI models already support extracting text from images in PDFs via OCR. You can try this by uploading sources to Ask WRITER.

Which tool works best for interpreting tables and graphs?

The Palmyra-X model which supports our Ask WRITER tool already handles tables and graphs by extracting text from them; so it has access to table titles, legends, axis titles, and additional information that's provided verbatim in text. However, it cannot understand the visual context in the table or graph.

The Palmyra-Vision model which supports the image analyzer agent can interpret the visual components as well, so it should do a better job at interpreting charts and graphs where the visual components provide more necessary context.

For example, in this unlabeled pie chart, if you wanted to know "Which English dialect has the largest speaking population?" Palmyra-X (Ask WRITER) would not be able to interpret the chart because there are no percentages or written answers. The only way to interpret the pie chart is by understanding the size of each slice of the pie chart in relation to the other, therefore Palmyra-Vision (the image analyzer agent) would be the better tool.

How high-quality do the images need to be?

From our testing, even images which are fuzzy, small, and/or hardly legible to the human eye can still be processed by the image analyzer agent.

Can the image analyzer agent be used to assess medical images?

No, this model is for general-purpose use and shouldn't be relied upon for medical assessment or interpretation.

Does the image analyzer agent store images that are uploaded?

No. When you upload an image, the image is only stored locally. We send the data to the model, but do not store or retain the image.

Can you upload more than one image at a time?

At this time, the image analyzer agent only supports one image at a time.

What is the size limit for a file uploaded to the image analyzer agent?

Currently, uploads to the image analyzer agent must be no larger than 20 MB.

Does the image analyzer agent take into account file names or metadata?

No.