Understanding our generative AI models

Who can use this feature

Available on Starter, Team and Enterprise plans

WRITER agents enable you to ideate with ease and generate content in seconds. To get the best results from WRITER agents, it helps to understand how our generative AI model is trained, as well as how it processes your inputs and generates output.

How our generative AI is trained

All of our agents, including our open-ended Ask WRITER agent, are powered by Palmyra, our family of open-source generative LLMs that have demonstrated strong performance in a wide range of natural language processing tasks, even achieving top scores on Stanford HELM and PubMedQA.

“Training” in an AI context is the process of feeding data to a model so that it can identify patterns and follow those patterns in the future. Large language models are trained on text tokens, which are units of text.

Where does the training data come from?

Palmyra LLMs are trained on a mix of publicly available data from the Internet and data licensed from third parties. We do not train our LLMs with customer data. We also make available a 1 billion token sample of the datasets we use on HuggingFace.

On Enterprise plans, you can also further train our models with your own best-in-class examples, so that output is high-quality, adheres to your rules, and stays consistent with your style and tone.

How our models generate content

Based on their training data, our models identify subtle patterns in how language is constructed. They use this pattern recognition to predict what might come next given a question or input.

There are a few things that you should keep in mind:

What our models know is limited by their training data, so by default, they won't know anything that's proprietary about your company. However, you can pair our LLMs with Knowledge Graph, which integrates with your business data sources, to deliver accurate outputs and insights. Knowledge Graph is currently in beta and you can learn more here.
Our models can't confirm the truthfulness of any of the content they produce. Large language models are trained to recognize linguistic patterns, and to use those patterns to produce new content. However, they aren't trained to know whether the content they generate is true. When WRITER suggests a quote or a statistic, our language models are suggesting a statement that would fit well in context. A human editor should verify any facts or statistics you see before publication. Our claim detection feature makes it easy to detect which statements need to be checked.
Additional custom training can have a significant impact on the output. For any given use case, provide us with your real-world examples and we can train our models to produce results that are high-quality and consistent with your best work.