Generative AI Annotation Services
Reinforcement Learning from Human Feedback
With the advent of Large Language Models in high-profile products, as well as other generative AI products, new needs arise for the builders of these products. Besides training data for specific use cases, it’s essential to properly and continuously assess the output of these models to improve the quality, both in objective and in subjective ways. Typically known as Reinforcement Learning from Human Feedback (RLHF), this essential part of generative AI development is deeply entrenched in Defined.ai's DNA, as we specialized in human data annotation since our inception in 2015.
After all, you want to provide a product that gives both the correct answer, and does it in such a way that's agreeable to the end user. Either way, with the availability of the right people in our crowd, both when it comes to area of specialization and language knowledge, Defined.ai is perfectly placed to provide any kind of annotation at scale and with unparalleled quality, whether you want to eliminate toxic or offensive content, LLM hallucinations and wrongly learned answers (truthfulness), or to assess how appropriate the answer is for your use case - for instance customer service.
Evaluating Correctness — Hallucinations and Truthfulness
Defined.ai's specialized contributors can annotate any question-answer pair for the correctness of the answer:
Evaluating Appropriateness — Toxicity
We can assess toxicity on a binary or Likert scale, but also make use of our highly adaptable platform to assess toxicity in a way that makes the most sense for your workflow and pipeline.
Evaluating Appropriateness — Answer Comparison
Defined.ai's large crowd will compare the answer to the same question from different iterations of your model.
Evaluating Appropriateness — Likert Scale
If you really want to make your product stand out, you will also have to provide the answer the right way. If the tone of the voice or the verboseness of the answer doesn't suit the use case, your users are much more likely to drop off.