General Knowledge Prompt and Response Data for LLMs
About this Dataset
Defined.ai has added one of the most valuable data assets for natural language understanding and LLM training! This dataset contains unprompted, user-initiated prompts from one million unique users interacting with a generic digital assistant. The data is cleansed of PII (Personally Identifiable Information) and each prompt has intent and entity annotations. Queries cover hundreds of intents and subintents such as asking about the weather, searching for businesses, playing music, knowledge questions, and more.
This dataset is covered by our standard Data License Agreement. The license agreement is perpetual and allows for the commercialization of all models built on the data.