Scam Alert: We’ve detected unauthorized use of the Defined.ai name.Read the notice

Become a partnerGet in touch
Get in touch
  • Browse Marketplace
  • Data Annotation

    Model-in-the-loop, expert-verified labeling for text, audio, image and video

    Machine Translation

    High-quality multilingual content for global AI systems

    Data Collection

    Global, diverse datasets for AI training at scale

    Conversational AI

    Natural, bias-free voice and chat experiences worldwide

    Data & Model Evaluation

    Rigorous testing to ensure accuracy, fairness and quality

    Accelerat.ai

    Smarter multilingual AI agent support for global businesses


    Industries

Find the right datasets for you

Suggested filters

Healthcareimage

Dataset title

Domain

Type

Locale

Amount

Tamil Spontaneous Dialogue, insurance

83 hours of Tamil simulated call center conversations between an agent and a client, recorded over telephony in the insurance domain.

Insurance
Conversational

ta-IN

83 hours

Tamil Spontaneous Dialogue, retail

40 hours of Tamil simulated call center conversations between an agent and a client, recorded over telephony in the retail domain.

Retail
Conversational

ta-IN

40 hours

Tamil Spontaneous Dialogue, telco

28 hours of Tamil simulated call center conversations between an agent and a client, recorded over telephony in the telco domain.

Telco
Conversational

ta-IN

28 hours

Tamil Spontaneous Dialogue, banking

38 hours of Tamil simulated call center conversations between an agent and a client, recorded over telephony in the banking domain.

Banking
Conversational

ta-IN

38 hours

Tamil Podcasts

522 hours of Tamil live, non-simulated podcasts, recorded by real podcasters in our partner network.

Various
Podcast

ta-IN

522 hours

Tamil Call Center Speech Dataset — 10,420 Hours of Live Telephony Audio for ASR Training

Tamil call center dataset with 10,420 hours of live agent–client conversations recorded over telephony.

Various

ta-IN

hours

Tamil Podcast Speech Dataset — 3,315 Hours of Conversational Audio for ASR and TTS Training

Tamil Podcast Speech Dataset including 3,315 hours of live, non-simulated podcasts recorded by real podcasters.

Various

ta-IN

hours

Showing 7 of 7 datasets

Datasets per page

Tamil Spontaneous Dialogue, insurance

Domain:

Insurance
Conversational

Amount:

83 hours

Locale:

ta-IN

Tamil Spontaneous Dialogue, retail

Domain:

Retail
Conversational

Amount:

40 hours

Locale:

ta-IN

Tamil Spontaneous Dialogue, telco

Amount:

28 hours

Locale:

ta-IN

Tamil Spontaneous Dialogue, banking

Domain:

Banking
Conversational

Amount:

38 hours

Locale:

ta-IN

Tamil Podcasts

Domain:

Various
Podcast

Amount:

522 hours

Locale:

ta-IN

Tamil Call Center Speech Dataset — 10,420 Hours of Live Telephony Audio for ASR Training

Amount:

hours

Locale:

ta-IN

Tamil Podcast Speech Dataset — 3,315 Hours of Conversational Audio for ASR and TTS Training

Amount:

hours

Locale:

ta-IN

Showing 7 of 7 datasets

1/1

New datasets

Medical Claims Data for AI Model Training

Healthcare

Longitudinal Data in Oncology for AI Model Development

Healthcare

Wearable Health Data for AI Model Training

Healthcare

Hot datasets

Live Spanish Call Center Audio Dataset

Call Center

DICOM Medical Imaging Dataset with Clinical Reports

Healthcare

Multimodal Dataset for Household Robotics

Robotics
3D and Lidar

Couldn’t find the right dataset for you?

Get in touch

© 2026 DefinedCrowd. All rights reserved.

Award logo
Award logo
Award logo
Award logo
Award logo
Award logo

Datasets

Marketplace

Dataset Types

Privacy and Cookie PolicyTerms & Conditions (T&M)Data License AgreementSupplier ProgramCCPA Privacy StatementWhistleblowing ChannelCandidate Privacy Statement

© 2026 DefinedCrowd. All rights reserved.

Award logo
Award logo
Award logo
Award logo
Award logo
Award logo