Speech Cloning: Text-To-Speech Using VITS

Journal Title: Engineering and Technology Journal - Year 2024, Vol 9, Issue 05

Abstract

Voice is one of the most common and natural communication methods for humans. Voice is becoming the primary interface for AI voice assistants like Amazon Alexa, as well as in autos and smart home devices. Homes and so on. As human-machine communication becomes more common, researchers are exploring technology that mimics genuine speech. Speech cloning is the practice of copying or mimicking another person's speech, usually utilizing modern technology and artificial intelligence (AI). This entails producing a synthetic or cloned version of someone's voice that sounds very similar to the actual speaker. The objective is to produce speech that is indistinguishable from the genuine person, both in tone and intonation. Instant Voice Cloning (IVC) in text-to-speech (TTS) synthesis refers to the TTS model's capacity to copy the voice of any reference speaker based on a short audio sample, without requiring extra speaker-specific training. This method is usually referred to as zero-shot TTS. IVC provides users with the flexibility to tailor the generated voice, offering significant value across diverse real-world applications. Examples include media content creation, personalized chatbots, and multi-modal interactions between humans and computers or extensive language models.

Authors and Affiliations

Utkarsh Verma, Dr. Padmanaban R,

Keywords

Related Articles

Optimizing Affordable Drone Surveillance with Advanced Image Processing Techniques

The widespread adoption of drones can be attributed to their low cost and convenience which led to a growth in their use for surveillance reasons leading to their extensive use in other areas too. In spite of this, the p...

Distribution and Delivery Model on E-Commerce Service for MSMEs

This research aims to develop an e-commerce service application model for Micro, Small, and Medium Enterprises (MSMEs) that facilitates the sale of MSME products online. The application model allows MSMEs to expand marke...

A Phytochemical and Antioxidant Activity of Extracts Red Cambodia Sap (Plumeria Rubra L.)

Red Cambodian (Plumeria rubra L.) is a medicinal plant cultivated by the community with antimicrobial, anti-cancer, and antioxidant bioactivity. This research aims to obtain information regarding secondary metabolite con...

Analysis of Komolino-Lino River Flow Using HEC-RAS 6.4.1

Sedimentation in the Komolino-Lino River channel with a watershed area of ​​36.90 km2 has reduced the river's capacity. This causes flooding that inundates plantations, irrigated rice fields, residential areas, and socia...

Investigating the Impact of Ground-Return Parameters on Transitional Voltages at Switching-Off Unloaded Power Transmission Lines

The electrical parameters calculations of the overhead transmission lines are very important to the areas concerned to electromagnetic compatibility and transitional processes in power systems. Therefore, accurate calcul...

Download PDF file
  • EP ID EP735380
  • DOI 10.47191/etj/v9i05.10
  • Views 60
  • Downloads 0

How To Cite

Utkarsh Verma, Dr. Padmanaban R, (2024). Speech Cloning: Text-To-Speech Using VITS. Engineering and Technology Journal, 9(05), -. https://europub.co.uk./articles/-A-735380