Voice Cloning

Feature Introduction

Voice cloning capability, by modeling the timbre of specific characters, can convert text or voice input into highly realistic specific timbre replicas, applied to virtual assistants and intelligent broadcasting scenarios.

Interface Description

Request method: POST(HTTP)

Request address: https://service-mqk0mc83-1257411467.bj.apigw.tencentcs.com

Request header: Content-Type: application/json

Request process: The interface includes 'Create Task', 'Query Task', and 'Get Model List'. After creating a task, users can actively query the task to know the task result, or they can input a callback address (callback) when creating the task, and the task will automatically callback that address after completion.

Create Task

URL Path: /release/job

Parameter Description

Parameter Required Type Description
action Yes string Common parameter, here is CreateJob
secretId Yes string Common parameter, user's secretId
secretKey Yes string Common parameter, user's secretKey
createJobRequest Yes object
- inputs Yes Array of Input Input structure array
- outputs Yes Array of Output Output structure array
- callback No string Callback address, default: not enabled
- customId No string User-defined task ID, less than 64 characters
- timeout No int Task timeout, in seconds. After the timeout, the task will be set to ERROR

Input

Parameter Required Type Description
sourceData No string Text input (fill in one of the three fields of sourceData, url and source), length limit 512
url No string Voice file input (fill in one of the three fields of sourceData, url and source)
source No object Repository source setting (fill in one of the three fields of sourceData, url and source)
- contentId Yes string Repository ID
- path Yes string Source path

Output

Parameter Required Type Description
contentId No string Repository ID, default: empty, required for audio generation task (type=1)
destination No string Output directory, default: '/' (root directory)
inputSelectors Yes Array of int The input source for this output
smartContentDescriptor Yes SmartContentDescriptor Intelligent capability description, default: empty
- outputPrefix No string Output file prefix, less than 20 characters, default: empty
- voiceCloning Yes object Voice Cloning
-- type Yes VoiceCloningType enum Task type, see VoiceCloningType below for details
-- totalEpoch No int Training epochs, (can be filled in when type=2) less than or equal to 50, default: 30
-- modelName Yes string Model name, when type is 2 (training model), fill in the generated model name, limit: only allows input of numbers/uppercase and lowercase letters/underscores, less than 64 characters, must be unique; when type is 1 (generating audio), fill in the model name used for voice cloning, can fill in default model "default" or trained model name, model list can be stored by the user side or queried through the model list interface at the end of the article

VoiceCloningType

Value Meaning
1 Generating audio
2 Training model

Request example:

  • Generating audio
{
  "action": "CreateJob",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "createJobRequest": {
    "customId": "{customId}",
    "callback": "{callback}",
    "inputs": [
      {
        "url": "{url}"
      }
    ],
    "outputs": [
      {
        "contentId": "{contentId}",
        "destination": "/output",
        "inputSelectors": [0],
        "smartContentDescriptor": {
          "outputPrefix": "{outputPrefix}",
          "voiceCloning": {
            "type": 1,
            "modelName": "default"
          }
        }
      }
    ]
  }
}
  • Training model
{
  "action": "CreateJob",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "createJobRequest": {
    "customId": "{customId}",
    "callback": "{callback}",
    "inputs": [
      {
        "url": "{url}"
      }
    ],
    "outputs": [
      {
        "inputSelectors": [0],
        "smartContentDescriptor": {
          "voiceCloning": {
            "type": 2,
            "modelName": "demo"
          }
        }
      }
    ]
  }
}

Response example:

  • Generating audio
{
  "requestId": "ac004192-110b-46e3-ade8-4e449df84d60",
  "createJobResponse": {
    "job": {
      "id": "13f342e4-6866-450e-b44e-3151431c578b",
      "state": 1,
      "customId": "{customId}",
      "callback": "{callback}",
      "inputs": [
        {
          "url": "{url}"
        }
      ],
      "outputs": [
        {
          "contentId": "{contentId}",
          "destination": "{destination}",
          "inputSelectors": [0],
          "smartContentDescriptor": {
            "outputPrefix": "{outputPrefix}",
            "voiceCloning": {
              "type": 1,
              "modelName": "default"
            }
          }
        }
      ],
      "timing": {
        "createdAt": "1603432763000",
        "startedAt": "0",
        "completedAt": "0"
      }
    }
  }
}
  • Training model
{
  "requestId": "ac004192-110b-46e3-ade8-4e449df84d60",
  "createJobResponse": {
    "job": {
      "id": "13f342e4-6866-450e-b44e-3151431c578b",
      "state": 1,
      "customId": "{customId}",
      "callback": "{callback}",
      "inputs": [
        {
          "url": "{url}"
        }
      ],
      "outputs": [
        {
          "inputSelectors": [0],
          "smartContentDescriptor": {
            "voiceCloning": {
              "type": 2,
              "modelName": "demo"
            }
          }
        }
      ],
      "timing": {
        "createdAt": "1603432763000",
        "startedAt": "0",
        "completedAt": "0"
      }
    }
  }
}

State

Value Meaning
1 SUBMITTED
2 PROCESSING
3 COMPLETED
4 ERROR
5 CANCELED

Query Task

URL Path: /release/job

Acquisition method: divided into active query and passive callback.

  • There are two types of query interfaces for active query according to the id. One is to query based on the user-defined id. Since the platform cannot guarantee the uniqueness of this id, a Job array is returned (see 1); the other is to query based on the id in the return package after creating the task (see 2).
  • For passive callbacks, you need to fill in the callback field when creating a task. After the task enters the completed state (COMPLETED/ERROR), the platform will send the Job structure to the address specified by the callback (see 3). The platform recommends using passive callbacks to obtain task results.

In the voice cloning capability, if the queried task is successful (state=3), the task's Output will carry the smartContentResult structure, in which the voiceCloning structure (VoiceCloningResult) stores the output file name information. For the result file of the audio generation task, users can splice the cos path of the output file according to the cos and destination information in the Output.

VoiceCloningResult

Parameter Type Description
modelName string Model name, output when input type is 1 (training model)
voiceName string Generated audio file, output when input type is 2 (generating audio)

1. Active query, based on the customId entered by the user when creating a new task

Request example:

{
  "action": "ListJobs",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "listJobsRequest": {
    "customId": "{customId}"
  }
}

Response example:

  • Generating audio
{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "listJobsResponse": {
    "jobs": [
      {
        "id": "a95e9d74-6602-4405-a3fc-6408a76bcc98",
        "state": 3,
        "customId": "{customId}",
        "callback": "{callback}",
        "timing": {
          "createdAt": "1610513575000",
          "startedAt": "1610513575000",
          "completedAt": "1610513618000"
        },
        "inputs": [{ "url": "{url}" }],
        "outputs": [
          {
            "contentId": "{contentId}",
            "destination": "{destination}",
            "inputSelectors": [0],
            "smartContentDescriptor": {
              "outputPrefix": "{outputPrefix}",
              "voiceCloning": {
                "type": 1,
                "modelName": "default"
              }
            },
            "smartContentResult": {
              "voiceCloning": {
                "voiceName": "out.wav"
	            }
            }
          }
        ]
      }
    ],
    "total": 1
  }
}
  • Training model
{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "listJobsResponse": {
    "jobs": [
      {
        "id": "a95e9d74-6602-4405-a3fc-6408a76bcc98",
        "state": 3,
        "customId": "{customId}",
        "callback": "{callback}",
        "timing": {
          "createdAt": "1610513575000",
          "startedAt": "1610513575000",
          "completedAt": "1610513618000"
        },
        "inputs": [{ "url": "{url}" }],
        "outputs": [
          {
            "inputSelectors": [0],
            "smartContentDescriptor": {
              "outputPrefix": "{outputPrefix}",
              "voiceCloning": {
                "type": 2,
                "modelName": "demo"
              }
            },
            "smartContentResult": {
              "voiceCloning": {
                "modelName": "demo"
              }
            }
          }
        ]
      }
    ],
    "total": 1
  }
}

2. Active query, based on the id included in the return package when creating a new task

Request example:

{
  "action": "GetJob",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "getJobRequest": {
    "id": "{id}"
  }
}

Response example:

  • Generating audio
{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "getJobResponse": {
    "job": {
      "id": "a95e9d74-6602-4405-a3fc-6408a76bcc98",
      "state": 3,
      "customId": "{customId}",
      "callback": "{callback}",
      "timing": {
        "createdAt": "1610513575000",
        "startedAt": "1610513575000",
        "completedAt": "1610513618000"
      },
      "inputs": [{ "url": "{url}" }],
      "outputs": [
        {
          "contentId": "{contentId}",
          "destination": "{destination}",
          "inputSelectors": [0],
          "smartContentDescriptor": {
            "outputPrefix": "{outputPrefix}",
            "voiceCloning": {
              "type": 1,
              "modelName": "default"
            }
          },
          "smartContentResult": {
            "voiceCloning": {
              "voiceName": "out.wav"
           }
          }
        }
      ]
    }
  }
}
  • Training model
{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "getJobResponse": {
    "job": {
      "id": "a95e9d74-6602-4405-a3fc-6408a76bcc98",
      "state": 3,
      "customId": "{customId}",
      "callback": "{callback}",
      "timing": {
        "createdAt": "1610513575000",
        "startedAt": "1610513575000",
        "completedAt": "1610513618000"
      },
      "inputs": [{ "url": "{url}" }],
      "outputs": [
        {
          "inputSelectors": [0],
          "smartContentDescriptor": {
            "outputPrefix": "{outputPrefix}",
            "voiceCloning": {
              "type": 2,
              "modelName": "demo"
            }
          },
          "smartContentResult": {
            "voiceCloning": {
              "modelName": "demo"
            }
          }
        }
      ]
    }
  }
}

3. Passive callback

The entire Job structure of tasks entering the completed state (COMPLETED/ERROR) will be sent to the address corresponding to the callback field specified by the user when creating the task. See the Job structure in the active query example.

Get Model List

URL Path: /release/music_model

Parameter Description

Parameter Required Type Description
action Yes string Common parameter, here is ListModels
secretId Yes string Common parameter, user's secretId
secretKey Yes string Common parameter, user's secretKey
listModelsRequest Yes object
- offset Yes int Offset
- limit Yes int The maximum amount of data fetched at once, up to 100

Request example:

{
  "action": "ListModels",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "listModelsRequest": {
    "offset": 0,
    "limit": 10
  }
}

Response example:

{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "listModelsResponse": {
    "total": 1,
    "models": [
      {
        "name": "demo",
        "createdAt": "1610513575000"
      }
    ]
  }
}
Tencent Media Lab
/
We would like to use performance and analytics cookies (“Cookies”) to help us recognize whether you are a returning visitor and to track the number of website views and visits. For more information about the Cookies we use and your options (including how to change your preferences) see our Cookies Policy.