Voice Cloning

Feature Introduction

Voice cloning capability, by modeling the timbre of specific characters, can convert text or voice input into highly realistic specific timbre replicas, applied to virtual assistants and intelligent broadcasting scenarios.

Interface Description

Request method: POST（HTTP）

Request address: https://service-mqk0mc83-1257411467.bj.apigw.tencentcs.com

Request header: Content-Type: application/json

Request process: The interface includes 'Create Task', 'Query Task', and 'Get Model List'. After creating a task, users can actively query the task to know the task result, or they can input a callback address (callback) when creating the task, and the task will automatically callback that address after completion.

Create Task

URL Path: /release/job

Parameter Description

Parameter	Required	Type	Description
action	Yes	string	Common parameter, here is CreateJob
secretId	Yes	string	Common parameter, user's secretId
secretKey	Yes	string	Common parameter, user's secretKey
createJobRequest	Yes	object
- inputs	Yes	Array of Input	Input structure array
- outputs	Yes	Array of Output	Output structure array
- callback	No	string	Callback address, default: not enabled
- customId	No	string	User-defined task ID, less than 64 characters
- timeout	No	int	Task timeout, in seconds. After the timeout, the task will be set to ERROR

Input

Parameter	Required	Type	Description
sourceData	No	string	Text input (fill in one of the three fields of sourceData, url and source), length limit 512
url	No	string	Voice file input (fill in one of the three fields of sourceData, url and source)
source	No	object	Repository source setting (fill in one of the three fields of sourceData, url and source)
- contentId	Yes	string	Repository ID
- path	Yes	string	Source path

Output

Parameter	Required	Type	Description
contentId	No	string	Repository ID, default: empty, required for audio generation task (type=1)
destination	No	string	Output directory, default: '/' (root directory）
inputSelectors	Yes	Array of int	The input source for this output
smartContentDescriptor	Yes	SmartContentDescriptor	Intelligent capability description, default: empty
- outputPrefix	No	string	Output file prefix, less than 20 characters, default: empty
- voiceCloning	Yes	object	Voice Cloning
-- type	Yes	VoiceCloningType enum	Task type, see VoiceCloningType below for details
-- totalEpoch	No	int	Training epochs, (can be filled in when type=2) less than or equal to 50, default: 30
-- modelName	Yes	string	Model name, when type is 2 (training model), fill in the generated model name, limit: only allows input of numbers/uppercase and lowercase letters/underscores, less than 64 characters, must be unique; when type is 1 (generating audio), fill in the model name used for voice cloning, can fill in default model "default" or trained model name, model list can be stored by the user side or queried through the model list interface at the end of the article

VoiceCloningType

Value	Meaning
1	Generating audio
2	Training model

Request example:

Generating audio

{
  "action": "CreateJob",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "createJobRequest": {
    "customId": "{customId}",
    "callback": "{callback}",
    "inputs": [
      {
        "url": "{url}"
      }
    ],
    "outputs": [
      {
        "contentId": "{contentId}",
        "destination": "/output",
        "inputSelectors": [0],
        "smartContentDescriptor": {
          "outputPrefix": "{outputPrefix}",
          "voiceCloning": {
            "type": 1,
            "modelName": "default"
          }
        }
      }
    ]
  }
}

Training model

{
  "action": "CreateJob",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "createJobRequest": {
    "customId": "{customId}",
    "callback": "{callback}",
    "inputs": [
      {
        "url": "{url}"
      }
    ],
    "outputs": [
      {
        "inputSelectors": [0],
        "smartContentDescriptor": {
          "voiceCloning": {
            "type": 2,
            "modelName": "demo"
          }
        }
      }
    ]
  }
}

Response example:

Generating audio

{
  "requestId": "ac004192-110b-46e3-ade8-4e449df84d60",
  "createJobResponse": {
    "job": {
      "id": "13f342e4-6866-450e-b44e-3151431c578b",
      "state": 1,
      "customId": "{customId}",
      "callback": "{callback}",
      "inputs": [
        {
          "url": "{url}"
        }
      ],
      "outputs": [
        {
          "contentId": "{contentId}",
          "destination": "{destination}",
          "inputSelectors": [0],
          "smartContentDescriptor": {
            "outputPrefix": "{outputPrefix}",
            "voiceCloning": {
              "type": 1,
              "modelName": "default"
            }
          }
        }
      ],
      "timing": {
        "createdAt": "1603432763000",
        "startedAt": "0",
        "completedAt": "0"
      }
    }
  }
}

Training model

{
  "requestId": "ac004192-110b-46e3-ade8-4e449df84d60",
  "createJobResponse": {
    "job": {
      "id": "13f342e4-6866-450e-b44e-3151431c578b",
      "state": 1,
      "customId": "{customId}",
      "callback": "{callback}",
      "inputs": [
        {
          "url": "{url}"
        }
      ],
      "outputs": [
        {
          "inputSelectors": [0],
          "smartContentDescriptor": {
            "voiceCloning": {
              "type": 2,
              "modelName": "demo"
            }
          }
        }
      ],
      "timing": {
        "createdAt": "1603432763000",
        "startedAt": "0",
        "completedAt": "0"
      }
    }
  }
}

State

Value	Meaning
1	SUBMITTED
2	PROCESSING
3	COMPLETED
4	ERROR
5	CANCELED

Query Task

URL Path: /release/job

Acquisition method: divided into active query and passive callback.

There are two types of query interfaces for active query according to the id. One is to query based on the user-defined id. Since the platform cannot guarantee the uniqueness of this id, a Job array is returned (see 1); the other is to query based on the id in the return package after creating the task (see 2).
For passive callbacks, you need to fill in the callback field when creating a task. After the task enters the completed state (COMPLETED/ERROR), the platform will send the Job structure to the address specified by the callback (see 3). The platform recommends using passive callbacks to obtain task results.

In the voice cloning capability, if the queried task is successful (state=3), the task's Output will carry the smartContentResult structure, in which the voiceCloning structure (VoiceCloningResult) stores the output file name information. For the result file of the audio generation task, users can splice the cos path of the output file according to the cos and destination information in the Output.

VoiceCloningResult

Parameter	Type	Description
modelName	string	Model name, output when input type is 1 (training model)
voiceName	string	Generated audio file, output when input type is 2 (generating audio)

1. Active query, based on the customId entered by the user when creating a new task

Request example:

{
  "action": "ListJobs",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "listJobsRequest": {
    "customId": "{customId}"
  }
}

Response example:

Generating audio

{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "listJobsResponse": {
    "jobs": [
      {
        "id": "a95e9d74-6602-4405-a3fc-6408a76bcc98",
        "state": 3,
        "customId": "{customId}",
        "callback": "{callback}",
        "timing": {
          "createdAt": "1610513575000",
          "startedAt": "1610513575000",
          "completedAt": "1610513618000"
        },
        "inputs": [{ "url": "{url}" }],
        "outputs": [
          {
            "contentId": "{contentId}",
            "destination": "{destination}",
            "inputSelectors": [0],
            "smartContentDescriptor": {
              "outputPrefix": "{outputPrefix}",
              "voiceCloning": {
                "type": 1,
                "modelName": "default"
              }
            },
            "smartContentResult": {
              "voiceCloning": {
                "voiceName": "out.wav"
	            }
            }
          }
        ]
      }
    ],
    "total": 1
  }
}

Training model

{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "listJobsResponse": {
    "jobs": [
      {
        "id": "a95e9d74-6602-4405-a3fc-6408a76bcc98",
        "state": 3,
        "customId": "{customId}",
        "callback": "{callback}",
        "timing": {
          "createdAt": "1610513575000",
          "startedAt": "1610513575000",
          "completedAt": "1610513618000"
        },
        "inputs": [{ "url": "{url}" }],
        "outputs": [
          {
            "inputSelectors": [0],
            "smartContentDescriptor": {
              "outputPrefix": "{outputPrefix}",
              "voiceCloning": {
                "type": 2,
                "modelName": "demo"
              }
            },
            "smartContentResult": {
              "voiceCloning": {
                "modelName": "demo"
              }
            }
          }
        ]
      }
    ],
    "total": 1
  }
}

2. Active query, based on the id included in the return package when creating a new task

Request example:

{
  "action": "GetJob",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "getJobRequest": {
    "id": "{id}"
  }
}

Response example:

Generating audio

{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "getJobResponse": {
    "job": {
      "id": "a95e9d74-6602-4405-a3fc-6408a76bcc98",
      "state": 3,
      "customId": "{customId}",
      "callback": "{callback}",
      "timing": {
        "createdAt": "1610513575000",
        "startedAt": "1610513575000",
        "completedAt": "1610513618000"
      },
      "inputs": [{ "url": "{url}" }],
      "outputs": [
        {
          "contentId": "{contentId}",
          "destination": "{destination}",
          "inputSelectors": [0],
          "smartContentDescriptor": {
            "outputPrefix": "{outputPrefix}",
            "voiceCloning": {
              "type": 1,
              "modelName": "default"
            }
          },
          "smartContentResult": {
            "voiceCloning": {
              "voiceName": "out.wav"
           }
          }
        }
      ]
    }
  }
}

Training model

{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "getJobResponse": {
    "job": {
      "id": "a95e9d74-6602-4405-a3fc-6408a76bcc98",
      "state": 3,
      "customId": "{customId}",
      "callback": "{callback}",
      "timing": {
        "createdAt": "1610513575000",
        "startedAt": "1610513575000",
        "completedAt": "1610513618000"
      },
      "inputs": [{ "url": "{url}" }],
      "outputs": [
        {
          "inputSelectors": [0],
          "smartContentDescriptor": {
            "outputPrefix": "{outputPrefix}",
            "voiceCloning": {
              "type": 2,
              "modelName": "demo"
            }
          },
          "smartContentResult": {
            "voiceCloning": {
              "modelName": "demo"
            }
          }
        }
      ]
    }
  }
}

3. Passive callback

The entire Job structure of tasks entering the completed state (COMPLETED/ERROR) will be sent to the address corresponding to the callback field specified by the user when creating the task. See the Job structure in the active query example.

Get Model List

URL Path: /release/music_model

Parameter Description

Parameter	Required	Type	Description
action	Yes	string	Common parameter, here is ListModels
secretId	Yes	string	Common parameter, user's secretId
secretKey	Yes	string	Common parameter, user's secretKey
listModelsRequest	Yes	object
- offset	Yes	int	Offset
- limit	Yes	int	The maximum amount of data fetched at once, up to 100

Request example:

{
  "action": "ListModels",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "listModelsRequest": {
    "offset": 0,
    "limit": 10
  }
}

Response example:

{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "listModelsResponse": {
    "total": 1,
    "models": [
      {
        "name": "demo",
        "createdAt": "1610513575000"
      }
    ]
  }
}