Music Tag

Feature Introduction

Music tag capability automatically generates classification tags and probability values for each dimension through multi-dimensional analysis of music. Currently supported seven dimensions include:

Vocal
Mood
Language
Genre
Scene
Background Music
Intensity

API Description

Request Method:

POST (HTTP)

Request URL:

https://api.mediax.tencent.com/job

Request Header:

Content-Type: application/json

Request Flow:

The API includes 'Create Task' and 'Query Task'. After creating a task, users can actively query the task to know the result, or input a callback address when creating the task, and the task will automatically callback to that address after completion

Other Requirements:

File format: Common audio formats are recommended, such as mp3, wav

Create Task

Parameter Description

Parameter	Required	Type	Description
action	Yes	string	Common parameter, here is CreateJob
secretId	Yes	string	Common parameter, user SecretId
secretKey	Yes	string	Common parameter, user SecretKey
createJobRequest	Yes	object
- inputs	Yes	Array of Input	Input, input structure array
- outputs	Yes	Array of Output	Output, output structure array
- callback	No	string	Callback address, default: callback disabled
- customId	No	string	User-defined task ID, less than 64 characters
- timeout	No	int	Task timeout, in seconds. Task will be set to ERROR after timeout

Input

Parameter	Required	Type	Description
url	No	string	Source URL address, choose one with source field
source	No	object	Repository source settings, choose one with url field
- contentId	Yes	string	Repository ID
- path	Yes	string	Source path

Note: Only one of the source and url fields can be present

Output

Parameter	Required	Type	Description
inputSelectors	Yes	Array of int	Input source for this output
smartContentDescriptor	Yes	object	Smart capability description, default: empty
- musicTagV2	Yes	object
-- tagType	Yes	Array of string	1. Input range: ["vocal", "language", "mood", "genre", "scene", "bgm", "intensity", "all"]2. Input description - all means calling all types of tags - Can combine and call tags of each sub-dimension based on dimension name - all option and sub-options cannot appear simultaneously

- vocal, 2 types of vocal tags
- language, 21 types of language tags
- mood, 11 types of mood tags
- genre, 37 types of genre tags
- scene, 36 types of scene tags
- bgm, 2 types of background music tags
- intensity, 2 types of intensity tags
- all, get all 7 dimensions of tags

Request Example:

{
  "action": "CreateJob",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "createJobRequest": {
    "customId": "{customId}",
    "callback": "{callback}",
    "inputs": [
      {
        "url": "{url}"
      }
    ],
    "outputs": [
      {
        "inputSelectors": [0],
        "smartContentDescriptor": {
          "musicTagV2": {
            "tagType": ["mood", "vocal", "language", "scene", "genre", "bgm"]
          }
        }
      }
    ]
  }
}

Response Example:

{
  "requestId": "ac004192-110b-46e3-ade8-4e449df84d60",
  "createJobResponse": {
    "job": {
      "id": "13f342e4-6866-450e-b44e-3151431c578b",
      "state": 1, // See state description below
      "customId": "{customId}",
      "callback": "{callback}",
      "inputs": [
        {
          "url": "{url}"
        }
      ],
      "outputs": [
        {
          "inputSelectors": [0],
          "smartContentDescriptor": {
            "musicTagV2": {
              "tagType": ["mood", "vocal", "language", "scene", "genre", "bgm"]
            }
          }
        }
      ],
      "timing": {
        "createdAt": "1603432763000",
        "startedAt": "0",
        "completedAt": "0"
      }
    }
  }
}

State

Value	Meaning
1	SUBMITTED
2	PROCESSING
3	COMPLETED
4	ERROR
5	CANCELED

Get Task Information

There are two ways to get information: active query and passive callback.

Active query has two types of query interfaces based on id category: one is to query based on user-defined id, since the platform cannot guarantee the uniqueness of this id, it returns a Job array (see 1); the other is to query by the id in the response after creating the task (see 2)
Passive callback requires filling in the callback field when creating the task, the platform will send the Job structure to the address specified by callback after the task enters the completion state (COMPLETED/ERROR) (see 3), the platform recommends using passive callback to get task results.

In the music tag capability, if the queried task is successful (state=3), the task's Output will carry a smartContentResult structure, in which the audioTagger field is a result object array, each result object has tag type (tagType), tag name (tag), probability (probility)

Active Query, Based on User-Defined customId Passed When Creating Task

Request Example:

{
  "action": "ListJobs",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "listJobsRequest": {
    "customId": "{customId}"
  }
}

Note: jobs in the listJobs response is an array

Response Example:

{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "listJobsResponse": {
    "jobs": [
      {
        "id": "a95e9d74-6602-4405-a3fc-6408a76bcc98",
        "state": 3,
        "customId": "{customId}",
        "callback": "{callback}",
        "timing": {
          "createdAt": "1610513575000",
          "startedAt": "1610513575000",
          "completedAt": "1610513618000"
        },
        "inputs": [
          {
            "url": "{url}"
          }
        ],
        "outputs": [
          {
            "inputSelectors": [0],
            "smartContentDescriptor": {
              "musicTagV2": {
                "tagType": ["mood", "vocal", "language", "scene", "genre", "bgm"]
              }
            },
            "smartContentResult": {
              "musigTagV2": {
                "musicTagV2": {
                  "vocal": {
                    "label": "Vocal",
                    "prob": "1.0000"
                  },
                  "language": {
                    "label": "Other",
                    "prob": "0.0817"
                  },
                  "mood": {
                    "label": "Exciting",
                    "prob": "0.2953"
                  },
                  "genre": {
                    "label": "Zhuang",
                    "prob": "0.7710"
                  },
                  "scene": {
                    "label": "Prayer",
                    "prob": "0.8886"
                  },
                  "intensity": {}
                }
              }
            }
          }
        ]
      }
    ],
    "total": 1
  }
}

Active Query, Based on id in Response After Creating Task

Request Example:

{
  "action": "GetJob",
  "secretId": "{secretId}",
  "secretKey": "{secretKey}",
  "getJobRequest": {
    "id": "{id}"
  }
}

Response Example:

{
  "requestId": "c9845a99-34e3-4b0f-80f5-f0a2a0ee8896",
  "getJobResponse": {
    "job": {
      "id": "a95e9d74-6602-4405-a3fc-6408a76bcc98",
      "state": 3,
      "customId": "{customId}",
      "callback": "{callback}",
      "timing": {
        "createdAt": "1610513575000",
        "startedAt": "1610513575000",
        "completedAt": "1610513618000"
      },
      "inputs": [
        {
          "url": "{url}"
        }
      ],
      "outputs": [
        {
          "inputSelectors": [0],
          "smartContentDescriptor": {
            "musicTagV2": {
              "tagType": ["mood", "vocal", "language", "scene", "genre", "bgm"]
            }
          },
          "smartContentResult": {
            "musigTagV2": {
              "musicTagV2": {
                "vocal": {
                  "label": "Vocal",
                  "prob": "1.0000"
                },
                "language": {
                  "label": "Other",
                  "prob": "0.0817"
                },
                "mood": {
                  "label": "Exciting",
                  "prob": "0.2953"
                },
                "genre": {
                  "label": "Zhuang",
                  "prob": "0.7710"
                },
                "scene": {
                  "label": "Prayer",
                  "prob": "0.8886"
                },
                "intensity": {}
              }
            }
          }
        }
      ]
    }
  }
}

Passive Callback

The entire Job structure of the task that enters the completion state (COMPLETED/ERROR) will be sent to the address corresponding to the callback field specified by the user when creating the task. See the active query example (under getJobResponse) for the Job structure

cURL Example

Create Task

curl --location --request POST 'https://api.mediax.tencent.com/test/job' \
--header 'Content-Type: application/json' \
--data-raw '{
    "action": "CreateJob",
    "secretId": "{secretId}",
    "secretKey": "{secretKey}",
    "createJobRequest": {
        "customId": "{customId}",
        "callback": "{callback}",
        "inputs": [{
                "url": "{url}"
            }],
        "outputs": [
            {
                "inputSelectors": [0],
                "smartContentDescriptor": {
                    "audioTagger": {
                        "tags": [1, 2, 3]
                    }
                }
            }]
    }
}'

Query Task

# Based on customId
curl --location --request POST 'https://api.mediax.tencent.com/test/job' \
--header 'Content-Type: application/json' \
--data-raw '{
    "action": "ListJobs",
    "secretId": "{secretId}",
    "secretKey": "{secretKey}",
    "listJobsRequest": {
        "customId": "{customId}"
    }
}'

# Based on id
curl --location --request POST 'https://api.mediax.tencent.com/test/job' \
--header 'Content-Type: application/json' \
--data-raw '{
    "action": "GetJob",
    "secretId": "{secretId}",
    "secretKey": "{secretKey}",
    "getJobRequest": {
        "id": "{id}"
    }
}'

Appendix: Currently Supported Tags

Music Tag Model List
Tag Model	Types	Tag Field
vocal	2	Vocal
		Instrumental
language	21	Russian
		Indonesian
		Mandarin
		German
		Italian
		Japanese
		French
		Thai
		Javanese
		Swedish
		Cantonese
		Instrumental
		Burmese
		English
		Portuguese
		Spanish
		Vietnamese
		Arabic
		Korean
		Malay
		Other
mood	11	Sad
		Exciting
		Quiet
		Catharsis
		Romantic
		Happy
		Nostalgic
		Horror
		Magnificent
		Empty
		Funny
genre	37	8bit Pixel Style
		ACG
		Bossa
		Chinese Christian Songs
		Chinese Zen
		Ancient Style
		Country
		Buddhism
		Children
		Caribbean Style
		Indian
		Classical
		Costume Original Soundtrack
		Alternative
		Epic
		Hip Hop
		Zhuang
		Brazilian
		Blues
		Yi
		Latin
		Rock
		Punk
		Folk
		Thai
		Jazz
		Electronic
		Hardcore
		Dance
		R&B
		Mongolian
		Tibetan
		Easy Listening
		Metal
		Northern Shaanxi Folk Songs
		Reggae
		K-Pop
scene	25	Eastern Zen
		Chinese Style
		ACG
		Fitness
		Pixel Sound Effects
		Nursery Rhymes
		Music Box
		Coffee Shop
		Festive
		Nightclub
		Grand Chorus
		Pets
		Love Song
		Travel
		Campus
		Opera
		Game
		Yoga
		Before Sleep
		Prayer
		Pure Nature
		Aerial Photography
		Figure Skating
		Grassland
		Running
		Piano Pure Tone
bgm	2	With Background Music
		No Background Music
intensity	2	Exciting
		Soothing