今更ながらAlexaのAudio Playerをきちんと理解してみる①

周回遅れも甚だしいテーマですいません。

Voiceflowでは、Stream Blockを使うと非常にかんたんに音楽スキル、つまりAudio Playerスキルが作れます。Advent Calendarでもご紹介してますね。

１つの音楽ファイルの再生ぐらいなら初心者の方でもかんたんにスキルを作って公開されてたりしますが、複数の曲を使ったり、他のブロックと組み合わせたり、と少し凝ったことをやろうとした途端にうまく行かないという質問が来ることが多いです。

Audio Playerは一見簡単に見えるし、ちょっと前にあったスキル内課金と組み合わせた環境音スキルでテスラと家を買った話の影響もあって始める人が多いんですが、SSMLでAudioタグ使ってサウンドを鳴らす場合とは根本的に異なり結構複雑ということは最初のうちはなかなか理解しにくいところかなと思います。

個人的にもなんとなくフワッとした理解しかないので、改めてAlexaのAudio Playerについてコードを書いてきちんと理解しようという取り組みです。自分のためのメモなのであまり参考にしないでください。

事前準備

今回はお手軽にAlexa hostedを使って、「1曲だけ」再生できるスキルを作ってみたいと思います。まず最初の準備。

スキルを新規作成
- スキル名は「お手軽ミュージック」
- 日本語
- カスタムモデルを選択
- Alexa-hosted（Node.js）を選択
- テンプレートは「Hello Worldスキル」を選択
インタフェースで「Audio Player」を有効にしておく
対話モデルを作成
- HelloWorldIntentをリネームして、PlayAudioIntentを作成
- サンプル発話は以下みたいな感じで適当に。
  - 音楽をかけて
  - 音楽を再生して
  - 音楽をスタートして
モデルの保存とビルドをお忘れなく。
コードエディタからAlexa-hostedのS3に適当なmp3を２つ３つアップロード。４分以上あるものが望ましいですね。
package.jsonにask-sdk-s3-persistence-adapterを追加、使わないかもだけど。その他のパッケージもバージョンをとりあえず最新に。

  "dependencies": {
    "ask-sdk-core": "^2.7.0",
    "ask-sdk-model": "^1.28.0",
    "aws-sdk": "^2.649.0",
    "ask-sdk-s3-persistence-adapter": "^2.7.0"
  }

index.jsを修正、ask-sdk-s3-persistence-adapterとutil.jsを読み込み。hostedのS3ファイルを読み込むのでutil.jsは必要。

const persistenceAdapter = require('ask-sdk-s3-persistence-adapter');
const Util = require('./util.js')

HelloWorldIntentをPlayAudioIntentに変更したので、それに合わせてHelloWorldIntentHandlerをPlayAudioIntentHandlerに修正

const PlayAudioIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && Alexa.getIntentName(handlerInput.requestEnvelope) === 'PlayAudioIntent';
    },

リクエストハンドラもHelloWorldIntentHandlerをPlayAudioIntentHandlerに修正。PersistentAdapterの設定も追加。

exports.handler = Alexa.SkillBuilders.custom()
    .addRequestHandlers(
        LaunchRequestHandler,
        PlayAudioIntentHandler,
        HelpIntentHandler,
        CancelAndStopIntentHandler,
        SessionEndedRequestHandler,
        IntentReflectorHandler, // make sure IntentReflectorHandler is last so it doesn't override your custom intent handlers
    )
    .withPersistenceAdapter(
         new persistenceAdapter.S3PersistenceAdapter({bucketName:process.env.S3_PERSISTENCE_BUCKET})
     )
    .addErrorHandlers(
        ErrorHandler,
    )
    .lambda();

コード

起動〜音楽再生の開始

まずLaunchRequest。ここはまあ普通に喋ってるだけ。

const LaunchRequestHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'LaunchRequest';
    },
    async handle(handlerInput) {
        const speech   = 'お手軽ミュージックプレイヤーにようこそ。音楽を聞くには「音楽を再生」と言ってください。';
        const reprompt = '音楽を聞くには「音楽を再生」と言ってください。'; 
        return handlerInput.responseBuilder
            .speak(speech)
            .reprompt(reprompt)
            .getResponse();
    }
};

次に「音楽をかけて」を受けたPlayAudioIntentHandler。

const PlayAudioIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && Alexa.getIntentName(handlerInput.requestEnvelope) === 'PlayAudioIntent';
    },
    async handle(handlerInput) {
        const speech = 'わかりました。楽曲を再生します。';
        const url    = Util.getS3PreSignedUrl('Media/hiking.mp3');
        const token  = ”1”;
        return handlerInput.responseBuilder
            .speak(speech)
            .addAudioPlayerPlayDirective('REPLACE_ALL', url, token, 0, null) // playBehavior, url, token, offsetInMilliseconds, expectedPreviousToken
            .getResponse();
    }
};

addAudioPlayerPlayDirectiveを使ってオーディオを再生させます。引数は以下の通りです。

AudioPlayerは基本的にキューイング処理です。ここでは初回再生になるので、キューをクリアしてからセットするREPLACE_ALLを選択します。キューを追加したり（ENQUE）、置き換えたり（REPLACE_ENQUEUED）もできます。
音楽ファイルはURLで指定します。今回はAlexa hostedのS3なので署名付きURLにする必要があります。
トークンで現在再生中の曲を識別します。今回は1曲しかないので"1"とします。
曲を途中から開始する場合のオフセットです。初回再生時は最初からなので0になります。
最後はキューに曲を追加する場合に順番を制御するために、前の曲のトークンを指定します。

これによりスキルとのセッションは一旦切断され、Alexa側のAudioPlayerに制御が移ります。ちなみに .speakを追加しておくと曲再生される前に発話もできるみたいです。

一時停止

AudioPlayerがユーザの発話「一時停止」を受け取ると、スキル側にAMAZON.PauseIntentでリクエストしてきます。

const PauseIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.PauseIntent';
    },
    async handle(handlerInput) {
        const speech = 'わかりました。楽曲を停止します。';
        return handlerInput.responseBuilder
            .speak(speech)
            .addAudioPlayerStopDirective()
            .getResponse();
    }
};

スキル側でaddAudioPlayerStopDirectiveを返すことで、AudioPlayerが曲の再生を停止します。

再開・レジューム

一時停止中に、AudioPlayerがユーザの発話「再開」を受け取ると、スキル側にAMAZON.ResumeIntentでリクエストしてきます。

const ResumeIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.ResumeIntent';
    },
    async handle(handlerInput) {
        const speech = 'わかりました。楽曲再生を再開します。';
        const url    = Util.getS3PreSignedUrl('Media/hiking.mp3');
        const AudioPlayer = handlerInput.requestEnvelope.context.AudioPlayer;
        const token       = AudioPlayer.token;
        const offset      = AudioPlayer.offsetInMilliseconds;
        return handlerInput.responseBuilder
            .speak(speech)
            .addAudioPlayerPlayDirective('REPLACE_ALL', url, token, offset, null) // playBehavior, url, token, offsetInMilliseconds, expectedPreviousToken
            .getResponse();
    }
};

このとき、Alexaから飛んでくるリクエスト内にAudioPlayerのコンテキストが含まれています。一時停止中の曲を識別するためのトークン、現在の停止位置（オフセット）が含まれているので、それを使って再度addAudioPlayerPlayDirectiveを返すことで、中断している曲・位置を把握するというわけですね。ただし曲はあくまでもtokenでしか判断できず、URLは返ってきません。複数の曲を扱い場合などは、スキル側でトークンとURLのセットを保持しておく必要があるということですね。

その他

ここまでで必須のインテントは実装ができました（Help/Cancel/Stopも必須ですが割愛します）。1曲の場合だとこれだけで機能的には十分なんですが、他にもインテントは色々あります。

AMAZON.LoopOffIntent
AMAZON.LoopOnIntent
AMAZON.NextIntent
AMAZON.NextIntent
AMAZON.RepeatIntent
AMAZON.ShuffleOffIntent
AMAZON.ShuffleOnIntent
AMAZON.StartOverIntent

このあたりはインテントを登録してなくても呼び出されたら飛んできます。なので、無視する、もしくは「対応していない」旨を返すように、コードを修正したほうが良いようです。

無視する場合、Hello Worldスキルをベースにしているとデフォルトでデバッグ用に入っている以下のコードが対応していないインテントに対して発話してしまうので、消しといたほうがいいかなと思います。addRequestHandlersのところをコメントアウトしておけば良いと思います。

const IntentReflectorHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest';
    },
    handle(handlerInput) {
        const intentName = Alexa.getIntentName(handlerInput.requestEnvelope);
        const speakOutput = `${intentName} が呼び出されました。`;

        return handlerInput.responseBuilder
            .speak(speakOutput)
            //.reprompt('add a reprompt if you want to keep the session open for the user to respond')
            .getResponse();
    }
};
----- (snip) -----
exports.handler = Alexa.SkillBuilders.custom()
    .addRequestHandlers(
----- (snip) -----
        IntentReflectorHandler, // make sure IntentReflectorHandler...   //ここをコメントアウト
    )

「対応していない」旨を返すならこんな感じかな。少しやりすぎな気もしますが。

const AudioFallbackIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && ( Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.ShuffleOffIntent'
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.ShuffleOnIntent'
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.LoopOnIntent'
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.LoopOffIntent'
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.NextIntent'
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.PreviousIntent' );
    },
    async handle(handlerInput) {
        const speech = 'その機能には対応していません。楽曲を最初から再生します。';
        const url    = Util.getS3PreSignedUrl('Media/hiking.mp3');
        const token  = String(Math.random());
        return handlerInput.responseBuilder
            .speak(speech)
            .addAudioPlayerPlayDirective('REPLACE_ALL', url, token, 0, null) // playBehavior, url, token, offsetInMilliseconds, expectedPreviousToken
            .getResponse();
    }
};

この部分で曲を最初から再生してるのは一応理由があって、もう記憶も曖昧なのであまり信用してほしくないんですけど、

NextIntentとかPreviousIntentとかは対応してなければ「対応していない」旨を返すべき
かつ、その時点からそのまま再生を継続させずに曲を最初から再生し直すようにすべき

みたいなのを審査で指摘された、みたいなのをどっかで聞いたような記憶があるんですよね。ただ海外の場合かもしれない。間違っていたらごめんなさい。まあこういう書き方もあるよ、ということで。

ちなみにAMAZON.StartOverIntentはこのやり方になると思います。ハンドラもう一つ書くのも面倒なので、PlayAudioIntentHandlerで両方受けるようにすればいいかなと。

const PlayAudioIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && ( Alexa.getIntentName(handlerInput.requestEnvelope) === 'PlayAudioIntent' ||
                     Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.StartOverIntent' );
    },
    async handle(handlerInput) {
        const intent = Alexa.getIntentName(handlerInput.requestEnvelope);
        let speech;
        if (intent === "PlayAudioIntent") {
          speech = 'わかりました。楽曲を再生します。';
        } else {
          speech = 'わかりました。楽曲を最初から再生します。';
        }
        const url    = Util.getS3PreSignedUrl('Media/hiking.mp3');
        const token  = String(Math.random());
        return handlerInput.responseBuilder
            .speak(speech)
            .addAudioPlayerPlayDirective('REPLACE_ALL', url, token, 0, null) // playBehavior, url, token, offsetInMilliseconds, expectedPreviousToken
            .getResponse();
    }
};

ということはもういっそ全部一つにしちゃえばいいというのも。多分これはやりすぎだと思う。

const PlayAudioIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && ( Alexa.getIntentName(handlerInput.requestEnvelope) === 'PlayAudioIntent' ||
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.ShuffleOffIntent'
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.ShuffleOnIntent'
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.LoopOnIntent'
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.LoopOffIntent'
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.NextIntent'
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.PreviousIntent'
                 || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.StartOverIntent' );
    },
    async handle(handlerInput) {
        const intent = Alexa.getIntentName(handlerInput.requestEnvelope);
        let speech;
        if (intent === "PlayAudioIntent") {
          speech = 'わかりました。楽曲を再生します。';
        } else if (intent === "StartOverIntent") {
          speech = 'わかりました。楽曲を最初から再生します。';
        } else {
          speech = 'ごめんなさい、その操作には対応していません。楽曲を最初から再生します。';
        }
        const url    = Util.getS3PreSignedUrl('Media/hiking.mp3');
        const token  = String(Math.random());
        return handlerInput.responseBuilder
            .speak(speech)
            .addAudioPlayerPlayDirective('REPLACE_ALL', url, token, 0, null) // playBehavior, url, token, offsetInMilliseconds, expectedPreviousToken
            .getResponse();
    }
};

Resumeもオフセットだけ拾えばいいので、一つにしちゃえますね。めんどくさいのでやりませんが。

あと、Alexa-SDK Ver2（その8) AudioPlayer | Developers.IO によるとインテントリクエストじゃないものも飛んでくるらしいです。

  canHandle(h) {
      const type = h.requestEnvelope.request.type;
      return (type == 'AudioPlayer.PlaybackStarted' || // 再生開始
              type == 'AudioPlayer.PlaybackFinished' || // 再生終了
              type == 'AudioPlayer.PlaybackStopped' || // 再生停止
              type == 'AudioPlayer.PlaybackNearlyFinished' || // もうすぐ再生終了
              type == 'AudioPlayer.PlaybackFailed'); // 再生失敗
  },
  async handle(handlerInput) {
      return handlerInput.responseBuilder
      .getResponse();
  }
};

端折って書くとこんな感じですかね。

const PlaybackHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope).startsWith === 'AudioPlayer.'
    },
    async handle(handlerInput) {
      return handlerInput.responseBuilder
      .getResponse();
  }
};

このあたりは複数の曲を再生する場合に考慮する必要が出て来ると思うので、次回に。