PENGUINITIS - Amazon Polly によるテキストの読み上げ

Amazon Polly によるテキストの読み上げ 2020年5月14日
はじめに Amazon Polly を用いて、ストリーミングでテキストを読み上げる。準備 AWS マネジメントコンソールで Amazon Polly を開いて、Polly を使えるようにしておく。テキストを読み上げさせたいだけなら、ここでもできる。以下ではストリーミングで読み上げさせる。権限の取得方法 Web サイトから Polly を用いるには、AWS SDK for JavaScript を用いる。そのためには AWS で Polly を使うための権限を取得する必要がある。アクセスキー ID、シークレットアクセスキーを用いる方法もあるが、JavaScript のコードは丸見えなので、テストにしか使えない。ここでは Cognito を用いる。 Cognito の設定 (こちらを参照) は済んでいるとして、"認証されたロール" にポリシー "AmazonPollyReadOnlyAccess" あるいは　"AmazonPollyFullAccess" をアタッチする。 Polly によるテキストの読み上げボタンを押すと入力されたテキストを読み上げるようにする。 index.html <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Polly テスト</title> </head> <body> <script src="https://sdk.amazonaws.com/js/aws-sdk-2.651.0.min.js"></script> <script src="login.js"></script> <label for="textInput">テキスト: </label> <input type="text" id="textInput"> <input type="submit" value="読み上げ" id="pollySubmit"> <input type="submit" value="ログアウト" id="logoutSubmit"> <script> const textInput = document.getElementById("textInput"); const pollySubmit = document.getElementById("pollySubmit"); const logoutSubmit = document.getElementById("logoutSubmit"); function playText() { const polly = new AWS.Polly({ apiVersion: "2016-06-10" }); const params = { OutputFormat: "mp3", SampleRate: "22050", // "8000", "16000", "22050", "24000" Text: textInput.value, TextType: "text", // "text", "ssml" VoiceId: "Mizuki", // "Mizuki", "Takumi" }; polly.synthesizeSpeech(params).promise() .then(data => { const audio = document.createElement("audio"); document.body.appendChild(audio); const stream = new Uint8Array(data.AudioStream); const blob = new Blob([stream.buffer]); audio.src = URL.createObjectURL(blob); audio.play(); }) .catch(err => { console.log(err); }); } pollySubmit.addEventListener("click", playText); logoutSubmit.addEventListener("click", logout); </script> </body> </html> "login.js" は認証用のスクリプトである。 login.js const awsRegion = 'ap-northeast-1'; const bucketName = '(バケット名)'; const domainPrefix = bucketName; const cognitoUserPoolId = '(ユーザープール ID)'; const cognitoAppClientId = '(アプリクライアント ID)'; const cognitoIdentityPoolId = '(ID プール ID)'; const thisUrl = "http://localhost:8000/index.html"; // for test //const thisUrl = `https://${bucketName}.s3-${awsRegion}.amazonaws.com/index.html`; const cognitoDomainUrl = `https://${domainPrefix}.auth.${awsRegion}.amazoncognito.com`; const cognitoLoginUrl = `${cognitoDomainUrl}/login?response_type=token&client_id=${cognitoAppClientId}&redirect_uri=${thisUrl}`; const cognitoLogoutUrl = `https://${domainPrefix}.auth.${awsRegion}.amazoncognito.com/logout?client_id=${cognitoAppClientId}&logout_uri=${thisUrl}`; const logout = () => { location.href = cognitoLogoutUrl; } const idToken = (() => { const params = new URLSearchParams(location.hash.slice(1)); return params.get("id_token"); })(); let username = ''; if (idToken === null) { location.href = cognitoLoginUrl; } // Initialize the Amazon Cognito credentials provider AWS.config.region = awsRegion; AWS.config.credentials = new AWS.CognitoIdentityCredentials({ IdentityPoolId: cognitoIdentityPoolId, Logins: { [`cognito-idp.${awsRegion}.amazonaws.com/${cognitoUserPoolId}`]: idToken, } }); AWS.config.credentials.get(function() { const secretAccessKey = AWS.config.credentials.secretAccessKey; if (secretAccessKey === undefined) { location.href = cognitoLoginUrl; } }); if (idToken !== null) { username = (() => { const tokens = idToken.split('.'); const obj = JSON.parse(atob(tokens[1])); return obj['cognito:username']; })(); } 本番環境を S3 に置くようになっているが、とりあえずお試し用でローカルの Web サーバーを用いている (thisUrl の値を参照)。バケット名とドメインプレフィックスは同じとしている。ローカルの Web サーバーは Python で立てる (index.html のあるディレクトリにてコマンドプロンプトで実行)。 >python -m http.server index.html を開くと Cognito のログイン画面に行くはずなので、作成したユーザーでログインすると、読み上げ画面に行く。 iOS 版 Safari の問題 iOS 版 Safari では、音声再生に関して制限があるため、上のものではうまくいかない。以下のように変更する。 audio 要素の src にストリーミング (blob) を指定しても再生してくれない (mp3 ファイル直接なら OK) ので、代わりに source 要素を用いる。ユーザークリック駆動での再生は問題ないが、コールバック関数などでの再生はダメなようなので、あくまでクリック駆動で再生するようにする。 <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Polly テスト</title> </head> <body> <script src="https://sdk.amazonaws.com/js/aws-sdk-2.651.0.min.js"></script> <script src="login.js"></script> <label for="textInput">テキスト: </label> <input type="text" id="textInput"> <input type="submit" value="読み上げ" id="pollySubmit"> <input type="submit" value="ログアウト" id="logoutSubmit"> <audio id="polly"><source id="pollySource" src="" type="audio/mp3"></audio> <script> const textInput = document.getElementById("textInput"); const pollySubmit = document.getElementById("pollySubmit"); const logoutSubmit = document.getElementById("logoutSubmit"); async function playText() { const audio = document.getElementById("polly"); const audioSrc = document.getElementById("pollySource"); const polly = new AWS.Polly({ apiVersion: "2016-06-10" }); const params = { OutputFormat: "mp3", SampleRate: "22050", // "8000", "16000", "22050", "24000" Text: textInput.value, TextType: "text", // "text", "ssml" VoiceId: "Mizuki", // "Mizuki", "Takumi" }; await polly.synthesizeSpeech(params).promise() .then(data => { const stream = new Uint8Array(data.AudioStream); const blob = new Blob([stream.buffer]); audioSrc.src = URL.createObjectURL(blob); }) .catch(err => { console.log(err); }); audio.load(); audio.play(); } pollySubmit.addEventListener("click", playText); logoutSubmit.addEventListener("click", logout); </script> </body> </html> audio.load() は PC 版のために必要。自動再生問題また、音声再生のタイミングがクリック時にできない場合 (たとえば、クリック駆動でどこかから文字列をもらい、それを Polly で音声化して再生したりする場合など)、自動再生とみなされ再生できない。その場合は、再生する予定の audio 要素でクリック時に一旦無音を鳴らしておけばよい。無音は Windows であれば Audacity で作成できる (Mac なら GarageBand でいけそう)。参考 Audacityを使った無音ファイルの作成 (ペンタのブログ) デバッグ iPhone など iOS の Safari のデバッグには Mac が必要 (Windows でもできなくはなさそう)。iOS の Safari の設定の [詳細] で [Web インスペクタ] を有効にしておいて、デバイスを Mac に接続し、Safari の [開発] からデバイスおよび Web ページを選択すればよい。 SSML の利用 SSML を利用する際はパラメータの TextType を "ssml" にし、Text を "<ssml>...</ssml>" の形で書く。

PENGUINITIS