If you use AWS Transcribe to convert audio or video into a text transcription, you might be underwhelmed at the fact that the output of AWS Transcribe is a JSON file. You can’t really show it to a human end user without first processing it.
Here’s how to import a Transcribe-generated .json file into AWS Lambda from an S3 bucket, convert it into a readable text file, and then export it to a different S3 bucket:
Create Two S3 Buckets
If you don’t already have 2 buckets created in AWS S3 then do so. For this tutorial I will name them:
- my-source-bucket
- my-destination-bucket
Create A Lambda Function
Select:
- Author from scratch
- Name it something like transcribeJSONtoTXT .
- Use Node 18.x (that is the most recent version of Node at the time this tutorial is being written).
Unless you have good reason, leave everything else the same.

Add Permissions to The Buckets
Once the Lambda function is created, within the Lambda function go into:
Configurations >> Permissions and click on the role name that was created for you.

You will be taken to the IAM permissions policy page for your role. Select Add permissions >> Create inline policy.

In the JSON tab of the Edit Policy page add this code, replacing the ARNs with those of your own buckets:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GetFrom3",
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-source-bucket-us-east-1/*"
},
{
"Sid": "PutTos3",
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-destination-bucket-us-east-1/*"
}
]
}
Review the policy, give it a relevant name (e.g. Lambda-S3-Read-Write), and save it. What this policy does is allow you to get files from your source bucket and put files in your destination bucket.
Add Code To Lambda Function
Back in your Lambda function’s Code tab, you will probably find a file named index.mjs.

Go ahead and delete all of the default code. I’ll walk you step by step through the code that you’ll need to replace it with.
First add:
import AWS from "aws-sdk";
const s3 = new AWS.S3();
NOTE: The Node 18 runtime should be bundled with the AWS-SDK. If for some reason you get the error that `AWS cannot be found` then you will have to add the AWS-SDK as a Lambda Layer. You shouldn’t have to, so we will move on.
We will keep the name “handler” for our main export, as this is the default in the runtime settings (i.e. index.handler means “Hello any application that is running this lambda, look in the index file for a function named handler!”).

So at this point we will have:
import AWS from "aws-sdk";
const s3 = new AWS.S3();
export const handler = async (event, context) => {
}
All of the rest of our code will go inside this handler. I will put the full code here with comments in the code:
import AWS from "aws-sdk";
const s3 = new AWS.S3();
export const handler = async (event, context) => {
try {
// Retrieve the file from the source S3 bucket with a promise
const sourceBucket = event.Records[0].s3.bucket.name;
const sourceKey = event.Records[0].s3.object.key;
const sourceParams = {
Bucket: sourceBucket,
Key: sourceKey
};
const sourceFile = await s3.getObject(sourceParams).promise();
// Convert the file into the transcription text you can use
const transcription = JSON.parse(sourceFile.Body.toString());
//Get the Transcribe job name - this will be your output file name later on
const jobName = transcription.jobName;
// Get all of the words and punctuation from the items array within the JSON file
const itemInArray = transcription.results.items;
// Some variables you will need
let speakerLabel = "";
let showNewSpeakerLabel;
let content = "";
//Start looping through the words and punctuation
for (let transcrItem of itemInArray) {
//Sometimes each new item section is still the previous section's speaker
//Decide if you want to repeat the same speaker's label
if (transcrItem.speaker_label == speakerLabel) {
showNewSpeakerLabel = false;
} else {
showNewSpeakerLabel = true;
}
//Change speaker label to the current item's speaker
speakerLabel = transcrItem.speaker_label;
//get the item's current word or punctuation mark
let itemcontent = transcrItem.alternatives[0].content;
//Are we showing the speaker label or not? If so, also add two line breaks above
if (showNewSpeakerLabel == true) {
itemcontent = "\n" + "\n" + speakerLabel + ": " + itemcontent;
} else if (itemcontent !== "." && itemcontent !== ",") {
itemcontent = " " + itemcontent;
}
//Add this item's content to any previous content
content = content + itemcontent;
}
// Put the processed file into the destination S3 bucket with a promise
// Change to your destination bucket's name
const destinationParams = {
Bucket: 'my-destination-bucket',
Key: jobName + '_transcript.txt',
Body: content,
ContentType: 'text/plain'
};
await s3.putObject(destinationParams).promise();
console.log(`Successfully processed and uploaded file: ${sourceKey}`);
return {
statusCode: 200,
body: JSON.stringify({
message: `Successfully processed and uploaded file: ${sourceKey}`
})
};
} catch (err) {
console.error(`Error processing file: ${err}`);
return {
statusCode: 500,
body: JSON.stringify({
message: `Error processing file: ${err}`
})
};
}
};
Click Deploy to save this file.

Add Trigger
Finally we want to create a trigger so that whenever a .json file is copied to, moved to, or uploaded to your source bucket, that this function is automatically called. So on the main page for your Lambda, select + Add Trigger.

On the Add Trigger Page, add your source bucket:

Make the trigger happen for All Object Create Events.

In the Suffix Field, add .json to that it only triggers when a .json file is added.

Add the trigger, and you should be done! Upload your Transcribe JSON output file into your source bucket, and then see if the TXT file shows up in your destination bucket. If it doesn’t check the logs in the monitor tab of your lambda function.
Note: this tutorial is for transcriptions that identify multiple speakers. If you only need the block of transcription without identifiers, then after
const transcription = JSON.parse(sourceFile.Body.toString());
you can get it with:
const transcript = transcription.results.transcripts[0].transcript;