how to create an Amazon S3 bucket

Let’s create an Amazon S3 bucket that will be used by the Amazon EMR Serverless job.

Before you begin, you must download two files, dataset_en_dev.json which is the source data to be processed and reviews.py which is the spark job script. These files can be downloaded here .

To create an Amazon S3 bucket, navigate to the AWS Management Console, enter S3 in the search bar, and then choose S3 from the provided search results.

From the Amazon S3 Buckets page, choose Create bucket. This will redirect you to the Create bucket page.

Next, configure an S3 bucket. First provide a unique Bucket name. (In the bucket name, you can include the date in a format such as year, month, day, and last four digits of your account number, because the bucket name needs to be globally unique). Then, under AWS Region, choose the Region where you want the bucket to reside.

uncheck the public access and check the acknowledge public access at the bottom part. Leave all other options as default. Then, scroll down and choose Create bucket.

From the Account snapshot screen, select the bucket created and Then, choose the Create folder.

Enter the Folder name as input. For this page, leave all other options at their default values.

scroll down and choose Create folder.

Now Select the folder input, then choose Upload.

choose Add files. Select the dataset_en_dev.json and reviews.py files that you downloaded here as a prerequisite for this demonstration.

After the files have been selected, scroll down and choose Upload. Finally, wait for the upload to complete.

Then, note the S3 bucket name, as you will need it during the next step of IAM policy creation.