TIL: Effortless parallelization with Python 3

September 21, 2018

Earlier this week I had a task which required me to send numerous (~5K) AWS S3 copy requests through the boto3 API. Each request took roughly 1 second so that means it would take about an hour and twenty minutes to finish this process. The requests didn’t depend on each other, so I decided to parallelize the requests using threads.

I’ve had some experience with threads in the past but I decided to try out something new. Namely, ThreadPoolExecutor.

Here’s how it looks:

from concurrent.futures import ThreadPoolExecutor
import boto3

s3 = boto3.client('s3')
keys = [...]  # A long list of things to copy


def copy(key):
    s3.copy(...)  # Cut for brevity


with ThreadPoolExecutor(max_workers=20) as executor:
    for key in keys:
        executor.submit(copy, key)

Voilà, instant parallelization.

Comments