Building a python selenium lambda function that works

My goal is building an AWS Lambda function to capture screenshot of a grafana dashboard, then upload it to a slack channel.

This works with latest AWS lambda(runtime python 3.6, amzn-ami-hvm-2018.03.0.20181129-x86_64-gp2) as 2020.04.04

I build this based on blog post(https://robertorocha.info/setting-up-a-selenium-web-scraper-on-aws-lambda-with-python/), but the two static library is not required with the latest image.

First of all, we need a combination of chromium, chromedriver and selenium. There are couple of blog posts describing steps of building a tailored version of chromium, but unfortunately none of them worked through. At the end, I launched a t2.micro instance with lambda official image. After that tried different pre-built chromium binary, with different options, to make it work standalone. By invoking in shell, I found that it is key to add disable-dev-shm-usage option to make it work. Also according to another blog post(https://swizec.com/blog/serverless-chrome-on-aws-lambda-the-guide-works-in-2019-beyond/swizec/9024), it is required to remove v=99 option.

Here is the final combination I use:

serverless chrome v1.0.0-55, (https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-55/stable-headless-chromium-amazonlinux-2017-03.zip)

Options:

     chrome_options.add_argument('--headless')
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--verbose')
        chrome_options.add_argument('--disable-dev-shm-usage')
        chrome_options.add_argument('--disable-gpu')
        chrome_options.add_argument('--ignore-gpu-blacklist')
        chrome_options.add_argument('--prerender-from-omnibox=disabled')
        chrome_options.add_argument('--window-size=1280x1024')
        chrome_options.add_argument('--user-data-dir={}'.format(self._tmp_folder + '/user-data'))
        chrome_options.add_argument('--hide-scrollbars')
        chrome_options.add_argument('--enable-logging')
        #chrome_options.add_argument('--log-level=0')
        #chrome_options.add_argument('--v=99')
        chrome_options.add_argument('--single-process')
        chrome_options.add_argument('--data-path={}'.format(self._tmp_folder + '/data-path'))
        chrome_options.add_argument('--ignore-certificate-errors')
        chrome_options.add_argument('--homedir={}'.format(self._tmp_folder))
        chrome_options.add_argument('--disk-cache-dir={}'.format(self._tmp_folder + '/cache-dir'))
        chrome_options.add_argument(
            'user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36')

After having a working chromium binary, find compatible chromedriver and selenium, here are the versions work:

chromedriver: 2.43(https://chromedriver.storage.googleapis.com/2.43/chromedriver_linux64.zip)

selenium 3.141.0.

Instead of including chromium binary in build package, I chose to download it from s3 to /tmp during runtime. In this way, the limit of 50MB upload could be bypassed, as I need to add dependencies of requests and requests_toolbelt to make slack API call.

The final part, send captured screenshot to slack by calling files.upload API. As it is the only API I need, I used request module instead of a full slack-client. The trick, to appropriately build a http request with content-type application/x-www-form-urlencoded , use MultipartEncoder.

    m = MultipartEncoder(fields={'channels':'lchen-test','file':('grafana.png',open('/tmp/grafana.png', 'rb'))})
    header = {'Authorization': '<slack token>', 'Content-Type':m.content_type}
    r = requests.post('https://slack.com/api/files.upload', data=m, headers=header)

Leave a comment