Aarfah Ahmad
8 min readJun 10, 2020

--

Introduction to AWS Lambda

Hola!

In this article we will talk about AWS Lambda, steps to create one and how we can implement backup functionality using the service.

AWS Lambda

AWS Lambda is a serverless computing service provided by Amazon Web Services (AWS). You can also consider this as Function as a Service example (FaaS)

As mentioned above Lambda is a serverless technology, we don’t need to worry about which server will run the code, that does not mean you don’t need one, it only means that the administration is done by AWS itself. In Serverless, basically, a Function spins-up server with an event/trigger and cleans it up (the resources etc.) once the request/task is completed. There’s an elusive difference between PaaS and Serverless (FaaS). In PaaS, you have to plan and define an Auto-Scaling strategy whereas in Serverless (FaaS) you just don’t need to know about how/what’s happening behind the scene — i.e. complete abstraction!

Users of AWS Lambda create functions, self-contained applications written in one of the supported languages and runtimes, and upload them to AWS Lambda, which executes those functions in an efficient and flexible manner. We can configure triggers to invoke a function in response to resource lifecycle events, respond to incoming HTTP requests, consume events from a queue, or run on a schedule.

The Lambda functions can perform any kind of computing task, from serving web pages and processing streams of data to calling APIs and integrating with other AWS services.

Serverless computing

The concept of “serverless” computing helps you develop and deploy your AWS Lambda functions, along with the AWS infrastructure resources they require. Serverless computing allows you to build and run applications and services without thinking about servers. With serverless computing, your application still runs on servers, but all the server management is done by AWS. AWS Lambda is a fully managed service that takes care of all the infrastructure for you. And so “serverless” again doesn’t mean that there are no servers involved: it just means that the servers, the operating systems, the network layer and the rest of the infrastructure have already been taken care of, so that you can focus on writing application code.

You just need to create the code and all the upscaling/downscaling of resources will be done by AWS. You are only charged for the time your code executes.

How does AWS Lambda work?

Let us first understand a container.

A container contains the code the user has provided to satisfy the query.

Whenever a lambda function is created, a new container with the appropriate resources will be created to execute it, and the code for the function will be loaded into the container.

Clients send data to Lambda. Clients could be anyone who’s sending requests to AWS Lambda. This could be an application or other Amazon services.

Lambda receives requests from clients, and depending on the size, amount, or volume of the data, it runs on the defined number of containers. Requests are then given to a container to handle.

With an increased number of requests, an increasing number of containers are created. If the number of requests reduces, the number of containers are reduced as well. AWS Lambda functions execute in a container (sandbox) that isolates them from other functions and provides the resources, such as memory, specified in the function’s configuration.

Example

Consider an application like Instagram on which people upload pictures/videos.

In a normal day, suppose only 1 user is uploading then all the processing is done and the photo is stored in S3 bucket and then saved into Database. (Just assuming, it never happens :P)

Image 1

Now consider the day of a festival (Consider Diwali, where a lot of people upload pictures/videos) there will so much load on the application to serve the request.

You cannot restrict the function to run on only specific amount of server, it needs to scale up the storage capacity of S3 bucket, provision and scale more servers, apply security patches.

If we use Lambda function, it will create containers as per the demand and all the requests will be served without any delay. Also using Lambda function, you just need to write the code and you will be billed for only the execution time. All the capacity scaling patching and administration is handled by AWS.

Image 2

Lambda Function

I had worked on the implementation of a Lambda function to take incremental backups of RDS(MySql) tables and store it in the S3 bucket.

Here, is the code snippet and the steps to create and invoke Lambda function which might be useful for others.

Steps to create and configure the function in your AWS account.

To create a Lambda function

  1. Open the AWS Lambda console and Choose Create a function.
Step 1

2. For Function name, enter your function name.

Lambda creates a Node.js function and an execution role that grants the function permission to upload logs. Lambda assumes the execution role when you invoke your function, and uses it to create credentials for the AWS SDK and to read data from event sources.

3. For me I will type backup_incremental. Choose or create a new role, remember to have access to S3 and RDS in order to connect with RDS database and dumping data to S3.

For language/runtime, I will choose Python in our case.

Then, click on Create function.

Step 3

4. Now, make a Zip file of the code and necessary libraries that you have used in your code and upload it.

You can also use the inline code editing feature.

In my case I am using the Upload a Zip file option wherein I will create a zip of the libraries that I am importing in the code and a python file with the code.

i. Upload the Zip file

ii. Make sure the file name of your python code is same as the one mentioned in the Handler

iii. Click on Upload

Step 4 i.
Step 4 ii.

5. Invoke your Lambda function using the sample event data provided in the console.

To invoke a function

i. In the upper right corner, choose Test.

ii. In the Configure test event page, choose Create new test event and in Event template, leave the default Hello World option. Enter an Event name and note the following sample event template:

{

“key1”: “value1”,

“key2”: “value2”,

“key3”: “value3”

}

You can change key and values in the sample JSON, but don’t change the event structure. If you do change any keys and values, you must update the sample code accordingly.

iii. Choose Create and then choose Test. Each user can create up to 10 test events per function. Those test events are not available to other users.

iv. AWS Lambda executes your function on your behalf. The handler in your Lambda function receives and then processes the sample event.

v. Upon successful execution, view results in the console.

Libraries and their usage

i. Boto3: It is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2

As we need to connect with the S3 service of AWS to store the files, we will need Boto3 library.

ii. StringIO: Read and write strings as files

As we need to create a new file to backup the new data that is modified based on the last modified timestamp of the column in our table. This will help us to keep separate files for each day’s data. So if there’s an error in entering the records we can go back to the backup files and recheck.

iii. PyMsql: PyMySQL is an interface for connecting to a MySQL database server from Python.

iv. Datetime: The datetime module supplies classes for manipulating dates and times in both simple and complex ways.

To attach the timestamp to each of the files generated for backup.

v. Pandas: To perform manipulation and analysis.

To read the SQL data and convert into tabular format so that we can save it as CSV.

Here we have extracted data from the RDS table and stored in Pandas Dataframe using the read_sql function.

Code

import boto3

from io import StringIO

import pymysql

import sys

import logging

from botocore.vendored import requests

import datetime

import pandas as pd

DESTINATION = ‘UAT’ #S3 Folder name where you want to save your CSV file to

def incremental_backup(event, context):

conn = pymysql.connect(host=’rds_end_point’, port=rds_port_number, user=’rds_user_name’, password=’rds_password’, database=’rds_db_name’)

cursor = conn.cursor()

query = ‘select * from db_name. table_name where date(last_modified_date)=date(NOW())’

cursor.execute(query)

df1 = pd.read_sql(query, conn)

timestr = datetime.datetime.now().strftime(“%Y-%m-%d %H:%M:%S”)

_write_dataframe_to_csv_on_s3(df1, ‘Project_name/table_name_inc_backup’+timestr+’.csv’)

return “Hello”

def _write_dataframe_to_csv_on_s3(dataframe, filename):

“”” Write a dataframe to a CSV on S3 “””

print(“Writing {} records to {}”.format(len(dataframe), filename))

# Create buffer

csv_buffer = StringIO()

# Write dataframe to buffer

dataframe.to_csv(csv_buffer, sep=”,”, index=False)

# Create S3 object

s3_resource = boto3.resource(“s3”)

# Write buffer to S3 object

s3_resource.Object(DESTINATION, filename).put(Body=csv_buffer.getvalue())

We have used the zip file to upload our code, but what we could also use is the Layers functionality of AWS Lambda where in, you can create a layer of all the libraries you need and save it as an entity.

Lambda Layers consists of a file structure where you store your libraries, load it independently to Amazon Lambda, and use them on your code whenever needed. Once you create a Lambda Layer it can be used later for other Lambda functions too.

After creating and testing the function you can schedule or trigger the Lambda function.

Here, we will schedule the function to run everyday at 12 am using a Cron expression.

Use the link to follow the steps.

Cron job: 0 0 12 * * ?

Pattern: second minute hour day month year

You can also use the rate expression to schedule the function.

Thanks for reading! Please do share the article, if you liked it. Any comments or suggestions are welcome!

--

--

Aarfah Ahmad

Data Engineer | AWS | GCP | Snowflake | ETL | Warehousing