Let Lambda do Feature Engineering

We all know a lambda function can have any number of arguments but can have only one expression. It is one of the famous one-liner functions in Python programming.
By using this lambda function with Pandas, we can do many feature engineering operations. In this article, I am gonna explain some of the use cases.
I am using the students’ performance dataset from Kaggle. For reproducibility, you can download the link here.
Let’s import the libraries and dataset
import pandas as pd
import numpy as npdf=pd.read_csv(‘students_performance.csv’)
df
Output:

1. Discretization
Here I am gonna discretize the writing score values into categories (Outstanding and Satisfactory) based on their score. So I’ve applied lambda with a unary conditional statement.

2. Data Aggregation
One of the best ways to aggregate and create new features is using the lambda function.
Here we can calculate the total score from multiple features such as math score, reading score, and writing score.

3. Filter Rows
Sometimes, we need to apply lambda expression to filter the dataset. Suppose we need to filter the rows based on the percentage value. (E.g. percentage > 80%) . Here the score value is not available in percentage, and we could not filter using the direct method. So applying lambda expression to filter the data is the best way.

4. Binary Encoding:
Let’s apply the lambda with the combination of the map to encode the lunch feature into binary values (standard — 1, free/reduce — 0)

5. Categorical Encoding
Let’s convert the categorical feature ‘race/ethnicity’ into numeric values through the lambda function.

6. Normalization:
Normalization refers to replot real-valued numeric attributes into a 0 to 1 range.
- Zscore
- Min-Max
Zscore
The z-score calculates how many standard deviations a data point is away from the mean.
Here I am finding the Zscore value using lambda for math_score

Min-Max Normalisation
In Min-Max Normalization, the minimum value of that feature gets transformed into a 0, the maximum value gets transformed into a 1, and every other value gets transformed into a decimal between 0 and 1.
Here I am applying this function with Lambda to the reading_score feature.

7. IQR
Using the Interquartile Rule, we can find the outliers.
Once we find the values of the first and third quartiles, the interquartile range is very easy to calculate.
Here I am using Lambda to find outliers for writing scores.

So this is how Lambda is a powerful function to apply with Pandas to reduce most of the Feature engineering operations to one line. However, be cautious while using lambda for a sequence of operations since It may lead to reading complexity.
For more data science thoughts, follow me on LinkedIn.