Feature Engineering

Images from Unsplash Disclaimer: This article is my learning note from the courses I took from Kaggle. In this course, we will learn on how to: determine which features are the most important with mutual information invent new features in several real-world problem domains encode high-cardinality categoricals with a target encoding create segmentation features with k-means clustering decompose a dataset’s variation into features with principal component analysis 1. Introduction The reason we perform feature engineering is we want to make our data more suited to the problem at hand....

August 20, 2023 · 20 min · Kean Teng Blog

Data Cleaning

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. It is a key part of data science, and it can be deeply frustrating. What should we do to the missing values? Why the dates are not in the correct format? How to clean up inconsistent data entry? These are some of the problems that we will learn to tackle in this course.

August 19, 2023 · 10 min · Kean Teng Blog

Pandas

In this course, we will explore on the Python pandas module which is a popular library for data analysis. With pandas, we can use it to create data and also work or manipulate the existing data.

August 18, 2023 · 11 min · Kean Teng Blog