Images from Unsplash
Disclaimer: This article is my learning note from the courses I took from Kaggle.
In this course, we will explore on data visualization using seaborn
, a Python package to visualize data with a variety of plot types. The package is powerful yet easy to use, check out the below images on the plot types that seaborn
is able to generate. You can also scroll to the bottom to see the table summary:
Plots That seaborn Can Create
Let’s explore the Python code to create different plot type with seaborn
1. Lineplot
In the code below, we use sns.lineplot()
where it tells Python that we want to produce a line chart with the specified datasets. We only need to change the data
parameters if we would like to plot for a different dataset.
Moreover, we can also set the size of the plot by calling plt.figure(figsize = (w,h))
. By adjusting the height and width of the plot we can set our plot to the desired size.
# setting the plot size
plt.figure(figsize = (16,6)) # width and height
# plot line plot
sns.lineplot(data = fifa_data)
Line Plot
Now, let’s consider that we are using a dataset about the number of steams per day for songs such as:
Shape of You
Despacito
Something Just Like This
We would like to compare the streams between “Shape of You” and “Despacito”. But we do not want the plot to include the song “Something Just Like This”. Here’s how we can do it:
plt.title("Comparing Two Songs Streams")
sns.lineplot(data = spotify['Shape of You'], label = 'Shape of You')
sns.lineplot(data = spotify['Despacito'], label = 'Shape of You')
plt.xlabel("Date")
Comparing Two Songs Streams
2. Bar Charts & Heatmaps
For bar chart, we can plot it with sns.barplot()
. Let’s say we want to visualize the average arrival delay for an airline service by month starting from January to December:
data = pd.read_csv(file_name, index_cols = 'Month')
sns.barplot(x = data.index, y =data['delay'])
plt.title("Average Arrival Delay By Month")
plt.ylabel('Arrival Delay (in minute)')
Bar Plot
Let’s also look at heatmap where it can be used to illustrate patterns in our dataset by color-coding each cell to its corresponding value:
sns.heatmap(data = data, annot = True)
plt.title('Average Airline Delay for Each Airline')
plt.xlabel('Airline')
Heatmap
The annot
parameter ensures all value appears on the chart. By setting to False
, we would have no number for each cell.
3. Scatter Plot
Scatter plot is used to show the relationship between two variables. It is a useful plot to understand the relationship between two variables. Here’s how we can plot it using Python:
sns.scatterplot(x = insurance_data['bmi'], y = insurance_data['charge'])
Scatter Plot
From the above plot, it seems that BMI are positive correlated with the insurance costs. Now, let’s do a double-checking by adding a regression line to our plot:
sns.regplot(x = insurance_data['bmi'], y = insurance_data['charge'])
Scatter Plot With Regression Line
In fact we can also perform some color-coding to our plot by adding the hue
parameter. Let’s say we want to color-code the plot by separating people that smoke and do not smoke:
sns.scatterplot(x = insurance_data['bmi'], y = insurance_data['charge'], hue = insurance_data['smoker'])
Color-Coded Scatter Plot
It seems that smoker will tend to pay more than non-smoker. Let’s check it again by adding two regression lines:
sns.lmplot(x = 'bmi', y = 'charge', hue = 'smoker', data = insurance_data)
Multiple Regression Lines Plot
Another interesting plot to look into is known as the categorical scatter plot. The plot can be produced with the sns.swarmplot()
command:
sns.swarmplot(x = insurance_data['smoker'], y = insurance_data['charges'])
Swarm Plot
Here’re some insights from the plot:
- Non-smokers are charge less than smoker on average
- People that pay the most are smokers, while those that pay the least are non-smoker.
4. Distributions
In this section, we will explore about histograms as well as the density plots. Histogram is a graph that shows the frequency of numerical data using rectangles while for a density plot, it represents the distribution of a numeric variable.
Here’s an example for histogram:
sns.histplot(irisi_data['Petal Length'])
Histogram Plot
For density plot, we will use the kernel density estimate plot, it looks like a smoothed histogram. By changing the shade
parameter, we can turn the display of the shaded region on and off.
sns.kdeplot(data = iris_data['Petal Length'], shade = True)
KDE Plot
Here’s the code to produce two-dimensional kernel density plot:
sns.jointplot(x = irid_data['Petal Length'], y = ['Sepal Width'], kind = 'kde')
2D KDE Plot
Furthermore, to color-code the histogram of the kernel density estimate plot, simply add a hue
parameter as below:
sns.histplot(data = iris_data, x = 'Petal Length', hue = 'Species')
sns.kdeplot(data = iris_data, x = 'Petal Length', hue 'Species', shade = True)
Color-Coded Histogram Plot
Color-Coded KDE Plot
5. Plot Style
There are several themes available in the seaborn
module, you can set the style or theme of your plot before you start plotting:
sns.set_style('dark')
# your plot here
# try other themes with: drakgrid, whitegrid, dark or white
6. Summary
Here’s a summary of all the plot you have learned:
Type | Category | Code | Remarks |
---|---|---|---|
Line Chart | Trend | sns.lineplot() |
Show trends over time, multiple lines can be used to show trend in more than a group |
Bar Chart | Relationship | sns.barplot() |
Compare quantities with respect to different groups |
Heatmap | Relationship | sns.heatmap() |
Find color-coded patterns in tables of numbers |
Scatter Plot | Relationship | sns.scatterplot() |
Show relationship between two continuous variables |
Regression Line | Relationship | sns.regplot() |
See linear relationship between two variables |
Multi-Regression Line | Relationship | sns.lmplot() |
See linear relationship between two variables involving group |
Categorical Scatter Plot | Relationship | sns.swarmplot() |
Observe the relationship between continuous variable and categorical variable |
Histogram | Distribution | sns.histplot() |
Show distribution of single numerical variable |
KDE Plot | Distribution | sns.kdeplot() |
Show a smooth distribution of a single or more numerical variable |
2D KDE Plot | Distribution | sns.jointplot(kind = 'kde') |
Display a 2D KDE plot with each KDE correspond to each variable |