*Images from Unsplash*

Disclaimer: This article is my learning note from the courses I took from Kaggle.

In this course, we will explore on data visualization using `seaborn`

, a Python package to visualize data with a variety of plot types. The package is powerful yet easy to use, check out the below images on the plot types that `seaborn`

is able to generate. You can also scroll to the bottom to see the table summary:

*Plots That seaborn Can Create*

Let’s explore the Python code to create different plot type with `seaborn`

### 1. Lineplot

In the code below, we use `sns.lineplot()`

where it tells Python that we want to produce a line chart with the specified datasets. We only need to change the `data`

parameters if we would like to plot for a different dataset.

Moreover, we can also set the size of the plot by calling `plt.figure(figsize = (w,h))`

. By adjusting the height and width of the plot we can set our plot to the desired size.

```
# setting the plot size
plt.figure(figsize = (16,6)) # width and height
# plot line plot
sns.lineplot(data = fifa_data)
```

*Line Plot*

Now, let’s consider that we are using a dataset about the number of steams per day for songs such as:

```
Shape of You
Despacito
Something Just Like This
```

We would like to compare the streams between “Shape of You” and “Despacito”. But we do not want the plot to include the song “Something Just Like This”. Here’s how we can do it:

```
plt.title("Comparing Two Songs Streams")
sns.lineplot(data = spotify['Shape of You'], label = 'Shape of You')
sns.lineplot(data = spotify['Despacito'], label = 'Shape of You')
plt.xlabel("Date")
```

*Comparing Two Songs Streams*

### 2. Bar Charts & Heatmaps

For bar chart, we can plot it with `sns.barplot()`

. Let’s say we want to visualize the average arrival delay for an airline service by month starting from January to December:

```
data = pd.read_csv(file_name, index_cols = 'Month')
sns.barplot(x = data.index, y =data['delay'])
plt.title("Average Arrival Delay By Month")
plt.ylabel('Arrival Delay (in minute)')
```

*Bar Plot*

Let’s also look at heatmap where it can be used to illustrate patterns in our dataset by color-coding each cell to its corresponding value:

```
sns.heatmap(data = data, annot = True)
plt.title('Average Airline Delay for Each Airline')
plt.xlabel('Airline')
```

*Heatmap*

The `annot`

parameter ensures all value appears on the chart. By setting to `False`

, we would have no number for each cell.

### 3. Scatter Plot

Scatter plot is used to show the relationship between two variables. It is a useful plot to understand the relationship between two variables. Here’s how we can plot it using Python:

```
sns.scatterplot(x = insurance_data['bmi'], y = insurance_data['charge'])
```

*Scatter Plot*

From the above plot, it seems that BMI are positive correlated with the insurance costs. Now, let’s do a double-checking by adding a regression line to our plot:

```
sns.regplot(x = insurance_data['bmi'], y = insurance_data['charge'])
```

*Scatter Plot With Regression Line*

In fact we can also perform some color-coding to our plot by adding the `hue`

parameter. Let’s say we want to color-code the plot by separating people that smoke and do not smoke:

```
sns.scatterplot(x = insurance_data['bmi'], y = insurance_data['charge'], hue = insurance_data['smoker'])
```

*Color-Coded Scatter Plot*

It seems that smoker will tend to pay more than non-smoker. Let’s check it again by adding two regression lines:

```
sns.lmplot(x = 'bmi', y = 'charge', hue = 'smoker', data = insurance_data)
```

*Multiple Regression Lines Plot*

Another interesting plot to look into is known as the categorical scatter plot. The plot can be produced with the `sns.swarmplot()`

command:

```
sns.swarmplot(x = insurance_data['smoker'], y = insurance_data['charges'])
```

*Swarm Plot*

Here’re some insights from the plot:

- Non-smokers are charge less than smoker on average
- People that pay the most are smokers, while those that pay the least are non-smoker.

### 4. Distributions

In this section, we will explore about histograms as well as the density plots. Histogram is a graph that shows the frequency of numerical data using rectangles while for a density plot, it represents the distribution of a numeric variable.

Here’s an example for histogram:

```
sns.histplot(irisi_data['Petal Length'])
```

*Histogram Plot*

For density plot, we will use the kernel density estimate plot, it looks like a smoothed histogram. By changing the `shade`

parameter, we can turn the display of the shaded region on and off.

```
sns.kdeplot(data = iris_data['Petal Length'], shade = True)
```

*KDE Plot*

Here’s the code to produce two-dimensional kernel density plot:

```
sns.jointplot(x = irid_data['Petal Length'], y = ['Sepal Width'], kind = 'kde')
```

*2D KDE Plot*

Furthermore, to color-code the histogram of the kernel density estimate plot, simply add a `hue`

parameter as below:

```
sns.histplot(data = iris_data, x = 'Petal Length', hue = 'Species')
sns.kdeplot(data = iris_data, x = 'Petal Length', hue 'Species', shade = True)
```

*Color-Coded Histogram Plot*

*Color-Coded KDE Plot*

### 5. Plot Style

There are several themes available in the `seaborn`

module, you can set the style or theme of your plot before you start plotting:

```
sns.set_style('dark')
# your plot here
# try other themes with: drakgrid, whitegrid, dark or white
```

### 6. Summary

Here’s a summary of all the plot you have learned:

Type | Category | Code | Remarks |
---|---|---|---|

Line Chart | Trend | `sns.lineplot()` |
Show trends over time, multiple lines can be used to show trend in more than a group |

Bar Chart | Relationship | `sns.barplot()` |
Compare quantities with respect to different groups |

Heatmap | Relationship | `sns.heatmap()` |
Find color-coded patterns in tables of numbers |

Scatter Plot | Relationship | `sns.scatterplot()` |
Show relationship between two continuous variables |

Regression Line | Relationship | `sns.regplot()` |
See linear relationship between two variables |

Multi-Regression Line | Relationship | `sns.lmplot()` |
See linear relationship between two variables involving group |

Categorical Scatter Plot | Relationship | `sns.swarmplot()` |
Observe the relationship between continuous variable and categorical variable |

Histogram | Distribution | `sns.histplot()` |
Show distribution of single numerical variable |

KDE Plot | Distribution | `sns.kdeplot()` |
Show a smooth distribution of a single or more numerical variable |

2D KDE Plot | Distribution | `sns.jointplot(kind = 'kde')` |
Display a 2D KDE plot with each KDE correspond to each variable |