SQL Query into Pandas DataFrame - Part 3

After playing some aggregation and grouping in the last part, now we will play harder with table joins.

The Playground Database

We will be using the same SQLite database, but now we are going to use some tables. So get all the required csv files here

Preparing the DataFrame

import pandas as pd

albums_df = pd.read_csv("albums.csv")
artists_df = pd.read_csv("artists.csv")

Join Queries into Pandas DataFrame

INNER JOIN:

SQL:

SELECT
    *
FROM albums
JOIN artists ON albums.ArtistId = artists.ArtistId

or

SELECT
    *
FROM albums
INNER JOIN artists ON albums.ArtistId = artists.ArtistId

Pandas:

# For the exact same column name on both table
albums_df.merge(artists_df, on='ArtistId')

# Defining the join column of each tables
albums_df.merge(artists_df, left_on='ArtistId', right_on='ArtistId')

# To make sure we use the INNER one
albums_df.merge(artists_df, left_on='ArtistId', right_on='ArtistId', how='inner')

LEFT JOIN

SQL:

SELECT
    *
FROM albums
LEFT JOIN artists ON albums.ArtistId = artists.ArtistId

Pandas:

albums_df.merge(artists_df, on='ArtistId', how='left')

RIGHT JOIN

SQL:

SELECT
    *
FROM albums
RIGHT JOIN artists ON albums.ArtistId = artists.ArtistId

Pandas:

albums_df.merge(artists_df, on='ArtistId', how='right')

24