25
SQL Query into Pandas DataFrame - Part 2
Continuing the last part, we are going deep into intermediate SQL translated into Pandas DataFrame.
We will be using the same SQLite database from the previous post, but now we will use the invoice
table and the csv file of the table.
import pandas as pd
invoices_df = pd.read_csv("invoices.csv")
We will cover some aggregation function and groupby function in both sql and pandas.
SUM
:
SELECT SUM(Total)
FROM invoices
invoices_df['Total'].sum()
COUNT()
:
SELECT COUNT(BillingState)
FROM invoices
invoices_df['BillingState'].count()
AVG()
:
SELECT AVG(Total)
FROM invoices
invoices_df['Total'].mean()
MAX()
:
SELECT MAX(Total)
FROM invoices
invoices_df['Total'].max()
MIN()
:
SELECT MIN(Total)
FROM invoices
invoices_df['Total'].min()
SELECT
CustomerId,
SUM(Total) AS Total
FROM invoices
GROUP BY CustomerId
## grouping with all (number) columns aggregated
invoices_df.groupby(['CustomerId']).sum()
## the same as the sql result
invoices_df.groupby(['CustomerId']).sum().reset_index()[['CustomerId', 'Total']]
25