Flatten Pandas dataframe MultiIndex

Context

A few days ago, I had to extract a data from Oracle Database using SQL, and then PIVOT a long set of values in a column, to multiple columns. The sintax to do this, requires to write every single value you want to be as a column, not one but twice to avoid the default quotes.

See reference
https://www.oracle.com/technical-resources/articles/database/sql-11g-pivot.html

There are a few options like pivoting using PIVOT XML, or even more recent to build a JSON column using JSON_ARRAYAGG and JSON_OBJECT to get dynamic columns or any value as an attribute, but still not so straightforward.

First try:

Lets try with a python notebook (I use VSCode notebook).
First import pandas:

import pandas as pd

We will use a random free dataset:

# source file from https://www.kaggle.com/gregorut/videogamesales
df = pd.read_csv('./vgsales.csv')
df

Using the function pivot_table we will transponse the values of the column Genre to be a column for every value in the dataset.

pivot_df = df.pivot_table(index=['Year', 'Publisher'], columns=['Genre'], values=['Global_Sales'], aggfunc=np.sum, fill_value=0)
pivot_df

Now let's reset_index() to "try" flattening the indexes.

mi_pivot_df = pivot_df.reset_index()
mi_pivot_df

pivot_df.to_excel('./global_sales_by_publishers_genres.xlsx', index=False)

Second try

I searched the solution to flat the most, but no answer was full, complete enough, and free of errors. I collected all, and by try and error, got here, a working solution. I also works after using pivot function:

flat_index_pivot_df = pivot_df.copy()
flat_index_pivot_df.columns = flat_index_pivot_df.columns.droplevel(0)
flat_index_pivot_df.reset_index(inplace=True)
flat_index_pivot_df.columns.name = None
flat_index_pivot_df

Now you can get a clean Excel Sheet, free of MultiIndex.

flat_index_pivot_df.to_excel('./global_sales_by_publishers_genres.xlsx', index=False)

Do you know another solution?

Categories:

Python Sql

Tags:

Pandas Python Sql Datascience

NodeJS vs Python

App Development Practices To Follow For Your Business In 2021

Arctype connects to YugabyteDB

ETL: Extract Data with Node.js

ETL: Load Data to Destination with Node.js

ETL: Transform Data with Node.js

SQL query builders

SQL Server Recovery Models: A Quick Guide

Flatten Pandas dataframe MultiIndex

Categories:

Tags:

Related Posts