Power BI's Power Query (M) for Python Programmers

Overview

Since April 2019, Python has been a part of Power BI. This is great in theory. Python is a great language known for its prominence in the data science community. It has a wide array of tools to perform data wrangling, calculations, visualizations, and a whole lot more!

This is also great in practice. Analysts who have pre-existing Python code would have a relatively easy time onboarding to Power BI, not to mention the ease of development time that Python brings. However, putting together two different stack often has a lot of barriers. It could be technical, organizational, cultural, infrastructure-related, or whatever else you'd like to categorize. Point being, there is always going to be an inevitable scenario for users to pick up a software's native tools. (Yes Microsoft, a lot of us still code in VBA).

In this post, I will attempt to put together a "cheat sheet" of Power Query (M) features that Python programmers (e.g., me) would take interest in. (This serves as a disclaimer that I am also new at this).

Power Query (M) and DAX

PowerBI comes with two built-in languages: Power Query and DAX (Data Analysis Expressions).

DAX is available in Home > Calculations ribbon. It is preferrable to do it on Data view (at the left side of the window, along with Report View and Model View) to unlock the New Column option. This is a sample DAX:

Revenue = [Sales] * [PricePerUnit]

This DAX expression creates a new column called Revenue which is a product of the Sales column and the ProductPerUnit column. This operation should be fairly familiar to Excel users.

On the other hand, Power Query (M) is available in the Power Query Editor which you can launch at Home > Queries > Transform Data. In the Power Query Editor, you may also launch the Advance Editor via Home > Query > Advance Editor.

M is the formula language behind Power Query (i.e., let ... in). The actions you do in the Power Query GUI is translated to M behind the scenes. Fortunately, both can be used pretty much interchangeably with regards to search engine.

This is a sample M code:

let
    src = Table.FromList({
        "Foo", "Spam", "Bar", "Ham", "Baz"
    }),
    output = Table.SelectRows(
        src,
        each ([Column1] = "Spam") 
    )
in 
    output

The snippet shows two steps: step src to initiate a table and step output to filter a desired value. This step-by-step data transformation is something most programmers would be familiar with.

There's definitely going to be an overlap between what Power Query and DAX are capable of. Of course, one feature is going to be easier in the other and vice versa. But ulitimately, it will all boil down to preference.

These two videos are both wonderul resources for introduction to M. Both resource persons are from the Power BI Customer Advisory team. I urge every one to at least watch the first 5 or 10 minutes of either videos just so we can get some of the basics out of the way.

Introduction to M in Power BI - Chris Webb

Power BI Dev Camp March Intro to M Programming

Anyway, without further ado, let's jump into the content!

Cheat Sheet: Common Types and Operations

Python 3 M Remarks
# Comments //single-line, and /* multi-line*/
42 42
-42 -42
+42 +42
--42 --42
not 42 # False no truthy values for M
not 0 # True no falsy values for M
42 == 42 42 = 42
5 + 2 5 + 2
5 - 2 5 - 2
5 * 2 5 * 2
4 / 3 4 / 3 both are 1.3333333333333333
2 // 3 no native floor division in M, but DAX has FLOOR
'Hello World' "Hello World" M string literal is strictly quotation mark "
'Hello ' + 'World!' "Hello " & "World!"
'HeLLo wOrLd'.upper() Text.Upper("HeLLo wOrLd") result is HELLO WORLD
'HeLLo wOrLd'.lower() Text.Lower("HeLLo wOrLd") result is hello world
'Wo' in 'Hello World' Text.Contains("Hello World", "Wo")
'len('Hello World') Text.Length("Hello World")
['Foo', 'Bar'] { "Foo", "Bar" }
['Foo', 'Bar'] + ['Spam', 'Ham'] { "Foo", "Bar" } & { "Spam", "Ham" }
['Foo', 'Bar', 'Baz'].pop() List.RemoveLastN({ "Foo", "Bar", "Baz" }) Python's list.pop modifies the list but returns the element removed. M's List.RemoveLastN returns the resulting list.
range(10) { 0 .. 9} Python's range is a generator, not a list.
range(1,11) { 1 .. 10}

This is not a complete list but hopefully, it should be enough for intermediate Python users to pick up the nuances between Python's data model vs M's.

Cheat Sheet: is

Python's is operator evaluates if two objects refer to the same object.

Example Python code:

a = {}
b = {}
c = a
a == b # True, both are empty dictionaries
a is b # False, both are separately initialized dictionary.
c is a # True, c refers to the same dictionary as a

There are nuances to it but this should illustrate the the point.

M's is operator is completely different. In M, is is used to check for type conformance.

Example M code:

42 is number //TRUE

Cheat Sheet: Records and Tables

Python does not have a native implementation of a Record or a Table.

Record is most likely inspired by the database term. It goes in the form of [Name="Seinfeld", Age=33].

At first glance, Record looks like a Python dict (in that it has key-value pairs) but keys in a Python dict need to be something that's already defined. It also looks similar to namedtuple but a schema for a namedtuple has to be specified first (note though that defaults were added since Python 3.7); a Record does not have such restriction.

A Record implemented in Python would definitely look like this:

class Record:
    def __init__(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)


jerry = Record(Name='Seinfeld', Age='33')
print(jerry.Name) # Seinfeld
print(jerry.Age) # 33

Thanks to libraries like pandas, Python users does have options for dealing with 2-Dimensional data.

Table is definitely the most prominent data type in Power Query.

We're not gonna go through all of the available methods but one good feature to note is that is has quite a number of class methods which can be used to instantiate a table. Here's an example.

let
    output = Table.FromRecords({
        [Name="Jeff Winger", Age=27],
        [Name="Britta Perry", Age=28]
    })
in
    output

Cheat Sheet: Functions

Functions in Python can created (generally) with the def keyword, or anonymously with lambda.

For example:

def add(addend1, addend2):
    return addend1 + addend2

print(add(4,2)) # 6

add_lambda = lambda addend1, addend2: addend1 + addend2
print(add_lambda(4,2)) # 6

Defining functions in M is similar to JS (welp, there's an arrowhead). Functions in M can be assigned in a separate query, or in a variable.

(addend1, addend2) => addend1 + addend2

As far as I know, M does not have list comprehensions or map but native types should have sufficient methods for transformation.

Example: return a list containing the squares of numbers from 0 to 9.

Python:

numbers = list(range(10))

def square(n):
    return n ** 2

# via comprehension
squares_comprehension = [ square(i) for i in numbers ]

# via map
squares_map = list(map(square, numbers))

In M, it is translated as:

let 
    numbers = { 0 .. 9 },
    square = (n) => Number.Power(n,2),
    squares_list = List.Transform(numbers, square)
in 
    squares_list

Cheat Sheet: underscore (_) and each

The both underscore (_) and each are syntactic sugar for unary functions (functions that only has one arguments. Take a snippet the code example above.

square = (n) => Number.Power(n,2)

square is a function which takes a parameter n and returns the square if n. It is possible to substitute n with an underscore. This is usually common when functions are passed anonymously.

For example, the code above can be re-written as:

List.Transform(
    { 0 .. 9},
    (_) => Number.Power(_,2)
)

It is also further possible to simplify using the each keyword

List.Transform(
    { 0 .. 9},
    each Number.Power(_,2)
)

Do take note that underscore can represent any data type.

let
    src = Table.FromRecords({
        [Name="Foo"],
        [Name="Bar"],
        [Name="Baz"],
        [Name="Spam"],
        [Name="Ham"]
    }),
    out = Table.SelectRows(
        src,
        each Text.StartsWith([Name], "B") // (_) => Text.StartsWith(_[Name], "B")
    )
in
    out

Cheat Sheet: typing in M

M can be strongly typed. Take below code for example.

let
    add = (addend1, addend2) => try addend1 + addend2 otherwise addend1 & addend2,
    output = add("1", "2")
in
    output

In this example, we passed Text-types "1" and "2" to our custom function add. The function add will try to add the two parameters if possible, or else, it will attempt to concatenate.. M will not type case "1" and "2" into number types. Hence, the result will be"12" (Text).

One can modify the function by specifying the types the function is expected to get.

let
    add = (addend1 as number, addend2 as number) => try addend1 + addend2 otherwise addend1 & addend2,
    output = add("1", "2")
in
    output

The code above will throw an Error.

It is also possible to define some sort of a schema when creating custom tables.

let
    schema = type [Name=text, Age=number],
    my_table = type table schema,
    output = #table(
        my_table,
        {
            {"Jeff Winger", 27},
            {"Britta Perry", 28}
        }
    )
in
    output

Unlike in the earlier example where we used Table.FromRecords to instantiate the table, this method does not require users to type the attributes of each item.

Conclusion

Power BI is an awesome software that Python programmers can add to their arsenal. One of its tools, M, is a mature query language which can go toe to toe with Python packages like pandas.

Edit

Community Contribution:
Would like to thank u/MonkeyNin for contributing some topics on r/PowerBI. You may refer to their gist where they cover additional topics such as Selection and Projection Operators and more!

21