22
Power BI's Power Query (M) for Python Programmers
Since April 2019, Python has been a part of Power BI. This is great in theory. Python is a great language known for its prominence in the data science community. It has a wide array of tools to perform data wrangling, calculations, visualizations, and a whole lot more!
This is also great in practice. Analysts who have pre-existing Python code would have a relatively easy time onboarding to Power BI, not to mention the ease of development time that Python brings. However, putting together two different stack often has a lot of barriers. It could be technical, organizational, cultural, infrastructure-related, or whatever else you'd like to categorize. Point being, there is always going to be an inevitable scenario for users to pick up a software's native tools. (Yes Microsoft, a lot of us still code in VBA).
In this post, I will attempt to put together a "cheat sheet" of Power Query (M) features that Python programmers (e.g., me) would take interest in. (This serves as a disclaimer that I am also new at this).
PowerBI comes with two built-in languages: Power Query and DAX (Data Analysis Expressions).
DAX is available in Home > Calculations ribbon. It is preferrable to do it on Data view (at the left side of the window, along with Report View and Model View) to unlock the New Column option. This is a sample DAX:
Revenue = [Sales] * [PricePerUnit]
This DAX expression creates a new column called Revenue
which is a product of the Sales
column and the ProductPerUnit
column. This operation should be fairly familiar to Excel users.
On the other hand, Power Query (M) is available in the Power Query Editor which you can launch at Home > Queries > Transform Data. In the Power Query Editor, you may also launch the Advance Editor via Home > Query > Advance Editor.
M
is the formula language behind Power Query (i.e., let
... in
). The actions you do in the Power Query GUI is translated to M
behind the scenes. Fortunately, both can be used pretty much interchangeably with regards to search engine.
This is a sample M
code:
let
src = Table.FromList({
"Foo", "Spam", "Bar", "Ham", "Baz"
}),
output = Table.SelectRows(
src,
each ([Column1] = "Spam")
)
in
output
The snippet shows two steps: step src
to initiate a table and step output
to filter a desired value. This step-by-step data transformation is something most programmers would be familiar with.
There's definitely going to be an overlap between what Power Query and DAX are capable of. Of course, one feature is going to be easier in the other and vice versa. But ulitimately, it will all boil down to preference.
These two videos are both wonderul resources for introduction to M
. Both resource persons are from the Power BI Customer Advisory team. I urge every one to at least watch the first 5 or 10 minutes of either videos just so we can get some of the basics out of the way.
Introduction to M in Power BI - Chris Webb
Power BI Dev Camp March Intro to M Programming
Anyway, without further ado, let's jump into the content!
Python 3 | M | Remarks |
---|---|---|
# Comments |
//single-line , and /* multi-line*/
|
|
42 | 42 | |
-42 | -42 | |
+42 | +42 | |
--42 | --42 | |
not 42 # False | no truthy values for M | |
not 0 # True | no falsy values for M | |
42 == 42 | 42 = 42 | |
5 + 2 | 5 + 2 | |
5 - 2 | 5 - 2 | |
5 * 2 | 5 * 2 | |
4 / 3 | 4 / 3 | both are 1.3333333333333333 |
2 // 3 | no native floor division in M , but DAX has FLOOR
|
|
'Hello World' |
"Hello World" |
M string literal is strictly quotation mark "
|
'Hello ' + 'World!' |
"Hello " & "World!" |
|
'HeLLo wOrLd'.upper() |
Text.Upper("HeLLo wOrLd") |
result is HELLO WORLD
|
'HeLLo wOrLd'.lower() |
Text.Lower("HeLLo wOrLd") |
result is hello world
|
'Wo' in 'Hello World' |
Text.Contains("Hello World", "Wo") |
|
'len('Hello World') |
Text.Length("Hello World") |
|
['Foo', 'Bar'] |
{ "Foo", "Bar" } |
|
['Foo', 'Bar'] + ['Spam', 'Ham'] |
{ "Foo", "Bar" } & { "Spam", "Ham" } |
|
['Foo', 'Bar', 'Baz'].pop() |
List.RemoveLastN({ "Foo", "Bar", "Baz" }) |
Python's list.pop modifies the list but returns the element removed. M's List.RemoveLastN returns the resulting list. |
range(10) |
{ 0 .. 9} |
Python's range is a generator, not a list. |
range(1,11) |
{ 1 .. 10} |
This is not a complete list but hopefully, it should be enough for intermediate Python users to pick up the nuances between Python's data model vs M's.
Python's is
operator evaluates if two objects refer to the same object.
Example Python code:
a = {}
b = {}
c = a
a == b # True, both are empty dictionaries
a is b # False, both are separately initialized dictionary.
c is a # True, c refers to the same dictionary as a
There are nuances to it but this should illustrate the the point.
M's is
operator is completely different. In M, is
is used to check for type conformance.
Example M code:
42 is number //TRUE
Python does not have a native implementation of a Record or a Table.
Record
is most likely inspired by the database term. It goes in the form of [Name="Seinfeld", Age=33]
.
At first glance, Record
looks like a Python dict
(in that it has key-value pairs) but keys in a Python dict need to be something that's already defined. It also looks similar to namedtuple
but a schema for a namedtuple has to be specified first (note though that defaults were added since Python 3.7); a Record
does not have such restriction.
A Record implemented in Python would definitely look like this:
class Record:
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, key, value)
jerry = Record(Name='Seinfeld', Age='33')
print(jerry.Name) # Seinfeld
print(jerry.Age) # 33
Thanks to libraries like pandas
, Python users does have options for dealing with 2-Dimensional data.
Table is definitely the most prominent data type in Power Query.
We're not gonna go through all of the available methods but one good feature to note is that is has quite a number of class methods which can be used to instantiate a table. Here's an example.
let
output = Table.FromRecords({
[Name="Jeff Winger", Age=27],
[Name="Britta Perry", Age=28]
})
in
output
Functions in Python can created (generally) with the def
keyword, or anonymously with lambda
.
For example:
def add(addend1, addend2):
return addend1 + addend2
print(add(4,2)) # 6
add_lambda = lambda addend1, addend2: addend1 + addend2
print(add_lambda(4,2)) # 6
Defining functions in M
is similar to JS (welp, there's an arrowhead). Functions in M
can be assigned in a separate query, or in a variable.
(addend1, addend2) => addend1 + addend2
As far as I know, M
does not have list comprehensions or map
but native types should have sufficient methods for transformation.
Example: return a list containing the squares of numbers from 0 to 9.
Python:
numbers = list(range(10))
def square(n):
return n ** 2
# via comprehension
squares_comprehension = [ square(i) for i in numbers ]
# via map
squares_map = list(map(square, numbers))
In M
, it is translated as:
let
numbers = { 0 .. 9 },
square = (n) => Number.Power(n,2),
squares_list = List.Transform(numbers, square)
in
squares_list
The both underscore (_) and each
are syntactic sugar for unary functions (functions that only has one arguments. Take a snippet the code example above.
square = (n) => Number.Power(n,2)
square
is a function which takes a parameter n
and returns the square if n
. It is possible to substitute n
with an underscore. This is usually common when functions are passed anonymously.
For example, the code above can be re-written as:
List.Transform(
{ 0 .. 9},
(_) => Number.Power(_,2)
)
It is also further possible to simplify using the each
keyword
List.Transform(
{ 0 .. 9},
each Number.Power(_,2)
)
Do take note that underscore can represent any data type.
let
src = Table.FromRecords({
[Name="Foo"],
[Name="Bar"],
[Name="Baz"],
[Name="Spam"],
[Name="Ham"]
}),
out = Table.SelectRows(
src,
each Text.StartsWith([Name], "B") // (_) => Text.StartsWith(_[Name], "B")
)
in
out
M
can be strongly typed. Take below code for example.
let
add = (addend1, addend2) => try addend1 + addend2 otherwise addend1 & addend2,
output = add("1", "2")
in
output
In this example, we passed Text-types "1"
and "2"
to our custom function add
. The function add
will try to add the two parameters if possible, or else, it will attempt to concatenate.. M
will not type case "1"
and "2"
into number types. Hence, the result will be"12"
(Text).
One can modify the function by specifying the types the function is expected to get.
let
add = (addend1 as number, addend2 as number) => try addend1 + addend2 otherwise addend1 & addend2,
output = add("1", "2")
in
output
The code above will throw an Error.
It is also possible to define some sort of a schema when creating custom tables.
let
schema = type [Name=text, Age=number],
my_table = type table schema,
output = #table(
my_table,
{
{"Jeff Winger", 27},
{"Britta Perry", 28}
}
)
in
output
Unlike in the earlier example where we used Table.FromRecords
to instantiate the table, this method does not require users to type the attributes of each item.
Power BI is an awesome software that Python programmers can add to their arsenal. One of its tools, M
, is a mature query language which can go toe to toe with Python packages like pandas
.
Edit
Community Contribution:
Would like to thank u/MonkeyNin for contributing some topics on r/PowerBI. You may refer to their gist where they cover additional topics such as Selection and Projection Operators and more!
22