From HTML To The Screen: How Browsers Render Web Pages

The most basic function of a web browser is to get a html file, along with
an optional css, interpret them and display a page to the user. It is a complex
process but it is based on basic principles that any developer can understand.

In this post we will learn the basics of html rendering and discuss how we can
improve our html and css from what we learned.

Let's get right to it!

Constructing the DOM

After downloading the html file from the web, the first step taken by the browser
is to construct the DOM based on this file.

The DOM (Document Object Model) is the internal browser representation of a page
and it is represented as a tree.

To better understand how the DOM is built let's use the following HTML as an
example and go over each step of the DOM construction for this file.

<html>
  <head>
    <link href="main.css" rel="stylesheet" >
    <title>From HTML To The Screen: How Browsers Render Web Pages</title>
  </head>
  <body>
    <h1>Lorem Ipsum</h1>
    <p>Dolor sit amet.</p>
  </body>
</html>

To construct the DOM tree, the browser has to first read each character of the
html text file and transform this text into a sequence of tokens.

A token is a representation of a piece of text that has a special meaning and
specific rules about how to handle it.

The example HTML above will produce the following sequence of tokens:

HTML > HEAD > LINK > TITLE > TEXT > /TITLE > /HEAD > BODY > H1 > TEXT > /H1 >
P > TEXT > /P > /BODY > /HTML

Note that not only HTML tags are identified by tokens. The text content from the
title, h1 and p tags generate a TEXT token to identify them.

After transforming the HTML text into tokens, the browser scans each token and
arranges them in the DOM tree structure. The browser knows how to create
this structure based on the order of these tokens and the rules for each token.

Each node of the DOM tree is also used to store additional information about that node
like tag attributes.

The entire process can be represented by this simple image:

The CSSOM

The CSSOM is a structure similar to the DOM that is constructed by the browser
to store the css information for a page.

Like the DOM, the CSSOM is constructed as a tree and the steps required
to generate the CSSOM from a css file are the same as the DOM.

The file is read character by character to generate tokens that will be later
arranged in a tree structure.

Let's have a look how the browser will construct the CSSOM for the following css.

body { font-size: 14px; }
p { line-height: 25px; }
span { color: red }
p span { color: green }

The tree structure of the CSSOM is constructed from the most generic to the most
specific rule.

Each node of the CSSOM contains the css rule defined by the css in addition to
rules inherited from parent nodes. This rule inheritance is also called "cascading"
and that's where the name Cascading Style Sheets comes from.

In the example above it's possible to see that the font-size rule defined for
the body is cascaded down to p and span.

Another interesting thing to note in this example is the rule overriding. The css
defines that all span elements should be red. But all span that are inside a p
tag will be rendered green because a more specific rule p span overrides the
initial rule.

The Render Tree

After creating the DOM tree and the CSSOM tree the browser combines these trees
into a new tree called the "render tree".

The render tree will contain only nodes that need to be rendered by the browser.
DOM nodes representing tags like <script> or <link> tags won't be present in
the render tree. Nodes with css rules like display: none won't be present either.

Layout Calculation

In this phase the browser will read the render tree starting on the root node
and will traverse each node of the tree to calculate the exact size and position
in pixels of every html element on the page.

This step is necessary because positioning, widths and heights can be defined in
the css using relative units like % or em and must be converted to absolute
pixels equivalent based on the browser viewport size.

Painting

And finally, the last step. With every position defined in exact pixel units the
browser can start painting each pixel of the screen to render the page.

Optimizing The Rendering Process

The page rendering process is long and complex. Based on what we learned so far
there are 3 easy optimizations we can do to help the browser to go trough this
entire process and display the page as fast as possible to our users.

The first is, remove all unnecessary html and css from your pages. Unnecessary html
and css that won't ever be displayed makes the browser do a lot of useless
work.

That's because the browser will parse and create the DOM and CSSOM trees
for all html and css and will only detect what should be displayed or hidden
in the render tree construction step.

Second, improve your server response time. By getting the html and css files
earlier, the browser can start the whole rendering process earlier too.

Third, reduce the html and css file sizes. That can be done by removing unused
html/css or by using file compression at the backend. This will reduce the
time needed for the browser to download all the necessary files and will
enable it to start the parsing earlier.

In Conclusion

Browsers are a really complex piece of software.

But some basic knowledge of how the browsers work, specially how they render
web pages are helpful to be able to create better pages.

25