Create A Terminal Media Player

So, media players are no new thing right, we have the Windows Media Player, Apple Quicktime, and the absolute champion in media playing, VLC. So why would anyone want to create another?

It's because it is in ASCII!

We programmers have always been fond of TUIs (Text User Interfaces). From keeping our screensavers to the matrix code, to using Figlet to convert text to ASCII art, it just seems that ASCII art in the terminal makes anything cool.

So in this post, I will describe how I created a quick and dirty media player that displays video MPEG and JPEG files as character art, and we will discuss how we can improve upon the program in small, meaningful ways.

📝 The Basics

So I made this program in Python (version 3.9), primarily because it has excellent libraries to handle media files and images, of which I will be using Pillow and OpenCV. However, you can use any language you like and are comfortable with, after all, OpenCV has been ported to several languages, and for images you can use a binding of ImageMagick.

Furthermore, to handle '.srt' (subtitle) files, I have used pysrt module. However, you may parse subtitle files using regex, or some hacked up string algorithm. You may also opt not to include subtitles too.

We also use a true-colour support POSIX terminal, like GNOME terminal (the one used in all GNOME systems), Konsole (KDE based systems), iTerm (macOS), and the Windows terminal (using WSL). Basically almost all terminals, the exception being Terminal.app and Command Prompt support this. What this means is that we can use full 24 bit RGB colours in the terminal.

✈️ Taking Off

So the first thing to do is to read pixel data off a picture. We can deal with videos later, but first we need a test picture. Choose any picture on your computer that is simply enough to analyse the program and continue. To read a pixel, we first load it into Pillow and simply store the data into an array/list or a tuple.

from PIL import Image
try:
    with Image.open(<your_filename>) as img:
    img = img.convert("RGB")
    row, col = img.size
    pixels = list(img.getdata())
except OSError:
    print("Could not open file!")

So in the above code we simply take the image and read its dimensions and its pixel data, which contains the RGB values (from 0 to 255) of each pixel in the image. Now we map these RGB values to our ASCII characters, which we arrange in increasing order of brightness.

ASCII_CHARS = "`^\",:;Il!i~+_-?][}{1)(|\\/tfjrxnuvczXYUJCLQ0OZmwqpdbkhao*#MW&8%B@$"
MAX_PIXEL_VALUE = 255

symbol_index = int(intensity / MAX_PIXEL_VALUE * len(ASCII_CHARS)) - 1

The variable intensity above there is what we would compute the brightness to be from the RGB values (For the time being, let intensity = 10). We subtract 1 to prevent overflow of indices above the length of the character string, and we use a simple modulo operation for the mapping.

However that -1 turns up problem when the intensity is zero. Because then it becomes -1, which means we map our most intense character to a zero intensity 😱

So we simply include a check against it, and solve the problem 😉

symbol_index = symbol_index + 1 if symbol_index < 0 else symbol_index

We could then print the character onto the screen, and adding a newline character whenever we finish a row.

Now that this is good to go, we should run and test it. This should give the output of a low intensity character for all pixels in the image.

🧨 Solving Problems

Running the program will show some outright problems, the most important being that the image is too big! Even a small image is of 500 x 300 size, but a terminal typically just has 159 columns and 36 rows, even in full screen. So we would need to resize the image first. This is done by the method img.resize(<length>, <width>). However, using a few libraries and a clever line of code means that the image could be resized based upon the terminal size.

import sys
import os
size = os.get_terminal_size()
# new_rows, new_cols defined using image's aspect ratio and the terminal size
img - img.resize((new_cols, new_rows))

So solving this, another thing we haven't yet implemented is an intensity measure, though intensity = (pixel_red + pixel_blue + pixel_green) / 3 works, the human eye is most sensitive to green light, and least to blue. So a better intensity formula would be

intensity = (0.299 * pixel_red_square + 0.587 * pixel_green_square + 0.114 * pixel_blue_square) ** 0.5

Another thing is that like you must have already noticed, the ASCII render appear squished. This is because, font characters aren't squares (like pixels), but rectangles. So to preserve the image, we would have to print the characters twice, or thrice. This would also mean tweaking the resizing code to realise this.

🎥 Video Time!

Adding video support isn't that hard. All we have to do is to politely ask OpenCV to split the video into image frames for us, and it will comply.

import cv2
vidcap = cv2.VideoCapture(<filename>)
i = 0 # a frame counter
frame_skip = 0 # to control the choppiness/frame rate
os.system("clear")
while vidcap.isOpened():
    success, image = vidcap.read()
    if not success:
        break
    if i > frame_skip - 1:
        cv2.imwrite("frame.jpg", image)
        i = 0
        # a call the old function to ASCII render image 'frame.jpg'
        continue
    i += 1
vidcap.release()

Since vidcap can also open image files, you can use the same code to even read images and then call our old methods. You can also retouch the video/frames using OpenCV. For example, you could increase contrast by image = cv2.convertScaleAbs(image, alpha=1.5, beta=50) which increase contrast to 1.25 and brightness to 50% more.

🖍 Showing Colour

Adding colour is by far the easiest. We are going to use the ultimate tool (apart from curses) to create TUIs in POSIX compliant systems... ANSI escape code 🥁

So the ANSI escape code to change the foreground colour in 24-RGB is (spaces added for readability)

"ESC[ 38;2;<r>;<g>;<b> m"

Here "ESC" is actually "\033" when using in code. (Since ASCII code of ESC is 033 in octal)
and r, g, b are the 0-255 range values. So to say print something in red, we could have the following code

print("\033[38;2;255;0;0m Hello \033[0m")

We can output this escape code before printing the corresponding character. Just do not forget to reset all colours/settings by printing ESC[0m

🛠 Fine Tuning

Experimenting uptil now, you must have observed that when playing a video, it sometimes gives that rolling shutter effect found in old TVs. This is because the program prints all the characters of a current frame below the ASCII render of the previous frame. However, the correct way to do things would be to replace characters in the ASCII render, overwriting them. This means we would have to navigate around the terminal to go to the first position, and then rewrite all characters upon the previous ones, which would give us a smoother, glitch free transition into frames.

This is also done using ANSI escape codes, by simply going to the first position in the terminal (top left corner) every time we start printing. The ANSI code to move the cursor is (spaces for readability)

ESC[ <line_number>;<column_number> H

Since this is 1-indexed, the code to move to start will be ESC[1;1H

🎙 Subtitles

To add subtitles, we would first need a .srt file. This file contains everything heard in the video, with a count and time signatures as shown below

1
00:00:00,000 --> 00:00:01,000
Hey, want to party?

2
00:00:01,001 --> 00:01:00,123
(Tense music plays)

So, according to the above file, from time 0 to 1 second, the subtitle Hey, want to party? should display, and then from 1.001 second (1 second, 001 millisecond) to 1 minute 123 millisecond, we should show the subtitle (Tense music plays)

The time signatures are in the format Hours:Minutes:Second,Milliseconds

However, since we traverse the video frame by frame, we first get our frame's timestamp by adding the following commands to the previous frame splitting code

import cv2
vidcap = cv2.VideoCapture(<filename>)
sub_file = pysrt.open(<subtitle_srt_file>)
i = 0 # a frame counter
frame_skip = 0 # to control the choppiness/frame rate
os.system("clear")
while vidcap.isOpened():
    success, image = vidcap.read()
    if not success:
        break
    if i > frame_skip - 1:
        cv2.imwrite("frame.jpg", image)
        i = 0
        timestamp = int(vidcap.get(cv2.CAP_PROP_POS_MSEC))
        # a call the old function to ASCII render image 'frame.jpg'
        # send timestamp & sub_file to a subtitle printing function
        continue
    i += 1
vidcap.release()

cv2.CAP_PROP_POS_MSEC is an attribute that returns the current frame's time after start in milliseconds, we can directly use this in our subtitle printing function, as pysrt allows for a splice feature that selects subtitles for the specific timestamp.

subs = sub_file.slice(starts_before={'milliseconds': timestamp}, ends_after={'milliseconds': timestamp})
for line in subs:
    # print line (a string)

The slice function is pretty straightforward to use, and also supports minutes, hours and seconds attributes.

🤔 Final Thoughts And Improvements

So there you have it, a pretty basic in-terminal media player. This is far from a finished program however, with features like playback speed, pause and play, and seek still missing. However you are advised to implement these and experiment around, and most of all, awe at what you just made!

19