Handling of symlinks on Windows (Perl, MSYS2, Cygwin)

After reading about the Perl language server module for VS Code,
I was eager to test it since I had been missing a possibility to debug Perl code from inside VS Code. In the past I have been using emacs as my primary editor for a long time, but am gradually using VS Code more and more now. VS Code has built-in debugging support for the Node.js runtime and can debug JavaScript, TypeScript. For debugging other languages one can install Debugger extensions in the VS Code Marketplace.

The Language Server and Debugger for Perl VS Code extension allows debugging of Perl scripts, see the link above for an extensive list of features. To use the extension you first need to install the Perl module Perl::LanguageServer. I first tried to install it on Ubuntu 21.04. The module installed fine here, but after installation the extension still did not work. I found that the reason was that I was using a custom perl installed with perlbrew. Fortunately, it was easy to fix by going into the extension settings and setting an absolute path to the binary:

Next, I was curious to see if it also would install on Windows. Even if my preferred platform is Linux, I occasionally use Windows. For example when trying to answer questions on Stack Overflow that are related to Perl and Windows. I am using Windows 10 (run from Ubuntu through KVM), Home edition, 21H1, and Strawberry perl version 5.32.1. As anticipated, I quickly ran into issues with installing dependent modules, in particular IO::AIO was difficult.

I downloaded the source and tried to install IO::AIO manually (not using cpanm) first:

>perl Makefile.PL
[...]
*** It seems you are running perl version 5.032001, likely the "official" or
*** "standard" version. While there is nothing wrong with doing that,
*** standard perl versions 5.022 and up are not supported by IO::AIO.
*** While this might be fatal, it might also be all right - if you run into
*** problems, you might want to downgrade your perl or switch to the
*** stability branch.
***
*** If everything works fine, you can ignore this message.
***
***
*** Stability canary mini-FAQ:
***
*** Do I need to do anything?
***    With luck, no. While some distributions are known to fail
***    already, most should probably work. This message is here
***    to alert you that your perl is not supported by IO::AIO,
***    and if things go wrong, you either need to downgrade, or
***    sidegrade to the stability variant of your perl version,
***    or simply live with the consequences.
***
[...]

*** Your platform is not standards compliant. To get this module working, you need to
*** download and install win32 pthread (http://sourceware.org/pthreads-win32/).
***

Generating a gmake-style Makefile
Writing Makefile for IO::AIO
Writing MYMETA.yml and MYMETA.json

Apparently it wants me to install POSIX Threads for Windows. I also noticed that the Makefile.PL is calling GNU Autotools configure script which, as far as I know, cannot be run without a POSIX subsystem like Cygwin or MSYS2.

Since I already had Cygwin and MSYS2 installed, I decided to give MSYS2 a try. According to wikipedia, MSYS2 ("minimal system 2") is a software distribution and a development platform for Windows, based on Mingw-w64 and Cygwin, that helps to deploy code from the Unix world on Windows. Instead of providing a full environment like Cygwin does, MSYS2 focuses on being a development and deployment platform.

From the MSYS2 terminal window, also using perl version 5.32.1 (but still a different binary than the perl used by the CMD prompt):

$ perl Makefile.PL
[...]
*** The stability canary says: (nothing, it was driven away by harsh weather)
***
*** It seems you are running perl version 5.032001, likely the "official" or
*** "standard" version. While there is nothing wrong with doing that,
*** standard perl versions 5.022 and up are not supported by IO::AIO.
*** While this might be fatal, it might also be all right - if you run into
*** problems, you might want to downgrade your perl or switch to the
*** stability branch.
***
*** If everything works fine, you can ignore this message.
***
[...]
Continue anyways?  [y]
configure: loading site script /etc/config.site
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.exe
checking for suffix of executables... .exe
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
[...]
checking for st_birthtimespec... yes
checking for st_gen... no
checking for statx... no
checking for accept4... yes
configure: creating ./config.status
config.status: creating config.h
Generating a Unix-style Makefile
Writing Makefile for IO::AIO
Writing MYMETA.yml and MYMETA.json
$ make
$ make test
[...]
t/03_errors.t ... 1/12 # Failed test 9 in t/03_errors.t at line 57
#  t/03_errors.t line 57 is:       ok (!$_[0]);
# Failed test 10 in t/03_errors.t at line 58
#  t/03_errors.t line 58 is:       ok ("\\test\\" eq readlink $some_link);
t/03_errors.t ... Failed 2/12 subtests
[...]

And more failed tests were to come. Most fails were due to an unexpected behavior of symlinks on Windows and MSYS2. Even if I was able to fix the issue with IO::AIO, there were other issues with other modules also needed by Perl::LanguageServer.

Despite this setback, I saw it as an opportunity to learn more about symlinks on Windows. Apparently there were similar issues with other modules that also needed to be fixed. As I was working with a pull request to file-copy-recursive-reduced I was encouraged to write this blog post.

Symlinks on Windows (MSYS2 and Cygwin)

A symlink is a file that contains a reference to another file or directory in the form of an absolute or relative path. The referenced text string is automatically followed by the operating system as a path to another file or directory. This other file or directory is called the "target". So the symlink is a second file that exists independently of its target. If a symlink is deleted, its target remains unaffected. If a symlink points to a target, and sometime later that target is moved, renamed or deleted, the symlink is not automatically updated or deleted, but continues to exist and still points to the now non-existing target. Symlinks pointing to non-existing targets are called broken symlinks.

On Unix-like operating systems, the target does not have to exist when a symlink is created, so broken symlinks can even be created initially. The ability to create broken symlinks is particularly useful when copying directories. Usually, the algorithm used to copy a directory just copies the files in the order returned by the readdir() system call and since it is legal to create broken symlinks it does not need to copy the target of a relative symlink before the symlink itself is copied.

So on Unix-like operating systems symlinks and broken symlinks can always be created, whereas on Windows this is not always the case. Early versions of Windows did not have symlinks of any kind, Windows 95 introduced file shortcuts, Windows XP introduced native symlinks (only enabled by default for kernel mode programs), and starting with Windows 10 Insider build 14972, native symlinks could be created without needing to elevate the console as administrator (to enable this, go to the Windows settings app and choose "Update & Security" -> "For developers", and turn on "Developer mode").

Even if the Windows shortcut file is just a metafile used by the Windows File Explorer, it has been used by Cygwin to emulate symlinks. However, the Cygwin shortcut file cannot be read properly by the File Explorer since it lacks many of the expected header fields, whereas a shortcut file created in the File Explorer can be read by Cygwin.

In addition to this, in Cygwin and MSYS2 there is a further complication to the creation of symlinks. In Cygwin, creation of symlinks depends on an environment variable called CYGWIN. Depending of the content of this environment variable, the creation of broken symlinks may fail, or the creation of non-broken symlinks may fail if developer mode (see discussion above) is not activated. The environment variable also regulates whether the symlink will be created as a shortcut file or as a native symlink.

Behavior of ln -s on Cygwin

The behavior of the ln --symbolic <target> <destination> command in Cygwin depends on the environment variable CYGWIN which is used to configure many global settings for the Cygwin runtime system. It contains options separated by blank characters. The option that is important for the ln -s command is called winsymlinks. According to the Cygwin documentation, there are four cases for the winsymlinks option to consider:

  1. winsymlinks is not defined. (Note: this behavior differs from that of MSYS2, see below). This is called the default behavior for Cygwin.
    a) If native symlinks are enabled (see discussion above), then this is equivalent to setting winsymlinks to native (e.g. CYGWIN=winsymlinks:native), see 3) below.
    b) If native symlinks are not enabled, this is equivalent to setting winsymlinks to lnk see 2) below.

  2. winsymlinks is empty (CYGWIN=winsymlinks) or winsymlinks is set to lnk (e.g. CYGWIN=winsymlinks:lnk)
    Whether <target> exists or not, ln -s creates as a Windows shortcut file.

  3. winsymlinks:native
    a) If native symlinks are enabled, and whether <target> exists or not, creates <destination> as a native Windows symlink. Note, this is most similar to the behavior of ln -s on *nix.
    b) If native symlinks are not enabled, it is equivalent to setting winsymlinks to lnk, see 2) above.

  4. winsymlinks:nativestrict
    a) If native symlinks are enabled and <target> exists, creates <destination> as a native Windows symlink,
    b) else if native symlinks are not enabled or if <target> does not exist, ln -s fails.

Behavior of ln -s on MSYS2

Similiarly to the CYGWIN environment variable, the MSYS environment variable is used to configure global settings for the MSYS2 runtime system (since MSYS2 is based on Cygwin). The four cases for the winsymlinks option to consider is:

  1. winsymlinks is not defined. The default behavior for MSYS2. (Note: this is not similar to Cygwin)
    a) If <target> exists, <target> is (surprise!!) copied to <destination>, so <destination> does not become a symlink but simply a copy of <target>, this happens whether <target> is a file or a directory, or whether native symlinks are enabled or not.
    b) If <target> does not exist, ln -s fails.

  2. winsymlinks or winsymlinks:lnk : (Similar to Cygwin, see above)

  3. winsymlinks:native : (Similar to Cygwin, see above)

  4. winsymlinks:nativestrict : (Similar to Cygwin, see avove)

Behavior of symlinks in Perl on Windows (MSYS2 and Cygwin)

In both MSYS2 and Cygwin,

perl -MConfig -E'say $Config{d_symlink}'

prints define (meaning the symlink call is implemented), whereas in regular windows (CMD prompt and $^O eq "MSWin32") with e.g. strawberry perl, $Config{d_symlink} is only defined for perl versions >= 5.33.5, see perldelta.

So the perl symlink function "works" on MSYS2 and Cygwin, and for newer versions of MSWin32. However, since the newest Strawberry perl release is currently at 5.32.1, I was not able to test how symlinks behave with MSWin32. I will therefore in the following focus on Cygwin and MSYS2.

It would be nice if one could easily check from within a Perl script if developer mode was on and thus native symlinks were enabled. However, the only way I found was to use XS (a C extension) to check the value of the registry key SOFTWARE\Microsoft\Windows\CurrentVersion\AppModelUnlock.

For the further discussion below, consider the Perl statement:

symlink $target, $dest;

If $dest exists, the symlink command always fails (returning a value of 0 and setting $!). So consider the case where $dest does not exist: There are four cases for the winsymlinks option contained in the MSYS or CYGWIN environment variable to consider, as was done above for the ln -s command. It turns out that symlink behaves identically to the ln -s command, and when ln -s fails, symlink also fails and returns a value of 0 and sets $! (ERRNO).

Also note that the environment variables MSYS or CYGWIN cannot/should not be changed from within the Perl script itself. I am not sure why this does not work, but I tested it and it showed undefined behavior in my tests. So the variables should be set before perl is run, e.g. on the command line:

MSYS=winsymlinks:native perl p.pl

What about the perl -l operator ? Tests show that it does not differentiate between a Windows shortcut file and a native symlink file. So

say "symlink" if -l "foobar";

prints "symlink" for both file types. Further, there seems to be no tool available to determine which of the two file types a given symlink file is. This means that when copying a symlink file, it is difficult to determine if the destination should be a native symlink or a windows shortcut. Hence, copying a symlink can silently convert a native symlink to a shortcut file depending on the setting of the CYGWIN or MSYS environment variable and this is also how the cp command in MSYS2 (or Cygwin)

cp -a source destination

works.

Summary

The behavior of symlinks is more complicated on Windows than on Linux. This is mainly a problem for programs that needs to copy directories. These programs may fail unexpectedly if they do not handle the different options in the CYGWIN or MSYS2 environment variables. If the user does not set the environment variables, the default (meaning that the user did not set the CYGWIN variable) behavior on Cygwin is such that it will never fail if the target of the symlink does not exist or if native symlinks are not enabled. However, the default behavior on MSYS2 is different. MSYS2 will by default not create a symlink when a symlink is apparently created by calling ln -s. Instead it creates a copy of the target file. If the target file does not exist, ln -s fails.

Epilouge

I was in the end able to install Perl::LanguageServer on MSYS2, but when VS Code tried to use it, it crashed. I believe it was due to the module IO::AIO, but I have not looked further into the issue. I also discovered a new Perl module called PLS. This module is currently under active development and implements features like auto-completion. However, it currently does not implement the Debug Adapter Protocol so debugging is not available. This module also installed fine from my Windows CMD prompt (it uses IO::Async instead of IO::AIO), but it does not run from VS Code when I tested it. From the discussion on reddit I guess this problem will hopefully be fixed soon.

14