Ninja for .NET with F# and Visual Studio Code

Next post of “Programming XeonPHI using .NET and F#” series

 

We recently received a XeonPHI Ninja developer platform (a pedestal liquid cooled unit) and started to play with it (for those unaware of the XeonPhi processor look at the quick summary at the end of this post); I was thrilled to test the second iteration of the XeonPhi processor codenamed Knights Landing, KNL for friends.

KNL is a significant improvement from the first iteration of this platform codenamed Knights Corner, and it’s the first iteration capable of running standalone and not as an accelerator inside a server/desktop machine.

The ambitious goal from Intel was to deliever a manycore processor capable of supporting GPGPU computing kind of workload while preserving the x86 instruction set in the cores. This is a very intersting challenge since part of the massive parallelism of which GPU is due to the simiplicity of GPU cores, so it may be difficult to match the same performance with more complex cores. Benchmark will show and debates will take place between GPU and XeonPHI fans, though in my humble opinion it is an irrelevant comparison. A GPU is a hardware mostly shaped for 3D graphics, and then reused in problems similar to it; its strenght is also its weakness: a lack of flexibility. So in the last 10 years the world of programmers started classifying algorithms in GPU friendly or not. XeonPhi takes a different angle by packing a large number of general purpose cores featuring the x86 instruction set. Even though the XeonPhi platform is younger, it is time to start performing a similar exercise and classify algorithms and problems as XeonPhi friendly or not. KNL platform in particular is the first iteration of fully compatible x86 code, in the sense that the processor is capable of executing any x86 code without need for rewriting or even recompiling (KNC was implemening a reducted x86 instruction set and a different ABI, making recompilation necessary).

Something that is now possible with XeonPhi and still impossible on a GPU is the execution of a full program, not just accelerate its computational portion. This means that XeonPhi allows for a very different kind of programming, where more traditional programming tools and models may benefit of the large number of cores and vector processing units. As a matter of fact the Ninja development platform ships without a traditional CPU since the internal XeonPhi executes a CentOS 7.2 (with appropriate drivers to let linux kernel being aware of the platform difference).

In this new era of manycore programming I was dead curious to see how standard enterprise runtimes such as .NET performs on the architecture, in order to understand if this platform would contribute broadening the population of programmers writing parallel aware code. As you may know I’m not only a .NET fan, but also an F# fan, so I decided to start a series of blog posts on my experiments programming the XeonPhi using F# and mono.

In this first post of the series I will simply discuss how to setup a standalone KNL XeonPhi with .NET, F# and… Visual Studio code… that was thrilling!

Installing .NET and F#

The first thing you notice when playing with KNL is that it truly is a full x86 machine. This means that you can easily browse to Mono-project, go to the Download section and follow the instructions for how to add the mono repository to your yum installation. Once you completed the yum configuration and and added the EPEL references by installing epel-release rpm you can install mono and F#:

# yum install fsharp

You can easily test your installation by executing the F# REPL environment F# interactive:

# fsharpi

F# Interactive for F# 4.0 (Open Source Edition)
Freely distributed under the Apache 2.0 Open Source License

For help type #help;;

> 2 + 2 ;;
val it : int = 4
> #quit;;

- Exit...
#

You may think that everything is fine, but there is a glitch in the current mono distribution that you have to manually fix: in some cases libraries (such as the popular F# paket system) will need the libMonoPosixHelper.so dynamic library, which is not present in the default LD paths. To fix this you can simply link it:

# ln -s /usr/lib64/libMonoPosixHelper.so /usr/lib

Installing Visual Studio Code

Visual Studio Code is a modern and lightweight editor that allows you to easily program using F# (I could also have used the Atom editor, but it is clearly more intriguing to run VS and .NET on Linux). You can download the rpm from the download section and install it:

# rpm -i code-1.5.2-1473686317.el7.x86_64.rpm
# code
#

Notice that Visual Studio requires graphics, so I used my favorite X server on Windows MobaXTerm.

Finally you can install Ionide to enable F# support inside Visual Studio Code.

Installing FSLab

For some of our activities we will use FSLab package, so we can have interactive graph display and pursue a more modern, data-science oriented, approach in using XeonPhi. FSLab is just a tarball available on the tool website and run

# mono .paket/paket.bootstrapper.exe
# mono .paket/paket.exe install

You can open the folder in Visual Studio code and play with the Tutorial.fsx (select code in the editor and evaluate it by pressing Alt+Enter). You can get this beatiful output:

Now everything is set, and we can move forward to the first experiments on KNL using .NET platform and F#.

XeonPhi short background

In 2008 Intel publicly shared a project codenamed Larrabee whose goal was to define a manycore architecture similar to a GPGPU but using more flexible x86 cores. The project has taken years to deliver and it changed over time to resurface as the XeonPhi architecture. The first iteration of the processor, codenamed Knight Corner (KNC), was shipped in the PCIe card form factor as an accelerator for numeric intensive computations. Although mostly x86, the platform wasn’t fully x86 reducing the scope since a recompilation was needed for x86 programs. The second iteration of the platform fixed this by implementing the full x86 instruction set. The current system features up to 72 cores organized in 36 tiles, and a high bandwidth memory called MCDRAM. Every core has two Vector Processing Units controlled by the AVX512 instructions.

 

Next post of “Programming XeonPHI using .NET and F#” series

Leave a Reply