High throughput disk access through the network

In the last six months or so we compared SSD drives from Intel using NVMe and SATA interfaces. NVMe (Non-Volatile Memory express) is a standard protocol over PCIe for accessing block devices such as disk drives. NVMe SSD drives benefit from direct access to the PCIe bus since the disk controller has become the bottleneck in the disk access on a modern system. In three different technical reports [1, 2, 3] we compared Intel SSDs having SATA and NVMe interface. The tests clearly show how the same SSD technology performs much better when bypassing a disk controller.

When testing software defined RAID with Microsoft Storage Spaces and Intel RSTe we recorded 9GB/s bandwidth when aggregating four NVMe drives in an equivalent to RAID0 mode. Since 9GB/s corresponds to 72Gbps it is natural to wonder how much of this bandwidth is available when accessing drives through a network connection assuming a NIC with the appropriate bandwidth is available.

The test setup

The test has been performed on a Dell R630 server with four NVMe Intel P3600 SSD drives (1.6TB each). The server run Windows Server 2012R2 and Microsoft Storage Spaces is used to aggregate the four drives using the simple option (RAID0). A single NTFS volume has been created on the virtual drive. A second Dell R630 server running Windows Server 2016 has been used for accessing the virtual drive through a network share using the SMB3 protocol. The two servers have been connected to a private LAN using two 100Gbps Mellanox NICs and a Dell Z9100 switch.

NVMeThroughNet

The test setup is shown in figure.

Testing the four drives locally

Disk performance has been measured using the Diskspd utility. We used Powershell to automate the test by using the script included at the end of the post. The test essentially executes DiskSpd with the following options:

  • Sequential read and write, 128KB block, 56 queues, single thread, 2TB file
  • Random read and write, 4KB block, 56 queues, single thread, 2TB file
  • Sequential read and write, 1MB block, 1 queue, single thread, 2TB file
  • Random read and write, 4KB block, 1 queue, single thread, 2TB file
  • Sequential read, 4MB block, 56 queues, single thread 2TB file

We executed the eight tests twice, once with the empty disk, and then with a file whose size combined with the 2TB test file left only 500MB free. We refer the test executed with the disk full as preconditioned and is meant to verify the performance of the SSD when full (SSD drives may degrade performance significantly with less than 10% of available space).

The following table shows the recorded bandwidth for read and 56 queues:

 

Bandwidth MB/sec

Random

403.98

4KB

403.98

Sequential

11,294.8

128KB

8,787.47

4MB

11,294.8

As you may notice we recorded a maximum of 11GB/s bandwidth when reading sequentially 4MB blocks. Using bits instead of bytes for measuring bandwidth we get 90,36Gbps, a number that requires a 100Gbps to avoid a network bottleneck when accessing the volume through a file share.

File share access throughput

We repeated the test (with small changes due to network drive mapping) on the Windows 2016 server connected with a 100Gbps fabric implemented using the multi-rate Dell Z9100 switch. The following table shows the measured bandwidth when accessing the volume through a SMB3 file share over a 100Gbps connection compared with the test we executed on the locally connected drives:

 

Bandwidth MB/sec

Random

449.52

4KB

449.52

Local

403.98

Remote

449.52

Sequential

11,294.8

128KB (56)

9,078.48

Local

8,787.47

Remote

9,078.48

4MB (56)

11,294.8

Local

11,294.8

Remote

10,975.53

As we can notice the performance recorded when accessing the volume remotely is often better than accessing it locally. Our explanation of this phenomenon is the use of a larger amount of RAM by the network filesystem driver that probably optimizes the disk operations. The only exception in the table is the higher throughput case when transferring 4MB blocks, in this case we notice a small slowdown of the remote throughput versus the local one. In this case the network bandwidth used is 88Gbps that is noteworthy since I/O is performed by multiple queues requiring syncronization and some overhead.

Conclusions

When we started this test we were curious to see how much of the disk bandwidth available when using NVMe was available when accessing it through network. Results have been very impressive given the fact that recorded bandwidth is close to the maximum available bandwidth of a PCIe x16 bus. It is our opinion that these numbers will influence how hyperconvergent systems are designed when NVMe drive adoption will become dominant. Currently most servers offer at most 4 NVMe drives, since each drive has 4 PCIe lanes for a total of 16 PCIe lanes. We expect that this limitation will be reduced by introducting more PCIe bus dedicated to drives doubling or tripling the peak bandwidth potentially generated by NVMe drives. Under this assumption seems almost impossible to get that information throughput through the network given that 100Gbps NIC already require an x16 PCIe slot.

Test script

function runTest ([char] $letter, [int] $queues = 32,  [string] $block="128K", [bool] $seq = $false, [bool] $rdonly = $false, [int] $threads = 1, [int] $duration = 10)
{
  $fn = $letter + ":\\disktest.dat"
  $bsz = "-b" + $block
  $q = "-o" + $queues
  $sq = ""
  $t = "-t" + $threads
  $d = "-d" + $duration
  if ($seq -eq $false) { $sq = "-r" }
  C:\Diskspd-v2.0.15\amd64fre\diskspd.exe -c2048G $bsz $d $q $t $sq -W -h -w0 $fn
  rm $fn
  if ($rdonly -eq $false) {
    C:\Diskspd-v2.0.15\amd64fre\diskspd.exe -c2048G $bsz $d $q $t $sq -W -h -w100 $fn
    rm $fn
  }
}

function makeReserve ([char] $letter)
{
  $vol = Get-Volume -DriveLetter $letter
  $gig = 1024*1024*1024
  $sz = "-c" + ([math]::Truncate(($vol.SizeRemaining - 2048*$gig) / $gig)) + "G"
  C:\Users\cisterni\Desktop\Diskspd-v2.0.15\amd64fre\diskspd.exe $sz -o56 -t1 -b256K "${letter}:\reserve.dat"
}

function testVolume ([char] $letter)
{
    echo "Test sequential 128K block queues 56..."
    runTest -letter $letter -seq $true -block 128K -queues 56
    echo "Test random 4K block queues 56..."
    runTest -letter $letter -seq $false -block 4K -queues 56
    echo "Test sequential 1M block queues 1..."
    runTest -letter $letter -seq $true -block 1M -queues 1
    echo "Test random 4K block queues 1..."
    runTest -letter $letter -seq $false -block 4K -queues 1
}

rm N:\*
testVolume -letter N > C:\Users\cisterni\Desktop\NetTestLocalOutNoReserve.txt
rm N:\*
makeReserve -letter N
testVolume -letter N > C:\Users\cisterni\Desktop\NetTestLocalOutReserve.txt
rm N:\*

runTest -letter N -seq 1 -block 4M -queues 56 -rdonly 1 -duration 20 > C:\Users\cisterni\Desktop\NetTestPeak.txt

Leave a Reply