How to hurt a couple of NetApp filers but not worry about it

So what’s a good way to really pound the hell out of a NetApp R150 NearStore and a FAS960 filer? First, start a ndmpcopy of a 1.4TB volume with 23M files from the FAS960 to the R150 (I’m migrating a mostly finished project to the lower $/GB NearStore). Then, forget that there’s a full backup scheduled via NetBackup and let a remote NDMP backup start of a 3.2TB volume on the R150 heading for a FC LTO2 drive attached to the FAS960. Let that (unwittingly) chug away overnight, then check in the morning and wonder why only 700GB of the volume migration has been transfered to the R150. Check the nearstore and see this:

[root@serv01 root]# rsh filer02 sysstat -u 2
 CPU   Total    Net kB/s    Disk kB/s    Tape kB/s Cache Cache  CP  CP Disk
       ops/s    in   out   read  write  read write   age   hit time ty util
100%       0 30074 47207  45228  37827     0     0   >60  100% 100%  :  72%
100%       0 30362 46322  43137  31095     0     0   >60  100%  80%  F  58%
 97%       0 28955 44519  41400  39493     0     0   >60  100% 100%  :  76%
100%       0 35919 46019  43414  42398     0     0   >60  100%  90%  F  69%
100%       0 33374 45894  42749  49657     0     0   >60  100% 100%  :  87%

Oh! Damn! It’s doing a backup and a restore at the same time! Doh!

Then, for good measure, notice that someone has unleashed a cluster job that’s reading some human genome related files off the FAS960 at an aggregate rate of 220MB/sec. So the filers are moving close to 400MB/sec aggregate right now and are at or near their maximum throughput! Oh no! System meltdown?! Not really.

This explosion of activity isn’t actually a problem, believe it or not. The cluster NFS traffic is automatically rate limited to 220MB/sec before it enters the 4506 core switches where it’s then treated with a lower priority than NFS traffic from other networks, which leaves the FAS960 enough bandwidth to do the backups at close to full speed and still answer NFS requests from other hosts without feeling too sluggish. The NearStore is pegged, but that’s OK. The clients mount the R150 volumes through a C6100 DNFS cache, so they don’t even notice it being slow. The ndmpcopy will still finish fast enough for me to complete the migration later today.

I love the NAS concept simply because it lets me use so many of the tools in my IP toolkit. QoS policies, security ACLs, protocol proxying/caching, even WAN routing of our storage resources to collaborating organisations in other provinces. Not to mention IP/Ethernet’s excellent cost per port. This sort of control and flexibility would be very complex and difficult, if not impossible, to pull off on a traditional SAN. iSCSI and it’s SCSI over IP over Ethernet approach will bring a lot of these advantages back to block-level storage, but I still really dig network filesystems.


About this entry