This week I spent some time reading about Microsoft Azure Stack, and one of the things that made me curious was the Storage Spaces Direct(S2D) technology. At first glance, it looks like a simple Software Defined Storage solution, but when I went deep, I found that it has a good range of resiliency and efficiency features. Microsoft introduced Storage Spaces Direct in Windows Server 2016, building on the Storage Spaces feature that was originally introduced in Windows Server 2012. Storage Spaces Direct is a software-defined storage offering that can be used in both converged and hyper-converged infrastructures and as a matter of fact, S2D is used for the storage layer in Microsoft Azure Stack.
Storage Spaces Direct is based on a software-defined, shared nothing architecture that uses industry-standard x86 servers with internal drives. S2D doesn’t have any complex networking requirements, it uses standard Ethernet (minimum 10GbE) as the backbone. All you need to do is install Windows Server 2016 on the servers and create a Failover Cluster. Once you have the failover cluster set up, you can enable the S2D feature. After that S2D will automatically recognize all the drives that are present on each server and create a software-defined storage pool. At this point, you can start carving out volumes. You can either expose these volumes to other hosts over the SMB3 protocol, or you can enable Hyper-V on the local hosts and then start spinning up VMs on the Cluster Shared Volume(CSV), thus making it a hyper-converged system. You can have 2 to 16 physical servers as part of the failover cluster. It supports a maximum of 400 drives per cluster and depending on the capacity of those drives, S2D is able to seamlessly support up to 1 PB of storage per cluster. Each server needs at least 2 Cache drives and 4 Capacity drives. The drives can be either SATA, SAS or NVMe. S2D automatically picks that fastest two drives as the cache layer and assigns the remaining drives to the capacity tier. The total capacity of the drives in the capacity tier is what forms the total available storage in the cluster. The cache drives are only used for cache, because all the data that is in the cache, is or will be present in the capacity tier at some point. When you want to scale out the cluster, you can add more servers (up to 16), and if you want to scale up the cluster, you can just add additional drives to the servers. It is highly recommended that you have the same number of drives in each server in the cluster. S2D is intelligent enough to automatically rebalance virtual machines when you scale up or scale out.
Storage Spaces Direct supports two disk configuration options:
- Hybrid Configuration: Here you can have a mixture of HDDs and SSDs or NVMe drives in the cluster. The following are the possible combinations for a hybrid configuration:
As you can see, S2D will automatically pick the fastest drive for the cache tier. In a hybrid configuration, S2D will use the cache tier for caching both the reads and writes coming into the cluster. This way you get the highest performance for both reads and writes.
- All-Flash Configuration: Here you can have all SSDs, all NVMe drives or a mixture of both. The following are the possible combinations for an All-Flash configuration:
In the second configuration, S2D chose NVMe drives for the cache tier because they are the faster than SSDs. For the 1st and 3rd scenarios, the user has to go and manually assign which drives he wants in the cache tier. For All-Flash configs, S2D only uses the cache tier for caching writes. All the read I/O is forwarded to the capacity tier.
In both the Hybrid and All-Flash configurations, the cache tier collects all the random I/O from all the servers and then lays them out in a sequential manner before writing them to the capacity tier. Each drive in the cache tier automatically binds to multiple drives in the capacity tier, and for best performance, it is recommended that you have a multiple of the number of cache drives as the capacity drives. For eg. If you have 2 SSDs for the cache tier, you should have 4 or 8 capacity drives per SSD, and not 5 or 7 drives per SSD.
All this sounds good, but what about resiliency. What happens to my data when a single drive or a single server or even a single rack goes down? To handle such scenarios, S2D has multiple options that you can choose from for Fault Tolerance.
- Mirroring: This means that S2D stores multiple copies of your data in the cluster. You can choose to store two copies or three copies of your data based on your requirements.
- Two-Way Mirroring: In this scenario, S2D will store two copies of your data on drives on two different servers. So if a server goes down, you will still have a copy of your data. Microsoft recommends using this only when you have 2 servers in your cluster. If you have more than 2 servers, then use the next option.
- Three-Way Mirroring: In this scenario, S2D will store three copies of your data, making it resilient to two server failures. This sounds perfect from a data protection point of view, but you will need thrice the amount of capacity with this option, which means that if you want to store 1TB of data, you will need 3 TB of capacity in your cluster.
- Erasure Coding: Since Mirroring requires a lot of additional storage for resiliency, S2D has another option that uses Parity encoding to provide fault tolerance. The two options for Erasure coding are single parity and dual parity. Single parity requires one additional disk to store the parity bit and Dual parity needs two additional disks to store the parity bits. You can compare the single and dual parity to RAID 5 and RAID 6 respectively. In a scenario where you have only 4 disks, you might think that this isn’t offering any better capacity efficiency. But, when you have more than 7 disks, that’s when the capacity efficiency crosses 66.7%.
- Mirror-accelerated parity: According to me, this is the best of both worlds. It is a capacity-efficient and performance sensitive solution. Each volume in this category has a mirror tier and a parity tier. All the data first hits the mirror tier, so you get the best performance and then as the mirror tier gets full, the data is aggregated and destaged to the parity tier. The parity calculations are done in real time.
Great, you made it past the caching and fault tolerant sections of this blog. Now let’s quickly focus on the storage efficiency and security features:
- Deduplication and Compression: Industry standard deduplication and compression algorithms are applied to all the data that comes in near-real time, and it applies to both the Hybrid and All-Flash configs. Dedupe is opportunistic and it doesn’t come in the way of essential I/O. But, with the mirror-accelerated parity volumes, dedupe and parity calculations are always done inline.
- File Integrity Checksums: This enables checksums for user data. You can configure a periodic background scrubber that goes and matches the checksums. If there has been bit-rot and the checksums don’t match, S2D will go ahead and do a corrective write operation and fix the errored bits.
- Encryption: S2D offers both At-Rest and In-Transit encryption. It uses Bitlocker for At-Rest and SMB encryption for data In-Transit encryption respectively.
The management for S2D volumes can be done using Project Honolulu or using Windows PowerShell cmdlets. I am planning to write another blog post on how to do that. But, thank you for making it all the way. I know it has been a long one. If you want to learn more, you can check out the following links: