Thursday, November 19, 2009

No More Cloud Servers - Think Racks and Containers

I just read a very nice post on the profile for a cloud server by Ernest de Leon, the Silicon Whisperer. Here is the opening paragraph:

"With the massive push toward cloud computing in the enterprise, there are some considerations that hardware vendors will have to come to terms with in the long run. Unlike the old infrastructure model with hardware bearing the brunt of fault tolerance, the new infrastructure model places all fault tolerance concerns within the software layer itself. I won’t say that this is a new concept as Google has been doing exactly this for a very long time (in IT time at least.) This accomplishes many things, but two particular benefits are that load balancing can now be more intelligent overall, and hardware can be reduced to the absolute most commodity parts available to cut cost."

I'm on board in a big way with this message until Ernest starts talking about the steps that are taken at failure:

"When there is a failure of a component or an entire cloud server, the Cloud Software Layer can notify system administrators. Replacement is as simple as unplugging the bad server and plugging in a new one. The server will auto provision itself into the resource pool and it’s ready to go. Management and maintenance are simplified greatly."


And I think to myself that there is no way we can operate at cloud scale if we continue to think about racking and plugging servers. If we really want to lower the cost of operational management, which is a big part of the appeal of cloud, we have to start thinking about the innovations that should happen throughout the supply chain.

Commodity parts are great, but I want commodity assembly, shipping, and handling costs as well. The innovations in cloud hardware will be packaging and supply chain innovations. I want to buy a rack of pre-networked systems with a simple interface for hooking up power and network and good mobility associated with the rack itself (i.e. roll it into place, lock it down, roll it out at end of life). Maybe I even want to buy a container with similar properties. And when a system fails, it is powered down remotely and no one even thinks about trying to find it in the rack to replace it. It is dead in the rack/container until the container is unplugged and removed from the datacenter and sent back to the supplier for refurb and salvage.

Let's use "cloud" as the excuse to get crazy for efficiency around datacenter operations. I agree with Ernest that the craziness for efficiency with netbooks has led to a great outcome, but let's think crazy at real operating scale. No more hands on servers. No more endless cardboard, tape, staples, and styrofoam packaging. No more lugging a server under each arm through the datacenter and tripping and dropping the servers and falling into a rack and disconnecting half the systems from the network. My cloud server is a rack or a container that eliminates all this junk.

Monday, November 9, 2009

Virtualization is not Cloud

After spending the early part of last week at the Cloud Computing Expo, which is now co-located with the Virtualization Conference and Expo, I feel compelled to proclaim that virtualization is not cloud. Nor does virtualization qualify for the moniker of IaaS. If virtualization was cloud/IaaS, there would not be so much industry hubbub surrounding Amazon's EC2 offering. Nor would Amazon be able to grow the EC2 service so quickly because the market would be full of competitors offering the same thing. Cloud/IaaS goes beyond virtualization by providing extra services for dynamically allocating infrastructure resources to match the peaks and valleys of application demand.

Virtualization is certainly a valuable first step in the move to cloud/IaaS, but it only provides a static re-configuration of workloads to consume fewer host resources. After going P2V, you have basically re-mapped your static physical configuration onto a virtualization layer – isolating applications inside VMs for stability while stacking multiple VMs on a single machine for higher utilization. Instead of physical machines idling away on the network, you now have virtual machines idling away, but on fewer hosts.

To transform virtual infrastructure to IaaS, you need, at a minimum, the following:

Self Service API – if an application owner needs to call a system owner to gain access to infrastructure resources, you do not have a cloud. Upon presentation of qualified, valid credentials, the infrastructure should give up the resources to the user.

Resource Metering and Accountability – if the infrastructure cannot measure the resources consumed by a set of application services belonging to a particular user, and if it cannot hold that user accountable for some form of payment (whether currency exchange or internal charge-back), you do not have a cloud. When users are charged based upon the value consumed, they will behave in a manner that more closely aligns consumption with actual demand. They will only be wasteful when wasting system resources is the only way to avoid wasting time, which leads us to our next cloud attribute:

Application Image Management – if there is no mechanism for developing and configuring the application offline and then uploading a ready-to-run image onto the infrastructure at the moment demand arises, you do not have a IaaS cloud. Loading a standard OS template that does not reflect the configuration of the application and the OS combination used for testing and development prevents rapid application scaling because configuration can cost the owner hours/weeks/days/months of runtime cycles. Too much latency associated with setup results in over-allocation of resources in order to be responsive to demand (i.e. no one takes down running images even with slack demand because getting the capacity back is too slow). See my post on Single Minute Exchange of Applications.

Network Policy Enforcement – if an application owner cannot allow or deny network access to their virtual machine images without involving a network administrator or being constrained to a particular subset of infrastructure systems, you do not have a cloud. This requirement is related to the Self Service API and it also speaks to the requirement for low latency in application setup in order to be dynamic in meeting application demand fluctuations. True clouds provide unrestricted multi-tenancy (and therefore higher utilization) without compromising compliance policies that mandate network isolation for certain types of workloads

There maybe other requirements that I have missed, but in my mind, this is the minimum set. Any lesser implementation leads to the poor outcomes that are currently the bane of IT existence – VM Sprawl and Rogue Cloud Deployments.

VM sprawl results from lack of accountability for resource consumption (or over-consumption), and it also results from inefficient setup and configuration capabilities. If I don't have to pay for over-consumption, or if I cannot respond quickly to demand, I am not going to bother with giving back resources for others to use.

Rogue cloud deployments, unsanctioned by IT, to Amazon or other service providers result from lack of self service or high latency in requests for network or system resource configuration. Getting infrastructure administrators involved in every resource transaction discourages the type of dynamic changes that should occur based upon the fluctuations in application demand. People give up on the internal process because Amazon has figured it out.

True clouds go beyond virtualization by providing the necessary services to transform static virtual infrastructure to dynamic cloud capacity. These extra services eliminate the friction and latency between the demand signal and the supply response. The result is a much more efficient market for matching application demand with infrastructure supply.