Thursday, November 19, 2009

No More Cloud Servers - Think Racks and Containers

I just read a very nice post on the profile for a cloud server by Ernest de Leon, the Silicon Whisperer. Here is the opening paragraph:

"With the massive push toward cloud computing in the enterprise, there are some considerations that hardware vendors will have to come to terms with in the long run. Unlike the old infrastructure model with hardware bearing the brunt of fault tolerance, the new infrastructure model places all fault tolerance concerns within the software layer itself. I won’t say that this is a new concept as Google has been doing exactly this for a very long time (in IT time at least.) This accomplishes many things, but two particular benefits are that load balancing can now be more intelligent overall, and hardware can be reduced to the absolute most commodity parts available to cut cost."

I'm on board in a big way with this message until Ernest starts talking about the steps that are taken at failure:

"When there is a failure of a component or an entire cloud server, the Cloud Software Layer can notify system administrators. Replacement is as simple as unplugging the bad server and plugging in a new one. The server will auto provision itself into the resource pool and it’s ready to go. Management and maintenance are simplified greatly."


And I think to myself that there is no way we can operate at cloud scale if we continue to think about racking and plugging servers. If we really want to lower the cost of operational management, which is a big part of the appeal of cloud, we have to start thinking about the innovations that should happen throughout the supply chain.

Commodity parts are great, but I want commodity assembly, shipping, and handling costs as well. The innovations in cloud hardware will be packaging and supply chain innovations. I want to buy a rack of pre-networked systems with a simple interface for hooking up power and network and good mobility associated with the rack itself (i.e. roll it into place, lock it down, roll it out at end of life). Maybe I even want to buy a container with similar properties. And when a system fails, it is powered down remotely and no one even thinks about trying to find it in the rack to replace it. It is dead in the rack/container until the container is unplugged and removed from the datacenter and sent back to the supplier for refurb and salvage.

Let's use "cloud" as the excuse to get crazy for efficiency around datacenter operations. I agree with Ernest that the craziness for efficiency with netbooks has led to a great outcome, but let's think crazy at real operating scale. No more hands on servers. No more endless cardboard, tape, staples, and styrofoam packaging. No more lugging a server under each arm through the datacenter and tripping and dropping the servers and falling into a rack and disconnecting half the systems from the network. My cloud server is a rack or a container that eliminates all this junk.

Monday, November 9, 2009

Virtualization is not Cloud

After spending the early part of last week at the Cloud Computing Expo, which is now co-located with the Virtualization Conference and Expo, I feel compelled to proclaim that virtualization is not cloud. Nor does virtualization qualify for the moniker of IaaS. If virtualization was cloud/IaaS, there would not be so much industry hubbub surrounding Amazon's EC2 offering. Nor would Amazon be able to grow the EC2 service so quickly because the market would be full of competitors offering the same thing. Cloud/IaaS goes beyond virtualization by providing extra services for dynamically allocating infrastructure resources to match the peaks and valleys of application demand.

Virtualization is certainly a valuable first step in the move to cloud/IaaS, but it only provides a static re-configuration of workloads to consume fewer host resources. After going P2V, you have basically re-mapped your static physical configuration onto a virtualization layer – isolating applications inside VMs for stability while stacking multiple VMs on a single machine for higher utilization. Instead of physical machines idling away on the network, you now have virtual machines idling away, but on fewer hosts.

To transform virtual infrastructure to IaaS, you need, at a minimum, the following:

Self Service API – if an application owner needs to call a system owner to gain access to infrastructure resources, you do not have a cloud. Upon presentation of qualified, valid credentials, the infrastructure should give up the resources to the user.

Resource Metering and Accountability – if the infrastructure cannot measure the resources consumed by a set of application services belonging to a particular user, and if it cannot hold that user accountable for some form of payment (whether currency exchange or internal charge-back), you do not have a cloud. When users are charged based upon the value consumed, they will behave in a manner that more closely aligns consumption with actual demand. They will only be wasteful when wasting system resources is the only way to avoid wasting time, which leads us to our next cloud attribute:

Application Image Management – if there is no mechanism for developing and configuring the application offline and then uploading a ready-to-run image onto the infrastructure at the moment demand arises, you do not have a IaaS cloud. Loading a standard OS template that does not reflect the configuration of the application and the OS combination used for testing and development prevents rapid application scaling because configuration can cost the owner hours/weeks/days/months of runtime cycles. Too much latency associated with setup results in over-allocation of resources in order to be responsive to demand (i.e. no one takes down running images even with slack demand because getting the capacity back is too slow). See my post on Single Minute Exchange of Applications.

Network Policy Enforcement – if an application owner cannot allow or deny network access to their virtual machine images without involving a network administrator or being constrained to a particular subset of infrastructure systems, you do not have a cloud. This requirement is related to the Self Service API and it also speaks to the requirement for low latency in application setup in order to be dynamic in meeting application demand fluctuations. True clouds provide unrestricted multi-tenancy (and therefore higher utilization) without compromising compliance policies that mandate network isolation for certain types of workloads

There maybe other requirements that I have missed, but in my mind, this is the minimum set. Any lesser implementation leads to the poor outcomes that are currently the bane of IT existence – VM Sprawl and Rogue Cloud Deployments.

VM sprawl results from lack of accountability for resource consumption (or over-consumption), and it also results from inefficient setup and configuration capabilities. If I don't have to pay for over-consumption, or if I cannot respond quickly to demand, I am not going to bother with giving back resources for others to use.

Rogue cloud deployments, unsanctioned by IT, to Amazon or other service providers result from lack of self service or high latency in requests for network or system resource configuration. Getting infrastructure administrators involved in every resource transaction discourages the type of dynamic changes that should occur based upon the fluctuations in application demand. People give up on the internal process because Amazon has figured it out.

True clouds go beyond virtualization by providing the necessary services to transform static virtual infrastructure to dynamic cloud capacity. These extra services eliminate the friction and latency between the demand signal and the supply response. The result is a much more efficient market for matching application demand with infrastructure supply.

Thursday, October 22, 2009

EC2 Value Shines at Amazon

On the heels of Randy Bias' excellent analysis of the market adoption of EC2 (well 3 wks later but I only read it this week), I thought I would publish the findings of the survey that we conducted on AWS value. While we do not have a huge sample size for the survey (24 responses), I do believe the answers provide some insight into the terrific uptake that Randy describes.

The large majority of respondents (92%) identified themselves as providers of some type of technology application as opposed to enterprise users. I think this mostly reflects these folks friendliness to survey requests versus enterprise users – not a lack of enterprise consumption of AWS. Those in the market with services are more likely to answer surveys due to their empathy for the pursuit of market information. Enterprise users typically have little empathy for the pursuit of information that makes them easier targets for marketers such as myself.

Almost all identified themselves as senior management in their organizations (61%), with 9% claiming middle management and the remaining 30% “breaking their back for the man.” Interestingly, the distribution of the AWS experience curve was not as skewed to the near term as I would have expected. Typically, for a hot/new service, you would expect the majority of the users to be early in their experiences. I consider early to be 6 months to a year. For our respondents, 50% had been using the service for more than a year, with the remaining 50% split 10/20/20 at the 3 month, 6 month, and 12 month experience intervals. I would have anticipated a curve more skewed to 3 to 6 months.

The most popular service of the five we surveyed is S3 (92%), with EC2 (88%) just behind. EBS trailed at 58%, with SimpleDB and SQS bringing up the rear with 17% and 21% respectively of respondents indicated they use these services. Since every EC2 user must use S3, I find the popularity of EC2 to be the most interesting, but not surprising, finding in the survey. It supports Randy's analysis, and it reflects the market generally. The amount of compute cycles sold in the form of hardware on a dollar value basis far outstrips the amount of storage sold on a dollar basis. Also, while there are several storage offerings in the market relative to S3 functionality, few providers have cracked the code on providing compute cycles via a web API with hourly granularity and massive scale in the manner of EC2. EC2 is delivering the big value to Amazon within the AWS portfolio.

To summarize the rest of the responses, scalability was the most important competitive feature followed closely by low cost and pay-as-you-go pricing. Content delivery applications were the most popular workload, with no clear-cut number 2 coming close. Users are spending between $100 and $1000 per month and almost all (67%) plan to add more workloads in the future. Many would like to see an API-compatible AWS capability running on other networks ranging from “my laptop” to their private network to service providers that might be competitive with Amazon. Check out the details for yourself.

My bottom line on this and other indicators in the market is that Amazon's approach to IaaS is effectively the modern datacenter architecture. The market growth of cloud services for compute, storage, messaging, database, etc. will largely reflect the current market for those capabilities as represented by respective sales of existing hardware and licensed solutions. But the availability of these lower friction “virtual” versions of hardware and licensed software will dramatically increase the total market for technology by eliminating the hidden, but real, costs of procurement inefficiencies. When services similar to EC2 run on every corporate and service provider network, we will have more computing and more value from computing. And the world will be a better place.

Tuesday, September 8, 2009

Cloud Attraction Survey

I labeled this blog The Cloud Option because of my belief that the best reason to build an application with a cloud architecture is to manage application demand risk. The cloud option allows you to align application demand with infrastructure supply to protect against a demand forecast that is certain to be inaccurate. While I believe my value hypothesis will prove to be correct in the long term, I think there may be far more basic attractions associated with the near term value driving cloud demand.

With that in mind, I have built a short (12 question) survey that I offer to AWS users (as I believe AWS represents the most successful, high demand implementation of cloud computing by far thus far). If you are a current AWS consumer, please take 2 minutes to fill out the survey. I'll post the results on the blog in a week or two. Thanks for helping me assign value to the cloud option!

Wednesday, September 2, 2009

Latest Gmail Outage Again Fuels Cloud Computing Luddites

By Steve Bobrowski

What's a Luddite? I remember looking this up once, and here's one of the definitions I found at Webster's:



Change scares people, making them feel uncomfortable and uneasy for many reasons. But in the world of technology, we must embrace change, for it is inevitable and fast-paced. And IT change usually happens for the better, not the worst.

In this context, my news wires on cloud computing today have been flooded with countless stories written by a bunch of, well, Luddites. Those insisting that the latest Gmail outage is proof that cloud computing system outages threaten the cloud computing paradigm shift. Really?

The truth is that system outages are a fact of life. We all hear about the public system outages, but we rarely hear about those that occur behind the firewall. Rather than trouncing cloud services when they go down, shouldn't the focus be on how long it took them to return to service and then comparing the impact of this event to similar on-premise outages?

For example, when was the last time that your organization's email system went down? Did your IT department have the training, staff, and resources to quickly identify the problem and then fix it? My personal recollection of an internal email system outage: lots of squabbles and finger-pointing among the parties involved, all leading to two days without email and some lost messages. Never mind the ripple effects of lost time and money while our IT staff needed to suspend work on other internal projects.

Google's team of highly-specialized administrators took less then two hours to fix things and I didn't lose any of my mail! In my experience, that's outstanding service. Furthermore, fixing the outage did not require work from any of my company's in-house resources, which were free to continue being productive on internal projects that lead to revenue generation.

Should your organization worry about relying on a cloud application or cloud platform? The answer is simple—applications should reside where it makes the most sense. In some cases, cloud wins, in others, data center wins. But the trend is undeniable that more and more enterprises are outsourcing common business applications such as email and CRM to the cloud because it provides their workers with a better service and more time to work on core business functions.

In summary, don't let the Luddites scare you—the inevitable world of utility-based computing will improve the enterprise's standard of living in many cases, not the opposite, as they would have you believe.

Thursday, August 27, 2009

Amazon Aims for Enterprises - Poo Poos Internal Clouds

Amazon's announcement yesterday regarding an enterprise feature for linking existing datacenter operations to Amazon's AWS via a Virtual Private Network feature did not surprise me. It is an obvious extension of their value proposition, and folks had already been accomplishing a similar capability with work-arounds that were simply a bit more cumbersome than Amazon's integrated approach. The more surprising piece of news, in my opinion, is the subtle racheting up of the rhetoric by Amazon regarding their disdain for the notion of “internal” cloud. Werner Vogels blog post explaining the rationale for the new VPN features is a case in point. Here are a few tasty excerpts:

Private Cloud is not the Cloud

These CIOs know that what is sometimes dubbed "private [internal] cloud" does not meet their goal as it does not give them the benefits of the cloud: true elasticity and capex elimination. Virtualization and increased automation may give them some improvements in utilization, but they would still be holding the capital, and the operational cost would still be significantly higher. . . .

What are called private [internal] clouds have little of these benefits and as such, I don't think of them as true clouds. . .

[Cloud benefits are]

* Eliminates Cost. The cloud changes capital expense to variable expense and lowers operating costs. The utility-based pricing model of the cloud combined with its on-demand access to resources eliminates the needs for capital investments in IT Infrastructure. And because resources can be released when no longer needed, effective utilization rises dramatically and our customers see a significant reduction in operational costs.

* Is Elastic. The ready access to vast cloud resources eliminates the need for complex procurement cycles, improving the time-to-market for its users. Many organizations have deployment cycles that are counted in weeks or months, while cloud resources such as Amazon EC2 only take minutes to deploy. The scalability of the cloud no longer forces designers and architects to think in resource-constrained ways and they can now pursue opportunities without having to worry how to grow their infrastructure if their product becomes successful.

* Removes Undifferentiated "Heavy Lifting."The cloud let its users focus on delivering differentiating business value instead of wasting valuable resources on the undifferentiated heavy lifting that makes up most of IT infrastructure. Over time Amazon has invested over $2B in developing technologies that could deliver security, reliability and performance at tremendous scale and at low cost. Our teams have created a culture of operational excellence that power some of the world's largest distributed systems. All of this expertise is instantly available to customers through the AWS services.

Elasticity is one of the fundamental properties of the cloud that drives many of its benefits. While virtualization has tremendous benefits to the enterprise, certainly as an important tool in server consolidation, it by itself is not sufficient to give the benefits of the cloud. To achieve true cloud-like elasticity in a private cloud, such that you can rapidly scale up and down in your own datacenter, will require you to allocate significant hardware capacity. While to your internal customers it may appear that they have increased efficiency, at the company level you still own all the capital expense of the IT infrastructure. Without the diversity and heterogeneity of the large number of AWS cloud customers to drive a high utilization level, it can never be a cost-effective solution.


OK. Let's examine Werner's sales proposition without the pressure to sell anything (as I am not currently trying to sell anyone anything). Clearly, Amazon is now attacking the vendors such as VMware that seem intent on attacking them by proclaiming that Amazon cannot give you enterprise features. Not only is Amazon delivering features targeted at the enterprise, but they are also scaling up the war of words by poo pooing the value proposition of these classic vendors – namely the notion of an internal cloud. Werner makes two assertions in dissing internal clouds:

First, he asserts that an internal cloud is not elastic. Well, why not? Just because your IT department has historically been labeled the NO department doesn't mean that it always must be that way. Indeed, the very pressure of Amazon providing the terrific services they provide without the mind-numbing procurement and deployment friction of your IT department is going to lead to massive changes on the part of IT. They are going to virtualize, provide self provisioning tools, and more closely align business application chargebacks to actual application usage. If the application owners are thoughtful about their architecture, they will be able to scale up and scale back based upon the realities of demand, and their IT transfer costs will reflect their thoughtfulness. Other business units will benefit from the release of resources, and server hoarding will be a thing of the past. All this is not to say that an IT department should “own” every bit of compute capacity they use. They don't. They won't. And there will probably be an increasing shift toward owning less.

But Werner claims that ownership is generally a bad thing in his second assertion that capex is bad and opex is good. Werner writes that cloud eliminates costs by eliminating capital spending. Well, it might - depending on the scenario. But his insinuation that capex is bad and opex is good is silliness. They are simply different, and the measurement that any enterprise must take is one relating to risk of demand and cost of capital. For a capital constrained startup with high risk associated with application demand, laying out precious capital for a high demand scenario in the face of potential demand failure makes no sense at all. However, for a cash rich bank with years of operating history relative to the transaction processing needs associated with servicing customer accounts, transferring this burden from capital expense to operating expense is equally senseless. Paying a premium for Amazon's gross profit margin when demand is fairly deterministic and your cost of capital is low is certainly a losing proposition.

The challenge and the opportunity of cloud for any enterprise is moving applications to an architecture that can exercise the cloud option for managing demand risk while simultaneously striking the right balance between capex and opex relative to the cost of capital. I find it funny that Amazon's new VPN feature is designed to make this opportunity a reality, while the blog post of their CTO announcing the feature proclaims that internal operations are too costly. Maybe they are viewing the VPN as a temporary bridge that will be burned when capex to opex nirvana is attained. Personally, I see it as the first of many permanent linkages that will be built to exercise the cloud option for managing demand risk. Lower costs associated with a proper portfolio balance of capex and opex is just icing on the cake.

Monday, August 24, 2009

VMware Springs Big for SpringSource

In a blog post back in May, I described why I believed a SpringSource and Hyperic combination was a good thing. In the new world of virtualized infrastructure and cloud computing, the application delivery and management approach is going to be lightweight and lean. At the time, however, I never imagined lightweight and lean would be worth $420M to VMware. While I have no doubt that a lightweight and agile approach to application delivery and management is going to replace the outdated heavy approach of J2EE and EJB, I am not quite convinced that VMware is getting in this deal what they want us to believe they are getting – general purpose operating system irrelevance.

VMware has done an incredible job abstracting the hardware away from the general purpose operating system. Now they have moved to the other end of the stack in an attempt to abstract the application away from the operating system. If the operating system is not responsible for hardware support and it is likewise not responsible for application support, then it is irrelevant, right? It is a good theory, but it is not quite true.

While the majority of application code will certainly be written in languages that can be supported by SpringSource (java, grails), there will remain lots and lots of application utilities and services that are provided by various programs that are not, and will never be, written in Java or the related languages supported by SpringSource. All of these various programs will still need to be assembled into the system images that represent a working application. And while I absolutely believe the general purpose operating system should die an ugly death in the face of virtualized infrastructure and cloud computing, I do not believe that operating systems can be rendered irrelevant to the application. I simply believe they become lighter and more application specific. I also believe that we are going to see a proliferation of application language approaches, not a consolidation to Java alone.

Acquiring SpringSource puts VMware on the path to providing not only Infrastructure as a Service technology, but also Platform as a Service technology. From what I have seen to date in the market, PaaS lags far, far behind IaaS in acceptance and growth. I have written multiple posts praising the Amazon approach and decrying the Google and Salesforce approach for cloud because the latter requires developers to conform to the preferences of the platform provider while the former allows developers to exercise creativity in the choice of languages, libraries, data structures, etc. That's not to say that PaaS cannot be a valuable part of the application developer toolkit. It's just that the market will be much more limited in size due to the limitations in the degrees of freedom that can be exercised. And if developers love one thing more than anything else, it is freedom.

VMware's acquisition of SpringSource moves them into the very unfamiliar territory of developer tools and runtimes. It is a different sale to a different audience. Developers are notoriously fickle, and it will be interesting to see how a famously insular company like VMware manages to maintain the developer momentum built by the SpringSource team.

Thursday, August 13, 2009

The Cloud Option

A few months back, I participated in a panel on the evolution of cloud computing that was hosted by Union Square Advisors. Alongside me on the panel were executives from Amazon, Adobe, Cisco, and NetApp. Someone in the audience claimed that their economic analysis of running an application on Amazon AWS indicated the services were not cost competitive relative to an internal deployment. My response was that the analysis was likely based upon a simple, non-volatile application demand scenario. I said that the analysis should have instead considered the option value of Amazon's services subject to some level of demand volatility. What is an option worth that allows you to quickly scale up and scale down with application demand with costs scaling (or descaling) proportionately? How many applications in your portfolio could benefit from this type of risk management hedge? What type of premium should you be willing to pay for a cost profile that is correlated more closely to your demand profile? To capture without big capital outlays the benefits of terrific demand while simultaneously avoiding the costs of over-provisioning when demand fails?

My response to the simplistic Amazon cost analysis struck a chord with the audience, and I have since been thinking quite a bit about the metaphor of financial options as applied to the value of cloud computing. A financial option basically allows the holder of the option to participate in the market for a particular asset at some future date for a defined price (the premium) today. Aside from their value as a tool for market speculation, options provide a low cost way to manage the risk associated with sudden and significant swings in the market for important portfolio assets. The cloud option provides just this risk management function for the portfolio of applications that any given enterprise must execute and manage in the course of delivering on the promises of its business. In exchange for a cloud architecture premium, the owner of the application gets both upside and downside protection associated with a demand forecast (and its related budget) that is almost certain to be inaccurate.

The objective of this blog, The Cloud Option, is to discover the various costs and benefits associated with the premium of a cloud architecture. By analyzing the structure of the various cloud offerings and the technologies which underpin them (i.e. virtualization, programming APIs, etc), we will provide application owners with a context for evaluating which cloud services and technology might provide the best option for managing their demand risks. At the level of the enterprise, IT planners will be able to more effectively undertake an analysis of their application portfolio in order to lay out a broad demand-risk management strategy based upon cloud technology and services.

Contributing to this blog alongside me will be Steve Bobrowski. Steve is the former CTO of SaaS at Computer Sciences Corporation, former Director of SaaS Technology at BEA Systems, and currently freelances as a technical, strategic, and marketing consultant to prominent cloud vendors. Because of the variety and breadth of our experiences, we should be able to cover the material fairly broadly and with a compelling level of depth. To provide some context on my historical perspective of cloud, I have posted below the cloud related entries from my open source blog dating back to November of 2006.

Cloud technology and services are certainly going to change the landscape of enterprise computing. I believe it can substantially lower the risk adjusted cost of delivering applications. We hope to help elucidate the cloud option – insuring that the premium paid to adopt the architecture truly helps manage cost and risk instead of simply making a technology fashion statement.

Tuesday, August 11, 2009

IBM Cloud Fizzles

From June 30, 2009

Based on my positive review below of IBM's CloudBurst technology for building internal clouds, I tuned into the IBM webinar for the external cloud companion product with high hopes. I was hoping to hear about a consistent architecture across the two products that would allow an enterprise to federate workloads seamlessly between the internal and external cloud. Boy, was I disappointed.

It seems the IBM external cloud is nothing more than an IBM hosted capability for running virtual appliances of IBM Rational Software products. Among my many disappointments:

- no ability to run virtual appliances defined by me. They don't even publish a specification.

- no federation between internal and external. They are not even the same architecture because one runs Xen and the other runs VMware, and they do not provide a conversion utility.

- private beta (alpha maybe?) for invited customers only. Why make an announcement?

- no timetable for general availability of a product. Why make an announcement?

This announcement was a terrible showing by IBM to say the least. It is obvious to me that the CloudBurst appliance folks (call them “left hand”) and the Smart Business cloud folks (call them “right hand”) were two totally different teams. And the left hand had no idea what the right hand was doing. But each was intent not to be outdone by the other in announcing “something” with cloud in the title. And they were told to “cooperate” by some well meaning marketing and PR person from corporate. And this mess of a situation is the outcome. Good grief!

IBM CloudBurst Hits the Mark

From June 29, 2009

IBM rolled out a new infrastructure offering called CloudBurst last week. Aimed at development and test workloads, it is essentially a rack of x86 systems pre-integrated with VMware’s virtualization technology along with IBM software technology for provisioning, management, metering, and chargeback. I believe IBM, unlike Verizon, has hit the cloud computing mark with this new offering.

First, IBM is targeting the offering at a perfect application workload for cloud – development and test. The transient nature of development and test workloads means that an elastic computing infrastructure with built-in virtualization and chargeback will be attractive to IT staff currently struggling to be responsive to line of business application owners. The line of business application owners are holding the threat of Amazon EC2 over the head of the IT staff if they cannot get their act together with frictionless, elastic compute services for their applications. By responding with a development and test infrastructure that enables self-service, elasticity, and pay-as-you-go chargeback capability, the IT staff will take a step in the right direction to head off the Amazon threat. Moving these dev/test workloads to production with the same infrastructure will be a simple flick of the switch when the line of business owners who have become spoiled by CloudBurst for dev/test complain that the production infrastructure is not flexible, responsive, or cost competitive.

Second, IBM embraced virtualization to enable greater self-service, and elasticity. While they do not detail the use of VMware’s technology on their website (likely to preserve the ability to switch it out for KVM or Xen at some future date), IBM has clearly taken an architectural hint from Amazon by building virtualization into the CloudBurst platform. Virtualization allows the owners of the application to put the infrastructure to work quickly via virtual appliances, instead of slogging through the tedious process of configuring some standard template from IT (which is never right) to meet the needs of their application – paying for infrastructure charges while they fight through incompatibilities, dependency resolution, and policy exception bureaucracy. CloudBurst represents a key shift in the way IT will buy server hardware in the future. Instead of either a bare-metal unit or pre-loaded with a bloated general purpose OS (see the complaint about tedious configuration above), the systems will instead come pre-configured with virtualization and self-service deployment capability for the application owners - a cloud-computing infrastructure appliance if you will. Cisco has designs on the same type capability with their newly announced Unified Computing System.

Third, it appears that IBM is going to announce a companion service to the CloudBurst internal capability tomorrow. From the little information that is available today, I surmise that IBM is likely going to provide a capability through their Rational product to enable application owners to “federate” the deployment of their applications across local and remote CloudBurst infrastructure. With this federated capability across local (fixed capital behind the firewall) and remote sites (variable cost operating expense from infrastructure hosted by IBM), the IBM story on cloud will be nearly complete.

The only real negatives I saw in this announcement were that IBM did not include an option for an object storage array for storing and cataloging the virtual appliances, nor did they include any utilities for taking advantage of existing catalogs of virtual appliances from VMware and Amazon. While it probably hurt IBM’s teeth to include VMware in the offering, perhaps they could have gone just a bit further and included another EMC cloud technology for the object store. Atmos would be a perfect complement to this well considered IBM cloud offering. And including a simple utility for accessing/converting existing virtual appliances really would not be that difficult. Maybe we’ll see these shortcomings addressed in the next version. All negatives aside, I think IBM made a good first showing with CloudBurst.

Verizon Misses with Cloud Offering

From June 18, 2009

About two weeks back, I was excited to see a headline about Verizon partnering with Red Hat to offer their customers a “new” cloud computing offering. I was hopeful that the details would reveal a KVM hypervisor based elastic compute capability coupled with an OVF based specification for virtual appliances to run on the service. I was also hoping to discover some details on storage as a service, with all of the services accessible via a management capability exposed via RESTful APIs. Boy, was I disappointed. Turns out the new Verizon cloud offering is just the old Verizon hosting offering with a new name.

Why is it so difficult for all of these old school infrastructure providers to understand the new path being blazed by Amazon AWS? Why can't they offer even a reasonable facsimile of the capability provided by Amazon? Surely it is the threat of Amazon that is leading them to re-name the old hosting stuff as the new cloud stuff. Why not go all the way and actually offer something that is competitive? Here is a recipe for any that are interested:

First, provide a X86 hypervisor based, virtualized compute service that allows the customer to bring their applications with them as pre-packaged, pre-configured virtual machines (virtual appliances). Don't ask them to boot a “standard OS” and then spend hours, days, weeks, months configuring it to work for them (because what you specified as the “standard” is certainly not what they have tested with their applications, and the whole purpose of elasticity is defeated if you can't quickly put images to work on the network in response to application demand). Better yet, let them boot existing Amazon Machine Images and VMware virtual appliances. Providing this capability is not rocket science. It is just work.

Second, provide a simple storage service (see Amazon S3 for what it should do) for storing unstructured data as well as for storing their virtual appliances that boot on the virtualized, elastic compute service. If you don't want to take the time to develop your own, follow AT&T's lead and go buy the capability EMC offers as part of the Atmos product line. You don't even have to think, you just need to write a check and viola – an Amazon S3 type capability running on your network. What could be easier?

Third, provide a block storage capability for attaching to virtual appliance images that must store state, such as database images. Most of the hosting companies already provide this type of SAN offering, so this part should be a no-brainer. Just price it with a very fine grained, variable cost approach (think megabyte-days, not months).

Fourth, provide access to the infrastructure management services via simple, RESTful APIs. You don't have to go overboard with capability at first, just make certain the basics are available in a manner that allows the services to be run effectively over the Internet without any funky protocols that are specific to your network implementation.

Finally, go sign up partners like rPath and RightScale to offer the next level of manageability and support for the virtual machines that will run on the network. These are the final touches that indicate to your customers that you are serious about providing a terrific capability for the complete lifecycle of your cloud computing offering. Instead of asking them to be patient with you while you re-name your hosting offering as a cloud offering in the hopes that it will assuage their bitterness that Amazon-like capability is not available on your network.

Federation - The Enterprise Cloud Objective

From June 2, 2009

I know the title to this blog post sounds a bit like a Star Trek episode, but I believe I have an useful point to make with the term federation - even at the risk of sounding a bit corny. I have been watching with interest the lexicon of terms that are emerging to describe the architecture and value of cloud computing. VMware uses the terms Internal/External/Private to describe the distribution of application workloads across multiple networks in a coordinated fashion. Sun uses the terms Private/Public/Hybrid, respectively, to describe the same architecture (although they would argue for Sun branded components in lieu of Vmware/EMC branded components). I think both of these term sets as descriptors for a cloud architecture that distributes workloads across multiple networks are flawed and confusing. Rather than simply complaining, however, I am willing to offer a solution.

The term Federation describes the end state of an effective cloud architecture perfectly, and I think we should all begin using it when we attempt to sell our respective goods and services to enable the enterprise cloud. Whether part of a Internal/External/Federation combination or a Private/Public/Federation combination or Network1/Network2/Networkn/Federation, the common term accurately describes the end objective of cloud computing.

First, some attribution. This term was presented to me as a descriptor for cloud value during my work with the cloud infrastructure group at EMC (the folks that own the Atmos product line) over a year ago. It is now my turn to put some greater structure on this enviable original thought that belongs to EMC.

A good general definition for Federation (independent of an IT context) is a union of member entities that preserves the integrity of the policies of the individual members. Members get the benefits of the union while retaining control over their internal affairs.

In the case of a technology infrastructure federation (aka a cloud architecture), the primary benefit of the union is the lower cost and risk associated with a pool of technology assets which are available across a diversified set of independent networks. In other words, application workloads should be distributed to the network with the lowest risk adjusted cost of execution – i.e. based upon the risk policies of the enterprise. If the risk of running a mission critical, enterprise workload on Amazon's AWS network is deemed high (for whatever reason, real or perceived), that workload might stay on a proprietary network owned by the enterprise. Likewise, a low risk workload that is constantly being deferred due to capacity or complexity constraints on the enterprise network might in fact be run for the lowest cost at Amazon or a comparable provider. For a startup, the risk of depleting capital to purchase equipment may dictate that all workloads run on a third party network that offers a variable cost model for infrastructure (Infrastructure as a Service, IaaS).

Independent of the proprietary calculus for risk that must be undertaken by every enterprise relative to their unique situation, it should become clear to all that the distribution of application workloads across multiple networks based upon the cost/capability metrics of those networks will lower the risk adjusted cost of enterprise computing. The same diversification theories that apply to managing financial portfolio risk also apply to managing the distributed execution of application workloads. The historical challenge to this notion of application workload federation is the lack of an efficient market – the transaction cost associated with obtaining capacity for any given application on any given network were too high due to complexity and lack of standards for application packaging (de facto or otherwise). Now, with virtualization as the underpinning of the network market, virtual appliances as the packaging for workloads, high bandwidth network transit and webscale APIs for data placement/access, the time is coming for an efficient market where infrastructure capacity is available to applications across multiple networks. And Federation is the perfect word to describe a cloud architecture that lowers the risk adjusted cost of computing to the enterprise. Enterprise. Federation. Clouds. Star Trek.

Cloud Application Management - Agile, Lean, Lightweight

From May 6, 2009

The acquisition of Hyperic by SpringSource got me thinking about the next generation of application delivery and management for cloud applications. At first, I was cynical about this combination – two small companies with common investors combining resources to soldier on in a tough capital environment. While this cynical thinking probably has a kernel of truth to it, the more I thought about the combination the more I thought that it makes sense beyond the balance sheet implications. Indeed, I believe the future of application delivery and management will combine agile development with lean resource allocation and lightweight management. This new approach to application delivery and management is one that complements the emerging cloud architecture for infrastructure.

Agile development, with its focus on rapid releases of new application functionality, requires a programming approach that is not overly burdened with the structure of J2EE and EJB. Spring, Rails, Grails, Groovy, Python all represent the new approach – placing a premium on quick delivery of new application functionality. Application functionality takes center stage, displacing the IT infrastructure dominance of the legacy application server oriented approach. Developers will use what works to deliver the application functionality instead of using what works for the IT organization's management framework. The new approach does have implications for scalability, but we will get to that issue in a moment.

Lean is one of the newer terms emerging to describe the future of application delivery. I first referenced lean as an IT concept by relating it to the lean approach for manufacturing operations in a blog post about a year ago. With lean application delivery, applications scale horizontally to consume the infrastructure resources that they require based upon the actual demand that they are experiencing. The corollary is that they also contract to release resources to other applications as demand subsides. This “lean” approach to resource allocation with dynamic scaling and de-scaling is what a cloud architecture is all about – elasticity. Rather than optimizing the code to “scale up” on an ever bigger host, the code remains un-optimized but simple – scaling out with cheap, variable cost compute cycles when the peaks in demand require more capacity. Giving back the capacity when the peaks subside.

With the lean approach for resource allocation, a lightweight management approach that measures only a few things replaces the old frameworks that attempt to measure and optimize every layer in an ever more complex infrastructure stack. If the service is under stress due to demand, add more instances until the stress level subsides. If the service is under extremely light load, eliminate resources until a more economical balance is struck between supply and demand. If an instance of a service disappears, start a new one. In most cases, you don't even bother figuring out what went wrong. It costs too much to know everything. This lightweight approach for management makes sense when you have architected your applications and data to be loosely coupled to the physical infrastructure. Managing application availability is dramatically simplified. Managing the physical hosts becomes a separate matter, unrelated to the applications, and is handled by the emerging datacenter OS as described by VMware or the cloud provider in the case of services like those provided by Amazon AWS.

Take a look at the rPath video on this topic. I think it reinforces the logic behind the SpringSource and Hyperic combination. It rings true regarding the new approach that will be taken for rapid application delivery and management in a cloud infrastructure environment. Applications and data will be loosely coupled to the underlying infrastructure, and agile development, lean resource allocation, and lightweight management will emerge as the preferred approach for application delivery and management.

McKinsey Recommends Virtualization as First Step to Cloud

From April 20, 2009

In a study released last week, the storied consulting company, McKinsey & Company, suggested that moving datacenter applications wholesale to the cloud probably doesn't make sense – it's too expensive to re-configure and the cloud is no bargain if simply substituted for equipment procurement and maintenance costs. I think this conclusion is obvious. They go on to suggest that companies adopt virtualization technology in order to improve the utilization of datacenter servers from the current miserable average of ten percent (10%). I think this is obvious too. The leap that they hesitated to make explicitly, but which was called out tacitly in the slides, was that perhaps virtualization offers the first step to cloud computing, and a blend of internal plus external resources probably offers the best value to the enterprise. In other words cloud should not be viewed as an IT alternative, but instead it should be considered as an emerging IT architecture.

With virtualization as an underpinning, not only do enterprises get the benefit of increased asset utilization on their captive equipment, they also take the first step toward cloud by defining their applications independent from their physical infrastructure (virtual appliances for lack of a better term). The applications are then portable to cloud offerings such as Amazon's EC2, which is based on virtual infrastructure (the Xen hypervisor). In this scenario, cloud is not an alternative to IT. Instead, cloud is an architecture that should be embraced by IT to maximize financial and functional capability while simultaneously preserving corporate policies for managing information technology risk.

Virtualization as a step to cloud computing should also be viewed in the context of data, not simply application and server host resources. Not only do applications need compute capacity, they also need access to the data that defines the relationship of the application to the user. In addition to technology such as VMware and Citrix's Xen technology, enterprises also need to consider how they are going to abstract their data from the native protocols of their preferred storage and networking equipment.

For static data, I think this abstraction will take the form of storage and related services with RESTful interfaces that enable web-scale availability to the data objects instead of local network availability associated with file system interfaces like NFS. With RESTful interfaces, objects become abstracted from any particular network resource, making them available to the network where they are needed. Structured data (frequently updated information typically managed by a database server technology) is a bit trickier, and I believe solving the problem of web-scale availability of structured data will represent the “last mile” of cloud evolution. It will often be the case that the requirement for structured data sharing among applications will be the ultimate arbiter of whether an application moves to the cloud or remains on an internal network.

The company that I founded, rPath, has been talking about the virtualization path to cloud computing for the past three years. Cloud is an architecture for more flexible consumption of computing resources – independent of whether they are captive equipment or offered by a service provider for a variable consumption charge. About nine months ago, rPath published the Cloud Computing Adoption Model that defined this approach in detail with a corresponding webinar to offer color commentary. In the late fall of last year, rPath published a humorous video cartoon that likewise offered some color on this approach to cloud computing. With McKinsey chiming in with a similar message, albeit incomplete, I am hopeful that the market is maturing to the point where cloud becomes more than a controversial sound-byte for replacing the IT function and instead evolves into an architecture that provides everyone more value from IT.

Outsourcing Gives Way to Now-Sourcing via Cloud

From April 13, 2009

The theory behind the value of outsourcing, aside from labor arbitrage, was that the outsourcer could deliver IT resources to the business units in a more cost effective manner than the internal IT staff due to a more highly optimized resource management system. The big problem with outsourcing, however, was the enormous hurdle the IT organization faced in transitioning to the “optimized” management approach of the outsourcer. In many cases this expensive hurdle had to be crossed twice – once when the applications were “outsourced” and then again when the applications were subsequently “in-sourced” after the outsourcer failed to live up to service level expectations set during the sales pitch. Fortunately, the new architecture of cloud computing enables outsourcing to be replaced with “now sourcing” by eliminating the barriers to application delivery on third party networks.

The key to “now sourcing” is the ability to de-couple applications and data from the underlying system definitions of the internal network while simultaneously adopting a management approach that is lightweight and fault tolerant. Historically, applications were expensive to “outsource” because they were tightly coupled to the underlying systems and data of the internal network. The management systems also pre-supposed deep access to the lowest level of system structure on the network in order to hasten recovery from system faults. The internal IT staff had “preferences” for hardware, operating systems, application servers, storage arrays, etc., as did the outsourcer. And they were inevitably miles apart in both the brands and structure not to mention differences in versions, release levels, and the management system itself. Even with protocols that should be a “standard,” each implementation still had peculiarities based upon vendor and release level. NFS is a great example. Sun's implementation of NFS on Solaris was different than NetApp's implementation on their filers, leading to expensive testing and porting cycles in order to attain the benefits of “outsourcing.”

I believe a by-product of the “cloud” craze will be new technology, protocols, and standards that are designed from the beginning to enable applications to run across multiple networks with a much simpler management approach. A great example is server virtualization coupled with application delivery standards like OVF. With X86 as a de facto machine standard and virtualization as implemented by hypervisor technology like Xen and VMware, applications can be “now sourced” to providers like Amazon and RackSpace with very little cost associated with the “migration.”

Some will argue that we are simply trading one protocol trap for another. For example, Amazon does not implement Xen with OVF in mind as an application delivery standard. Similarly, VMware has special kernel requirements for the virtual machines defined within OVF in order to validate your support agreement. Amazon's S3 cloud storage protocol is different than a similar REST protocol associated with EMC's new Atmos cloud storage platform. And the list of “exceptions” goes on and on.

Even in the face of these obvious market splinters, I still believe we are heading to a better place. I am optimistic because all of these protocols and emerging standards are sufficiently abstracted from the hardware that translations can be done on the fly – as with translations between Amazon's S3 and EMC's Atmos. Or the penalty of non-conformance is so trivial it can be ignored – as with VMware's kernel support requirements which do not impact actual run-time performance.

The other requirement for “now sourcing” that I mentioned above was a fault tolerant, lightweight approach to application management. The system administrators need to be able to deliver and manage the applications without getting into the low level guts of the systems themselves. As with any “new” approach that requires even the slightest amount of “change” or re-factoring, this requirement to re-think the packaging and management of the applications will initially be an excuse for the IT staff to “do nothing.” In the face of so many competing priorities, even subtle application packaging and management changes become the last item on the ever lengthening IT “to do” list – even when the longer term savings are significant. But, since “now sourcing” is clearly more palatable to IT than “outsourcing” (and more effective too), perhaps there is some hope that these new cloud architectures will find a home inside the IT department sooner rather than later.

Will Agile Drive a Hybrid Cloud Approach?

From March 4, 2009

Some workloads are perfectly suited for cloud deployment. Generally, these are workloads with transient or fluctuating demand, relatively static data (lots of reads, few writes), and no regulated data compliance issues (i.e. patient healthcare records). Test fits this description perfectly – especially with the growing popularity of Agile methods. With its focus on rapid iteration and feedback to achieve faster innovation and lower costs, Agile demands a flexible and low cost approach for testing cycles. I have no doubt that developers will begin using variable-cost compute cycles from services like Amazon EC2 because of its flexibility and pay-for-what-you-use capability. But I am also willing to bet that testing with Amazon will put further pressure on the IT organization to respond with a similar, self-service IT capability. I think a hybrid-cloud architecture with complementary internal and external capability will emerge as a productive response to the demand for true end-to-end agility.

Some time ago, I authored a blog post titled “When Agile Becomes Fragile” that outlined the challenge of implementing Agile development methods while attempting to preserve the legacy IT approach. What good is rapid development when the process for promoting an application to production takes several months to absorb even a few weeks of new development? If developers take their Agile methods to the cloud for testing (which they will), it becomes a slippery slope that ultimately leads to using the cloud for production. Rather than the typical, dysfunctional IT response of “don't do that – it's against policy,” I think the IT organization should instead consider implementing production capacity that mimics and complements cloud capability such as that offered by Amazon.

Along with all of the cool technology that is emerging to support Agile methods, new technology and standards are also emerging to support the notion of a hybrid-cloud. The new Atmos storage technology from EMC and the OVF standard for virtualizing applications are two good examples of hybrid-cloud technology. Atmos gives you the ability to describe your data in a manner that automatically promotes/replicates it to the cloud if it has been approved for cloud storage/availability. Whether applications run on an external cloud or on your “internal cloud,” the supporting data will be available. Similarly, OVF has the potential to enable virtualized applications to run effectively externally on the cloud or internally – without significant manual (and error prone) intervention by system administrators (or those developers that play a sysadmin on a TV show). In both cases, the goal is to enable greater flexibility for applications to run both internally and on the cloud – depending on the profile of the application and the availability of resources.

Agile is yet another important technology change that is going to pressure IT to evolve, and rPath is sponsoring a webinar series that dives into this topic in some detail. Whether you are a developer, an architect, or a system administrator, these webinars should be interesting to you. For the IT staff, the series may offer a glimpse at an approach for IT evolution that is helpful. In the face of Agile and cloud pressure, the alternative to evolution – extinction – is much less appealing.

Is the Cloud Game Already Over

From January 25, 2009

This is the thought that crossed my mind a few weeks back as I pondered Amazon's beta release of the Amazon Web Services Console. The reason the game might be over is because Amazon is apparently so far ahead of the competition that they can now divert their engineering attention to the management console instead of core platform functionality. To me, this signals a competitive lead so vast that absent quick and significant re-direction of resources and potential strategic acquisitions of capability, Amazon's competitors are doomed in the cloud space.

I saw this dynamic once before during my time at Red Hat. Red Hat had such a lead in the market with almost total mind share for the platform (Red Hat Linux, now Red Hat Enterprise Linux), that the company could launch a strategic management technology, Red Hat Network, while others were grasping for relevance on the core platform. Note that in the case of Red Hat, no one else has come close to their lead in the Linux market space. And no one else has really gotten around to building out the management technology that was offered by Red Hat Network 8 years ago.

Consider these other challenges facing Amazon's competitors:

1. Lack of machine image definitions - Amazon published the AMI spec for EC2 about 2 years ago. To my knowledge, all of the competitors that use virtualization (Amazon uses Xen) are still requiring customers to boot a limited set of approved "templates" which must then be configured manually, and subsequently lose their state when retired.

2. Proprietary versus open - when you require the customer to program in a specific language environment that is somewhat unique to a particular "cloud" platform (ala Google with Python and Salesforce with Apex), you dramatically limit your market to virtual irrelevance out of the gate. Amazon doesn't care, so long as you can build to an X86 virtual machine.

3. Elastic billing model - until you have a platform for billing based upon the on-demand usage of resources, you don't have a cloud with the key value proposition of elasticity. You simply have hosting. To my knowledge, most competitors are still on a monthly payment requirement. Hourly is still a long way away for these folks.

Perhaps I am wrong, but I bet I am not. If I am right, the day will come in the not too distant future (after the equity markets recover) when Amazon spins out AWS as a tracking stock (similar to the EMC strategy with VMware) with a monster valuation (keeping this asset tied to an Amazon revenue multiple makes no sense), and the valuations on the technology assets that help others respond to Amazon go nutty (witness the XenSource valuation on the day VMware went public). I say "Go, Amazon, Go!"

Cloud in Plain English

From December 23, 2008

I must take my hat off to Jake Sorofman, who runs marketing for rPath. Jake has done an incredible job distilling a bunch of complex stuff into a consumable and entertaining video. Do yourself a favor, and check out his Cloud Computing in Plain English video. Al Gore never looked so good.

And Happy Holidays!

Will Managing VM Sprawl Lead to Rogue Cloud Deployments?

From December 5, 2008

I just read an interesting article regarding the potential cost pitfalls associated with VM sprawl. Jett Thompson, an enterprise computing architect from Boeing, has developed a cost model regarding the benefits of virtualization and the related pitfalls of VM sprawl. It seems that virtualization is easy to justify, so long as you don't give the users everything that they want. Here is the money quote from the article:

However, all of those savings [from virtualization] can be eliminated if sprawl isn't controlled. With virtual servers easy to spin up, users may ask for large numbers of new virtual machines and it's up to IT to hold the line, Thompson says.

"If you don't have demand management and good governance in place you're actually going to cost your company money," he says. "Virtual server sprawl can wipe out any savings."

Gartner analyst Thomas Bittman also says virtual server sprawl can be tough to control and is harder to measure than physical server sprawl. "Fundamentally, we believe virtualization sprawl can be a much bigger problem than physical sprawl," Bittman said


I believe that the unintended consequences of "IT hold[ing] the line" will be rogue cloud deployments. Rogue cloud deployments describes the phenomenon of business unit developers taking matters into their own hands when IT "holds the line" on making computing resources available. Once the business units understand that resources can be made available on-demand, either internally or via services such as Amazon EC2, they are simply not going to take "no" for an answer. Deploying applications as virtual machines, or virtual appliances in the case of an ISV application, removes all of the friction from the deployment process. This same friction was formerly the tonic that IT sprinkled about in order to "hold the line" on availability (and the subsequent management costs) of computing resources. The instant gratification culture that we are cultivating with SaaS and cloud will not be held in check if IT "holds the line" by saying "no" to requests for capacity/capability.

I have a recommendation for Jett and the folks at Boeing and elsewhere who are fearing the unintended consequences of frictionless system capacity brought about by virtualization. Push the control point for deployment policy upstream via automated build, release, and management processes for applications released as virtual machines, manage the scale problem by going vertical with a JeOS architecture, and build a seamless bridge managed by IT to cloud offerings like Amazon EC2. Charge the users with the costs for deployment and management, but give them the technology to do it the right way. Check out our cloud computing adoption model and the webinar that accompanies it. Rogue cloud deployments can be avoided, even in the face of VM sprawl control measures, when you say "yes" to your users while holding them accountable for building manageable system images.

Can You See the Clouds from Windows

From October 24, 2008

During the course of our webinar entitled "The Pragmatist's Guide to Cloud Computing: 5 Steps to Real ROI," several of the attendees submitted questions regarding the status of Windows as an environment for cloud applications. In a partial answer to the question, Jeff Barr, a speaker during the webinar and a member of the Amazon Web Services team, responded that a beta implementation of Windows for EC2 was now available. The problem with the notion of “Windows for EC2” is that it perpetuates the broken, legacy model of tying your application to the infrastructure upon which it runs.

In the legacy model, applications became artificially tied to the physical server upon which they ran, and server utilization was low because it is very difficult to run multiple applications on a single instance of a general purpose operating system. The reason it is difficult to run multiple applications on a single instance of a general purpose operating system is because each application has unique needs which conflict or compete with the unique needs of other applications. Virtualization technology, such as that provided by VMware or Citrix with XenServer, breaks the bond of the application to a physical server by placing a layer of software, called a hypervisor, on the physical hardware beneath the operating system instances that support each application. The applications are “isolated” from one another inside virtual machines, and this isolation eliminates the conflicts.

Amazon embraces this virtualization model by using Xen to enable their Elastic Compute Cloud (EC2) service. So what's the problem? If the OS instances are not tied to the physical servers any longer (indeed you do not even know which physical system is running your application on EC2, nor do you need to know), why am I raising a hullabaloo over a “broken model?” The reason this new model of Windows for EC2 is broken is because your application is now artificially coupled to EC2. When you begin with a Windows Amazon Machine Image (AMI), install your application on top, configure-test, configure-test, configure-test, configure-test, configure-test to get it right, and then save the tested configuration as a new AMI, the only place you can run this tested configuration of your application is on Amazon's EC2. If you want to run the application on another virtualized cloud, say maybe one provided by RackSpace, or Terremark, or GoGrid, or even your own internal virtualized cloud of systems, you have to install the application yet again, configure-test, configure-test, configure-test, configure-test, configure-test to get it right again, and then save the tested configuration on the other cloud service. Why don't we just stop the madness and admit that binding the OS to the physical infrastructure upon which it runs is a flawed approach when applications run as virtual machine images (or virtual appliances) atop a hypervisor or virtualized cloud of systems like EC2?

The reason that we are continuing the madness is because madness is all we have ever known. Everyone knows that you bind an operating system to a physical host. Operating systems are useless unless they bind to something, and until the emergence of the hypervisor as the layer that binds to the physical host, the only sensible approach for operating system distribution was to bind it to the physical host. When you buy hardware, you make it useful by installing an operating system as step one. But if the operating system that you install as step one in the new virtualized world is a hypervisor in lieu of a general purpose operating system, how do we get applications to be supported on this new type of host? Here's your answer -- what we previously knew as the general purpose operating system now needs to be transformed to just enough operating system (JeOS or “juice”) to support the application, and it should bind to the application NOT THE INFRASTRUCTURE.

Virtualization enables the separation of the application from the infrastructure upon which it runs – making possible a level of business agility and dynamicism previously unthinkable. Imagine being able to run your applications on-demand in any data-center around the world that exposes the hypervisor (any hypervisor) as the runtime environment. Privacy laws prevent an application supporting medical records in Switzerland from running in an Amazon datacenter in Belgium? No problem, run the application in Switzerland. Need to run the same application in Belgium in support of a new service being offered there next month? No problem, run it on Amazon's infrastructure in Belgium. The application has to support the covert operations associated with homeland security and it cannot be accessed via any Internet connection? No problem, provide it as a virtual appliance for the NSA to run on their private network. Just signed a strategic deal with RackSpace that provides an extraordinary level of service that Amazon is not willing to embrace at this time? No problem, shut down the instances running on EC2 and spin them up at RackSpace. All of this dynamic capability is possible without the tedious cycle of configure-test -- if we will simply bind the operating system to the application in order to free it from the infrastructure and let it fly into the clouds.

So why doesn't Microsoft simply allow Windows to become an application support infrastructure, aka JeOS, instead of a general purpose operating system that is bound to the infrastructure? Because JeOS disrupts their licensing and distribution model. Turning a ship as big as the Microsoft Windows licensing vessel might require a figurative body of water bigger than the Atlantic, Pacific, and Indian oceans combined. But if they don't find a way to turn the ship, they may find that their intransigence becomes the catalyst for ever increasing deployments of Linux and related open source technology that is unfettered by the momentum of a mighty business model. Folks with valuable .Net application assets might begin to consider technology such as Novell's mono project as a bridge to span their applications into the clouds via Linux.

I can tell you that there are lots of folks asking lots of questions about how to enable Windows applications in the “cloud.” I do not believe the answer is “Windows for EC2” plus “Windows for GoGrid” plus “Windows for RackSpace” plus “Windows for [insert your data-center cloud name here].” If Microsoft does not find a way to turn the licensing ship and embrace JeOS, the market will eventually embrace alternatives that provide the business agility that virtualization and cloud computing promises.

Will the Credit Crunch Accelerate the Cloud Punch

From October 14, 2008

It's no secret that the days of cheap capital might be over. While it is obvious that startups with lean capital structures are already embracing cloud offerings such as Amazon EC2 for computing and S3 for storage, it seems to me that this trend might accelerate further for both startups and even enterprise customers.

Cloud consumption in the startup segment is poised to accelerate as investors like Sequoia Capital warn their portfolio companies to “tighten up” in the face of this credit crunch. Even the well capitalized SaaS software providers might begin re-considering the “ridiculous” expense of building out their offerings based upon the classic salesforce.com model of large scale, proprietary datacenters with complex and expensive approaches to multi-tenancy. They might be better served by a KnowledgeTree model where on-demand application value is delivered via virtual appliances. In this model, the customer can deploy the software on existing gear (no dedicated server required) because the virtualization model makes for a seamless, easy path to value without setup hassles. Or they can receive the value of the application as a SaaS offering when KnowledgeTree spins up their instance of the technology on Amazon's elastic compute cloud. In both cases, the customer and KnowledgeTree both avoid the capital cost of acquiring dedicated gear to run the application.

Large enterprises as well will be re-considering large scale datacenter projects. When credit is tight, everyone from municipal governments to the best capitalized financial institutions must find ways to avoid outlays of precious capital ahead of the reality of customer collections. More and more of these customers will be sifting through their application portfolio in search of workloads that can be offloaded to the cloud in order to free up existing resources and avoid outlays for new capacity to support high priority projects. Just as the 9/11 meltdown was a catalyst for the adoption of Linux (I witnessed this phenomenon as the head of enterprise sales at Red Hat), a similar phenomenon might emerge for incremental adoption of cloud associated with the credit crunch of 2008. All new projects will be further scrutinized to determine “Is there a better way forward than the status quo?”

As enterprises of all sizes evaluate new approaches to minimize capital outlays while accelerating competitive advantage via new applications, rPath is offering a novel adoption model for cloud computing that might serve as a convenient bridge to close the credit crunch capital gap. For those that are interested in exploring this new model, pleas join us in a webinar along with the good folks at Forrester, Amazon, and Momentum SI on October 23rd. If necessity is the mother of invention, we might be poised for some truly terrific innovations in the cloud space . . . . and we will owe a debt of gratitude to the credit crunch for driving the new architecture forward.

Larry Rains on the Cloud Parade

From September 30, 2008

At Oracle world last week, Larry Ellison derided the current “cloud” craze, likening the technology industry's obsession with “fashion” to the women's apparel industry. In a sense, he is right. Everything is being labeled cloud these days. New datacenters from IBM – cloud. New browser from Google – cloud. New strategy from VMware – cloud. I myself commented to Ben Worthen of the Wall Street Journal that I too feel the cloud craze is a bit “nutty.” At the same time, I believe there is some real change underfoot in the industry, and I believe that Amazon's Elastic Compute Cloud (EC2) is leading the way in capturing the imagination about what is possible with a new approach.

The reason EC2 has captured the imagination of so many people in the industry is because it offers the possibility of closing the painful gap that exists between application development and production operations. Promoting applications from development to production has typically been a contentious negotiation between the line of business application developers and the IT production operations management crew. It is a difficult process because the objectives of apps and ops run orthogonal to one another. Apps is about new features to quickly respond to market demand, and ops is about compliance, stringent change control, and standardization to assure stability.

With EC2, developers don't negotiate with operations at all. They simply package up the innovations they want inside a coordinated set of virtual machines (virtual appliances in the case of the ISV vernacular), and deploy, scale, and retire based upon the true workload demands of the market. No requisitions for hardware. No laborious setup of operating environments for new servers. No filling out waivers for using new software components that are not production approved yet. No replacement of components that fail the waiver process and re-coding when the production components don't work with the new application features. No re-testing. No re-coding. No internal chargebacks for servers that are not really being used because the demand for the application has waned. No painful system updates that break the application – even when the system function is irrelevant to the workload. No. No. No.

The on-demand, self-service datacenter architecture of Amazon's EC2 is going to put huge pressure on the operations organization to respond with an internal “cloud” architecture – or lose the business of the developers who would rather “go to the cloud” than negotiate with ops. Here at rPath, we believe that the ops folks are going to need to provide the apps folks with a release (rBuilder) and lifecycle management system (rPath Lifecycle Management Platform) that enables the self-service capability and rapid promotion of EC2 while preserving compliance with operating policies that assure stability and security. And, if an application really takes off, you don't have to build a new datacenter to respond to the demand. Just scale out the workload onto Amazon, or another provider with a similar cloud architecture. IT operations now has a way to say “yes we can” instead of “no you can't.” Getting to “yes” from your IT ops provider by closing the gap between apps and ops is what the excitement of cloud is all about.

Single Minute Exchange of Applications - The Cure for Server Hoarding

From August 17, 2008

I recently had an interesting conversation with an IT executive that has built a self-service datacenter capability based upon virtualization. He described for me a system whereby business units can request “virtual server hosts” with a pre-set system environment (i.e. Linux and Java), and within an hour or two they receive an email notification informing them of the availability of the “virtual machines.” The goal of this system, as it was explained to me, is to “cure server hoarding” by the business units.

The theory is that if the business units are confident that they can get new capacity “on-demand,” then they will not request more systems than they really need. And since they are billed based upon the actual amount of capacity deployed, they have incentive to “give back” any systems that are not necessary to meet production demands. I asked how it was working:

IT Exec – Great. We have over 1500 virtual machines actively deployed in production in support of business unit demand.

Billy – Wow! That's terrific. What do the statistics look like for server returns?

IT Exec – What do you mean?

Billy – I mean how many systems have the business units returned to the pool of available systems because their demand was transitory?

IT Exec – No one has ever given back a single machine ever. They have the economic incentive to do so, but so far not one machine has ever been given back to the pool.

And therein lies the problem. The reason no one gives systems back is because the setup costs associated with getting them productive are simply too high. Even in this case, when the setup of the operating environment is accomplished within an hour or two of the request, the process of “fiddling around with the system” to get the application installed, configured, and stable is so expensive that no one ever gives a productive system back when demand falls. This situation leads to tons of waste in the form of over deployed capital and over consumption of resources such as power. I am reminded of the early days of the lean production revolution in the world of manufacturing.

In the late eighties, Toyota was whipping Detroit's fanny because they had implemented a system that the folks in Detroit did not think was possible. The folks at Toyota got much higher utilization out of their capital investment with much lower levels of waste and work in process because they had implemented a system that assured the expensive production equipment was always engaged in producing parts and vehicles that closely reflected true demand. A big part of this system was a capability known as the Single Minute Exchange of Dies, or the SMED system, which was pioneered by Toyota and evangelized by the legendary manufacturing engineer, Shigeo Shingo.

With SMED, expensive body stamping machines (or any machine for that matter) are kept productively engaged building the exact parts that are required to meet true demand by reducing the setup time for a “changeover” to less than 10 minutes. This is accomplished primarily by precisely defining the interface between the machine and the stamping dies such that the dies can be prepared for production “off-line.” While a machine is productively engaged building Part A, the dies for Part B are setup for production in a manner that does not require interfacing with the production machine. When it is time for a changeover from Part A to Part B, the machine stops, the Part A dies are quickly released and pulled from the machine, and the Part B dies are quickly engaged using a highly standardized interface. No fiddling around to get it right. The machine starts up again in less than 10 minutes and down the line roll the perfect output for Part B.

Contrast this approach with the standard approach in Detroit in the late eighties. The economy of scale theory in Detroit was to set up the line for long runs of a single part type and build inventory because changing over the line was filled with setup costs. Fiddling around with the dies to get the parts to come off according to specification might take a day or even a week. So instead of building for true demand, Detroit over-deployed resources, both capital equipment and work in process, in an attempt to compensate for poor setup engineering. We all know how this story ends. The Toyota system is still the envy of the manufacturing world.

Now is the time for the technology world to take a lesson from Toyota. Virtualization will provide the standard interface for production, but it is almost worthless without “setup” technology that enables the applications to be defined independent from the production machine. The resources of the datacenter should reflect “true demand” for production output instead of idling away – suffering from a miserable case of server hoarding because setup is so expensive and error prone. The time has come for SMEA – Single Minute Exchange of Applications.

At rPath, we are working towards SMEA every day. We have high hopes that the complementary trends of virtualization and cloud computing will highlight the possibility for an entirely new, and more efficient, approach for consumption of server production capacity. An approach where applications are readied for production without consuming machine cycles “fiddling around” to get the application stable. An approach where expensive machines running application A are given back for production of application B when true demand indicates that B needs the resources instead of A. The Department of Energy and CERN are already on board with this approach, but it will be curious to observe who in the technology world emerges as “Toyota” and how long it takes the status quo of “Detroit” to wake up and smell the coffee.

VMware Accelerates Cloud with Free ESX

From July 29, 2008

The new CEO of VMware, Paul Maritz, seems to be committed to establishing VMware technology as the basis for emerging compute cloud offerings that enable shared, scalable infrastructure as a service via hypervisor virtualization. With Amazon EC2, the poster child for the successful compute cloud offering, being based upon the competing Xen technology from Citrix, Maritz is losing no time staking claim to other potential providers by meeting the Xen price requirement – zero, zilch, nada, zip. I love it. Low cost drives adoption, and free is as good as it gets when it comes to low cost and adoption.

As the economics of servers tilt more and more toward larger systems with multi-core CPUs, the hypervisor is going to become a requirement for getting value from the newer, larger systems. Developers simply do not write code that scales effectively across lots of CPUs on a single system. The coding trend is toward service oriented architectures that enable functions as small, atomic applications running on one or two CPUs, with multiple units deployed to achieve scalability. Couple the bigger server trend with the SOA trend with the virtualization trend with the cloud trend, and you have a pretty big set of table stakes that VMware does not want to miss. If a hypervisor is a requirement, why not use VMware's hypervisor if it is free?

The only challenge with free in the case of VMware is going to be lack of freedom. Xen currently offers both free price and freedom because of its open source heritage. If I run into a problem with VMware's ESX, my only recourse is to depend on the good will of VMware to fix problems. With Xen, I have the option of fixing my own problem if I am so inclined and capable. It will be interesting to watch the hypervisor choices people make as they build their cloud infrastructures, both internally and for commercial consumption, based upon the successful Amazon EC2 architecture.