Thursday, August 27, 2009

Amazon Aims for Enterprises - Poo Poos Internal Clouds

Amazon's announcement yesterday regarding an enterprise feature for linking existing datacenter operations to Amazon's AWS via a Virtual Private Network feature did not surprise me. It is an obvious extension of their value proposition, and folks had already been accomplishing a similar capability with work-arounds that were simply a bit more cumbersome than Amazon's integrated approach. The more surprising piece of news, in my opinion, is the subtle racheting up of the rhetoric by Amazon regarding their disdain for the notion of “internal” cloud. Werner Vogels blog post explaining the rationale for the new VPN features is a case in point. Here are a few tasty excerpts:

Private Cloud is not the Cloud

These CIOs know that what is sometimes dubbed "private [internal] cloud" does not meet their goal as it does not give them the benefits of the cloud: true elasticity and capex elimination. Virtualization and increased automation may give them some improvements in utilization, but they would still be holding the capital, and the operational cost would still be significantly higher. . . .

What are called private [internal] clouds have little of these benefits and as such, I don't think of them as true clouds. . .

[Cloud benefits are]

* Eliminates Cost. The cloud changes capital expense to variable expense and lowers operating costs. The utility-based pricing model of the cloud combined with its on-demand access to resources eliminates the needs for capital investments in IT Infrastructure. And because resources can be released when no longer needed, effective utilization rises dramatically and our customers see a significant reduction in operational costs.

* Is Elastic. The ready access to vast cloud resources eliminates the need for complex procurement cycles, improving the time-to-market for its users. Many organizations have deployment cycles that are counted in weeks or months, while cloud resources such as Amazon EC2 only take minutes to deploy. The scalability of the cloud no longer forces designers and architects to think in resource-constrained ways and they can now pursue opportunities without having to worry how to grow their infrastructure if their product becomes successful.

* Removes Undifferentiated "Heavy Lifting."The cloud let its users focus on delivering differentiating business value instead of wasting valuable resources on the undifferentiated heavy lifting that makes up most of IT infrastructure. Over time Amazon has invested over $2B in developing technologies that could deliver security, reliability and performance at tremendous scale and at low cost. Our teams have created a culture of operational excellence that power some of the world's largest distributed systems. All of this expertise is instantly available to customers through the AWS services.

Elasticity is one of the fundamental properties of the cloud that drives many of its benefits. While virtualization has tremendous benefits to the enterprise, certainly as an important tool in server consolidation, it by itself is not sufficient to give the benefits of the cloud. To achieve true cloud-like elasticity in a private cloud, such that you can rapidly scale up and down in your own datacenter, will require you to allocate significant hardware capacity. While to your internal customers it may appear that they have increased efficiency, at the company level you still own all the capital expense of the IT infrastructure. Without the diversity and heterogeneity of the large number of AWS cloud customers to drive a high utilization level, it can never be a cost-effective solution.


OK. Let's examine Werner's sales proposition without the pressure to sell anything (as I am not currently trying to sell anyone anything). Clearly, Amazon is now attacking the vendors such as VMware that seem intent on attacking them by proclaiming that Amazon cannot give you enterprise features. Not only is Amazon delivering features targeted at the enterprise, but they are also scaling up the war of words by poo pooing the value proposition of these classic vendors – namely the notion of an internal cloud. Werner makes two assertions in dissing internal clouds:

First, he asserts that an internal cloud is not elastic. Well, why not? Just because your IT department has historically been labeled the NO department doesn't mean that it always must be that way. Indeed, the very pressure of Amazon providing the terrific services they provide without the mind-numbing procurement and deployment friction of your IT department is going to lead to massive changes on the part of IT. They are going to virtualize, provide self provisioning tools, and more closely align business application chargebacks to actual application usage. If the application owners are thoughtful about their architecture, they will be able to scale up and scale back based upon the realities of demand, and their IT transfer costs will reflect their thoughtfulness. Other business units will benefit from the release of resources, and server hoarding will be a thing of the past. All this is not to say that an IT department should “own” every bit of compute capacity they use. They don't. They won't. And there will probably be an increasing shift toward owning less.

But Werner claims that ownership is generally a bad thing in his second assertion that capex is bad and opex is good. Werner writes that cloud eliminates costs by eliminating capital spending. Well, it might - depending on the scenario. But his insinuation that capex is bad and opex is good is silliness. They are simply different, and the measurement that any enterprise must take is one relating to risk of demand and cost of capital. For a capital constrained startup with high risk associated with application demand, laying out precious capital for a high demand scenario in the face of potential demand failure makes no sense at all. However, for a cash rich bank with years of operating history relative to the transaction processing needs associated with servicing customer accounts, transferring this burden from capital expense to operating expense is equally senseless. Paying a premium for Amazon's gross profit margin when demand is fairly deterministic and your cost of capital is low is certainly a losing proposition.

The challenge and the opportunity of cloud for any enterprise is moving applications to an architecture that can exercise the cloud option for managing demand risk while simultaneously striking the right balance between capex and opex relative to the cost of capital. I find it funny that Amazon's new VPN feature is designed to make this opportunity a reality, while the blog post of their CTO announcing the feature proclaims that internal operations are too costly. Maybe they are viewing the VPN as a temporary bridge that will be burned when capex to opex nirvana is attained. Personally, I see it as the first of many permanent linkages that will be built to exercise the cloud option for managing demand risk. Lower costs associated with a proper portfolio balance of capex and opex is just icing on the cake.

Monday, August 24, 2009

VMware Springs Big for SpringSource

In a blog post back in May, I described why I believed a SpringSource and Hyperic combination was a good thing. In the new world of virtualized infrastructure and cloud computing, the application delivery and management approach is going to be lightweight and lean. At the time, however, I never imagined lightweight and lean would be worth $420M to VMware. While I have no doubt that a lightweight and agile approach to application delivery and management is going to replace the outdated heavy approach of J2EE and EJB, I am not quite convinced that VMware is getting in this deal what they want us to believe they are getting – general purpose operating system irrelevance.

VMware has done an incredible job abstracting the hardware away from the general purpose operating system. Now they have moved to the other end of the stack in an attempt to abstract the application away from the operating system. If the operating system is not responsible for hardware support and it is likewise not responsible for application support, then it is irrelevant, right? It is a good theory, but it is not quite true.

While the majority of application code will certainly be written in languages that can be supported by SpringSource (java, grails), there will remain lots and lots of application utilities and services that are provided by various programs that are not, and will never be, written in Java or the related languages supported by SpringSource. All of these various programs will still need to be assembled into the system images that represent a working application. And while I absolutely believe the general purpose operating system should die an ugly death in the face of virtualized infrastructure and cloud computing, I do not believe that operating systems can be rendered irrelevant to the application. I simply believe they become lighter and more application specific. I also believe that we are going to see a proliferation of application language approaches, not a consolidation to Java alone.

Acquiring SpringSource puts VMware on the path to providing not only Infrastructure as a Service technology, but also Platform as a Service technology. From what I have seen to date in the market, PaaS lags far, far behind IaaS in acceptance and growth. I have written multiple posts praising the Amazon approach and decrying the Google and Salesforce approach for cloud because the latter requires developers to conform to the preferences of the platform provider while the former allows developers to exercise creativity in the choice of languages, libraries, data structures, etc. That's not to say that PaaS cannot be a valuable part of the application developer toolkit. It's just that the market will be much more limited in size due to the limitations in the degrees of freedom that can be exercised. And if developers love one thing more than anything else, it is freedom.

VMware's acquisition of SpringSource moves them into the very unfamiliar territory of developer tools and runtimes. It is a different sale to a different audience. Developers are notoriously fickle, and it will be interesting to see how a famously insular company like VMware manages to maintain the developer momentum built by the SpringSource team.

Thursday, August 13, 2009

The Cloud Option

A few months back, I participated in a panel on the evolution of cloud computing that was hosted by Union Square Advisors. Alongside me on the panel were executives from Amazon, Adobe, Cisco, and NetApp. Someone in the audience claimed that their economic analysis of running an application on Amazon AWS indicated the services were not cost competitive relative to an internal deployment. My response was that the analysis was likely based upon a simple, non-volatile application demand scenario. I said that the analysis should have instead considered the option value of Amazon's services subject to some level of demand volatility. What is an option worth that allows you to quickly scale up and scale down with application demand with costs scaling (or descaling) proportionately? How many applications in your portfolio could benefit from this type of risk management hedge? What type of premium should you be willing to pay for a cost profile that is correlated more closely to your demand profile? To capture without big capital outlays the benefits of terrific demand while simultaneously avoiding the costs of over-provisioning when demand fails?

My response to the simplistic Amazon cost analysis struck a chord with the audience, and I have since been thinking quite a bit about the metaphor of financial options as applied to the value of cloud computing. A financial option basically allows the holder of the option to participate in the market for a particular asset at some future date for a defined price (the premium) today. Aside from their value as a tool for market speculation, options provide a low cost way to manage the risk associated with sudden and significant swings in the market for important portfolio assets. The cloud option provides just this risk management function for the portfolio of applications that any given enterprise must execute and manage in the course of delivering on the promises of its business. In exchange for a cloud architecture premium, the owner of the application gets both upside and downside protection associated with a demand forecast (and its related budget) that is almost certain to be inaccurate.

The objective of this blog, The Cloud Option, is to discover the various costs and benefits associated with the premium of a cloud architecture. By analyzing the structure of the various cloud offerings and the technologies which underpin them (i.e. virtualization, programming APIs, etc), we will provide application owners with a context for evaluating which cloud services and technology might provide the best option for managing their demand risks. At the level of the enterprise, IT planners will be able to more effectively undertake an analysis of their application portfolio in order to lay out a broad demand-risk management strategy based upon cloud technology and services.

Contributing to this blog alongside me will be Steve Bobrowski. Steve is the former CTO of SaaS at Computer Sciences Corporation, former Director of SaaS Technology at BEA Systems, and currently freelances as a technical, strategic, and marketing consultant to prominent cloud vendors. Because of the variety and breadth of our experiences, we should be able to cover the material fairly broadly and with a compelling level of depth. To provide some context on my historical perspective of cloud, I have posted below the cloud related entries from my open source blog dating back to November of 2006.

Cloud technology and services are certainly going to change the landscape of enterprise computing. I believe it can substantially lower the risk adjusted cost of delivering applications. We hope to help elucidate the cloud option – insuring that the premium paid to adopt the architecture truly helps manage cost and risk instead of simply making a technology fashion statement.

Tuesday, August 11, 2009

IBM Cloud Fizzles

From June 30, 2009

Based on my positive review below of IBM's CloudBurst technology for building internal clouds, I tuned into the IBM webinar for the external cloud companion product with high hopes. I was hoping to hear about a consistent architecture across the two products that would allow an enterprise to federate workloads seamlessly between the internal and external cloud. Boy, was I disappointed.

It seems the IBM external cloud is nothing more than an IBM hosted capability for running virtual appliances of IBM Rational Software products. Among my many disappointments:

- no ability to run virtual appliances defined by me. They don't even publish a specification.

- no federation between internal and external. They are not even the same architecture because one runs Xen and the other runs VMware, and they do not provide a conversion utility.

- private beta (alpha maybe?) for invited customers only. Why make an announcement?

- no timetable for general availability of a product. Why make an announcement?

This announcement was a terrible showing by IBM to say the least. It is obvious to me that the CloudBurst appliance folks (call them “left hand”) and the Smart Business cloud folks (call them “right hand”) were two totally different teams. And the left hand had no idea what the right hand was doing. But each was intent not to be outdone by the other in announcing “something” with cloud in the title. And they were told to “cooperate” by some well meaning marketing and PR person from corporate. And this mess of a situation is the outcome. Good grief!

IBM CloudBurst Hits the Mark

From June 29, 2009

IBM rolled out a new infrastructure offering called CloudBurst last week. Aimed at development and test workloads, it is essentially a rack of x86 systems pre-integrated with VMware’s virtualization technology along with IBM software technology for provisioning, management, metering, and chargeback. I believe IBM, unlike Verizon, has hit the cloud computing mark with this new offering.

First, IBM is targeting the offering at a perfect application workload for cloud – development and test. The transient nature of development and test workloads means that an elastic computing infrastructure with built-in virtualization and chargeback will be attractive to IT staff currently struggling to be responsive to line of business application owners. The line of business application owners are holding the threat of Amazon EC2 over the head of the IT staff if they cannot get their act together with frictionless, elastic compute services for their applications. By responding with a development and test infrastructure that enables self-service, elasticity, and pay-as-you-go chargeback capability, the IT staff will take a step in the right direction to head off the Amazon threat. Moving these dev/test workloads to production with the same infrastructure will be a simple flick of the switch when the line of business owners who have become spoiled by CloudBurst for dev/test complain that the production infrastructure is not flexible, responsive, or cost competitive.

Second, IBM embraced virtualization to enable greater self-service, and elasticity. While they do not detail the use of VMware’s technology on their website (likely to preserve the ability to switch it out for KVM or Xen at some future date), IBM has clearly taken an architectural hint from Amazon by building virtualization into the CloudBurst platform. Virtualization allows the owners of the application to put the infrastructure to work quickly via virtual appliances, instead of slogging through the tedious process of configuring some standard template from IT (which is never right) to meet the needs of their application – paying for infrastructure charges while they fight through incompatibilities, dependency resolution, and policy exception bureaucracy. CloudBurst represents a key shift in the way IT will buy server hardware in the future. Instead of either a bare-metal unit or pre-loaded with a bloated general purpose OS (see the complaint about tedious configuration above), the systems will instead come pre-configured with virtualization and self-service deployment capability for the application owners - a cloud-computing infrastructure appliance if you will. Cisco has designs on the same type capability with their newly announced Unified Computing System.

Third, it appears that IBM is going to announce a companion service to the CloudBurst internal capability tomorrow. From the little information that is available today, I surmise that IBM is likely going to provide a capability through their Rational product to enable application owners to “federate” the deployment of their applications across local and remote CloudBurst infrastructure. With this federated capability across local (fixed capital behind the firewall) and remote sites (variable cost operating expense from infrastructure hosted by IBM), the IBM story on cloud will be nearly complete.

The only real negatives I saw in this announcement were that IBM did not include an option for an object storage array for storing and cataloging the virtual appliances, nor did they include any utilities for taking advantage of existing catalogs of virtual appliances from VMware and Amazon. While it probably hurt IBM’s teeth to include VMware in the offering, perhaps they could have gone just a bit further and included another EMC cloud technology for the object store. Atmos would be a perfect complement to this well considered IBM cloud offering. And including a simple utility for accessing/converting existing virtual appliances really would not be that difficult. Maybe we’ll see these shortcomings addressed in the next version. All negatives aside, I think IBM made a good first showing with CloudBurst.

Verizon Misses with Cloud Offering

From June 18, 2009

About two weeks back, I was excited to see a headline about Verizon partnering with Red Hat to offer their customers a “new” cloud computing offering. I was hopeful that the details would reveal a KVM hypervisor based elastic compute capability coupled with an OVF based specification for virtual appliances to run on the service. I was also hoping to discover some details on storage as a service, with all of the services accessible via a management capability exposed via RESTful APIs. Boy, was I disappointed. Turns out the new Verizon cloud offering is just the old Verizon hosting offering with a new name.

Why is it so difficult for all of these old school infrastructure providers to understand the new path being blazed by Amazon AWS? Why can't they offer even a reasonable facsimile of the capability provided by Amazon? Surely it is the threat of Amazon that is leading them to re-name the old hosting stuff as the new cloud stuff. Why not go all the way and actually offer something that is competitive? Here is a recipe for any that are interested:

First, provide a X86 hypervisor based, virtualized compute service that allows the customer to bring their applications with them as pre-packaged, pre-configured virtual machines (virtual appliances). Don't ask them to boot a “standard OS” and then spend hours, days, weeks, months configuring it to work for them (because what you specified as the “standard” is certainly not what they have tested with their applications, and the whole purpose of elasticity is defeated if you can't quickly put images to work on the network in response to application demand). Better yet, let them boot existing Amazon Machine Images and VMware virtual appliances. Providing this capability is not rocket science. It is just work.

Second, provide a simple storage service (see Amazon S3 for what it should do) for storing unstructured data as well as for storing their virtual appliances that boot on the virtualized, elastic compute service. If you don't want to take the time to develop your own, follow AT&T's lead and go buy the capability EMC offers as part of the Atmos product line. You don't even have to think, you just need to write a check and viola – an Amazon S3 type capability running on your network. What could be easier?

Third, provide a block storage capability for attaching to virtual appliance images that must store state, such as database images. Most of the hosting companies already provide this type of SAN offering, so this part should be a no-brainer. Just price it with a very fine grained, variable cost approach (think megabyte-days, not months).

Fourth, provide access to the infrastructure management services via simple, RESTful APIs. You don't have to go overboard with capability at first, just make certain the basics are available in a manner that allows the services to be run effectively over the Internet without any funky protocols that are specific to your network implementation.

Finally, go sign up partners like rPath and RightScale to offer the next level of manageability and support for the virtual machines that will run on the network. These are the final touches that indicate to your customers that you are serious about providing a terrific capability for the complete lifecycle of your cloud computing offering. Instead of asking them to be patient with you while you re-name your hosting offering as a cloud offering in the hopes that it will assuage their bitterness that Amazon-like capability is not available on your network.

Federation - The Enterprise Cloud Objective

From June 2, 2009

I know the title to this blog post sounds a bit like a Star Trek episode, but I believe I have an useful point to make with the term federation - even at the risk of sounding a bit corny. I have been watching with interest the lexicon of terms that are emerging to describe the architecture and value of cloud computing. VMware uses the terms Internal/External/Private to describe the distribution of application workloads across multiple networks in a coordinated fashion. Sun uses the terms Private/Public/Hybrid, respectively, to describe the same architecture (although they would argue for Sun branded components in lieu of Vmware/EMC branded components). I think both of these term sets as descriptors for a cloud architecture that distributes workloads across multiple networks are flawed and confusing. Rather than simply complaining, however, I am willing to offer a solution.

The term Federation describes the end state of an effective cloud architecture perfectly, and I think we should all begin using it when we attempt to sell our respective goods and services to enable the enterprise cloud. Whether part of a Internal/External/Federation combination or a Private/Public/Federation combination or Network1/Network2/Networkn/Federation, the common term accurately describes the end objective of cloud computing.

First, some attribution. This term was presented to me as a descriptor for cloud value during my work with the cloud infrastructure group at EMC (the folks that own the Atmos product line) over a year ago. It is now my turn to put some greater structure on this enviable original thought that belongs to EMC.

A good general definition for Federation (independent of an IT context) is a union of member entities that preserves the integrity of the policies of the individual members. Members get the benefits of the union while retaining control over their internal affairs.

In the case of a technology infrastructure federation (aka a cloud architecture), the primary benefit of the union is the lower cost and risk associated with a pool of technology assets which are available across a diversified set of independent networks. In other words, application workloads should be distributed to the network with the lowest risk adjusted cost of execution – i.e. based upon the risk policies of the enterprise. If the risk of running a mission critical, enterprise workload on Amazon's AWS network is deemed high (for whatever reason, real or perceived), that workload might stay on a proprietary network owned by the enterprise. Likewise, a low risk workload that is constantly being deferred due to capacity or complexity constraints on the enterprise network might in fact be run for the lowest cost at Amazon or a comparable provider. For a startup, the risk of depleting capital to purchase equipment may dictate that all workloads run on a third party network that offers a variable cost model for infrastructure (Infrastructure as a Service, IaaS).

Independent of the proprietary calculus for risk that must be undertaken by every enterprise relative to their unique situation, it should become clear to all that the distribution of application workloads across multiple networks based upon the cost/capability metrics of those networks will lower the risk adjusted cost of enterprise computing. The same diversification theories that apply to managing financial portfolio risk also apply to managing the distributed execution of application workloads. The historical challenge to this notion of application workload federation is the lack of an efficient market – the transaction cost associated with obtaining capacity for any given application on any given network were too high due to complexity and lack of standards for application packaging (de facto or otherwise). Now, with virtualization as the underpinning of the network market, virtual appliances as the packaging for workloads, high bandwidth network transit and webscale APIs for data placement/access, the time is coming for an efficient market where infrastructure capacity is available to applications across multiple networks. And Federation is the perfect word to describe a cloud architecture that lowers the risk adjusted cost of computing to the enterprise. Enterprise. Federation. Clouds. Star Trek.

Cloud Application Management - Agile, Lean, Lightweight

From May 6, 2009

The acquisition of Hyperic by SpringSource got me thinking about the next generation of application delivery and management for cloud applications. At first, I was cynical about this combination – two small companies with common investors combining resources to soldier on in a tough capital environment. While this cynical thinking probably has a kernel of truth to it, the more I thought about the combination the more I thought that it makes sense beyond the balance sheet implications. Indeed, I believe the future of application delivery and management will combine agile development with lean resource allocation and lightweight management. This new approach to application delivery and management is one that complements the emerging cloud architecture for infrastructure.

Agile development, with its focus on rapid releases of new application functionality, requires a programming approach that is not overly burdened with the structure of J2EE and EJB. Spring, Rails, Grails, Groovy, Python all represent the new approach – placing a premium on quick delivery of new application functionality. Application functionality takes center stage, displacing the IT infrastructure dominance of the legacy application server oriented approach. Developers will use what works to deliver the application functionality instead of using what works for the IT organization's management framework. The new approach does have implications for scalability, but we will get to that issue in a moment.

Lean is one of the newer terms emerging to describe the future of application delivery. I first referenced lean as an IT concept by relating it to the lean approach for manufacturing operations in a blog post about a year ago. With lean application delivery, applications scale horizontally to consume the infrastructure resources that they require based upon the actual demand that they are experiencing. The corollary is that they also contract to release resources to other applications as demand subsides. This “lean” approach to resource allocation with dynamic scaling and de-scaling is what a cloud architecture is all about – elasticity. Rather than optimizing the code to “scale up” on an ever bigger host, the code remains un-optimized but simple – scaling out with cheap, variable cost compute cycles when the peaks in demand require more capacity. Giving back the capacity when the peaks subside.

With the lean approach for resource allocation, a lightweight management approach that measures only a few things replaces the old frameworks that attempt to measure and optimize every layer in an ever more complex infrastructure stack. If the service is under stress due to demand, add more instances until the stress level subsides. If the service is under extremely light load, eliminate resources until a more economical balance is struck between supply and demand. If an instance of a service disappears, start a new one. In most cases, you don't even bother figuring out what went wrong. It costs too much to know everything. This lightweight approach for management makes sense when you have architected your applications and data to be loosely coupled to the physical infrastructure. Managing application availability is dramatically simplified. Managing the physical hosts becomes a separate matter, unrelated to the applications, and is handled by the emerging datacenter OS as described by VMware or the cloud provider in the case of services like those provided by Amazon AWS.

Take a look at the rPath video on this topic. I think it reinforces the logic behind the SpringSource and Hyperic combination. It rings true regarding the new approach that will be taken for rapid application delivery and management in a cloud infrastructure environment. Applications and data will be loosely coupled to the underlying infrastructure, and agile development, lean resource allocation, and lightweight management will emerge as the preferred approach for application delivery and management.

McKinsey Recommends Virtualization as First Step to Cloud

From April 20, 2009

In a study released last week, the storied consulting company, McKinsey & Company, suggested that moving datacenter applications wholesale to the cloud probably doesn't make sense – it's too expensive to re-configure and the cloud is no bargain if simply substituted for equipment procurement and maintenance costs. I think this conclusion is obvious. They go on to suggest that companies adopt virtualization technology in order to improve the utilization of datacenter servers from the current miserable average of ten percent (10%). I think this is obvious too. The leap that they hesitated to make explicitly, but which was called out tacitly in the slides, was that perhaps virtualization offers the first step to cloud computing, and a blend of internal plus external resources probably offers the best value to the enterprise. In other words cloud should not be viewed as an IT alternative, but instead it should be considered as an emerging IT architecture.

With virtualization as an underpinning, not only do enterprises get the benefit of increased asset utilization on their captive equipment, they also take the first step toward cloud by defining their applications independent from their physical infrastructure (virtual appliances for lack of a better term). The applications are then portable to cloud offerings such as Amazon's EC2, which is based on virtual infrastructure (the Xen hypervisor). In this scenario, cloud is not an alternative to IT. Instead, cloud is an architecture that should be embraced by IT to maximize financial and functional capability while simultaneously preserving corporate policies for managing information technology risk.

Virtualization as a step to cloud computing should also be viewed in the context of data, not simply application and server host resources. Not only do applications need compute capacity, they also need access to the data that defines the relationship of the application to the user. In addition to technology such as VMware and Citrix's Xen technology, enterprises also need to consider how they are going to abstract their data from the native protocols of their preferred storage and networking equipment.

For static data, I think this abstraction will take the form of storage and related services with RESTful interfaces that enable web-scale availability to the data objects instead of local network availability associated with file system interfaces like NFS. With RESTful interfaces, objects become abstracted from any particular network resource, making them available to the network where they are needed. Structured data (frequently updated information typically managed by a database server technology) is a bit trickier, and I believe solving the problem of web-scale availability of structured data will represent the “last mile” of cloud evolution. It will often be the case that the requirement for structured data sharing among applications will be the ultimate arbiter of whether an application moves to the cloud or remains on an internal network.

The company that I founded, rPath, has been talking about the virtualization path to cloud computing for the past three years. Cloud is an architecture for more flexible consumption of computing resources – independent of whether they are captive equipment or offered by a service provider for a variable consumption charge. About nine months ago, rPath published the Cloud Computing Adoption Model that defined this approach in detail with a corresponding webinar to offer color commentary. In the late fall of last year, rPath published a humorous video cartoon that likewise offered some color on this approach to cloud computing. With McKinsey chiming in with a similar message, albeit incomplete, I am hopeful that the market is maturing to the point where cloud becomes more than a controversial sound-byte for replacing the IT function and instead evolves into an architecture that provides everyone more value from IT.

Outsourcing Gives Way to Now-Sourcing via Cloud

From April 13, 2009

The theory behind the value of outsourcing, aside from labor arbitrage, was that the outsourcer could deliver IT resources to the business units in a more cost effective manner than the internal IT staff due to a more highly optimized resource management system. The big problem with outsourcing, however, was the enormous hurdle the IT organization faced in transitioning to the “optimized” management approach of the outsourcer. In many cases this expensive hurdle had to be crossed twice – once when the applications were “outsourced” and then again when the applications were subsequently “in-sourced” after the outsourcer failed to live up to service level expectations set during the sales pitch. Fortunately, the new architecture of cloud computing enables outsourcing to be replaced with “now sourcing” by eliminating the barriers to application delivery on third party networks.

The key to “now sourcing” is the ability to de-couple applications and data from the underlying system definitions of the internal network while simultaneously adopting a management approach that is lightweight and fault tolerant. Historically, applications were expensive to “outsource” because they were tightly coupled to the underlying systems and data of the internal network. The management systems also pre-supposed deep access to the lowest level of system structure on the network in order to hasten recovery from system faults. The internal IT staff had “preferences” for hardware, operating systems, application servers, storage arrays, etc., as did the outsourcer. And they were inevitably miles apart in both the brands and structure not to mention differences in versions, release levels, and the management system itself. Even with protocols that should be a “standard,” each implementation still had peculiarities based upon vendor and release level. NFS is a great example. Sun's implementation of NFS on Solaris was different than NetApp's implementation on their filers, leading to expensive testing and porting cycles in order to attain the benefits of “outsourcing.”

I believe a by-product of the “cloud” craze will be new technology, protocols, and standards that are designed from the beginning to enable applications to run across multiple networks with a much simpler management approach. A great example is server virtualization coupled with application delivery standards like OVF. With X86 as a de facto machine standard and virtualization as implemented by hypervisor technology like Xen and VMware, applications can be “now sourced” to providers like Amazon and RackSpace with very little cost associated with the “migration.”

Some will argue that we are simply trading one protocol trap for another. For example, Amazon does not implement Xen with OVF in mind as an application delivery standard. Similarly, VMware has special kernel requirements for the virtual machines defined within OVF in order to validate your support agreement. Amazon's S3 cloud storage protocol is different than a similar REST protocol associated with EMC's new Atmos cloud storage platform. And the list of “exceptions” goes on and on.

Even in the face of these obvious market splinters, I still believe we are heading to a better place. I am optimistic because all of these protocols and emerging standards are sufficiently abstracted from the hardware that translations can be done on the fly – as with translations between Amazon's S3 and EMC's Atmos. Or the penalty of non-conformance is so trivial it can be ignored – as with VMware's kernel support requirements which do not impact actual run-time performance.

The other requirement for “now sourcing” that I mentioned above was a fault tolerant, lightweight approach to application management. The system administrators need to be able to deliver and manage the applications without getting into the low level guts of the systems themselves. As with any “new” approach that requires even the slightest amount of “change” or re-factoring, this requirement to re-think the packaging and management of the applications will initially be an excuse for the IT staff to “do nothing.” In the face of so many competing priorities, even subtle application packaging and management changes become the last item on the ever lengthening IT “to do” list – even when the longer term savings are significant. But, since “now sourcing” is clearly more palatable to IT than “outsourcing” (and more effective too), perhaps there is some hope that these new cloud architectures will find a home inside the IT department sooner rather than later.

Will Agile Drive a Hybrid Cloud Approach?

From March 4, 2009

Some workloads are perfectly suited for cloud deployment. Generally, these are workloads with transient or fluctuating demand, relatively static data (lots of reads, few writes), and no regulated data compliance issues (i.e. patient healthcare records). Test fits this description perfectly – especially with the growing popularity of Agile methods. With its focus on rapid iteration and feedback to achieve faster innovation and lower costs, Agile demands a flexible and low cost approach for testing cycles. I have no doubt that developers will begin using variable-cost compute cycles from services like Amazon EC2 because of its flexibility and pay-for-what-you-use capability. But I am also willing to bet that testing with Amazon will put further pressure on the IT organization to respond with a similar, self-service IT capability. I think a hybrid-cloud architecture with complementary internal and external capability will emerge as a productive response to the demand for true end-to-end agility.

Some time ago, I authored a blog post titled “When Agile Becomes Fragile” that outlined the challenge of implementing Agile development methods while attempting to preserve the legacy IT approach. What good is rapid development when the process for promoting an application to production takes several months to absorb even a few weeks of new development? If developers take their Agile methods to the cloud for testing (which they will), it becomes a slippery slope that ultimately leads to using the cloud for production. Rather than the typical, dysfunctional IT response of “don't do that – it's against policy,” I think the IT organization should instead consider implementing production capacity that mimics and complements cloud capability such as that offered by Amazon.

Along with all of the cool technology that is emerging to support Agile methods, new technology and standards are also emerging to support the notion of a hybrid-cloud. The new Atmos storage technology from EMC and the OVF standard for virtualizing applications are two good examples of hybrid-cloud technology. Atmos gives you the ability to describe your data in a manner that automatically promotes/replicates it to the cloud if it has been approved for cloud storage/availability. Whether applications run on an external cloud or on your “internal cloud,” the supporting data will be available. Similarly, OVF has the potential to enable virtualized applications to run effectively externally on the cloud or internally – without significant manual (and error prone) intervention by system administrators (or those developers that play a sysadmin on a TV show). In both cases, the goal is to enable greater flexibility for applications to run both internally and on the cloud – depending on the profile of the application and the availability of resources.

Agile is yet another important technology change that is going to pressure IT to evolve, and rPath is sponsoring a webinar series that dives into this topic in some detail. Whether you are a developer, an architect, or a system administrator, these webinars should be interesting to you. For the IT staff, the series may offer a glimpse at an approach for IT evolution that is helpful. In the face of Agile and cloud pressure, the alternative to evolution – extinction – is much less appealing.

Is the Cloud Game Already Over

From January 25, 2009

This is the thought that crossed my mind a few weeks back as I pondered Amazon's beta release of the Amazon Web Services Console. The reason the game might be over is because Amazon is apparently so far ahead of the competition that they can now divert their engineering attention to the management console instead of core platform functionality. To me, this signals a competitive lead so vast that absent quick and significant re-direction of resources and potential strategic acquisitions of capability, Amazon's competitors are doomed in the cloud space.

I saw this dynamic once before during my time at Red Hat. Red Hat had such a lead in the market with almost total mind share for the platform (Red Hat Linux, now Red Hat Enterprise Linux), that the company could launch a strategic management technology, Red Hat Network, while others were grasping for relevance on the core platform. Note that in the case of Red Hat, no one else has come close to their lead in the Linux market space. And no one else has really gotten around to building out the management technology that was offered by Red Hat Network 8 years ago.

Consider these other challenges facing Amazon's competitors:

1. Lack of machine image definitions - Amazon published the AMI spec for EC2 about 2 years ago. To my knowledge, all of the competitors that use virtualization (Amazon uses Xen) are still requiring customers to boot a limited set of approved "templates" which must then be configured manually, and subsequently lose their state when retired.

2. Proprietary versus open - when you require the customer to program in a specific language environment that is somewhat unique to a particular "cloud" platform (ala Google with Python and Salesforce with Apex), you dramatically limit your market to virtual irrelevance out of the gate. Amazon doesn't care, so long as you can build to an X86 virtual machine.

3. Elastic billing model - until you have a platform for billing based upon the on-demand usage of resources, you don't have a cloud with the key value proposition of elasticity. You simply have hosting. To my knowledge, most competitors are still on a monthly payment requirement. Hourly is still a long way away for these folks.

Perhaps I am wrong, but I bet I am not. If I am right, the day will come in the not too distant future (after the equity markets recover) when Amazon spins out AWS as a tracking stock (similar to the EMC strategy with VMware) with a monster valuation (keeping this asset tied to an Amazon revenue multiple makes no sense), and the valuations on the technology assets that help others respond to Amazon go nutty (witness the XenSource valuation on the day VMware went public). I say "Go, Amazon, Go!"

Cloud in Plain English

From December 23, 2008

I must take my hat off to Jake Sorofman, who runs marketing for rPath. Jake has done an incredible job distilling a bunch of complex stuff into a consumable and entertaining video. Do yourself a favor, and check out his Cloud Computing in Plain English video. Al Gore never looked so good.

And Happy Holidays!

Will Managing VM Sprawl Lead to Rogue Cloud Deployments?

From December 5, 2008

I just read an interesting article regarding the potential cost pitfalls associated with VM sprawl. Jett Thompson, an enterprise computing architect from Boeing, has developed a cost model regarding the benefits of virtualization and the related pitfalls of VM sprawl. It seems that virtualization is easy to justify, so long as you don't give the users everything that they want. Here is the money quote from the article:

However, all of those savings [from virtualization] can be eliminated if sprawl isn't controlled. With virtual servers easy to spin up, users may ask for large numbers of new virtual machines and it's up to IT to hold the line, Thompson says.

"If you don't have demand management and good governance in place you're actually going to cost your company money," he says. "Virtual server sprawl can wipe out any savings."

Gartner analyst Thomas Bittman also says virtual server sprawl can be tough to control and is harder to measure than physical server sprawl. "Fundamentally, we believe virtualization sprawl can be a much bigger problem than physical sprawl," Bittman said


I believe that the unintended consequences of "IT hold[ing] the line" will be rogue cloud deployments. Rogue cloud deployments describes the phenomenon of business unit developers taking matters into their own hands when IT "holds the line" on making computing resources available. Once the business units understand that resources can be made available on-demand, either internally or via services such as Amazon EC2, they are simply not going to take "no" for an answer. Deploying applications as virtual machines, or virtual appliances in the case of an ISV application, removes all of the friction from the deployment process. This same friction was formerly the tonic that IT sprinkled about in order to "hold the line" on availability (and the subsequent management costs) of computing resources. The instant gratification culture that we are cultivating with SaaS and cloud will not be held in check if IT "holds the line" by saying "no" to requests for capacity/capability.

I have a recommendation for Jett and the folks at Boeing and elsewhere who are fearing the unintended consequences of frictionless system capacity brought about by virtualization. Push the control point for deployment policy upstream via automated build, release, and management processes for applications released as virtual machines, manage the scale problem by going vertical with a JeOS architecture, and build a seamless bridge managed by IT to cloud offerings like Amazon EC2. Charge the users with the costs for deployment and management, but give them the technology to do it the right way. Check out our cloud computing adoption model and the webinar that accompanies it. Rogue cloud deployments can be avoided, even in the face of VM sprawl control measures, when you say "yes" to your users while holding them accountable for building manageable system images.

Can You See the Clouds from Windows

From October 24, 2008

During the course of our webinar entitled "The Pragmatist's Guide to Cloud Computing: 5 Steps to Real ROI," several of the attendees submitted questions regarding the status of Windows as an environment for cloud applications. In a partial answer to the question, Jeff Barr, a speaker during the webinar and a member of the Amazon Web Services team, responded that a beta implementation of Windows for EC2 was now available. The problem with the notion of “Windows for EC2” is that it perpetuates the broken, legacy model of tying your application to the infrastructure upon which it runs.

In the legacy model, applications became artificially tied to the physical server upon which they ran, and server utilization was low because it is very difficult to run multiple applications on a single instance of a general purpose operating system. The reason it is difficult to run multiple applications on a single instance of a general purpose operating system is because each application has unique needs which conflict or compete with the unique needs of other applications. Virtualization technology, such as that provided by VMware or Citrix with XenServer, breaks the bond of the application to a physical server by placing a layer of software, called a hypervisor, on the physical hardware beneath the operating system instances that support each application. The applications are “isolated” from one another inside virtual machines, and this isolation eliminates the conflicts.

Amazon embraces this virtualization model by using Xen to enable their Elastic Compute Cloud (EC2) service. So what's the problem? If the OS instances are not tied to the physical servers any longer (indeed you do not even know which physical system is running your application on EC2, nor do you need to know), why am I raising a hullabaloo over a “broken model?” The reason this new model of Windows for EC2 is broken is because your application is now artificially coupled to EC2. When you begin with a Windows Amazon Machine Image (AMI), install your application on top, configure-test, configure-test, configure-test, configure-test, configure-test to get it right, and then save the tested configuration as a new AMI, the only place you can run this tested configuration of your application is on Amazon's EC2. If you want to run the application on another virtualized cloud, say maybe one provided by RackSpace, or Terremark, or GoGrid, or even your own internal virtualized cloud of systems, you have to install the application yet again, configure-test, configure-test, configure-test, configure-test, configure-test to get it right again, and then save the tested configuration on the other cloud service. Why don't we just stop the madness and admit that binding the OS to the physical infrastructure upon which it runs is a flawed approach when applications run as virtual machine images (or virtual appliances) atop a hypervisor or virtualized cloud of systems like EC2?

The reason that we are continuing the madness is because madness is all we have ever known. Everyone knows that you bind an operating system to a physical host. Operating systems are useless unless they bind to something, and until the emergence of the hypervisor as the layer that binds to the physical host, the only sensible approach for operating system distribution was to bind it to the physical host. When you buy hardware, you make it useful by installing an operating system as step one. But if the operating system that you install as step one in the new virtualized world is a hypervisor in lieu of a general purpose operating system, how do we get applications to be supported on this new type of host? Here's your answer -- what we previously knew as the general purpose operating system now needs to be transformed to just enough operating system (JeOS or “juice”) to support the application, and it should bind to the application NOT THE INFRASTRUCTURE.

Virtualization enables the separation of the application from the infrastructure upon which it runs – making possible a level of business agility and dynamicism previously unthinkable. Imagine being able to run your applications on-demand in any data-center around the world that exposes the hypervisor (any hypervisor) as the runtime environment. Privacy laws prevent an application supporting medical records in Switzerland from running in an Amazon datacenter in Belgium? No problem, run the application in Switzerland. Need to run the same application in Belgium in support of a new service being offered there next month? No problem, run it on Amazon's infrastructure in Belgium. The application has to support the covert operations associated with homeland security and it cannot be accessed via any Internet connection? No problem, provide it as a virtual appliance for the NSA to run on their private network. Just signed a strategic deal with RackSpace that provides an extraordinary level of service that Amazon is not willing to embrace at this time? No problem, shut down the instances running on EC2 and spin them up at RackSpace. All of this dynamic capability is possible without the tedious cycle of configure-test -- if we will simply bind the operating system to the application in order to free it from the infrastructure and let it fly into the clouds.

So why doesn't Microsoft simply allow Windows to become an application support infrastructure, aka JeOS, instead of a general purpose operating system that is bound to the infrastructure? Because JeOS disrupts their licensing and distribution model. Turning a ship as big as the Microsoft Windows licensing vessel might require a figurative body of water bigger than the Atlantic, Pacific, and Indian oceans combined. But if they don't find a way to turn the ship, they may find that their intransigence becomes the catalyst for ever increasing deployments of Linux and related open source technology that is unfettered by the momentum of a mighty business model. Folks with valuable .Net application assets might begin to consider technology such as Novell's mono project as a bridge to span their applications into the clouds via Linux.

I can tell you that there are lots of folks asking lots of questions about how to enable Windows applications in the “cloud.” I do not believe the answer is “Windows for EC2” plus “Windows for GoGrid” plus “Windows for RackSpace” plus “Windows for [insert your data-center cloud name here].” If Microsoft does not find a way to turn the licensing ship and embrace JeOS, the market will eventually embrace alternatives that provide the business agility that virtualization and cloud computing promises.

Will the Credit Crunch Accelerate the Cloud Punch

From October 14, 2008

It's no secret that the days of cheap capital might be over. While it is obvious that startups with lean capital structures are already embracing cloud offerings such as Amazon EC2 for computing and S3 for storage, it seems to me that this trend might accelerate further for both startups and even enterprise customers.

Cloud consumption in the startup segment is poised to accelerate as investors like Sequoia Capital warn their portfolio companies to “tighten up” in the face of this credit crunch. Even the well capitalized SaaS software providers might begin re-considering the “ridiculous” expense of building out their offerings based upon the classic salesforce.com model of large scale, proprietary datacenters with complex and expensive approaches to multi-tenancy. They might be better served by a KnowledgeTree model where on-demand application value is delivered via virtual appliances. In this model, the customer can deploy the software on existing gear (no dedicated server required) because the virtualization model makes for a seamless, easy path to value without setup hassles. Or they can receive the value of the application as a SaaS offering when KnowledgeTree spins up their instance of the technology on Amazon's elastic compute cloud. In both cases, the customer and KnowledgeTree both avoid the capital cost of acquiring dedicated gear to run the application.

Large enterprises as well will be re-considering large scale datacenter projects. When credit is tight, everyone from municipal governments to the best capitalized financial institutions must find ways to avoid outlays of precious capital ahead of the reality of customer collections. More and more of these customers will be sifting through their application portfolio in search of workloads that can be offloaded to the cloud in order to free up existing resources and avoid outlays for new capacity to support high priority projects. Just as the 9/11 meltdown was a catalyst for the adoption of Linux (I witnessed this phenomenon as the head of enterprise sales at Red Hat), a similar phenomenon might emerge for incremental adoption of cloud associated with the credit crunch of 2008. All new projects will be further scrutinized to determine “Is there a better way forward than the status quo?”

As enterprises of all sizes evaluate new approaches to minimize capital outlays while accelerating competitive advantage via new applications, rPath is offering a novel adoption model for cloud computing that might serve as a convenient bridge to close the credit crunch capital gap. For those that are interested in exploring this new model, pleas join us in a webinar along with the good folks at Forrester, Amazon, and Momentum SI on October 23rd. If necessity is the mother of invention, we might be poised for some truly terrific innovations in the cloud space . . . . and we will owe a debt of gratitude to the credit crunch for driving the new architecture forward.

Larry Rains on the Cloud Parade

From September 30, 2008

At Oracle world last week, Larry Ellison derided the current “cloud” craze, likening the technology industry's obsession with “fashion” to the women's apparel industry. In a sense, he is right. Everything is being labeled cloud these days. New datacenters from IBM – cloud. New browser from Google – cloud. New strategy from VMware – cloud. I myself commented to Ben Worthen of the Wall Street Journal that I too feel the cloud craze is a bit “nutty.” At the same time, I believe there is some real change underfoot in the industry, and I believe that Amazon's Elastic Compute Cloud (EC2) is leading the way in capturing the imagination about what is possible with a new approach.

The reason EC2 has captured the imagination of so many people in the industry is because it offers the possibility of closing the painful gap that exists between application development and production operations. Promoting applications from development to production has typically been a contentious negotiation between the line of business application developers and the IT production operations management crew. It is a difficult process because the objectives of apps and ops run orthogonal to one another. Apps is about new features to quickly respond to market demand, and ops is about compliance, stringent change control, and standardization to assure stability.

With EC2, developers don't negotiate with operations at all. They simply package up the innovations they want inside a coordinated set of virtual machines (virtual appliances in the case of the ISV vernacular), and deploy, scale, and retire based upon the true workload demands of the market. No requisitions for hardware. No laborious setup of operating environments for new servers. No filling out waivers for using new software components that are not production approved yet. No replacement of components that fail the waiver process and re-coding when the production components don't work with the new application features. No re-testing. No re-coding. No internal chargebacks for servers that are not really being used because the demand for the application has waned. No painful system updates that break the application – even when the system function is irrelevant to the workload. No. No. No.

The on-demand, self-service datacenter architecture of Amazon's EC2 is going to put huge pressure on the operations organization to respond with an internal “cloud” architecture – or lose the business of the developers who would rather “go to the cloud” than negotiate with ops. Here at rPath, we believe that the ops folks are going to need to provide the apps folks with a release (rBuilder) and lifecycle management system (rPath Lifecycle Management Platform) that enables the self-service capability and rapid promotion of EC2 while preserving compliance with operating policies that assure stability and security. And, if an application really takes off, you don't have to build a new datacenter to respond to the demand. Just scale out the workload onto Amazon, or another provider with a similar cloud architecture. IT operations now has a way to say “yes we can” instead of “no you can't.” Getting to “yes” from your IT ops provider by closing the gap between apps and ops is what the excitement of cloud is all about.

Single Minute Exchange of Applications - The Cure for Server Hoarding

From August 17, 2008

I recently had an interesting conversation with an IT executive that has built a self-service datacenter capability based upon virtualization. He described for me a system whereby business units can request “virtual server hosts” with a pre-set system environment (i.e. Linux and Java), and within an hour or two they receive an email notification informing them of the availability of the “virtual machines.” The goal of this system, as it was explained to me, is to “cure server hoarding” by the business units.

The theory is that if the business units are confident that they can get new capacity “on-demand,” then they will not request more systems than they really need. And since they are billed based upon the actual amount of capacity deployed, they have incentive to “give back” any systems that are not necessary to meet production demands. I asked how it was working:

IT Exec – Great. We have over 1500 virtual machines actively deployed in production in support of business unit demand.

Billy – Wow! That's terrific. What do the statistics look like for server returns?

IT Exec – What do you mean?

Billy – I mean how many systems have the business units returned to the pool of available systems because their demand was transitory?

IT Exec – No one has ever given back a single machine ever. They have the economic incentive to do so, but so far not one machine has ever been given back to the pool.

And therein lies the problem. The reason no one gives systems back is because the setup costs associated with getting them productive are simply too high. Even in this case, when the setup of the operating environment is accomplished within an hour or two of the request, the process of “fiddling around with the system” to get the application installed, configured, and stable is so expensive that no one ever gives a productive system back when demand falls. This situation leads to tons of waste in the form of over deployed capital and over consumption of resources such as power. I am reminded of the early days of the lean production revolution in the world of manufacturing.

In the late eighties, Toyota was whipping Detroit's fanny because they had implemented a system that the folks in Detroit did not think was possible. The folks at Toyota got much higher utilization out of their capital investment with much lower levels of waste and work in process because they had implemented a system that assured the expensive production equipment was always engaged in producing parts and vehicles that closely reflected true demand. A big part of this system was a capability known as the Single Minute Exchange of Dies, or the SMED system, which was pioneered by Toyota and evangelized by the legendary manufacturing engineer, Shigeo Shingo.

With SMED, expensive body stamping machines (or any machine for that matter) are kept productively engaged building the exact parts that are required to meet true demand by reducing the setup time for a “changeover” to less than 10 minutes. This is accomplished primarily by precisely defining the interface between the machine and the stamping dies such that the dies can be prepared for production “off-line.” While a machine is productively engaged building Part A, the dies for Part B are setup for production in a manner that does not require interfacing with the production machine. When it is time for a changeover from Part A to Part B, the machine stops, the Part A dies are quickly released and pulled from the machine, and the Part B dies are quickly engaged using a highly standardized interface. No fiddling around to get it right. The machine starts up again in less than 10 minutes and down the line roll the perfect output for Part B.

Contrast this approach with the standard approach in Detroit in the late eighties. The economy of scale theory in Detroit was to set up the line for long runs of a single part type and build inventory because changing over the line was filled with setup costs. Fiddling around with the dies to get the parts to come off according to specification might take a day or even a week. So instead of building for true demand, Detroit over-deployed resources, both capital equipment and work in process, in an attempt to compensate for poor setup engineering. We all know how this story ends. The Toyota system is still the envy of the manufacturing world.

Now is the time for the technology world to take a lesson from Toyota. Virtualization will provide the standard interface for production, but it is almost worthless without “setup” technology that enables the applications to be defined independent from the production machine. The resources of the datacenter should reflect “true demand” for production output instead of idling away – suffering from a miserable case of server hoarding because setup is so expensive and error prone. The time has come for SMEA – Single Minute Exchange of Applications.

At rPath, we are working towards SMEA every day. We have high hopes that the complementary trends of virtualization and cloud computing will highlight the possibility for an entirely new, and more efficient, approach for consumption of server production capacity. An approach where applications are readied for production without consuming machine cycles “fiddling around” to get the application stable. An approach where expensive machines running application A are given back for production of application B when true demand indicates that B needs the resources instead of A. The Department of Energy and CERN are already on board with this approach, but it will be curious to observe who in the technology world emerges as “Toyota” and how long it takes the status quo of “Detroit” to wake up and smell the coffee.

VMware Accelerates Cloud with Free ESX

From July 29, 2008

The new CEO of VMware, Paul Maritz, seems to be committed to establishing VMware technology as the basis for emerging compute cloud offerings that enable shared, scalable infrastructure as a service via hypervisor virtualization. With Amazon EC2, the poster child for the successful compute cloud offering, being based upon the competing Xen technology from Citrix, Maritz is losing no time staking claim to other potential providers by meeting the Xen price requirement – zero, zilch, nada, zip. I love it. Low cost drives adoption, and free is as good as it gets when it comes to low cost and adoption.

As the economics of servers tilt more and more toward larger systems with multi-core CPUs, the hypervisor is going to become a requirement for getting value from the newer, larger systems. Developers simply do not write code that scales effectively across lots of CPUs on a single system. The coding trend is toward service oriented architectures that enable functions as small, atomic applications running on one or two CPUs, with multiple units deployed to achieve scalability. Couple the bigger server trend with the SOA trend with the virtualization trend with the cloud trend, and you have a pretty big set of table stakes that VMware does not want to miss. If a hypervisor is a requirement, why not use VMware's hypervisor if it is free?

The only challenge with free in the case of VMware is going to be lack of freedom. Xen currently offers both free price and freedom because of its open source heritage. If I run into a problem with VMware's ESX, my only recourse is to depend on the good will of VMware to fix problems. With Xen, I have the option of fixing my own problem if I am so inclined and capable. It will be interesting to watch the hypervisor choices people make as they build their cloud infrastructures, both internally and for commercial consumption, based upon the successful Amazon EC2 architecture.

The CIO is the Last to Know

From July 24, 2008

A recent Goldman Sachs survey of CIOs indicates that these executives do not plan to spend much money on cloud computing in the coming year. Indeed, most of their stated plans involve reducing the amount of consulting services and hardware that they are buying. I'm certain the predictions are accurate, and this scenario will lead to even more rapid growth in cloud computing. And the CIO will be the last to know.

How does this work? If Goldman has correctly measured the intentions of the CIOs, then they will not be spending money on cloud computing. Instead it will be the business units that they are supposed to serve that will be spending the money because the service level of the IT department will not meet their needs. Recall the reduction in consultants and service personnel? When a fixed income group at an investment banking house needs to stand up 50 servers to run a set of Monte Carlo simulations to test a hypothesis, the over-stressed IT department response is going to be “we'll get to that request after we fill the 25 that are in line ahead of it. It will probably be next quarter.”

The “swoosh” sound you just heard is the developer of the simulation code swiping his credit card to set up his Amazon Web Services account. Three days later, he has 100 systems standing up on Amazon's Elastic Compute Cloud pumping back the information he needs to help his traders make money. The credit card bill is only about $5000 per month – much cheaper than the IT chargeback for similar capability. The head of fixed income hears about the profits due to the extra simulation capacity, and the developer gets a promotion and is encouraged to spin up another 100 to 200 machines to get even more aggressive with the strategy. Relative to the millions in profit, the cost is peanuts and the IT department just can't respond to these type requests anyway. The CIO is the last to know.

It always happens this way with new technology. As the leader of North America sales for Red Hat in 2002, I remember calling on the CIO of a company in the financial services industry that processed millions of transactions daily in support of the equities market. I sat in his office while he explained to me that his operation was mission critical – the markets depend on this operation. He would never consider using Linux and open source. “Why don't we take a tour of the datacenter,” he asked. I was game, so I replied “Sure.”

As we walked the floor, I noticed several machine consoles indicating they were attached to Red Hat Linux 7.1 servers. Here is the conversation that ensued:

Billy: What's this?

CIO: Huh? I don't know. Steve, what's this all about?

Steve the Admin: Yeah, we're running Red Hat Linux for most of our network services.

CIO: What do you mean?

Steve the Admin: You know, Apache, BIND, SendMail, a few transaction servers and log crunchers mixed in here and there.

CIO: How many of these are we running in this datacenter?

Steve the Admin: About 25% of the machines, I would guess. About 800 servers in total.

Billy: Why don't we go back to your office and have another conversation about how much value you are getting out of Linux and open source and how Red Hat can help you.

The CIO is always the last to know about new technology. The head of engineering brought UNIX into the enterprise for CAD/CAM and analysis applications, and the CIO was the last to know. Department managers brought in PCs and Windows for personal productivity and desktop publishing, and the CIO was the last to know. System administrators brought in Linux for network services, and the CIO was the last to know. The sales force brought in salesforce.com and introduced the enterprise to SaaS, and the CIO was the last to know. Developers in the business units will use cloud computing, and the CIO will be the last to know.

The good news is that CIOs know where their bread is buttered, and eventually supporting the business units becomes the top priority. In this case, I would guess that all of that spending that Goldman noted as being earmarked for virtualization will pave the path for a hybrid approach to cloud computing. The enterprise IT function will begin to model the services that they provide after Amazon, with hypervisor virtualization as the basis of the compute capacity. Then, with a single, corporate architecture for cloud computing, applications will be able to scale seamlessly across the internal cloud infrastructure and also out into the external clouds when necessary for extra capacity. In this scenario, everyone gets what they want, and the CIO is a hero for reducing the fixed costs and operating budget associated with data center capacity. Being the last to know isn't necessarily a bad thing.

Shut Down the Datacenter

From July 7, 2008

Or at least power down significant pieces of it during periods of low demand. This message always draws funny looks from IT types when I suggest a seemingly simple answer to the problem of extreme costs for datacenter resources. I push on:

Billy – If utilization is around 20 – 30%, aren't there periods of time when you could just shut down about 50% of the systems? Or at least 25%?

IT – We can't just shut the systems down. . .

Billy – Why not? You aren't using them.

IT – You don't understand.

Billy – What am I missing?

IT – Well, it just doesn't work that way.

Billy – How does it work?

IT – It takes a long time to lay the application down atop a production server.

Billy – Why?

IT – Set up is complicated. Laying down the application and bringing it online can take several days, typically 2 to 4 weeks.

Billy – So part of the application definition is described by the physical system it runs on?

IT – Yes, that's right. If I shut down the physical system, I lose part of the definition and configuration of the application.

And therein lies the culprit. The “last mile” of application release engineering and deployment is a black art. Applications become tightly coupled to the physical hosts upon which they are deployed, and the physical hosts cannot be powered down without losing the definition of a stable application. Bringing the application back up is expensive due to the high costs of expert administration resources, and it is fraught with peril because the process is not repeatable. Enterprises are spending billions of dollars on datacenter operating costs because the risk of bring applications back on-line is not worth the savings of taking them off-line.

Of course I blame most of this mess on the faulty architecture of the One Size Fits All General Purpose Operating System (OSFAGPOS). OSFAGPOS is typically deployed in unison with the physical hosts because OSFAGPOS provides the drivers that enable the applications to access the hardware resources. To get an application to run correctly on OSFAGPOS, the system administrators then need to “fiddle with it” to adjust it to the needs of any given application. This “fiddling” is where things run amok. It's hard to document “fiddling,” and it is therefore difficult to repeat “fiddling.” The “fiddle” period can last for up to 30 days, depending on the complexity of the “fiddling” required.

So how do we get away from all of this “fiddling” around, and deploy an architecture that allows the datacenter to scale up and down based on actual demand? Start with a bare metal hypervisor as the layer that provides access to the hardware. Then extend release engineering discipline to include the OS by releasing applications as virtual machines with Just Enough OS (JeOS or “juice”) in lieu of OSFAGPOS, complete with all of the “metadata” required to access the appropriate resources (memory, CPU, data, network, authentication services, etc.). By decoupling the definition of the application from the physical hosts, a world of flexibility becomes possible for datacenter resources. Starting up applications becomes fast, cheap, and reliable. As an added bonus, embracing cloud capacity such as that provided by Amazon's EC2 becomes a reality. Instead of standing up application capacity in-house, certain peak demand workloads can be deployed “on-demand” with a variable cost model (in the case of Amazon it starts at about $.10/CPU/hr).

With oil trading at around $140 per barrel, the cost of allowing datacenter resources to “idle” during slow demand periods is becoming a real burden. “Fiddling around” with applications to get them deployed on OSFAGPOS is no longer just good clean fun for system administrators. It is serious money.

Cloud Computing Casts Shadow on Walled Gardens

From April 17, 2008

As a technology provider that helps application companies embrace cloud computing by virtualizing the applications to run on any cloud, I was a bit disappointed with Google's appengine announcement. It appears that Google is embracing the “walled garden” approach of salesforce.com and Microsoft instead of the cloud approach of Amazon. I believe that walled gardens will ultimately be overshadowed by clouds because you cannot achieve webscale computing if every application has to run on a server owned by Google.

Historically, Google has been very good about providing APIs that enable applications to access its web services independent of the computer on which they run. This is an important concept because it is often the case that an application needs to run on a particular network or network segment in order to preserve some critical aspect of performance or security. It is also important because it provides developers with the broadest choice of system and programming tools when developing or maintaining their applications. If you must program the application in the Python implementation specified by Google and run it on a Google server in order to take advantage of services like BigTable and Sawzall, a huge segment of the application market has just been eliminated from consideration (note that it is unclear to me at this time if Big Table and Sawzall can be accessed independent of appengine).

Why not simply expose a virtual machine API (such as Amazon Machine Image) along with the API for the web services (such as Amazon's S3, SQS, etc.)? Application instances that require minimal latency to Google services are provisioned as virtualized appliances on a Google server. For applications that need to run on a different network, you can provision the same system definition to that network while accessing the web services over the Internet. Write the program in any language you choose. With any set of system components that you choose.

The problem with walled gardens is that they ultimately restrict the growth of the market. While it is true that an attractive and well manicured walled garden will result in asymetrically large economic rent for the owner of the garden (witness Microsoft), the size of the market is nonetheless constrained. It seems to me that Google would reap the greatest benefit from maximizing the market for cloud applications quickly – independent of their ability to collect an asymetrically large portion of the rent from that market. Even their marketing of the current implementation of appengine indicates this hypothesis is correct – it is free. Success with cloud computing will no doubt lead to a decline in the value of the Microsoft system software franchise (the ultimate walled garden). Why not accelerate that decline with broad market capability instead of yet another walled garden (YAWG)?

Let me provide a concrete example. rPath was approached by a SaaS application provider to help them release their on-demand application as an on-premise application – without sacrificing management control of the system software. They want on-premise capability in order to meet the data security requirements of a certain segment of the market which they have been unable to penetrate with their SaaS offering. Their current application runs on Microsoft server technology, but it is written in Java so skipping out of the Microsoft walled garden was pretty trivial. We provided them with a virtualized implementation of their application, and we demonstrated how it could run on a local network atop a hypervisor, or as a variable cost implementation on Amazon's elastic compute cloud (EC2). Their reaction was so positive that they are now planning to gradually migrate their entire infrastructure from Microsoft to virtual infrastructure in order to seamlessly deliver the application via SaaS, variable cost cloud (Amazon), and local network (virtual appliance). Without changing their preference for programming language. Without sacrificing control of the system software layer.

To be fair to Google, appengine is a beta service. I have no doubt that they made compromises in architecture in order to get the service out the door more quickly. I hope they follow Amazon's lead and expose all of their great services as true web services while enabling any application to run close to those services via a simple virtualization spec such as Amazon's AMI. The faster we take the market to cloud computing, the sooner we can kill off the walled gardens through webscale shadows that deprive them of economic sunlight.

A Big Switch or a Gradual Shift

From March 11, 2008

I just finished reading Nicholas Carr's new book, The Big Switch. I enjoyed the read, but I found the conclusions just a bit sensational. Not surprising, as all such books seek to be titillating and a bit controversial in order to hold our attention from cover to cover. The basic premise of the book is that there will be a “big switch” from internal application development, deployment, and management to external procurement of application services. The losers will be the skilled developers and IT staff that currently toil away inside the development centers and datacenters of corporations, and the winners will be the application providers such as Google and salesforce.com that provide applications on demand. I do not believe the "big switch" will be so black and white, but I do believe a gradual shift is underway.

The historical metaphor that Carr effectively uses to demonstrate the likelihood of this pending change is the switch from locally produced electrical power to regionally produced electrical power delivered via a high performing electrical grid infrastructure. In Carr's metaphor electricity is analogous to applications and the electrical grid is analogous to the Internet. There are clearly some parallels, but I believe the metaphor is flawed because information applications are more analogous to hair dryers, drill presses, and die stamping machines (i.e. applications that consume electricity) as opposed to the electricity itself.

Here is a simple example. Both a paper mill and a steel mill have a need for high voltage electricity, but the paper mill applies that electricity to an application that involves digesting wood chips into a slurry suitable for making paper while the steel mill applies that electricity to the transformation of molten iron ore into various steel products. The paper mill has no use for an application that transforms iron ore, and the steel mill has no use for an application that digests wood chips. Their application requirements are very different, but they do use very similar electrical inputs.

It is true enough that all businesses have a need for certain applications that are somewhat universal. Salesforce.com has certainly demonstrated that a single implementation of a customer relationship management and sales force automation application can be applied across a variety of businesses and delivered effectively via the Internet. Perhaps Google will indeed accomplish the same result for basic professional productivity application such as word processing and spreadsheet analysis. But what is the fate of proprietary applications? Is salesforce.com going to deliver chip design and analysis simulations to Intel? I doubt it. Is Google going to deliver portfolio and risk analysis applications to Goldman Sachs? Unlikely. If these applications are not candidates for the “big switch,” how might their delivery still be improved according to Carr's theory?

Carr identifies two key technology developments in the “big switch” from local to regionally produced power - alternating current (AC) and reliable transformers. Alternating current makes it possible to distribute high potential voltage over large distances while transformers reliably “step down” this voltage to levels where it can be safely and reliably consumed by a variety of applications (hair dryers, drill presses, etc.). Clearly fiber optics and broadband switching are the IT equivalent of alternating current by enabling efficient delivery over long distances. I believe that hypervisors coupled with virtual appliances are analogous to the transformer technology of the power system. When applications can reliably plug into a grid to receive “power” in a standardized and repeatable manner, it will be increasingly popular to let someone else deliver the power of the grid while the individual companies focus on the “design of the application” (i.e. the drill press, the chip digester, the ore smelter).

Currently, applications used by Goldman Sachs to perform portfolio and risk analysis are not easily portable to a band of computers that Intel uses for chip design and simulation. The only way Goldman could reliably “move” these applications to another “power” provider would be to literally unbolt the racks of machines from their datacenter, truck them to another datacenter, and rebolt them to the floor and re-attach them to power and network. The definition of the applications is hard-coupled to the machines that run the applications because there was never any thought of running them on a different “grid.” It takes effort to design applications to be totally independent from the computers they run upon.

If, however, Goldman Sachs were to “transform” these applications into a coordinated group of virtual appliances, then they could literally “plug” the applications into any set of computers that exposed a standard hypervisor. As standards emerge for reliable “transformation” of applications to virtual appliances, opportunities will emerge for utility providers of variable cost datacenter capacity (aka cloud computers such as Amazon's EC2) to supply the “power” to these applications. I do not believe it will occur as a “big switch,” but I am convinced that we are witnessing the beginning of a gradual shift in the division of labor for application delivery. Companies will increasingly focus their scarce resources on the definition of the application, and the machines that provide “power” to the application will increasingly be purchased as variable cost computing cycles. But I have to agree with Carr that The Big Switch is a much better title for a book than The Gradual Shift.