Bill had a great post on VMworld 2010 earlier this week and mentioned a very useful session on hosting Java applications within a virtualized environment. Justin Murray's whitepaper "Java in Virtual Machines on VMware ESX: Best Practices" was released a while ago, but the suggestions are so fundamental they apply today. The steps are very pragmatic and apply to nearly any hypervised environment, although it becomes an increasing focus when your cloud computing infrastructure starts to increase its density.
One best practice that Justin cites is a perfect example of how infrastructure architecture changes in a cloud hosting environment: run one (and only one) JVM inside of a virtual machine. When thinking about density in a physical server environment this makes data center architects sweat - one JVM per server can easily lead to under-utilization. For cloud computing infrastructure this makes total sense... you can size your VM wrap around your JVM precisely, so that you won't have any wasted resources. Say your JVM is set to have a maximum heap size of 768M and your fairly minimal JeOS only consumes 128M of memory - create a virtual machine with 1GB of RAM and you've got a server that is sized precisely for it's purpose. Say you need to expand your heap size? No problem... just expand your VM accordingly! Now that you don't have to worry about pulling pizza boxes out of a rack to upgrade them, adding and even subtracting server memory is easy.
Hypervisors think about memory in a much different way, and often use transparent page sharing to allow more efficient use of a host server's memory. For example, if you have 4096 kilobytes of nothing but 0's it doesn't make much sense to reserve all this physical memory for nothing more than a span of empty space. It would be more efficient to say "I have 0 stored 33.5 million times in a row." Same thing for patterns of memory - say the eight bit pattern 10101010. We could instead assert "the pattern 10 is repeated four times." Here the pattern 10 may be considered a "page" that is shared four times in a row. We can especially do this within a consolidated cloud computing infrastructure; every VM running the same OS kernel likely will have similar memory patterns. This is one of the advantages of virtualization: you achieve superior resource allocation by sharing the page containing the same Ubuntu kernel rather than repeating its place in memory over and over again.
One hallmark of the Java virtual machine has been garbage collection. The practice and grooming of garbage collection is best detailed elsewhere, however it is best summarized as a series of batched operations that clean up and defragment working heap memory. In physical servers this usually isn't a huge issue - memory is fast and usually idle. Since we have a hypervisor that is sharing memory pages, however, this does become an issue when memory rapidly changes in large swaths. The solution is to use larger memory pages to avoid continuous swapping of shared pages. You can force the JVM to use larger memory pages by adding the argument -XX:+UseLargePages to your Java runtime (or -Xlp in the IBM Java runtime), but you also need to enable large page support in your operating system of choice. See the VMware Performance Study "Large Page Performance" for details on how to tell your guest operating system that you wish to use large page sizes.
Another optimization highlight, and one that seems contrary at first, is that fewer CPUs often means greater performance during high load times. It is important to empasize this is during high CPU load because the issue centers around context switching - the effective hand-off between parallel threads. Java itself does context switching very efficiently, especially after Java 6 Update 10 (or so I've noticed at least). Yet when a storm of hardware interrupts and thread thrashing comes around due to high load sometimes just the effort of doling out workloads to processors can cause things to slow down. Ramesh Peddakotla from HP had some great empirical data to share, benchmarking workloads against ever increasing CPU counts. Performance gains topped off at 4 CPUs; 2 CPUs seemed like the sweet spot. At a certain point the cost of orchestrating multiple CPUs outweighs the throughput of parallel threads in the midst of high CPU load.
Linux machines have yet another tweak they can enjoy to reduce interrupts: lower the clock interrupt rate. VMware worked with kernel developers at SuSE Linux to reduce RTC polling intervals and the modification has since been adopted by Linux distributions at large. The most recent Linux distributions no longer require customization or kernel flags to set the appropriate clock behavior.
Ultimately Java running on a VMware cloud can be as fast as bare hardware. Thanks to the efficiencies you may gain from isolating one Java runtime environment within a single operating system, you may even find your Java apps running faster.