Forecast Cloudy – Profiling Azure Cloud Service

We can get an in-depth analysis of the computational aspects of how azure application runs by using the Visual Studio Profiler.  Below I will show you how to use Sampling performance gathering method in Visual Studio Profiler to profile Cloud Service in Azure. If you need basic information on Visual Studio Profiler start here – .
Sampling is a statistical profiling method that shows you the functions that are doing most of the user mode work in the application. Sampling is a good place to start to look for areas to speed up your application.

At specified intervals, the Sampling method collects information about the functions that are executing in your application. After you finish a profiling run, the Summary view of the profiling data shows the most active function call tree, called the Hot Path, where most of the work in the application was performed. The view also lists the functions that were performing the most individual work, and provides a timeline graph you can use to focus on specific segments of the sampling session.

Sampling is the most common method to application profiling, there are other methods as well, but they may come with more performance overhead or require more application instrumentation.
Make sure you enable appropriate settings when publishing your application to Azure

  • In Solution Explorer, open the shortcut menu for your Azure project, and then choose Publish.
  • In the Advanced Settings tab, select the Enable profiling


  • Choose the Settings And select Sampling  and Enable Tier Interaction Profiling, and click OK.
  • Click Next and Publish the application


Once you ran your profile you can now view profile reports. To get them do following

  • Using Visual Studio Server Explorer, expand Azure -> Cloud Services ->Your_Cloud_Role ->Production (Profiling), right click on the instance, click on View Profiling Report.
  • report_view
  • It may take couple minutes for the profiling report to show up. Click on the Save icon to save a local copy for analysis


  • Once you are done with profiling you may republish application without that option checked.

Happy performance hunting. For more see –


Purgamentum Init Venari – Analyzing Java GC Using IBM Pattern Modeling Tool

Recently again looking at some GC issues on IBM Websphere platform for a buffy of mine I got to learn new tool – Pattern Modeling and Analysis Tool for IBM Java Garbage Collector (PMAT).  This is second post-mortem IBM analysis tool I had priviledge to work with from IBM , as I previously profiled JCA – Javacore Dump Analysis Tool here 

The PMAT tool parses verbose GC trace, analyzes Java heap usage, and recommends key configurations based on pattern modeling of Java heap usage.  

Why do we need it?

When the JVM (Java virtual machine) cannot allocate an object from the current heap because of lack of space, a memory allocation fault occurs, and the Garbage Collector is invoked. The first task of the Garbage Collector is to collect all the garbage that is in the heap. This process starts when any thread calls the Garbage Collector either indirectly as a result of allocation failure or directly by a specific call to System.gc(). The first step is to get all the locks needed by the garbage collection process. This step ensures that other threads are not suspended while they are holding critical locks. All other threads are then suspended. Garbage collection can then begin. It occurs in three phases: Mark, Sweep, and Compaction (optional).

Sometimes you run into issues, most common are either performance issues in applications due to especially long running , aka “violent” GCs or rooted objects on the heap not getting cleaned up and application crashing with dreaded OutOfMemory error and causing you have to analyze Garbage Collection with Verbose GC on.

JRE Java.Lang.OutOfMemory

Verbose GC is a command-line option that one can supply to the JVM at start-up time. The format is: -verbose:gc or -verbosegc. This option switches on a substantial trace of every garbage collection cycle. The format for the generated information is not designed and therefore varies among various platforms and releases.

This trace should allow one to see the gross heap usage in every garbage collection cycle. For example, one could monitor the output to see the changes in the free heap space and the total heap space. This information can be used to determine whether garbage collections are taking too long to run; whether too many garbage collections are occurring; and whether the JVM crashed during garbage collection.

How does it work?

PMAT analyzes verbose GC traces by parsing the traces and building pattern models. PMAT recommends key configurations by executing a diagnosis engine and pattern modeling algorithm. If there are any errors related with Java heap exhaustion or fragmentation in the verbose GC trace, PMAT can diagnose the root cause of failures. PMAT provides rich chart features that graphically display Java heap usage.

The following features are included:

Where do I get it?

You can download this tool from –

Running the tool

      Run gaNNN.jar with the Java Run-time Environment. (NNN is the version number).

      You will see following initial screen

pmat1Select and open verbosegc log


Process Log and View Summary\Reports


More detailed information on the tool can be found in presentation here – 

Jinwoo Hwang was technical leader at IBM WebSphere Application Server Technical Support that created this tool, as well as JCA.  I recommend that everyone reads his articles on JVM internals – , 

Hope this short note helps

Capre Noctem – Using SQL Server Diagnostics (Preview) to analyze SQL Server minidump

Microsoft just released the SQL Server Diagnostics (Preview) extension within SQL Server Management Studio and Developer APIs to empower SQL Server customers to achieve more through a variety of offerings to self-resolve SQL Server issues.


So what can it do:

    • Analyze SQL Server dump.

Customers should be able to debug and self-resolve memory dump issues from their SQL Server instances and receive recommended Knowledge Base (KB) article(s) from Microsoft, which may be applicable for the fix.

  • Review recommendations to keep SQL Server instances up to date.


Customers will be able to keep their SQL Server instances up-to-date by easily reviewing the recommendations for their SQL Server instances. Customers can filter by product version or by feature area (e.g. Always On, Backup/Restore, Column Store, etc.) and view the latest Cumulative Updates (CU) and the underlying hotfixes addressed in the CU.

  • Developers who want to discover and learn about Microsoft APIs can view developer portal and then use APIs in their custom applications. Developers can log and discuss issues and even submit their applications to the application gallery.


So I installed this extension and decided to give it a go with one of the SQL Server minidumps I have.


It actually correctly identified an issue and issues suggestions. So, while this may not work on every dump, it should worth trying before you start up WinDbg.

Give it a try. Hope this helps.

Let Me Count The Ways – Various methods of generating stack dump for JVM in production

As I profiled previously thread dumps in Java are essential in diagnosing production issues with high CPU, locking, threading deadlocks, etc. There are great online thread dump analysis tools such as that can analyze and spot problems. But to those tools you need provide proper thread dumps as input. I already blogged about many tools to do so in the past like jstack, JvisualVM and Java Mission Control. Here I will try to summarize all of the ways to capture usable thread dumps in production Java application:

  • JStack

JStack remains one of the most common ways to capture thread dumps. It’s a command ike utility bundled in JDK. The Jstack tool is shipped in JDK_HOME\bin folder. Here is the command that you need to issue to capture thread dump:

jstack -l   > 


pid: is the Process Id of the application, whose thread dump should be captured

file-path: is the file path where thread dump will be written in to.

Example here:

jstack -l 37321 > /opt/tmp/threadDump.txt

As per the example thread dump of the process would be generated in /opt/tmp/threadDump.txt file.

    • Kill –3


In many customers only JREs are installed in production machines. Since jstack and other tools are only part of JDK, you wouldn’t be able to use jstack. In such circumstances, ‘kill -3’ option can be used.

kill -3 


pid: is the Process Id of the application, whose thread dump should be captured


 Kill -3 37321

When ‘kill -3’ option is used thread dump is sent to standard error stream. Fpr example in apps running under Tomcat it will be <TOMCAT_HOME>/logs/catalina.out file. VisualVM Java VisualVM is a graphical user interface tool that provides detailed information about the applications while they are running on a specified Java Virtual Machine (JVM). It’s located in JDK_HOME\bin\jvisualvm.exe. It’s part of Sun\Oracle JDK distribution since JDK 6 update 7.s Launch the jvisualvm. On the left panel, you will notice all the java applications that are running on your machine. You need to select your application from the list (see the red color highlight in the below diagram). This tool also has the capability to capture thread dumps from the java processes that are running in remote host as well. vjvm In order to generate thread dump, go to Threads Tab and click on Thread Dump button.

    •   Java Mission Control


Java Mission Control (JMC) is a tool that collects and analyze data from Java applications running locally or deployed in production environments. This tool has been packaged into JDK since Oracle JDK 7 Update 40. This tool also provides an option to take thread dumps from the JVM. JMC tool is present in JDK_HOME\bin\jmc.exe Once you launch the tool, you will see all the Java processes that are running on your local hostAs you use Flight Recorder feature on one of these processes , in the “Thread Dump” field, you can select the interval in which you want to capture thread dump. jmc

    • ThreadMXBean


Introduced in JDK 1.5, ThreadMXBean is a management interface for thread system in JVM and allows you to create thread dump in few lines of code in application like below:


public void  dumpThreadDump() {

        ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean();

        for (ThreadInfo ti : threadMxBean.dumpAllThreads(true, true)) {




  • JCMD

The jcmd tool was introduced with Oracle’s Java 7. It’s useful in troubleshooting issues with JVM applications. It has various capabilities such as identifying java process Ids, acquiring heap dumps, acquiring thread dumps, acquiring garbage collection statistics, ….

Using the below JCMD command you can generate thread dump:

jcmd  Thread.print > 


pid: is the Process Id of the application, whose thread dump should be captured

file-path: is the file path where thread dump will be written in to.


jcmd 37321 Thread.print > /opt/tmp/threadDump.txt

For more see –, , , ,

Taking Care of the Garbage – Generational is default GC policy on newer WebSphere AS

Imagine you have a legacy Java application running in IBM WebSphere that you have upgraded finally to newer version. Yet, customer is reporting serious performance regression. Why would that be? Well, one reason maybe a change in default JVM behavior between WebSphere versions, something that one of my customers discovered the “hard way”

Garbage collection (GC) is an integral part of the Java Virtual Machine (JVM) as it collects unused Java heap memory so that the application can continue allocating new objects. The effectiveness and performance of the GC play an important role in application performance and determinism. The IBM JVM provided with IBM WebSphere Application Server provides four different GC policy algorithms:

  • -Xgcpolicy:optthruput
  • -Xgcpolicy:optavgpause
  • -Xgcpolicy:gencon
  • -Xgcpolicy:balanced

Each of these algorithms provides different performance and deterministic qualities. In addition, the default policy in WebSphere Application Server V8 has changed from -Xgcpolicy:optthruput to the  policy -Xgcpolicy:gencon.   So lets dive in a bit what this really means.

The garbage collector

Different applications naturally have different memory usage patterns. A computationally intensive number crunching workload will not use the Java heap in the same way as a highly transactional customer-facing interface. To optimally handle these different sorts of workloads, different garbage collection strategies are required. The IBM JVM supports several garbage collection policies to enable you to choose the strategy that best fits your application

The parallel mark-sweep-compact collector: optthruput, formerly default

The simplest possible garbage collection technique is to continue allocating until free memory has been exhausted, then stop the application and process the entire heap. While this results in a very efficient garbage collector, it means that the user program must be able to tolerate the pauses introduced by the collector. Workloads that are only concerned about overall throughput might benefit from this strategy.

The optthruput policy (-Xgcpolicy:optthruput) implements this strategy. This collector uses a parallel mark-sweep algorithm. In a nutshell, this means that the collector first walks through the set of reachable objects, marking them as live data. A second pass then sweeps away the unmarked objects, leaving behind free memory than can be used for new allocations. The majority of this work can be done in parallel, so the collector uses additional threads (up to the number of CPUs by default) to get the job done faster, reducing the time the application remains paused.


The problem with a mark-sweep algorithm is that it can lead to fragmentation . There might be lots of free memory, but if it is in small slices interspersed with live objects then no individual piece might be large enough to satisfy a particular allocation.

The solution to this is compaction. In theory, the compactor slides all the live objects together to one end of the heap, leaving a single contiguous block of free space. This is an expensive operation because every live object might be moved, and every pointer to a moved object must be updated to the new location. As a result, compaction is generally only done when it appears to be necessary. Compaction can also be done in parallel, but it results in a less efficient packing of the live objects — instead of a single block of free space, several smaller ones might be created.


The concurrent collector: optavgpause

For applications that are willing to trade some overall throughput for shorter pauses, a different policy is available. The optavgpause policy (-Xgcpolicy:optavgpause) attempts to do as much GC work as possible before stopping the application, leading to shorter pauses . The same mark-sweep-compact collector is used, but much of the mark and sweep phases can be done as the application runs. Based on the program’s allocation rate, the system attempts to predict when the next garbage collection will be required. When this threshold approaches, a concurrent GC begins. As application threads allocate objects, they will occasionally be asked to do a small amount of GC work before their allocation is fulfilled. The more allocations a thread does, the more it will be asked to help out. Meanwhile, one or more background GC threads will use idle cycles to get additional work done. Once all the concurrent work is done, or if free memory is exhausted ahead of schedule, the application is halted and the collection is completed. This pause is generally short, unless a compaction is required. Because compaction requires moving and updating live objects, it cannot be done concurrently.



The generational collection: gencon

has long been observed that the majority of objects created are only used for a short period of time. This is the result of both programming techniques and the type of application. Many common Java idioms create helper objects that are quickly discarded; for example StringBuffer/StringBuilder objects, or Iterator objects. These are allocated to accomplish a specific task, and are rarely needed afterwards. On a larger scale, applications that are transactional in nature also tend to create groups of objects that are used and discarded together. Once a reply to a database query has been returned, then the reply, the intermediate state, and the query itself are no longer needed.

This observation lead to the development of generational garbage collectors. The idea is to divide the heap up into different areas, and collect these areas at different rates. New objects are allocated out of one such area, called the nursery (or newspace). Since most objects in this area will become garbage quickly, collecting it offers the best chance to recover memory. Once an object has survived for a while, it is moved into a different area, called tenure (or oldspace). These objects are less likely to become garbage, so the collector examines them much less frequently. For the right sort of workload the result is collections that are faster and more efficient since less memory is examined, and a higher percentage of examined objects are reclaimed. Faster collections mean shorter pauses, and thus better application responsiveness.
IBM’s gencon policy (-Xgcpolicy:gencon) offers a generational GC (“gen-“) on top of the concurrent one described above (“-con”). The tenure space is collected as described above, while the nursery space uses a copying collector. This algorithm works by further subdividing the nursery area into allocate and survivor spaces . New objects are placed in allocate space until its free space has been exhausted. The application is then halted, and any live objects in allocate are copied into survivor. The two spaces then swap roles; that is, survivor becomes allocate, and the application is resumed. If an object has survived for a number of these copies, it is moved into the tenure area instead.



The region-based collector: balanced

A new garbage collection policy has been added in WebSphere Application Server V8. This policy, called balanced (-Xgcpolicy:balanced), expands on the notion of having different areas of the heap. It divides the heap into a large number of regions, which can be dealt with individually. Frankly I haven’t seen it used by any customer I worked with yet.


For more on WebSphere IBM JVM GC see –,,

Hope this helps

Javacore Dump Analysis using JCA – IBM Thread and Monitor Dump Analyzer for Java

In my previous blog posts I spent some time in illustrating tools for thread analysis for Oracle\Sun HotSpot JVM. However recently I actually had to analyze stack\javacore dumps from IBM WebSphere for hang condition and therefore had to research equivalent tools to analyze dumps from that JVM.

As we all know, during the run time of a Java process, some Java Virtual Machines (JVMs) may not respond predictably and oftentimes seem to hang up for a long time or until JVM shutdown occurs. It is not easy to determine the root cause of these sorts of problems.

By triggering a javacore when a Java process does not respond, it is possible to collect diagnostic information related to the JVM and a Java application captured at a particular point during execution. For example, the information can be about the operating system, the application environment, threads, native stack, locks, and memory. The exact contents are dependent on the platform on which the application is running.

On non IBM platforms, and in most cases, javacore is known as “javadump.” Check out my previous post of how to analyze dumps on Oracle Sun JVM via jstack utility. The code that creates javacore is part of the JVM. One can control it by using environment variables and run-time switches. By default, a javacore occurs when the JVM terminates unexpectedly. A javacore can also be triggered by sending specific signals to the JVM. Although javacore or javadump is present in Sun JVMs, much of the content of the javacore is added by IBM and, therefore, is present only in IBM JVMs.

This technology analyzes each thread information and provides diagnostic information, such as current thread information, the signal that caused the javacore, Java heap information (maximum Java heap size, initial Java heap size, garbage collector counter, allocation failure counter, free Java heap size, and allocated Java heap size), number of runnable threads, total number of threads, number of monitors locked, and deadlock information.

You can download tool from here –

Once the tool is downloaded, you can run jca.jar with the Java Run-time Environment


This will open up the tool


Let’s now use File-Open and open a javacore\dump file


The tool will show a screen with a progress bar while it loads the javacore. Clicking on the javacore you just loaded in the Thread Dump List


As you can see you get huge amount of JVM settings details, as well as If the tool detected a deadlock then it will be displayed in the lower section of the Thread Dump List





Selecting the Compare Monitors option from Analysis Menu will show deadlocked threads


Another useful screen is Thread Status Screen. Here you can see RUNNABLE vs. PARKED vs. BLOCKED threads and associated stacks. This screen can be very useful in resolving slow response and hang conditions


A complete explanation of all thread states for Java 7 can be found here – Once you locate the RUNNABLE threads that are executing your application code, find out which method is being executed by following the stack trace. You may get  assistance from development team if needed. Also note the Thread ID.

The following Thread in the example below is in BLOCK state which typically means it is waiting to acquire a lock on an Object monitor. You will need to search in the earlier section and determine which Thread is holding the lock so you can pinpoint the root cause.

3XMTHREADINFO      "[STUCK] ExecuteThread: '162' for queue: 'weblogic.kernel.Default (self-tuning)'" J9VMThread:0x000000013ACF0800, j9thread_t:0x000000013AC88B20, java/lang/Thread:0x070000001F945798, state:B, prio=1

3XMTHREADINFO1            (native thread ID:0x1AD0F3, native priority:0x1, native policy:UNKNOWN)

3XMTHREADINFO3           Java callstack:

4XESTACKTRACE                at org/springframework/jms/connection/SingleConnectionFactory.createConnection( Code))

4XESTACKTRACE                at org/springframework/jms/connection/SingleConnectionFactory.createQueueConnection( Code))

4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate102.createConnection( Code))

4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate.execute( Code))

4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate.send( Code))

4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate.send( Code))



Hope this helps. For more see –,

Resolving Java Threading Issues with ThreadLogic Tool

The ThreadLogic utility is a free tool you can download to assist in analyzing thread dumps taken from a JVM. Threadlogic can digest the log file containing the thread
dump output. This utility does a fair amount of the initial analysis for you like finding locks, the holder of locks, and fatal conditions. If you’ve ever read a raw thread dump from a log file then you know it can be daunting – especially if you don’t exactly what you are looking for. Threadlogic helps by recognizing the type of each thread and categorizing them in the help you understand which threads are JVM threads, WLS threads, and then “application” threads. In addition, Threadlogic can process a series of thread dumps and perform “diff” operation between them. This is helpful in determining what threads are doing over a period of time.

ThreadLogic can be downloaded from –  Thread Logic Comes in a form of Jar File.We need to manually run the file using “java -jar threadLogic.jar”


Opening the dump tree and selecting the Advisory Map show a Map with information about the health of the system under investigation. Each of the advisory has a health level indicating severity of the issue found, pattern, name, keyword and related advice.

ThreadLogic is able to parse Sun, JRockit, and IBM thread dumps and provide advice based on predefined and externally defined patterns.

The health levels (in descending of severity) are FATAL (meant for Deadlocks, STUCK, Finalizer blocked etc), WARNING, WATCH (worth watching), NORMAL and IGNORE.

Based on the highest criticality of threads within a group, that health level gets promoted to the Thread Group’s health level and same is repeated at the thread dump level. There can be multiple advisories tagged to a Thread, Thread Group and Thread Dump. This is a typical advisory map I see:


The threads are associated with thread groups based on the functionality or thread names. Additional patterns exists to tag other threads (like iWay Adapter, SAP, Tibco threads) and group them:


For more on the tool see –,,