Dancing with the Elephants and Flying With The Bees – Apache Hive Performance Tuning

hive_thumb

Tuning Hive and other Apache components that run in the background to support processing of HiveQL is particularly important as the scale of your workload and database volume increases. When your applications query data sets that constitute a large-scale enterprise data warehouse (EDW), tuning the environment and optimizing Hive queries are often part of an ongoing effort of your team.

Increasingly, most enterprises require that Hive queries run against the data warehouse
with low-latency analytical processing, which is often referred to as LLAP by Hortonworks. LLAP of real-time data can be further enhanced by integrating the EDW with the Druid business intelligence engine.

Hive2 Architecture and Internals.

Before we begin, I am assuming you are running Hive on Tez execution engine. Hive LLAP with Apache Tez utilizes newer technology available in Hive 2.x to be
an increasingly needed alternative to other execution engines like MapReduce
and earlier implementations of Hive on Tez. Tez runs in conjunction with Hive
LLAP to form a newer execution engine architecture that can support faster
queries.

The architecture og Hive2 is shown below:

hive2

  • HiveServer2: provides JDBC and ODBC interface, and query compilation
  • Query coordinators: coordinate the execution of a single query LLAP daemon: persistent server, typically one per node. This is the main differentiating component of thearchitecture, which enables faster query runtimes than earlier execution engines.
  • Query executors: threads running inside the LLAP daemon
  • In-memory cache: cache inside the LLAP daemon that is shared across all users

Tuning Hive cluster memory

After spending some time hearing from customers about slow running Hive queries or worst yet heap issues precluding successful query execution and doing some research I learned about some important tuning parameters that may be useful for folks supporting Hive workloads.  Some of these I will get into below.

To maximize performance of your Apache Hive query workloads, you need to optimize cluster configurations, queries, and underlying Hive table
design. This includes the following:

  • Configure CDH clusters for the maximum allowed heap memory size, load-balance concurrent connections across your CDH Hive
    components, and allocate adequate memory to support HiveServer2 and Hive metastore operations.
  • Review your Hive query workloads to make sure queries are not overly complex, that they do not access large numbers of Hive table partitions,
    or that they force the system to materialize all columns of accessed Hive tables when only a subset is necessary.
  • Review the underlying Hive table design, which is crucial to maximizing the throughput of Hive query workloads. Do not create thousands of
    table partitions that might cause queries containing JOINs to overtax HiveServer2 and the Hive metastore. Limit column width, and keep the
    number of columns under 1,000.

Memory Recommendations. 

HiveServer2 and the Hive metastore require sufficient memory to run correctly. The default heap size of 256 MB for each component is
inadequate for production workloads. Consider the following guidelines for sizing the heap for each component, based on your cluster size.

table

Important: These numbers are general guidance only, and can be affected by factors such as number of columns, partitions, complex joins, and
client activity. Based on your anticipated deployment, refine through testing to arrive at the best values for your environment.
In addition, the Beeline CLI should use a heap size of at least 2 GB.
Set the PermGen space for Java garbage collection to 512 MB for all.

Configuring Heap Size and GC

You can use Cloudera Manager to configure Heap Size and GC for HiveServer2

    • To set heap size, go to Home > Hive > Configuration > HiveServer2 > Resource Management. Set Java Heap Size of HiveServer2 in Bytes to the desired value, and click Save Changes.
    • To set garbage collection, go to Home > Hive > Configuration > HiveServer2 > Advanced. Set the PermGen space for Java garbage collection to 512M , the type of garbage collector used ( ConcMarkSweepGC or ParNewGC ), and enable
      or disable the garbage collection overhead limit in Java Configuration Options for HiveServer2. The following example sets the PermGen space to 512M , uses the new Parallel Collector, and disables the garbage collection overhead limit:
      -XX:MaxPermSize=512M -XX:+UseParNewGC -XX:-UseGCOverheadLimit
    • Once you made any changes to Heap and GC you will need to restart. From the Actions drop-down menu, select Restart to restart the HiveServer2 service

Similarly you can set up Heap Size and GC for Hive Metastore with the similar parameters using Cloudera Manager by going tp Home > Hive > Configuration > Hive Metastore > Resource Management.

If you are not using Cloudera Manager or on the HDInsight you can configure the heap size for HiveServer2 and Hive metastore in command line by setting the -Xmx parameter in the HADOOP_OPTS variable to the desired maximum heap size in /etc/hive/hive-env.sh .

The following example shows a configuration with HiveServer2 using 12 GB heap,  Hive Metastore using 12 GB heap,  Hive clients use 2 GB heap:


if [ "$SERVICE" = "cli" ]; then
if [ -z "$DEBUG" ]; then
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx12288m -Xms12288m
-XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-
UseGCOverheadLimit"
else
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx12288m -Xms12288m
-XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
fi
fi
export HADOOP_HEAPSIZE=2048

You can use either the Concurrent Collector or the new Parallel Collector for garbage collection by passing -XX:+useConcMarkSweepGC or –XX:+useParNewGC in the HADOOP_OPTS lines above. To enable the garbage collection overhead limit, remove the -XX:-UseGCOverheadLimit setting or change it to -XX:+UseGCOverheadLimit .

For more on latest docs on tuning memory and troubleshootng heap\GC exceptions in Hive see Cloudera docs https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_hive_tuning.html or for HDInsight see this great MS docs with configurations – https://docs.microsoft.com/en-us/azure/hdinsight/interactive-query/hive-llap-sizing-guide

Purgamentum Init Venari – Analyzing Java GC Using IBM Pattern Modeling Tool

Recently again looking at some GC issues on IBM Websphere platform for a buddy of mine I got to learn new tool – Pattern Modeling and Analysis Tool for IBM Java Garbage Collector (PMAT). This is second post-mortem IBM analysis tool I had priviledge to work with from IBM , as I previously profiled JCA – Javacore Dump Analysis Tool here https://gennadny.wordpress.com/2016/03/28/javacore-dump-analysis-using-jca-ibm-thread-and-monitor-dump-analyzer-for-java/ .

The PMAT tool parses verbose GC trace, analyzes Java heap usage, and recommends key configurations based on pattern modeling of Java heap usage.

Why do we need it?
When the JVM (Java virtual machine) cannot allocate an object from the current heap because of lack of space, a memory allocation fault occurs, and the Garbage Collector is invoked. The first task of the Garbage Collector is to collect all the garbage that is in the heap. This process starts when any thread calls the Garbage Collector either indirectly as a result of allocation failure or directly by a specific call to System.gc(). The first step is to get all the locks needed by the garbage collection process. This step ensures that other threads are not suspended while they are holding critical locks. All other threads are then suspended. Garbage collection can then begin. It occurs in three phases: Mark, Sweep, and Compaction (optional).

Sometimes you run into issues, most common are either performance issues in applications due to especially long running , aka “violent” GCs or rooted objects on the heap not getting cleaned up and application crashing with dreaded OutOfMemory error and causing you have to analyze Garbage Collection with Verbose GC on.

JRE Java.Lang.OutOfMemory
Verbose GC is a command-line option that one can supply to the JVM at start-up time. The format is: -verbose:gc or -verbosegc. This option switches on a substantial trace of every garbage collection cycle. The format for the generated information is not designed and therefore varies among various platforms and releases.
This trace should allow one to see the gross heap usage in every garbage collection cycle. For example, one could monitor the output to see the changes in the free heap space and the total heap space. This information can be used to determine whether garbage collections are taking too long to run; whether too many garbage collections are occurring; and whether the JVM crashed during garbage collection.

How does it work?
PMAT analyzes verbose GC traces by parsing the traces and building pattern models. PMAT recommends key configurations by executing a diagnosis engine and pattern modeling algorithm. If there are any errors related with Java heap exhaustion or fragmentation in the verbose GC trace, PMAT can diagnose the root cause of failures. PMAT provides rich chart features that graphically display Java heap usage.
The following features are included:

Where do I get it?

You can download this tool from – ftp://public.dhe.ibm.com/software/websphere/appserv/support/tools/pmat/ga456.jar

Running the tool
Run gaNNN.jar with the Java Run-time Environment. (NNN is the version number).
You will see following initial screen

pmat1

Select and open verbosegc log

pmat2

Process Log and View Summary\Reports

pmat3

More detailed information on the tool can be found in presentation here – http://www-01.ibm.com/support/docview.wss?uid=swg27007240&aid=1

Jinwoo Hwang was technical leader at IBM WebSphere Application Server Technical Support that created this tool, as well as JCA. I recommend that everyone reads his articles on JVM internals – http://websphere.sys-con.com/node/1229281, http://www-01.ibm.com/support/docview.wss?uid=swg27011855 , http://www-01.ibm.com/support/docview.wss?uid=swg27018423

Hope this short note helps

Meet Redis – Setting Up Redis On Ubuntu Linux

redis_thumb.jpg

I have been asked by few folks on quick tutorial setting up Redis under systemd in Ubuntu Linux version 16.04.

I have blogged quite a bit about Redis in general –https://gennadny.wordpress.com/category/redis/ , however just a quick line on Redis in general. Redis is an in-memory key-value store known for its flexibility, performance, and wide language support. That makes Redis one of the most popular key value data stores in existence today. Below are steps to install and configure it to run under systemd in Ubuntu 16.04 and above.

Here are the prerequisites:

Next steps are:

  • Login into your Ubuntu server with this user account
  • Update and install prerequisites via apt-get
             $ sudo apt-get update
             $ sudo apt-get install build-essential tcl
    
  • Now we can download and exgract Redis to tmp directory
              $ cd /tmp
              $ curl -O http://download.redis.io/redis-stable.tar.gz
              $ tar xzvf redis-stable.tar.gz
              $ cd redis-stable
    
  • Next we can build Redis
        $ make
    
  • After the binaries are compiled, run the test suite to make sure everything was built correctly. You can do this by typing:
       $ make test
    
  • This will typically take a few minutes to run. Once it is complete, you can install the binaries onto the system by typing:
    $ sudo make install
    

Now we need to configure Redis to run under systemd. Systemd is an init system used in Linux distributions to bootstrap the user space and manage all processes subsequently, instead of the UNIX System V or Berkeley Software Distribution (BSD) init systems. As of 2016, most Linux distributions have adopted systemd as their default init system.

  • To start off, we need to create a configuration directory. We will use the conventional /etc/redis directory, which can be created by typing
    $ sudo mkdir /etc/redi
    
  • Now, copy over the sample Redis configuration file included in the Redis source archive:
         $ sudo cp /tmp/redis-stable/redis.conf /etc/redis
    
  • Next, we can open the file to adjust a few items in the configuration:
    $ sudo nano /etc/redis/redis.conf
    
  • In the file, find the supervised directive. Currently, this is set to no. Since we are running an operating system that uses the systemd init system, we can change this to systemd:
    . . .
    
    # If you run Redis from upstart or systemd, Redis can interact with your
    # supervision tree. Options:
    #   supervised no      - no supervision interaction
    #   supervised upstart - signal upstart by putting Redis into SIGSTOP mode
    #   supervised systemd - signal systemd by writing READY=1 to $NOTIFY_SOCKET
    #   supervised auto    - detect upstart or systemd method based on
    #                        UPSTART_JOB or NOTIFY_SOCKET environment variables
    # Note: these supervision methods only signal "process is ready."
    #       They do not enable continuous liveness pings back to your supervisor.
    supervised systemd
    
    . . .
    
  • Next, find the dir directory. This option specifies the directory that Redis will use to dump persistent data. We need to pick a location that Redis will have write permission and that isn’t viewable by normal users.
    We will use the /var/lib/redis directory for this, which we will create

    . . .
    
    
    # The working directory.
    #
    # The DB will be written inside this directory, with the filename specified
    # above using the 'dbfilename' configuration directive.
    #
    # The Append Only File will also be created inside this directory.
    #
    # Note that you must specify a directory here, not a file name.
    dir /var/lib/redis
    
    . . .
    

    Save and close the file when you are finished

  • Next, we can create a systemd unit file so that the init system can manage the Redis process.
    Create and open the /etc/systemd/system/redis.service file to get started:

    $ sudo nano /etc/systemd/system/redis.service
    
  • The file will should like this, create sections below
    [Unit]
    Description=Redis In-Memory Data Store
    After=network.target
    
    [Service]
    User=redis
    Group=redis
    ExecStart=/usr/local/bin/redis-server /etc/redis/redis.conf
    ExecStop=/usr/local/bin/redis-cli shutdown
    Restart=always
    
    [Install]
    WantedBy=multi-user.target
    
  • Save and close file when you are finished

Now, we just have to create the user, group, and directory that we referenced in the previous two files.
Begin by creating the redis user and group. This can be done in a single command by typing:

$ sudo chown redis:redis /var/lib/redis

Now we can start Redis:

  $ sudo systemctl start redis

Check that the service had no errors by running:

$ sudo systemctl status redis

And Eureka – here is the response

redis.service - Redis Server
   Loaded: loaded (/etc/systemd/system/redis.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2016-05-11 14:38:08 EDT; 1min 43s ago
  Process: 3115 ExecStop=/usr/local/bin/redis-cli shutdown (code=exited, status=0/SUCCESS)
 Main PID: 3124 (redis-server)
    Tasks: 3 (limit: 512)
   Memory: 864.0K
      CPU: 179ms
   CGroup: /system.slice/redis.service
           └─3124 /usr/local/bin/redis-server 127.0.0.1:6379    

Congrats ! You can now start learning Redis. Connect to Redis CLI by typing

$ redis-cli

Now you can follow these Redis tutorials

Hope this was helpful

Let Me Count The Ways – Various methods of generating stack dump for JVM in production

As I profiled previously thread dumps in Java are essential in diagnosing production issues with high CPU, locking, threading deadlocks, etc. There are great online thread dump analysis tools such as http://fastthread.io/ that can analyze and spot problems. But to those tools you need provide proper thread dumps as input. I already blogged about many tools to do so in the past like jstack, JvisualVM and Java Mission Control. Here I will try to summarize all of the ways to capture usable thread dumps in production Java application:

  • JStack

JStack remains one of the most common ways to capture thread dumps. It’s a command ike utility bundled in JDK. The Jstack tool is shipped in JDK_HOME\bin folder. Here is the command that you need to issue to capture thread dump:

jstack -l   > 

Where

pid: is the Process Id of the application, whose thread dump should be captured

file-path: is the file path where thread dump will be written in to.

Example here:

jstack -l 37321 > /opt/tmp/threadDump.txt

As per the example thread dump of the process would be generated in /opt/tmp/threadDump.txt file.

    • Kill –3

 

In many customers only JREs are installed in production machines. Since jstack and other tools are only part of JDK, you wouldn’t be able to use jstack. In such circumstances, ‘kill -3’ option can be used.

kill -3 

Where:

pid: is the Process Id of the application, whose thread dump should be captured

Example:</P?

 Kill -3 37321

When ‘kill -3’ option is used thread dump is sent to standard error stream. Fpr example in apps running under Tomcat it will be <TOMCAT_HOME>/logs/catalina.out file. VisualVM Java VisualVM is a graphical user interface tool that provides detailed information about the applications while they are running on a specified Java Virtual Machine (JVM). It’s located in JDK_HOME\bin\jvisualvm.exe. It’s part of Sun\Oracle JDK distribution since JDK 6 update 7.s Launch the jvisualvm. On the left panel, you will notice all the java applications that are running on your machine. You need to select your application from the list (see the red color highlight in the below diagram). This tool also has the capability to capture thread dumps from the java processes that are running in remote host as well. vjvm In order to generate thread dump, go to Threads Tab and click on Thread Dump button.

    •   Java Mission Control

 

Java Mission Control (JMC) is a tool that collects and analyze data from Java applications running locally or deployed in production environments. This tool has been packaged into JDK since Oracle JDK 7 Update 40. This tool also provides an option to take thread dumps from the JVM. JMC tool is present in JDK_HOME\bin\jmc.exe Once you launch the tool, you will see all the Java processes that are running on your local hostAs you use Flight Recorder feature on one of these processes , in the “Thread Dump” field, you can select the interval in which you want to capture thread dump. jmc

    • ThreadMXBean

 

Introduced in JDK 1.5, ThreadMXBean is a management interface for thread system in JVM and allows you to create thread dump in few lines of code in application like below:

Example

public void  dumpThreadDump() {



        ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean();



        for (ThreadInfo ti : threadMxBean.dumpAllThreads(true, true)) {



            System.out.print(ti.toString());

threadmxbean

 

  • JCMD

The jcmd tool was introduced with Oracle’s Java 7. It’s useful in troubleshooting issues with JVM applications. It has various capabilities such as identifying java process Ids, acquiring heap dumps, acquiring thread dumps, acquiring garbage collection statistics, ….

Using the below JCMD command you can generate thread dump:

jcmd  Thread.print > 

where

pid: is the Process Id of the application, whose thread dump should be captured

file-path: is the file path where thread dump will be written in to.

Example

jcmd 37321 Thread.print > /opt/tmp/threadDump.txt

For more see – https://docs.oracle.com/javase/7/docs/technotes/tools/windows/jcmd.html, https://blogs.oracle.com/jmxetc/entry/threadmxbean_a_singleton_mxbean_for , http://www.javamex.com/tutorials/profiling/profiling_java5_threads_howto.shtml , http://blog.takipi.com/oracle-java-mission-control-the-ultimate-guide/ , https://www.prosysopc.com/blog/using-java-mission-control-for-performance-monitoring/

Ocultos Exitus–JDBC Driver Unicode Settings and SQL Server Performance

While troubleshooting JDBC client apps that connect to SQL SErver I ran into this issue few times, latest very recently.

As you may already know well, SQL Server differentiates its data types that support Unicode from the ones that just support ASCII. For example, the character data types that support Unicode are nchar, nvarchar, longnvarchar where as their ASCII counter parts are char, varchar and longvarchar respectively. By default, all Microsoft’s JDBC drivers send the strings in Unicode format to the SQL Server, irrespective of whether the datatype of the corresponding column defined in the SQL Server supports Unicode or not.

In the case where the data types of the columns support Unicode, everything is smooth. But, in cases where the data types of the columns do not support Unicode, serious performance issues arise especially during data fetches. SQL Server tries to convert non-unicode datatypes implicitly in the table to unicode datatypes before doing the comparison. Moreover, if an index exists on the non-unicode column, it will be ignored. This would ultimately lead to a whole table scan during data fetch, thereby slowing down the search queries drastically.

This can be corrected by resetting one of the default parameter in the Java driver. The parameter name and value to be set might vary from driver to driver, depending on the vendor.

Vendor Parameter
JSQLConnect asciiStringParameters
JTDS sendStringParametersAsUnicode
DataDirectConnect sendStringParametersAsUnicode
Microsoft JDBC sendStringParametersAsUnicode
WebLogic Type 4 JDBC sendStringParametersAsUnicode

Reading – https://msdn.microsoft.com/en-us/library/ms378857(v=sql.110).aspx you will see- “ For optimal performance with CHAR, VARCHAR or LONGVARCHAR type of non-Unicode parameters, set the sendStringParametersAsUnicode connection string property to “false” and use the non-national character methods.” , issue was also reported via http://www.sqlconsulting.com/jdbc.shtml, http://www.jochenhebbrecht.be/site/2014-05-01/java/fixing-slow-queries-running-sql-server-using-jpa-hibernate-and-jtds and http://www.codeproject.com/Articles/281364/Solving-Performance-issues-in-data-migration-to-SQ

Hope this helps.

Taking Care of the Garbage – Generational is default GC policy on newer WebSphere AS

Imagine you have a legacy Java application running in IBM WebSphere that you have upgraded finally to newer version. Yet, customer is reporting serious performance regression. Why would that be? Well, one reason maybe a change in default JVM behavior between WebSphere versions, something that one of my customers discovered the “hard way”

Garbage collection (GC) is an integral part of the Java Virtual Machine (JVM) as it collects unused Java heap memory so that the application can continue allocating new objects. The effectiveness and performance of the GC play an important role in application performance and determinism. The IBM JVM provided with IBM WebSphere Application Server provides four different GC policy algorithms:

  • -Xgcpolicy:optthruput
  • -Xgcpolicy:optavgpause
  • -Xgcpolicy:gencon
  • -Xgcpolicy:balanced

Each of these algorithms provides different performance and deterministic qualities. In addition, the default policy in WebSphere Application Server V8 has changed from -Xgcpolicy:optthruput to the  policy -Xgcpolicy:gencon.   So lets dive in a bit what this really means.

The garbage collector

Different applications naturally have different memory usage patterns. A computationally intensive number crunching workload will not use the Java heap in the same way as a highly transactional customer-facing interface. To optimally handle these different sorts of workloads, different garbage collection strategies are required. The IBM JVM supports several garbage collection policies to enable you to choose the strategy that best fits your application

The parallel mark-sweep-compact collector: optthruput, formerly default

The simplest possible garbage collection technique is to continue allocating until free memory has been exhausted, then stop the application and process the entire heap. While this results in a very efficient garbage collector, it means that the user program must be able to tolerate the pauses introduced by the collector. Workloads that are only concerned about overall throughput might benefit from this strategy.

The optthruput policy (-Xgcpolicy:optthruput) implements this strategy. This collector uses a parallel mark-sweep algorithm. In a nutshell, this means that the collector first walks through the set of reachable objects, marking them as live data. A second pass then sweeps away the unmarked objects, leaving behind free memory than can be used for new allocations. The majority of this work can be done in parallel, so the collector uses additional threads (up to the number of CPUs by default) to get the job done faster, reducing the time the application remains paused.

figure1

The problem with a mark-sweep algorithm is that it can lead to fragmentation . There might be lots of free memory, but if it is in small slices interspersed with live objects then no individual piece might be large enough to satisfy a particular allocation.

The solution to this is compaction. In theory, the compactor slides all the live objects together to one end of the heap, leaving a single contiguous block of free space. This is an expensive operation because every live object might be moved, and every pointer to a moved object must be updated to the new location. As a result, compaction is generally only done when it appears to be necessary. Compaction can also be done in parallel, but it results in a less efficient packing of the live objects — instead of a single block of free space, several smaller ones might be created.

figure2

The concurrent collector: optavgpause

For applications that are willing to trade some overall throughput for shorter pauses, a different policy is available. The optavgpause policy (-Xgcpolicy:optavgpause) attempts to do as much GC work as possible before stopping the application, leading to shorter pauses . The same mark-sweep-compact collector is used, but much of the mark and sweep phases can be done as the application runs. Based on the program’s allocation rate, the system attempts to predict when the next garbage collection will be required. When this threshold approaches, a concurrent GC begins. As application threads allocate objects, they will occasionally be asked to do a small amount of GC work before their allocation is fulfilled. The more allocations a thread does, the more it will be asked to help out. Meanwhile, one or more background GC threads will use idle cycles to get additional work done. Once all the concurrent work is done, or if free memory is exhausted ahead of schedule, the application is halted and the collection is completed. This pause is generally short, unless a compaction is required. Because compaction requires moving and updating live objects, it cannot be done concurrently.

figure3

 

The generational collection: gencon

has long been observed that the majority of objects created are only used for a short period of time. This is the result of both programming techniques and the type of application. Many common Java idioms create helper objects that are quickly discarded; for example StringBuffer/StringBuilder objects, or Iterator objects. These are allocated to accomplish a specific task, and are rarely needed afterwards. On a larger scale, applications that are transactional in nature also tend to create groups of objects that are used and discarded together. Once a reply to a database query has been returned, then the reply, the intermediate state, and the query itself are no longer needed.

This observation lead to the development of generational garbage collectors. The idea is to divide the heap up into different areas, and collect these areas at different rates. New objects are allocated out of one such area, called the nursery (or newspace). Since most objects in this area will become garbage quickly, collecting it offers the best chance to recover memory. Once an object has survived for a while, it is moved into a different area, called tenure (or oldspace). These objects are less likely to become garbage, so the collector examines them much less frequently. For the right sort of workload the result is collections that are faster and more efficient since less memory is examined, and a higher percentage of examined objects are reclaimed. Faster collections mean shorter pauses, and thus better application responsiveness.
IBM’s gencon policy (-Xgcpolicy:gencon) offers a generational GC (“gen-“) on top of the concurrent one described above (“-con”). The tenure space is collected as described above, while the nursery space uses a copying collector. This algorithm works by further subdividing the nursery area into allocate and survivor spaces . New objects are placed in allocate space until its free space has been exhausted. The application is then halted, and any live objects in allocate are copied into survivor. The two spaces then swap roles; that is, survivor becomes allocate, and the application is resumed. If an object has survived for a number of these copies, it is moved into the tenure area instead.

figure4

 

The region-based collector: balanced

A new garbage collection policy has been added in WebSphere Application Server V8. This policy, called balanced (-Xgcpolicy:balanced), expands on the notion of having different areas of the heap. It divides the heap into a large number of regions, which can be dealt with individually. Frankly I haven’t seen it used by any customer I worked with yet.

 

For more on WebSphere IBM JVM GC see – http://www.perfdaddy.com/2015/10/ibm-jvm-tuning-gencon-gc-policy.html, http://www.ibmsystemsmag.com/ibmi/administrator/websphere/Tuning-Garbage-Collection-With-IBM-Technology-for/, http://javaeesupportpatterns.blogspot.com/2012/03/ibm-jvm-tuning-gencon-gc-policy.html

Hope this helps

Javacore Dump Analysis using JCA – IBM Thread and Monitor Dump Analyzer for Java

In my previous blog posts I spent some time in illustrating tools for thread analysis for Oracle\Sun HotSpot JVM. However recently I actually had to analyze stack\javacore dumps from IBM WebSphere for hang condition and therefore had to research equivalent tools to analyze dumps from that JVM.

As we all know, during the run time of a Java process, some Java Virtual Machines (JVMs) may not respond predictably and oftentimes seem to hang up for a long time or until JVM shutdown occurs. It is not easy to determine the root cause of these sorts of problems.

By triggering a javacore when a Java process does not respond, it is possible to collect diagnostic information related to the JVM and a Java application captured at a particular point during execution. For example, the information can be about the operating system, the application environment, threads, native stack, locks, and memory. The exact contents are dependent on the platform on which the application is running.

On non IBM platforms, and in most cases, javacore is known as “javadump.” Check out my previous post of how to analyze dumps on Oracle Sun JVM via jstack utility. The code that creates javacore is part of the JVM. One can control it by using environment variables and run-time switches. By default, a javacore occurs when the JVM terminates unexpectedly. A javacore can also be triggered by sending specific signals to the JVM. Although javacore or javadump is present in Sun JVMs, much of the content of the javacore is added by IBM and, therefore, is present only in IBM JVMs.

This technology analyzes each thread information and provides diagnostic information, such as current thread information, the signal that caused the javacore, Java heap information (maximum Java heap size, initial Java heap size, garbage collector counter, allocation failure counter, free Java heap size, and allocated Java heap size), number of runnable threads, total number of threads, number of monitors locked, and deadlock information.

You can download tool from here – https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=2245aa39-fa5c-4475-b891-14c205f7333c.

Once the tool is downloaded, you can run jca.jar with the Java Run-time Environment

command_line_jca

This will open up the tool

jca_splash

Let’s now use File-Open and open a javacore\dump file

jca_fileopen

The tool will show a screen with a progress bar while it loads the javacore. Clicking on the javacore you just loaded in the Thread Dump List

jca_afterfileopen

As you can see you get huge amount of JVM settings details, as well as If the tool detected a deadlock then it will be displayed in the lower section of the Thread Dump List

jca_summary

and

dead

 

Selecting the Compare Monitors option from Analysis Menu will show deadlocked threads

jca_dead

Another useful screen is Thread Status Screen. Here you can see RUNNABLE vs. PARKED vs. BLOCKED threads and associated stacks. This screen can be very useful in resolving slow response and hang conditions

jca_thread_status

A complete explanation of all thread states for Java 7 can be found here – https://www.ibm.com/support/knowledgecenter/SSYKE2_7.0.0/com.ibm.java.aix.71.doc/diag/tools/javadump_tags_threads.html. Once you locate the RUNNABLE threads that are executing your application code, find out which method is being executed by following the stack trace. You may get  assistance from development team if needed. Also note the Thread ID.

The following Thread in the example below is in BLOCK state which typically means it is waiting to acquire a lock on an Object monitor. You will need to search in the earlier section and determine which Thread is holding the lock so you can pinpoint the root cause.

3XMTHREADINFO      "[STUCK] ExecuteThread: '162' for queue: 'weblogic.kernel.Default (self-tuning)'" J9VMThread:0x000000013ACF0800, j9thread_t:0x000000013AC88B20, java/lang/Thread:0x070000001F945798, state:B, prio=1

3XMTHREADINFO1            (native thread ID:0x1AD0F3, native priority:0x1, native policy:UNKNOWN)

3XMTHREADINFO3           Java callstack:

4XESTACKTRACE                at org/springframework/jms/connection/SingleConnectionFactory.createConnection(SingleConnectionFactory.java:207(Compiled Code))

4XESTACKTRACE                at org/springframework/jms/connection/SingleConnectionFactory.createQueueConnection(SingleConnectionFactory.java:222(Compiled Code))

4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate102.createConnection(JmsTemplate102.java:169(Compiled Code))

4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate.execute(JmsTemplate.java:418(Compiled Code))

4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate.send(JmsTemplate.java:475(Compiled Code))

4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate.send(JmsTemplate.java:467(Compiled Code))

…………………………………………………………………………………………………………

 

Hope this helps. For more see – https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=2245aa39-fa5c-4475-b891-14c205f7333c, https://www.ibm.com/developerworks/community/blogs/e8206aad-10e2-4c49-b00c-fee572815374/entry/Java_Core_Debugging_using_IBM_Thread_and_Monitor_Dump_Analyzer_for_Java?lang=en

Resolving Java Threading Issues with ThreadLogic Tool

The ThreadLogic utility is a free tool you can download to assist in analyzing thread dumps taken from a JVM. Threadlogic can digest the log file containing the thread
dump output. This utility does a fair amount of the initial analysis for you like finding locks, the holder of locks, and fatal conditions. If you’ve ever read a raw thread dump from a log file then you know it can be daunting – especially if you don’t exactly what you are looking for. Threadlogic helps by recognizing the type of each thread and categorizing them in the help you understand which threads are JVM threads, WLS threads, and then “application” threads. In addition, Threadlogic can process a series of thread dumps and perform “diff” operation between them. This is helpful in determining what threads are doing over a period of time.

ThreadLogic can be downloaded from – https://java.net/projects/threadlogic/downloads  Thread Logic Comes in a form of Jar File.We need to manually run the file using “java -jar threadLogic.jar”

image

Opening the dump tree and selecting the Advisory Map show a Map with information about the health of the system under investigation. Each of the advisory has a health level indicating severity of the issue found, pattern, name, keyword and related advice.
image

ThreadLogic is able to parse Sun, JRockit, and IBM thread dumps and provide advice based on predefined and externally defined patterns.

The health levels (in descending of severity) are FATAL (meant for Deadlocks, STUCK, Finalizer blocked etc), WARNING, WATCH (worth watching), NORMAL and IGNORE.

Based on the highest criticality of threads within a group, that health level gets promoted to the Thread Group’s health level and same is repeated at the thread dump level. There can be multiple advisories tagged to a Thread, Thread Group and Thread Dump. This is a typical advisory map I see:

image

The threads are associated with thread groups based on the functionality or thread names. Additional patterns exists to tag other threads (like iWay Adapter, SAP, Tibco threads) and group them:

image

For more on the tool see – https://blogs.oracle.com/emeapartnerweblogic/entry/my_first_experiences_with_threadlogic, https://blogs.oracle.com/ATeamExalogic/entry/introducing_threadlogic, https://zeroproductionincidents.wordpress.com/2012/07/10/threadlogic-another-thread-dump-analysis-tool/

Forecast Cloudy – Developing Simple Java Servlet Application for Google App Engine

appengine

I realize that this post is pretty basic, however I think it may be useful to someone starting to develop Web applications on Google App Engine. In my previous post I shown you  how you can use NetBeans to develop for App Engine, this time however I will use Eclipse as with real Google plug-in it’s a lot more slick, despite my preference for NetBeans as Java IDE. In general I am a server side vs. UI guy so my servlet will be very basic, but that I don’t think is the point here as servlets in general are quite easy to do and make more more functional.

So you decided to build Web App on App Engine?  First you need to install Google Plug-In for Eclipse for AppEngine. Installation instructions are available here. In Eclipse Go To Help Menu –>Install Software and add proper link as per table in Google doc: If you are using latest Eclipse (Luna) full instructions are here – https://developers.google.com/eclipse/docs/install-eclipse-4.4

After you install Plug-In you will see new icon on your toolbar in Eclipse:

appengine_eclipse2

Next I will create a Dynamic Web Project in Eclipse to house my Servlet. Go To File –> New –>Dynamic Web Project. Resulting dialog looks like this:

appengine_eclipse4

After you complete this step, the project-creation wizard will create a simple servlet-based application featuring a Hello World-type servlet.  After you set that up you should see your project structure:

image

Now on to Java Servlets.  Java servlet is a Java programming language program that extends the capabilities of a server. Although servlets can respond to any types of requests, they most commonly implement applications hosted on Web servers. Such Web servlets are the Java counterpart to other dynamic Web content technologies such as PHP and ASP.NET.

servlet is a Java class that implements the Servlet interface. This interface has three methods that define the servlet’s life cycle:

public void init(ServletConfig config) throws ServletException
This method is called once when the servlet is loaded into the servlet engine, before the servlet is asked to process its first request.

public void service(ServletRequest request, ServletResponse response) throws ServletException, IOException
This method is called to process a request. It can be called zero, one or many times until the servlet is unloaded. Multiple threads (one per request) can execute this method in parallel so it must be thread safe.

public void destroy()
This method is called once just before the servlet is unloaded and taken out of service.

The init method has a ServletConfig attribute. The servlet can read its initialization arguments through the ServletConfig object. How the initialization arguments are set is servlet engine dependent but they are usually defined in a configuration file.

The doGet method has two interesting parameters: HttpServletRequest and HttpServletResponse. These two objects give you full access to all information about the request and let you control the output sent to the client as the response to the request.

My Servlet as stated is pretty basic:

package com.example.myproject;

import java.io.IOException;
import javax.servlet.http.*;

@SuppressWarnings("serial")
public class HelloAppEngineServletServlet extends HttpServlet {
	public void doGet(HttpServletRequest req, HttpServletResponse resp)
			throws IOException {
		resp.setContentType("text/plain");
		resp.getWriter().println("Hello, AppEngine from GennadyK");
	}
}

The servlet gets mapped under the URI /simpleservletapp in the web.xml, as you can see below:

image

The project-creation wizard also provides an index.html file that has a link to the new servlet.

Before I deploy this to App Engine I probably want to debug it locally. Lets find Debug button on toolbar

Hit dropdown arrow and you can setup local server for debugging. If your application uses App Engine but not GWT, the only indication that the App Engine development server is running will be output in the Console view. App Engine-only launches will not appear in the development mode view. The console output includes the URL of the server, which by default is http://localhost:8888/. You can change the port number via Eclipse’s launch configuration dialog by selecting your Web Application launch and editing the Port value on the Main tab.

image

Simple result here:

image

Now lets upload this “application” to App Engine. Since I already configured development server I used Google Engine WPT Deploy. Right click on the project and you can pick that option:

Deploy

I can now watch my status in the status bar:

image

And read logged messages in Eclipse Console:

May 29, 2015 7:34:53 PM java.util.prefs.WindowsPreferences 
WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
Reading application configuration data...
May 29, 2015 7:34:54 PM com.google.apphosting.utils.config.AppEngineWebXmlReader readAppEngineWebXml
INFO: Successfully processed C:/Users/gennadyk/workspace/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/helloAppEngineServlet\WEB-INF/appengine-web.xml
May 29, 2015 7:34:54 PM com.google.apphosting.utils.config.AbstractConfigXmlReader readConfigXml
INFO: Successfully processed C:/Users/gennadyk/workspace/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/helloAppEngineServlet\WEB-INF/web.xml


Beginning interaction for module default...
0% Created staging directory at: 'C:\Users\gennadyk\AppData\Local\Temp\appcfg8550091122421213739.tmp'
5% Scanning for jsp files.
20% Scanning files on local disk.
25% Initiating update.
28% Cloning 3 static files.
31% Cloning 13 application files.
40% Uploading 4 files.
52% Uploaded 1 files.
61% Uploaded 2 files.
68% Uploaded 3 files.
73% Uploaded 4 files.
77% Initializing precompilation...
80% Sending batch containing 4 file(s) totaling 5KB.
90% Deploying new version.
95% Closing update: new version is ready to start serving.
98% Uploading index definitions.

Update for module default completed successfully.
Success.
Cleaning up temporary files for module default

Now let me open my browser and go to App Engine Console at https://appengine.google.com/.

image

Something I learned along the way – At this time you cannot run Java 8 on GAE. My IT department is a stickler for latest Java installed on my work laptop. Therefore I am running Java 8 SDK and JRE. However when I first build this application in Java 8 and deployed to App Engine trying to run it I got http 500 result. Looking at the logs in dashboard I can see following error:

2015-05-29 16:49:22.442  
Uncaught exception from servlet
java.lang.UnsupportedClassVersionError: com/example/myproject/HelloAppEngineServletServlet : Unsupported major.minor version 52.0
	at com.google.appengine.runtime.Request.process-ee32bdc226f79da9(Request.java)
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:817)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
	at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:375)
	at org.mortbay.util.Loader.loadClass(Loader.java:91)
	at org.mortbay.util.Loader.loadClass(Loader.java:71)
	at org.mortbay.jetty.servlet.Holder.doStart(Holder.java:73)
	at org.mortbay.jetty.servlet.ServletHolder.doStart(ServletHolder.java:242)
	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
	at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:685)
	at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:437)
	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:444)
	at com.google.tracing.CurrentContext.runInContext(CurrentContext.java:230)
	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:308)
	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:300)
	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:441)
	at java.lang.Thread.run(Thread.java:745)

Did a bit of digging and here is what I found out.

I believe these are the current version numbers:

J2SE 7 = 51,  //Note this one

J2SE 6.0 = 50,

J2SE 5.0 = 49,

JDK 1.4 = 48,

JDK 1.3 = 47,

JDK 1.2 = 46,

JDK 1.1 = 45

51.0 appears to be Java 7, which would mean in my case I am using 8 as its seeing 52. So issue here is that I compiled on higher version JDK (52 = Java 8) and then execute it on lower JRE version (GAE uses java 7). Repointing PATH to Java 7 did the trick, moreover as it was suggested elsewhere on the web I just created a batch file for this purpose as

SET JAVA_HOME="C:\Program Files\Java\jdk1.7.0_55"
SET PATH="%JAVA_HOME%\bin"
START eclipse.exe

For more information see – https://cloud.google.com/appengine/docs/java/tools/eclipse, https://developers.google.com/eclipse/docs/appengine_deploy, https://cloud.google.com/appengine/docs/java/tools/maven, https://cloud.google.com/appengine/docs/java/webtoolsplatform

Forecast Cloudy -Developing for Google App Engine in NetBeans IDE

appengine

Although I am fairly Microsoft centric and Azure centric in my interests I have decided to give a quick try to a competing alternative using language that I am familiar with – Java and IDE that I know fairly well for it – NetBeans. I wanted to try something quick and easy, but my first task was making sure that NetBeans actually can deploy code to AppEngine.

Good news there is a plug-in for Google AppEngine available from this Kenai Project – https://kenai.com/projects/nbappengine/pages/Home.

First things first. Navigate to developers.google.com  and sign up to enable your Google Account for use with Google App Engine.  Create a project, make sure you write down Project ID and name. Download and unzip the Google App Engine SDK to a folder on your hard drive. Obviously here I am using Java SDK.

Now, I will assume that you have NetBeans already installed. If not it can be installed from here – https://netbeans.org/ . It is my favorite IDE when it comes to non-Microsoft or Java IDEs at this time.

Next, you will need to install plug-in. Go to https://kenai.com/projects/nbappengine/pages/Home and follow these instructions:

  • In NetBeans click Tools –> Plugins
  • Go To Settings Tab
  • Click Add Button
  • Type “App Engine” (without the quotes) into the Name field
  • If using NetBeans 6.9 and above paste http://kenai.com/downloads/nbappengine/NetBeans69/updates.xml into the URL field
  • Click the OK button
  • Click on Available Plugins
  • Select all Google App Engine plugins
  • If you’re using NetBeans 6.9 you must also install the  Java Web Applications plugin for NetBeans.

After you are done you should see plugins in your installed tab like this:

image

To install the Google App Engine service in NetBeans, follow these instructions:

  • Start NetBeans
  • Click on the Services tab next to Projects and Files
  • Right-click on Servers and click Add
  • Select Google App Engine and Click Next
  • Select the location you unzipped the Google App Engine SDK and Click Next
  • Unless you have another service running on port 8080 and port 8765 leave the default port values
  • Finally Click Finish

After you are done you should see Google AppEngine in your Servers”

image

Next I created really easy small JSP file that I want to deploy. First thing lets take a look at appengine-web.xml settings file. As per Google docs –“ An App Engine Java app must have a file named appengine-web.xml in its WAR, in the directory WEB-INF/. This is an XML file whose root element is .” My config looks like this:

image

image

Once you are ready to deploy just go to project and right click and pick Deploy to Google App Engine option. It will prompt you for user name and password. That’s where I ran into first issue with error that my password and email don’t match detailed here – https://groups.google.com/forum/#!topic/opendatakit/5hpDkvrd_WQ. After working through this and another issue noted here with JSPs and AppEngine deploy – https://code.google.com/p/googleappengine/issues/detail?id=1226#makechanges I finally deployed and ran my little application generating traffic I can see on console:

image

Frankly the experience for me wasn’t as seamless as I would like (perhaps I am spoiled a bit by Visual Studio Azure SDK integration for Azure PaaS), I may have had an easier time with official Google plugin and Eclipse, but as stated I like NetBeans more.

For more see – http://rocky.developerblogs.com/tutorials/getting-started-google-app-engine-netbeans/, https://techtavern.wordpress.com/2014/05/13/google-app-engine-on-netbeans-8/, https://cloud.google.com/appengine/docs/java/gettingstarted/introduction

Well this was different and going “back to regularly scheduled programming”