My Great Guardian – Watching Redis With Sentinel

 

redis

 

Redis Sentinel provides high availability for Redis. If you ever ran SQL Server mirroring or Oracle Golden Gate the concept should be somewhat familiar to you. To start you need to have Redis replication configured with master and N number slaves. From there, you have Sentinel daemons running, be it on your application servers or on the servers Redis is running on. These keep track of the master’s health.

Redis Sentinel provides high availability for Redis. If you ever ran SQL Server mirroring or Oracle Golden Gate the concept should be somewhat familiar to you. To start you need to have Redis replication configured with master and N number slaves. From there, you have Sentinel daemons running, be it on your application servers or on the servers Redis is running on. These keep track of the master’s health.

redis_sent

 

How does the failover work? Sentinel actually failover by rewriting configuration (conf) files for Redis instances that are running, I already mentioned SLAVEOF command before – https://gennadny.wordpress.com/2015/01/06/meet-redis-masters-slaves-and-scaling-out/, so by rewriting this command failover is achieved

Say we have a master “A” replicating to slaves “B” and “C”. We have three Sentinels (s1, s2, s3) running on our application servers, which write to Redis. At this point “A”, our current master, goes offline. Our sentinels all see “A” as offline, and send SDOWN messages to each other. Then they all agree that “A” is down, so “A” is set to be in ODOWN status. From here, an election happens to see who is most ahead, and in this case “B” is chosen as the new master.

The config file for “B” is set so that it is no longer the slave of anyone. Meanwhile, the config file for “C” is rewritten so that it is no longer the slave of “A” but rather “B.” From here, everything continues on as normal. Should “A” come back online, the Sentinels will recognize this, and rewrite the configuration file for “A” to be the slave of “B,” since “B” is the current master.

The current version of Sentinel is called Sentinel 2. It is a rewrite of the initial Sentinel implementation using stronger and simpler to predict algorithms (that are explained in this documentation).

A stable release of Redis Sentinel is shipped since Redis 2.8. Redis Sentinel version 1, shipped with Redis 2.6, is deprecated and should not be used.

When configuring Sentinel you need to take time and decide where you want to run Sentinel processes. Many folks recommend running those on your application servers. Presumably if you’re setting this up, you’re concerned about write availability to your master. As such, Sentinels provide insight to whether or not your application server can talk to the master. However a lot of folks decide to run Sentinel processes in their Redis instance servers amd that makes sense as well.

If you are using the redis-sentinel executable (or if you have a symbolic link with that name to the redis-server executable) you can run Sentinel with the following command line:

redis-sentinel /path/to/sentinel.conf

Otherwise you can use directly the redis-server executable starting it in Sentinel mode:

redis-server /path/to/sentinel.conf --sentinel

You have to use configuration file when running Sentinel (sentinel.conf) which is separate from Redis configuration file (redis.conf) and this file this file will be used by the system in order to save the current state that will be reloaded in case of restarts. Sentinel will simply refuse to start if no configuration file is given or if the configuration file path is not writable.

By default , Sentinel listens on TCP port 26379, so for Sentinels to work, port 26379 of your servers must be open to receive connections from the IP addresses of the other Sentinel instances. Otherwise Sentinels can’t talk and can’t agree about what to do, so failover will never be performed.

Redis-Sentinel

 

Some important items to remember on Sentinel

1. You need at least three Sentinel instances for a robust deployment.

2. As per Redis docs, three Sentinel instances should be placed into computers or virtual machines that are believed to fail in an independent way. So for example different physical servers or Virtual Machines executed on different availability zones or application fault domains

3. Sentinel + Redis distributed system does not guarantee that acknowledged writes are retained during failures, since Redis uses asynchronous replication. However there are ways to deploy Sentinel that make the window to lose writes limited to certain moments, while there are other less secure ways to deploy it.

4. You need Sentinel support in your clients. Popular client libraries have Sentinel support, but not all.

5. Test your setup so you know it works. Otherwise you cannot be sure in its performance

Basically. Initial setup expects all nodes running as a master with replication on, with manual set slaveof ip port in redis-cli on futire redis slaves. Then run sentinel and it does the rest.

Minimal redis.conf configuration file looks like this

daemonize yes
pidfile /usr/local/var/run/redis-master.pid
port 6379
bind 10.0.0.1
timeout 0
loglevel notice
logfile /opt/redis/redis.log
databases 1
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename master.rdb
 
dir /usr/local/var/db/redis/
slave-serve-stale-data yes
slave-read-only no
slave-priority 100
maxclients 2048
maxmemory 256mb
 
# act as binary log with transactions
appendonly yes
 
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
 
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60

Minimal sentinel.conf configuration file looks like this

port 17700
daemonize yes
logfile "/opt/redis/sentinel.log"
 
sentinel monitor master 10.0.0.1 6379 2
sentinel down-after-milliseconds master 4000
sentinel failover-timeout master 180000
sentinel parallel-syncs master 4

Start all of your redis nodes with redis config and choose master. Then run redis console and set all other nodes as a slave of given master, using command slaveof <ip address 6379>

Start all of your redis nodes with redis config and choose master. Then run redis console and set all other nodes as a slave of given master, using command slaveof <ip address 6379>. Then you can connect to your master and verify, if there are all of your slave nodes, connected and syncing – run info command in your master redis console. Output should show you something like this

role:master
connected_slaves:3
slave0:ip=10.0.0.2,port=6379,state=online,offset=17367254333,lag=1
slave1:ip=10.0.0.3,port=6379,state=online,offset=17367242971,lag=1
slave2:ip=10.0.0.4,port=6379,state=online,offset=17367222381,lag=1

To test, if your sentinel works, just shutdown your redis master and watch sentinel log. You should see something like this

[17240] 04 Dec 07:56:16.289 # +sdown master master 10.24.37.144 6379
[17240] 04 Dec 07:56:16.551 # +new-epoch 1386165365
[17240] 04 Dec 07:56:16.551 # +vote-for-leader 185301a20bdfdf1d5316f95bae0fe1eb544edc58 1386165365
[17240] 04 Dec 07:56:17.442 # +odown master master 10.0.0.1 6379 #quorum 4/2
[17240] 04 Dec 07:56:18.489 # +switch-master master 10.0.0.1 6379 10.0.0.2 6379
[17240] 04 Dec 07:56:18.489 * +slave slave 10.0.0.3:6379 10.0.0.3 6379 @ master 10.0.0.2 6379
[17240] 04 Dec 07:56:18.490 * +slave slave 10.0.0.4:6379 10.0.0.4 6379 @ master 10.0.0.2 6379
[17240] 04 Dec 07:56:28.680 * +convert-to-slave slave 10.0.0.1:6379 10.0.0.1 6379 @ master 10.0.0.2 6379

What is also important to note that latest builds on MSOpenStack Redis for Windows have implemented Sentinel as well. As per http://grokbase.com/t/gg/redis-db/147ezmad89/installing-redis-sentinel-as-windows-service , You could use the following command line to install a sentinel
instance as a service:

redis-server --service-install --service-name Sentinel1
sentinel.1.conf --sentinel*

In this case the arguments passed to the service instance will be “*sentinel.1.conf
–sentinel*”.

Make sure of following

1. The configuration file must be the last parameter of the command line. If another parameter was last, such as –service-name, it would run fine when invoked the command line but would consistently fail went started as a service.

2. Since the service installs a Network Service by default, ensure that it has access to the directory where the log file will be written.

For more on Sentinel see official Redis docs – http://redis.io/topics/sentinel, https://discuss.pivotal.io/hc/en-us/articles/205309388-How-to-setup-HAProxy-and-Redis-Sentinel-for-automatic-failover-between-Redis-Master-and-Slave-servers, http://opentodo.net/2014/05/setup-redis-failover-with-redis-sentinel/, http://tech.3scale.net/2014/06/18/redis-sentinel-failover-no-downtime-the-hard-way/ ,https://seanmcgary.com/posts/how-to-build-a-fault-tolerant-redis-cluster-with-sentinel

Meet Memcached in the Clouds – Setting Up Memcached as a Service via Amazon Elastic Cache

aws

 

In my previous post I introduced you to Memcached, in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering. This continued my interest in in-memory NoSQL cache systems like AppFabric Cache and Redis.

Both Redis and Memcached are offered by AWS as a cloud PaaS service called ElastiCace. With ElastiCache, you can quickly deploy your cache environment, without having to provision hardware or install software.You can choose from Memcached or Redis protocol-compliant cache engine software, and let ElastiCache perform software upgrades and patch management for you automatically. For enhanced security, ElastiCache runs in the Amazon Virtual Private Cloud (Amazon VPC) environment, giving you complete control over network access to your cache cluster.With just a few clicks in the AWS Management Console, you can add resources to your ElastiCache environment, such as additional nodes or read replicas, to meet your business needs and application requirements.

Existing applications that use Memcached or Redis can use ElastiCache with almost no modification; your applications simply need to know the host names and port numbers of the ElastiCache nodes that you have deployed.The ElastiCache Auto Discovery feature lets your applications identify all of the nodes in a cache cluster and connect to them, rather than having to maintain a list of available host names and port numbers; in this way, your applications are effectively insulated from changes to cache node membership.

Before I show you how to setup AWS ElastiCache cluster lets go through basics:

Data Model

The Amazon ElastiCache data model concepts include cache nodes, cache clusters, security configuration, and replication groups. The ElastiCache data model also includes resources for event notification and performance monitoring; these resources complement the core concepts.

Cache Nodes and Cluster

A cache node is the smallest building block of an ElastiCache deployment. Each node has its own memory, storage and processor resources, and runs a dedicated instance of cache engine software — either Memcached or Redis. ElastiCache provides a number of different cache node configurations for you to choose from, depending on your needs.You can use these cache nodes on an on-demand basis, or take advantage of reserved cache nodes at significant cost savings.

A cache cluster is a collection of one or more cache nodes, each of which runs its own instance of supported cache engine software.You can launch a new cache cluster with a single ElastiCache operation (CreateCacheCluster), specifying the number of cache nodes you want and the runtime parameters for the cache engine software on all of the nodes. Each node in a cache cluster has the same compute, storage and memory specifications, and they all run the same cache engine software (Memcached or Redis).The ElastiCache API lets you control cluster-wide attributes, such as the number of cache nodes, security settings, version upgrades, and system maintenance windows.

Cache parameter groups are an easy way to manage runtime settings for supported cache engine software. Memcached has many parameters to control memory usage, cache eviction policies, item sizes, and more; a cache parameter group is a named collection of Memcached specific parameters that you can apply to a cache cluster

. Memcached clusters contain from 1 to 20 nodes across which you can horizontally partition your data

ElastiCache-Clusters

To create cluster via ElastiCache console follow these steps:

1. Open  the Amazon ElastiCache console at https://console.aws.amazon.com/elasticache/.

2. Pick Memcached from Dashboard on the left

3. Choose Create

4. Complete Settings Section

 

memcached2

As you enter setting please note following:

1. In the Name enter desired cluster name. Remember, it must begin with the letter and can contain 1 to 20 alphanumeric characters, however cannot have two consecutive hyphens nor end with the hyphen

2. In Port you can accept default at 11211. If you have a reason to use a different port, type the port number.

3. For Parameter group, choose the default parameter group, choose the parameter group you want to use with this cluster, or choose Create new to create a new parameter group to use with this cluster.

4. For Number of nodes, choose the number of nodes you want for this cluster. You will partition your data across the cluster’s nodes.If you need to change the number of nodes later, scaling horizontally is quite easy with Memcached

5. Choose how you want the Availability zone(s) selected for this cluster. You have two options

  1. No Preference. ElastiCache selects availability zone for each node in your cluster
  2. Specify availability zones. Specify availability zone for each node in your cluster.

6. For Security groups, choose the security groups you want to apply to this cluster.

7. The Maintenance window is the time, generally an hour in length, each week when ElastiCache schedules system maintenance for your cluster. You can allow ElastiCache choose the day and time for your maintenance window (No preference), or you can choose the day, time, and duration yourself

8. Now check all of the settings and pick Create

memcached3

More information on Memcached specific parameters you can set up on your ElastiCache cluster see here – http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/ParameterGroups.Memcached.html

For clusters running the Memcached engine, ElastiCache supports Auto Discovery—the ability for client programs to automatically identify all of the nodes in a cache cluster, and to initiate and maintain connections to all of these nodes.From the application’s point of view, connecting to the cluster configuration endpoint is no different from connecting directly to an individual cache node

Process of Connecting to Cache Nodes

1. The application resolves the configuration endpoint’s DNS name. Because the configuration endpoint maintains CNAME entries for all of the cache nodes, the DNS name resolves to one of the nodes; the client can then connect to that node.

2. The client requests the configuration information for all of the other nodes. Since each node maintains configuration information for all of the nodes in the cluster, any node can pass configuration information to the client upon request.

3. The client receives the current list of cache node hostnames and IP addresses. It can then connect to all of the other nodes in the cluster.

The configuration information for Auto Discovery is stored redundantly in each cache cluster node. Client applications can query any cache node and obtain the configuration information for all of the nodes in the cluster.

For more information see – http://cloudacademy.com/blog/amazon-elasticache/, http://www.allthingsdistributed.com/2011/08/amazon-elasticache.html, https://www.sitepoint.com/amazon-elasticache-cache-on-steroids/

Hope this helps.

Let Me Count The Ways – Various methods of generating stack dump for JVM in production

As I profiled previously thread dumps in Java are essential in diagnosing production issues with high CPU, locking, threading deadlocks, etc. There are great online thread dump analysis tools such as http://fastthread.io/ that can analyze and spot problems. But to those tools you need provide proper thread dumps as input. I already blogged about many tools to do so in the past like jstack, JvisualVM and Java Mission Control. Here I will try to summarize all of the ways to capture usable thread dumps in production Java application:

  • JStack

JStack remains one of the most common ways to capture thread dumps. It’s a command ike utility bundled in JDK. The Jstack tool is shipped in JDK_HOME\bin folder. Here is the command that you need to issue to capture thread dump:

jstack -l   > 

Where

pid: is the Process Id of the application, whose thread dump should be captured

file-path: is the file path where thread dump will be written in to.

Example here:

jstack -l 37321 > /opt/tmp/threadDump.txt

As per the example thread dump of the process would be generated in /opt/tmp/threadDump.txt file.

    • Kill –3

 

In many customers only JREs are installed in production machines. Since jstack and other tools are only part of JDK, you wouldn’t be able to use jstack. In such circumstances, ‘kill -3’ option can be used.

kill -3 

Where:

pid: is the Process Id of the application, whose thread dump should be captured

Example:</P?

 Kill -3 37321

When ‘kill -3’ option is used thread dump is sent to standard error stream. Fpr example in apps running under Tomcat it will be <TOMCAT_HOME>/logs/catalina.out file. VisualVM Java VisualVM is a graphical user interface tool that provides detailed information about the applications while they are running on a specified Java Virtual Machine (JVM). It’s located in JDK_HOME\bin\jvisualvm.exe. It’s part of Sun\Oracle JDK distribution since JDK 6 update 7.s Launch the jvisualvm. On the left panel, you will notice all the java applications that are running on your machine. You need to select your application from the list (see the red color highlight in the below diagram). This tool also has the capability to capture thread dumps from the java processes that are running in remote host as well. vjvm In order to generate thread dump, go to Threads Tab and click on Thread Dump button.

    •   Java Mission Control

 

Java Mission Control (JMC) is a tool that collects and analyze data from Java applications running locally or deployed in production environments. This tool has been packaged into JDK since Oracle JDK 7 Update 40. This tool also provides an option to take thread dumps from the JVM. JMC tool is present in JDK_HOME\bin\jmc.exe Once you launch the tool, you will see all the Java processes that are running on your local hostAs you use Flight Recorder feature on one of these processes , in the “Thread Dump” field, you can select the interval in which you want to capture thread dump. jmc

    • ThreadMXBean

 

Introduced in JDK 1.5, ThreadMXBean is a management interface for thread system in JVM and allows you to create thread dump in few lines of code in application like below:

Example

public void  dumpThreadDump() {



        ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean();



        for (ThreadInfo ti : threadMxBean.dumpAllThreads(true, true)) {



            System.out.print(ti.toString());

threadmxbean

 

  • JCMD

The jcmd tool was introduced with Oracle’s Java 7. It’s useful in troubleshooting issues with JVM applications. It has various capabilities such as identifying java process Ids, acquiring heap dumps, acquiring thread dumps, acquiring garbage collection statistics, ….

Using the below JCMD command you can generate thread dump:

jcmd  Thread.print > 

where

pid: is the Process Id of the application, whose thread dump should be captured

file-path: is the file path where thread dump will be written in to.

Example

jcmd 37321 Thread.print > /opt/tmp/threadDump.txt

For more see – https://docs.oracle.com/javase/7/docs/technotes/tools/windows/jcmd.html, https://blogs.oracle.com/jmxetc/entry/threadmxbean_a_singleton_mxbean_for , http://www.javamex.com/tutorials/profiling/profiling_java5_threads_howto.shtml , http://blog.takipi.com/oracle-java-mission-control-the-ultimate-guide/ , https://www.prosysopc.com/blog/using-java-mission-control-for-performance-monitoring/

Semper Secura–Introduction To SQL Server Always Encrypted

With the introduction of SQL Server 2016 we now have a new way to encrypt columns called Always Encrypted. What makes it different from older encryption features in SQL Server like Column Encryption and Transparent Data Encryption is that it doesn’t just encrypt data at rest or while being transmitted on the wire, but only feature that ensures that the database never sees unencrypted values of sensitive columns, as data gets encrypted on the client tier via SQL Client\database driver. While the database only sees encrypted values, the application code works exclusively with unencrypted data. When a query is executed, the driver automatically looks up the master key in the Windows Certificate Store (or other OS-dependent location). The master key is then used to decrypt a column specific key, which in turn is used for encrypting and decrypting fields and parameters.

pic2

Microsoft advertises following use cases for Always Encrypted feature:

· Client and Data On-Premises

A customer has a client application and SQL Server both running on-premises, at their business location. The customer wants to hire an external vendor to administer SQL Server. In order to protect sensitive data stored in SQL Server, the customer uses Always Encrypted to ensure the separation of duties between database administrators and application administrators. The customer stores plaintext values of Always Encrypted keys in a trusted key store which the client application can access. SQL Server administrators have no access to the keys and, therefore, are unable to decrypt sensitive data stored in SQL Server.

· Client On-Premises with Data in Azure

A customer has an on-premises client application at their business location. The application operates on sensitive data stored in a database hosted in Azure (for example in SQL Server running in a virtual machine on Microsoft Azure). The customer uses Always Encrypted and stores Always Encrypted keys in a trusted key store hosted on-premises, to ensure Microsoft cloud administrators have no access to sensitive data.

· Client and Data in Azure

A customer has a client application, hosted in Microsoft Azure (e.g. in a worker role or a web role), which operates on sensitive data stored also stored in Microsoft Azure. The customer uses Always Encrypted to reduce security attack surface area (the data is always encrypted in the database and on the machine hosting the database).

pic1

 

Types of Always Encrypted encryption:

SQL Server offers two encryption modes: deterministic and random.

· Deterministic encryption ensures that a given value always has the same encrypted representation. This allows you to use the column for equality comparisons, joins, and grouping.

· For more security, you can use random encryption. This prevents guessing by ensuring that a given value’s encrypted representation is never the same twice.

You should use deterministic encryption for columns that will be used as search or grouping parameters, for example a government ID number. Use randomized encryption, for data such as confidential investigation comments, which are not grouped with other records, or used to join tables.

Limitations.

If a column is encrypted, then all range-like operations such as greater/less than, pattern matching using LIKE, etc. are disallowed. Furthermore, you can’t pass encrypted values to functions, user-defined or otherwise, because the database doesn’t have access to the unencrypted values.

Equality comparisons can only be performed on columns that use deterministic encryption.

Indexes can only be applied to deterministically encrypted columns.

If joining between two columns, both columns need to use the same column encryption key.

Constants expressions that refer to encrypted columns are not allowed. For example, you cannot write WHERE SSN = ‘111-11-1111’, but you can write WHERE SSN = @SSN. This is necessary because the driver works with the SqlParameter class to handle encryption requirements.

Unsupported data types include: xml, rowversion, image, ntext, text, sql_variant, hierarchyid, geography, geometry, and user defined-types.

Current drivers that support these feature are listed below:

pic3

 

Here is a quick tutorial on how to set up Always Encrypted

· Open you SQL Server Management Studio and connect to your SQL Server 2016 instance or SQL Azure. As far as on premise SQL Server you can use AlwaysEncrypted feature in Enterprise or Developer Editions

pic4

· First thing we will create is Column Master Key. You can do so either in script or via nice wizard in SSMS. Below is my script to do so , but of course based on your certificate , machine, etc. your script will be different

CREATE DATABASE AEDemo;

GO

USE [AEDemo]
/****** Object:  ColumnMasterKey [test]    Script Date: 9/15/2016 6:44:15 PM ******/
CREATE COLUMN MASTER KEY [test]
WITH
(
	KEY_STORE_PROVIDER_NAME = N'MSSQL_CERTIFICATE_STORE',
	KEY_PATH = N'LocalMachine/My/A9B9AFE1993FA0FAD052F49195FDF10DE5A953DB'
)
GO

•    Next thing is to create Column Encryption Key. Again your actual script will vary from mine below or you can use nice SSMS wizard to do so

CREATE COLUMN ENCRYPTION KEY [test]
WITH VALUES
(
	COLUMN_MASTER_KEY = [test],
	ALGORITHM = 'RSA_OAEP',
	ENCRYPTED_VALUE = 0x01700000016C006F00630061006C006D0061006300680069006E0065002F006D0079002F0061003900620039006100660065003100390039003300660061003000660061006400300035003200660034003900310039003500660064006600310030006400650035006100390035003300640062003B237BB1F1C8A7ADBB0BEA972AB6FC321B7BAB43C6EEEF20CC4BBAC52BE1582868C7E9BF92B01D7022AF9838D625ED5F1CE2D35975BFF600F96A5A5A07EDFA946DBDE08DE896A632B06BAFC5A861DC8298E36E408BE5EBAAEB94C674C04765D0138F30B6B0E676294B5D3FF5374434273B823550D3A89D6337BEEE1D93FBDAB50025ED7A91BE2F2C3C032A636D0049F5614EC248097BD0B12F79FF3A51DC242987D931A2473EAB98BCB7217CFF61B447E435F5FE19BB5DCAF1B7C8D8FA606FE5354EB9773A78C1F1EEFCC4D3D401CB3C602EBB3C197A2421637304D94BC58F129F20A5685A700C6BEDF8D5080F959B57B3F65721DDEF78BBDA035923D3A93D3380E52465663A72916CE4DCF0D904C27DB8298D6F44AACC95998EA7CC1F895BBD55553523A5558778252A5E4AE2DED30D6DE04DB24ACE18771D0B8C27F06A228F67F950C10E0DA035D12934DA5DB5D65E947789EBA86A5375C71DC386FB854FCE702D642CB5ED695B96E864C3F9CC34E52D87178B2DF75B6AAE3A996B2F6BC9EB99445F6311142E8C3F85A545EE0614AFCAB02A547664B31E546FFAA7667D86D7DF239CB4EAB66B847EBD78BDD01707C7C7DE04EE8F52F5F7D722A71BC8E5015F9EFB2A0DBDEB9B732C163D8C32F9DA00AABCDBE68067715CB1C56A385B0EA908A51EC51F8290D701689E112C646A10462636766066F703D82844CA7237F721EF
)
GO

· Now that keys are created lets create table that uses these keys to encrypt columns

CREATE TABLE dbo.EncryptedTable
(
  ID INT IDENTITY(1,1) PRIMARY KEY,
  
  LastName NVARCHAR(32) COLLATE Latin1_General_BIN2 
    ENCRYPTED WITH 
    (
       ENCRYPTION_TYPE = DETERMINISTIC, 
       ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256', 
       COLUMN_ENCRYPTION_KEY = test2
    ) NOT NULL,
    
  Salary INT 
    ENCRYPTED WITH 
    (
       ENCRYPTION_TYPE = RANDOMIZED, 
       ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256', 
       COLUMN_ENCRYPTION_KEY = test2
    ) NOT NULL
);
GO

So essentially we followed this workflow:

pic6

Well , we can try to insert some data into a table

 

DECLARE @LastName NVARCHAR(32) = N'Gennady', @Salary INT = 720000;
INSERT dbo.EncryptedTable(LastName,Salary) SELECT @LastName, @Salary;
go

This  will actually error out, but why? Since encryption and decryption is occurring on driver level this will be way you populate these encrypted columns from client side code. You don’t have to make any changes to your data access layer of your application except addition of following to your connection string:]

Column Encryption Setting=Enabled

Together with rolling out your Column Master Key to the client it should make it easy for your application to work with AlwaysEncrypte

pic8

Great details on Always Encrypted can be found here –

BOL – https://msdn.microsoft.com/en-us/library/mt163865.aspx

Channel 9 – https://channel9.msdn.com/Shows/Data-Exposed/Getting-Started-with-Always-Encrypted-with-SSMS?ocid=relatedentry

As every security and encryption feature there is performance overhead to enabling Always Encrypted, Aaron Bertrand has attempted some benchmarking that is shown here – http://sqlperformance.com/2015/08/sql-server-2016/always-encrypted-performance-follow-up

Meet Memcached– Introduction To Another In-Memory Caching System

Memcached_svg

After writing in this blog on Microsoft AppFabric Cache and Redis,  I will be following up with Memcached. Similarly to above mentioned systems, Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering. Just like them its nded for use in speeding up dynamic web applications by alleviating database load.

Memcached was originally developed by Danga Interactive for LiveJournal, but is now used by many other systems, including MocoSpace,YouTube,Reddit, Facebook, Tumblr and Wikipedia.Engine Yard and Jelastic are using Memcached as part of their platform as a service technology stack and Heroku offers several Memcached services as part of their platform as a service. Google App Engine, AppScale, Microsoft Azure and Amazon Web Services also offer a Memcached service through an API.

So Why Memcached?

  • Free & Open source
  • High performance
  • Simple to set up
  • Ease of development
  • APIs are available for most popular languages

Important to note what Memcached is NOT:

  • a persistent data store
  • a database
  • application-specific
  • a large object cache
  • fault-tolerant or
  • highly available

 

Memcached’ primary storage algorithm is a hash table.

Hash table is basically an power of 2 sized array of pointers to entries. Collisions in the hash table are resolved via separate chaining with linked chains of entries: each entry consists of the pointer to the key, a pointer to the value and the pointer to the next chained entry. ash table size is chosen simpy as the ceiling power of 2, closest to the doubled number of entries the hash table need to contain. This means effective load factor varies from 0.5 to 1.0 with average of 0.75. In that order, the same load factor management strategy is implemented in glibc’s unordered_map.

image

Memcached supports multithreaded access to the store. It controls access to the resources via bare POSIX thread mutexes. Operations with hash table buckets are guarded with one of the pthread_mutex objects in the power-of-two sized array. Size of this array couldn’t be smaller than hash table size. Index of the mutex for the bucket is determined as bucket index % bucket mutex array size. I. e. each mutex is responsible for hash table size / bucket mutex array size buckets.

Installing Memcached on Ubuntru – really easy.

To install Memcached on Ubuntu, go to terminal and type the following commands −

 

$sudo apt-get update
$sudo apt-get install memcached

Once installed Memcached should be running on default port -11211.  To check if Memcached is presently running or not, run the command given below

$ps aux | grep memcached

Memcached is originally a Linux application, but since it is open-source, it has been compiled for windows. There are two major sources for the pre-built windows binary: Jellycan and Northscale, and both versions can be used. The following are the download links for the memcached windows binaries:

http://code.jellycan.com/files/memcached-1.2.5-win32-bin.zip

http://code.jellycan.com/files/memcached-1.2.6-win32-bin.zip

http://downloads.northscale.com/memcached-win32-1.4.4-14.zip

http://downloads.northscale.com/memcached-win64-1.4.4-14.zip

http://downloads.northscale.com/memcached-1.4.5-x86.zip

http://downloads.northscale.com/memcached-1.4.5-amd64.zip

Full tutorial to install Memcached on Windows available here – https://commaster.net/content/installing-memcached-windows

To connect to a Memcached server, you need to use telnet command providing HOST name and PORT number.

Using basic TelNet

$telnet HOST PORT

Here, HOST and PORT are machine IP and port number respectively, on which the Memcached server is running.

Here I will connect to local Memcached running on default port and execute basic set and get commands:

$telnet 127.0.0.1 11211
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
// now store some data and get it from memcached server
set test 0 900 9
memcached
STORED
get test
VALUE tutorialspoint 0 9
memcached
END

Popular basic CLI commands

Command Description` Example
get Reads value based on key get mykey
set Set a key unconditionally set mykey 0 60 5
# Meaning:
0 = > no flags
60 => TTL in [s]
5 => size in byte
add Adds a key add newkey 0 60 5
replace Replaces key replace mykey 0 60 5
delete Deletes an existing key delete mykey

 

See details on these and other commands here – http://blog.elijaa.org/2010/05/21/memcached-telnet-command-summary/

In the near future I will go more in detail on Memcached, including available clients, configuration, Java development with clients and AWS ElastiCache implementation.  For now, you can also explore more here – https://memcached.org/, https://smarttechie.org/2013/07/20/memcached-a-distributed-memory-object-caching-system/, http://code.tutsplus.com/tutorials/turbocharge-your-website-with-memcached–net-23939, and https://wincent.com/wiki/Testing_memcached_with_telnet

Hope this helps.

Ocultos Exitus–JDBC Driver Unicode Settings and SQL Server Performance

While troubleshooting JDBC client apps that connect to SQL SErver I ran into this issue few times, latest very recently.

As you may already know well, SQL Server differentiates its data types that support Unicode from the ones that just support ASCII. For example, the character data types that support Unicode are nchar, nvarchar, longnvarchar where as their ASCII counter parts are char, varchar and longvarchar respectively. By default, all Microsoft’s JDBC drivers send the strings in Unicode format to the SQL Server, irrespective of whether the datatype of the corresponding column defined in the SQL Server supports Unicode or not.

In the case where the data types of the columns support Unicode, everything is smooth. But, in cases where the data types of the columns do not support Unicode, serious performance issues arise especially during data fetches. SQL Server tries to convert non-unicode datatypes implicitly in the table to unicode datatypes before doing the comparison. Moreover, if an index exists on the non-unicode column, it will be ignored. This would ultimately lead to a whole table scan during data fetch, thereby slowing down the search queries drastically.

This can be corrected by resetting one of the default parameter in the Java driver. The parameter name and value to be set might vary from driver to driver, depending on the vendor.

Vendor Parameter
JSQLConnect asciiStringParameters
JTDS sendStringParametersAsUnicode
DataDirectConnect sendStringParametersAsUnicode
Microsoft JDBC sendStringParametersAsUnicode
WebLogic Type 4 JDBC sendStringParametersAsUnicode

Reading – https://msdn.microsoft.com/en-us/library/ms378857(v=sql.110).aspx you will see- “ For optimal performance with CHAR, VARCHAR or LONGVARCHAR type of non-Unicode parameters, set the sendStringParametersAsUnicode connection string property to “false” and use the non-national character methods.” , issue was also reported via http://www.sqlconsulting.com/jdbc.shtml, http://www.jochenhebbrecht.be/site/2014-05-01/java/fixing-slow-queries-running-sql-server-using-jpa-hibernate-and-jtds and http://www.codeproject.com/Articles/281364/Solving-Performance-issues-in-data-migration-to-SQ

Hope this helps.

Taking Care of the Garbage – Generational is default GC policy on newer WebSphere AS

Imagine you have a legacy Java application running in IBM WebSphere that you have upgraded finally to newer version. Yet, customer is reporting serious performance regression. Why would that be? Well, one reason maybe a change in default JVM behavior between WebSphere versions, something that one of my customers discovered the “hard way”

Garbage collection (GC) is an integral part of the Java Virtual Machine (JVM) as it collects unused Java heap memory so that the application can continue allocating new objects. The effectiveness and performance of the GC play an important role in application performance and determinism. The IBM JVM provided with IBM WebSphere Application Server provides four different GC policy algorithms:

  • -Xgcpolicy:optthruput
  • -Xgcpolicy:optavgpause
  • -Xgcpolicy:gencon
  • -Xgcpolicy:balanced

Each of these algorithms provides different performance and deterministic qualities. In addition, the default policy in WebSphere Application Server V8 has changed from -Xgcpolicy:optthruput to the  policy -Xgcpolicy:gencon.   So lets dive in a bit what this really means.

The garbage collector

Different applications naturally have different memory usage patterns. A computationally intensive number crunching workload will not use the Java heap in the same way as a highly transactional customer-facing interface. To optimally handle these different sorts of workloads, different garbage collection strategies are required. The IBM JVM supports several garbage collection policies to enable you to choose the strategy that best fits your application

The parallel mark-sweep-compact collector: optthruput, formerly default

The simplest possible garbage collection technique is to continue allocating until free memory has been exhausted, then stop the application and process the entire heap. While this results in a very efficient garbage collector, it means that the user program must be able to tolerate the pauses introduced by the collector. Workloads that are only concerned about overall throughput might benefit from this strategy.

The optthruput policy (-Xgcpolicy:optthruput) implements this strategy. This collector uses a parallel mark-sweep algorithm. In a nutshell, this means that the collector first walks through the set of reachable objects, marking them as live data. A second pass then sweeps away the unmarked objects, leaving behind free memory than can be used for new allocations. The majority of this work can be done in parallel, so the collector uses additional threads (up to the number of CPUs by default) to get the job done faster, reducing the time the application remains paused.

figure1

The problem with a mark-sweep algorithm is that it can lead to fragmentation . There might be lots of free memory, but if it is in small slices interspersed with live objects then no individual piece might be large enough to satisfy a particular allocation.

The solution to this is compaction. In theory, the compactor slides all the live objects together to one end of the heap, leaving a single contiguous block of free space. This is an expensive operation because every live object might be moved, and every pointer to a moved object must be updated to the new location. As a result, compaction is generally only done when it appears to be necessary. Compaction can also be done in parallel, but it results in a less efficient packing of the live objects — instead of a single block of free space, several smaller ones might be created.

figure2

The concurrent collector: optavgpause

For applications that are willing to trade some overall throughput for shorter pauses, a different policy is available. The optavgpause policy (-Xgcpolicy:optavgpause) attempts to do as much GC work as possible before stopping the application, leading to shorter pauses . The same mark-sweep-compact collector is used, but much of the mark and sweep phases can be done as the application runs. Based on the program’s allocation rate, the system attempts to predict when the next garbage collection will be required. When this threshold approaches, a concurrent GC begins. As application threads allocate objects, they will occasionally be asked to do a small amount of GC work before their allocation is fulfilled. The more allocations a thread does, the more it will be asked to help out. Meanwhile, one or more background GC threads will use idle cycles to get additional work done. Once all the concurrent work is done, or if free memory is exhausted ahead of schedule, the application is halted and the collection is completed. This pause is generally short, unless a compaction is required. Because compaction requires moving and updating live objects, it cannot be done concurrently.

figure3

 

The generational collection: gencon

has long been observed that the majority of objects created are only used for a short period of time. This is the result of both programming techniques and the type of application. Many common Java idioms create helper objects that are quickly discarded; for example StringBuffer/StringBuilder objects, or Iterator objects. These are allocated to accomplish a specific task, and are rarely needed afterwards. On a larger scale, applications that are transactional in nature also tend to create groups of objects that are used and discarded together. Once a reply to a database query has been returned, then the reply, the intermediate state, and the query itself are no longer needed.

This observation lead to the development of generational garbage collectors. The idea is to divide the heap up into different areas, and collect these areas at different rates. New objects are allocated out of one such area, called the nursery (or newspace). Since most objects in this area will become garbage quickly, collecting it offers the best chance to recover memory. Once an object has survived for a while, it is moved into a different area, called tenure (or oldspace). These objects are less likely to become garbage, so the collector examines them much less frequently. For the right sort of workload the result is collections that are faster and more efficient since less memory is examined, and a higher percentage of examined objects are reclaimed. Faster collections mean shorter pauses, and thus better application responsiveness.
IBM’s gencon policy (-Xgcpolicy:gencon) offers a generational GC (“gen-“) on top of the concurrent one described above (“-con”). The tenure space is collected as described above, while the nursery space uses a copying collector. This algorithm works by further subdividing the nursery area into allocate and survivor spaces . New objects are placed in allocate space until its free space has been exhausted. The application is then halted, and any live objects in allocate are copied into survivor. The two spaces then swap roles; that is, survivor becomes allocate, and the application is resumed. If an object has survived for a number of these copies, it is moved into the tenure area instead.

figure4

 

The region-based collector: balanced

A new garbage collection policy has been added in WebSphere Application Server V8. This policy, called balanced (-Xgcpolicy:balanced), expands on the notion of having different areas of the heap. It divides the heap into a large number of regions, which can be dealt with individually. Frankly I haven’t seen it used by any customer I worked with yet.

 

For more on WebSphere IBM JVM GC see – http://www.perfdaddy.com/2015/10/ibm-jvm-tuning-gencon-gc-policy.html, http://www.ibmsystemsmag.com/ibmi/administrator/websphere/Tuning-Garbage-Collection-With-IBM-Technology-for/, http://javaeesupportpatterns.blogspot.com/2012/03/ibm-jvm-tuning-gencon-gc-policy.html

Hope this helps