Capre Noctem – Using SQL Server Diagnostics (Preview) to analyze SQL Server minidump

Microsoft just released the SQL Server Diagnostics (Preview) extension within SQL Server Management Studio and Developer APIs to empower SQL Server customers to achieve more through a variety of offerings to self-resolve SQL Server issues.

diagnostic

So what can it do:

    • Analyze SQL Server dump.

Customers should be able to debug and self-resolve memory dump issues from their SQL Server instances and receive recommended Knowledge Base (KB) article(s) from Microsoft, which may be applicable for the fix.

  • Review recommendations to keep SQL Server instances up to date.

 

Customers will be able to keep their SQL Server instances up-to-date by easily reviewing the recommendations for their SQL Server instances. Customers can filter by product version or by feature area (e.g. Always On, Backup/Restore, Column Store, etc.) and view the latest Cumulative Updates (CU) and the underlying hotfixes addressed in the CU.

  • Developers who want to discover and learn about Microsoft APIs can view developer portal and then use APIs in their custom applications. Developers can log and discuss issues and even submit their applications to the application gallery.

 

So I installed this extension and decided to give it a go with one of the SQL Server minidumps I have.

diagnostics2

It actually correctly identified an issue and issues suggestions. So, while this may not work on every dump, it should worth trying before you start up WinDbg.

Give it a try. Hope this helps.

Iter in ignotus–Installing SQL Server vNext on Ubuntu Linux 16.10

SQL-Loves-Linux_2_Twitter-002-640x358

 

 

Over the holidays I had a chance to install and run SQL Server vNext on my Ubuntu Linux machine. I did run into some issues during install, but was able to work around all of those and have SQL Server engine successfully running on Ubuntu 16.10 x64 .

To start out make sure that you are installing SQL Server vNext on Ubuntu Linux version 16 or higher and 64 bit. My first issue I ran into attempting to install on Ubuntu 14.x on 32 bit.

Here are steps to follow in Bash:

  1. Import the public repository GPG keys:
    curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
  2. Register the Microsoft SQL Server Ubuntu repository:
    curl https://packages.microsoft.com/config/ubuntu/16.04/mssql-server.list | sudo tee /etc/apt/sources.list.d/mssql-server.list
    
  3. Run Update thru apt-get. If you didn’t update you may run into an error later, as I did
    sudo apt-get update
    
  4. Run actual install via apt-get again
    sudo apt-get install -y mssql-server
    

    If you are on 32 bit or didnt update source you can see this error

    Unable to locate package mssql 
    

     

  5. Now if you in latest Ubuntu x 64 with updated components tou should see install occuring in your terminal window. After the package installation finishes, run the configuration script and follow the prompts.
    sudo /opt/mssql/bin/sqlservr-setup
    

    Once the configuration is done, verify that the service is running

    systemctl status mssql-server
    

Now that install is done in 5 easy steps and SQL Services are running on Ubuntu Linux you can use SSMS on Windows to connect to your SQL Server on Linux. But in order to connect from Linux to this instance on Linux I will need to install SQL Client Tools for connectivity stack.

 

The following steps install the command-line tools, Microsoft ODBC drivers, and their dependencies. The mssql-tools package contains:

  1. sqlcmd: Command-line query utility
  2. bcp: Bulk import-export utility.

Install on Ubuntu in 3 easy steps

  1. Import the public repository GPG keys:
    curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
    
  2. Register the Microsoft Ubuntu Repository
    curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
    
  3. Update again via apt-get
    sudo apt-get update
    
  4. Now run actual install
    sudo apt-get install mssql-tools
    

    If you are on 32 bit or did not update source you can see this error

    Unable to locate package mssql-tools
    

Now lets connect to our instance on local SQL and run a quick query again all via Terminal

sqlcmd -S localhost -U SA -P ''
SELECT @@version;
GO

Well, that answers what I was doing over the Holidays. Happy bashing to you SQL folks

.

For more see – https://www.microsoft.com/en-us/sql-server/sql-server-vnext-including-Linux

https://blogs.microsoft.com/blog/2016/03/07/announcing-sql-server-on-linux/#sm.001on55ayi1dduo11m02qmercn8t3, https://docs.microsoft.com/en-us/sql/linux/

Semper Secura–Introduction To SQL Server Always Encrypted

With the introduction of SQL Server 2016 we now have a new way to encrypt columns called Always Encrypted. What makes it different from older encryption features in SQL Server like Column Encryption and Transparent Data Encryption is that it doesn’t just encrypt data at rest or while being transmitted on the wire, but only feature that ensures that the database never sees unencrypted values of sensitive columns, as data gets encrypted on the client tier via SQL Client\database driver. While the database only sees encrypted values, the application code works exclusively with unencrypted data. When a query is executed, the driver automatically looks up the master key in the Windows Certificate Store (or other OS-dependent location). The master key is then used to decrypt a column specific key, which in turn is used for encrypting and decrypting fields and parameters.

pic2

Microsoft advertises following use cases for Always Encrypted feature:

· Client and Data On-Premises

A customer has a client application and SQL Server both running on-premises, at their business location. The customer wants to hire an external vendor to administer SQL Server. In order to protect sensitive data stored in SQL Server, the customer uses Always Encrypted to ensure the separation of duties between database administrators and application administrators. The customer stores plaintext values of Always Encrypted keys in a trusted key store which the client application can access. SQL Server administrators have no access to the keys and, therefore, are unable to decrypt sensitive data stored in SQL Server.

· Client On-Premises with Data in Azure

A customer has an on-premises client application at their business location. The application operates on sensitive data stored in a database hosted in Azure (for example in SQL Server running in a virtual machine on Microsoft Azure). The customer uses Always Encrypted and stores Always Encrypted keys in a trusted key store hosted on-premises, to ensure Microsoft cloud administrators have no access to sensitive data.

· Client and Data in Azure

A customer has a client application, hosted in Microsoft Azure (e.g. in a worker role or a web role), which operates on sensitive data stored also stored in Microsoft Azure. The customer uses Always Encrypted to reduce security attack surface area (the data is always encrypted in the database and on the machine hosting the database).

pic1

 

Types of Always Encrypted encryption:

SQL Server offers two encryption modes: deterministic and random.

· Deterministic encryption ensures that a given value always has the same encrypted representation. This allows you to use the column for equality comparisons, joins, and grouping.

· For more security, you can use random encryption. This prevents guessing by ensuring that a given value’s encrypted representation is never the same twice.

You should use deterministic encryption for columns that will be used as search or grouping parameters, for example a government ID number. Use randomized encryption, for data such as confidential investigation comments, which are not grouped with other records, or used to join tables.

Limitations.

If a column is encrypted, then all range-like operations such as greater/less than, pattern matching using LIKE, etc. are disallowed. Furthermore, you can’t pass encrypted values to functions, user-defined or otherwise, because the database doesn’t have access to the unencrypted values.

Equality comparisons can only be performed on columns that use deterministic encryption.

Indexes can only be applied to deterministically encrypted columns.

If joining between two columns, both columns need to use the same column encryption key.

Constants expressions that refer to encrypted columns are not allowed. For example, you cannot write WHERE SSN = ‘111-11-1111’, but you can write WHERE SSN = @SSN. This is necessary because the driver works with the SqlParameter class to handle encryption requirements.

Unsupported data types include: xml, rowversion, image, ntext, text, sql_variant, hierarchyid, geography, geometry, and user defined-types.

Current drivers that support these feature are listed below:

pic3

 

Here is a quick tutorial on how to set up Always Encrypted

· Open you SQL Server Management Studio and connect to your SQL Server 2016 instance or SQL Azure. As far as on premise SQL Server you can use AlwaysEncrypted feature in Enterprise or Developer Editions

pic4

· First thing we will create is Column Master Key. You can do so either in script or via nice wizard in SSMS. Below is my script to do so , but of course based on your certificate , machine, etc. your script will be different

CREATE DATABASE AEDemo;

GO

USE [AEDemo]
/****** Object:  ColumnMasterKey [test]    Script Date: 9/15/2016 6:44:15 PM ******/
CREATE COLUMN MASTER KEY [test]
WITH
(
	KEY_STORE_PROVIDER_NAME = N'MSSQL_CERTIFICATE_STORE',
	KEY_PATH = N'LocalMachine/My/A9B9AFE1993FA0FAD052F49195FDF10DE5A953DB'
)
GO

•    Next thing is to create Column Encryption Key. Again your actual script will vary from mine below or you can use nice SSMS wizard to do so

CREATE COLUMN ENCRYPTION KEY [test]
WITH VALUES
(
	COLUMN_MASTER_KEY = [test],
	ALGORITHM = 'RSA_OAEP',
	ENCRYPTED_VALUE = 0x01700000016C006F00630061006C006D0061006300680069006E0065002F006D0079002F0061003900620039006100660065003100390039003300660061003000660061006400300035003200660034003900310039003500660064006600310030006400650035006100390035003300640062003B237BB1F1C8A7ADBB0BEA972AB6FC321B7BAB43C6EEEF20CC4BBAC52BE1582868C7E9BF92B01D7022AF9838D625ED5F1CE2D35975BFF600F96A5A5A07EDFA946DBDE08DE896A632B06BAFC5A861DC8298E36E408BE5EBAAEB94C674C04765D0138F30B6B0E676294B5D3FF5374434273B823550D3A89D6337BEEE1D93FBDAB50025ED7A91BE2F2C3C032A636D0049F5614EC248097BD0B12F79FF3A51DC242987D931A2473EAB98BCB7217CFF61B447E435F5FE19BB5DCAF1B7C8D8FA606FE5354EB9773A78C1F1EEFCC4D3D401CB3C602EBB3C197A2421637304D94BC58F129F20A5685A700C6BEDF8D5080F959B57B3F65721DDEF78BBDA035923D3A93D3380E52465663A72916CE4DCF0D904C27DB8298D6F44AACC95998EA7CC1F895BBD55553523A5558778252A5E4AE2DED30D6DE04DB24ACE18771D0B8C27F06A228F67F950C10E0DA035D12934DA5DB5D65E947789EBA86A5375C71DC386FB854FCE702D642CB5ED695B96E864C3F9CC34E52D87178B2DF75B6AAE3A996B2F6BC9EB99445F6311142E8C3F85A545EE0614AFCAB02A547664B31E546FFAA7667D86D7DF239CB4EAB66B847EBD78BDD01707C7C7DE04EE8F52F5F7D722A71BC8E5015F9EFB2A0DBDEB9B732C163D8C32F9DA00AABCDBE68067715CB1C56A385B0EA908A51EC51F8290D701689E112C646A10462636766066F703D82844CA7237F721EF
)
GO

· Now that keys are created lets create table that uses these keys to encrypt columns

CREATE TABLE dbo.EncryptedTable
(
  ID INT IDENTITY(1,1) PRIMARY KEY,
  
  LastName NVARCHAR(32) COLLATE Latin1_General_BIN2 
    ENCRYPTED WITH 
    (
       ENCRYPTION_TYPE = DETERMINISTIC, 
       ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256', 
       COLUMN_ENCRYPTION_KEY = test2
    ) NOT NULL,
    
  Salary INT 
    ENCRYPTED WITH 
    (
       ENCRYPTION_TYPE = RANDOMIZED, 
       ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256', 
       COLUMN_ENCRYPTION_KEY = test2
    ) NOT NULL
);
GO

So essentially we followed this workflow:

pic6

Well , we can try to insert some data into a table

 

DECLARE @LastName NVARCHAR(32) = N'Gennady', @Salary INT = 720000;
INSERT dbo.EncryptedTable(LastName,Salary) SELECT @LastName, @Salary;
go

This  will actually error out, but why? Since encryption and decryption is occurring on driver level this will be way you populate these encrypted columns from client side code. You don’t have to make any changes to your data access layer of your application except addition of following to your connection string:]

Column Encryption Setting=Enabled

Together with rolling out your Column Master Key to the client it should make it easy for your application to work with AlwaysEncrypte

pic8

Great details on Always Encrypted can be found here –

BOL – https://msdn.microsoft.com/en-us/library/mt163865.aspx

Channel 9 – https://channel9.msdn.com/Shows/Data-Exposed/Getting-Started-with-Always-Encrypted-with-SSMS?ocid=relatedentry

As every security and encryption feature there is performance overhead to enabling Always Encrypted, Aaron Bertrand has attempted some benchmarking that is shown here – http://sqlperformance.com/2015/08/sql-server-2016/always-encrypted-performance-follow-up

Ocultos Exitus–JDBC Driver Unicode Settings and SQL Server Performance

While troubleshooting JDBC client apps that connect to SQL SErver I ran into this issue few times, latest very recently.

As you may already know well, SQL Server differentiates its data types that support Unicode from the ones that just support ASCII. For example, the character data types that support Unicode are nchar, nvarchar, longnvarchar where as their ASCII counter parts are char, varchar and longvarchar respectively. By default, all Microsoft’s JDBC drivers send the strings in Unicode format to the SQL Server, irrespective of whether the datatype of the corresponding column defined in the SQL Server supports Unicode or not.

In the case where the data types of the columns support Unicode, everything is smooth. But, in cases where the data types of the columns do not support Unicode, serious performance issues arise especially during data fetches. SQL Server tries to convert non-unicode datatypes implicitly in the table to unicode datatypes before doing the comparison. Moreover, if an index exists on the non-unicode column, it will be ignored. This would ultimately lead to a whole table scan during data fetch, thereby slowing down the search queries drastically.

This can be corrected by resetting one of the default parameter in the Java driver. The parameter name and value to be set might vary from driver to driver, depending on the vendor.

Vendor Parameter
JSQLConnect asciiStringParameters
JTDS sendStringParametersAsUnicode
DataDirectConnect sendStringParametersAsUnicode
Microsoft JDBC sendStringParametersAsUnicode
WebLogic Type 4 JDBC sendStringParametersAsUnicode

Reading – https://msdn.microsoft.com/en-us/library/ms378857(v=sql.110).aspx you will see- “ For optimal performance with CHAR, VARCHAR or LONGVARCHAR type of non-Unicode parameters, set the sendStringParametersAsUnicode connection string property to “false” and use the non-national character methods.” , issue was also reported via http://www.sqlconsulting.com/jdbc.shtml, http://www.jochenhebbrecht.be/site/2014-05-01/java/fixing-slow-queries-running-sql-server-using-jpa-hibernate-and-jtds and http://www.codeproject.com/Articles/281364/Solving-Performance-issues-in-data-migration-to-SQ

Hope this helps.

Et Obscure Ex Alto–TDS Compatibility for SQL Server Upgrades

As majority of folks may know time is counting down on SQL Server extended support for SQL Server 2005 , it officially ends April 12, 2016. Working with folks on upgrading to fairly old systems from SQL Server 2000\2005 to newer versions like SQL Server 2012\2014 we ran into fairly rare issue of TDS backward compatibility, mainly with very old clients running older versions of JDBC\ODBC drivers.

Tabular Data Stream (TDS) is an application layer protocol, used to transfer data between a database server and a client. It was initially designed and developed by Sybase Inc. for their Sybase SQL Server relational database engine in 1984, and later by Microsoft in Microsoft SQL Server

tds

Microsoft published TDS specification in 2008 – https://msdn.microsoft.com/en-us/library/cc448435.aspx , Still TDS and especially TDS compatibility on SQL Server upgrades are pretty obscure topic. Microsoft published TDS version list matched to SQL Server and .NET Framework here – https://msdn.microsoft.com/en-us/library/dd339982.aspx

TDS version SQL Server version .NET Framework version
7.0 SQL Server 7.0 .NET Framework 1.1
7.1 SQL Server 2000 .NET Framework 1.1
7.1 Revision 1 SQL Server 2000 SP1 .NET Framework 1.1
7.2 SQL Server 2005 .NET Framework 2.0
7.3.A SQL Server 2008 .NET Framework 4.0
7.3.B SQL Server 2008 R2 N/A
7.4 SQL Server 2012 .NET Framework 4.5
7.4 SQL Server 2014 .NET Framework 4.5
7.4 SQL Server 2016 CTP2 .NET Framework 4.6

Although TDS compatibility isn’t well documented by Microsoft , from what I have experienced it appears Microsoft keeps at least two versions of TDS backwardly compatible.

So how do I find out what versions of TDS are connecting to your server? On newer versions like SQL Server 2005 and above where DMVs are present following query can be used:

select protocol_type, protocol_version from sys.dm_exec_connections

Result will be something like:

TSQL   1930035203 

Well, the big number still doesn’t say which version of data access driver, does it? Actually it does. The integer value is the representation of TDS version (for TSQL). It needs to be translated as follows, converting protocol version to Hex:

SELECT CONVERT(BINARY(4), 1930035203) 

Result will be as follows:

0x730A0003

Take the first two hex digits (0x73) and map it to correct SQL Server version based on the table above. So now we know that connection was made via TDS 7.3x , possibly via SQL Server 2008 Client stack.

In one of its older posts SQLCAT actually combined al;l of these steps into one query:

SELECT session_id, protocol_type, driver_version = 
CASE SUBSTRING(CAST(protocol_version AS BINARY(4)), 1,1)
WHEN 0x70 THEN ‘SQL Server 7.0′
WHEN 0x71 THEN ‘SQL Server 2000′
WHEN 0x72 THEN ‘SQL Server 2005′
WHEN 0x73 THEN ‘SQL Server 2008′
ELSE ‘Unknown driver’
END
FROM sys.dm_exec_connections

But what if I am on SQL Server 2000, well before DMVs were available and need to check my TDS versions connecting before upgrade?

Here things will become a lot more difficult and labor intensive. The only way I think is via checking version in TDS pre-login packet via network capture on SQL Server side.

As per – https://msdn.microsoft.com/en-us/library/dd357559.aspx pre-login is message sent by the client to set up context for login. The server responds to a client PRELOGIN message with a message of packet header type 0x04 and the packet data containing a PRELOGIN structure. As part there is optional version value:

PL_OPTION_TOKEN Value Description
VERSION 0x00 PL_OPTION_DATA =   UL_VERSION

                   US_SUBBUILD

UL_VERSION is represented in network byte order (big-endian).

The server SHOULD use the VERSION sent by the client to the server. The client SHOULD use the version returned from the server to determine which features are enabled or disabled. The client SHOULD do this only if it is known that this feature is supported by that version of the database.<27>

Theoretically upon capture of such data via network trace it can be filtered in wireshark , from – https://www.wireshark.org/docs/dfref/t/tds.html

tds.prelogin.option.version	Version	Unsigned integer, 4 bytes	2.0.0

According to spec, version is an option – client should provide version by docs, however possibly some clients don’t. Assume that would be pretty rare, at least all mainstream clients I have seen do.

So dealing with older, obscure SQL Server clients is hard, but doesn’t have to impossible.

For more see – https://blogs.msdn.microsoft.com/jenss/2009/03/02/tds-protocol-versions-meet-client-stacks/, https://blogs.msdn.microsoft.com/sql_protocols/2008/07/15/connection-failure-because-of-mismatched-tds-version/, https://msdn.microsoft.com/en-us/library/dd339982.aspx

Forecast Cloudy – Azure SQL Data Warehouse Introduction

 

    Many enterprises today are moving into era of real-time analytics. As you may be aware on non-relational data typical Hadoop batch analysis architectures are enhanced by technologies like Storm, Kafka, Azure Stream Analytics, Amazon Kinesis, etc. and are moving from purely batch analytics to hybrid real-time\batch architectures like Lambda (http://www.semantikoz.com/blog/lambda-architecture-velocity-volume-big-data-hadoop-storm/) or Zeta (http://radar.oreilly.com/2015/04/zeta-architecture-hexagon-is-the-new-circle.html) with both speed and batch layers. With structured data new generation of elastic cloud based data warehouses is introduced like Amazon Redshift and Azure SQL Data Warehouse. I will blog about Redshift later, but now will take a bit to introduce Azure SQL DW.

Azure SQL Data Warehouse is a turn-key cloud (and related to Microsoft APS\APS on premises solution) data warehousing and analytics solution. It based on existing Azure services effectively and conveniently integrating them together under one roof. The key characteristics of Azure SQL Data Warehouse are:

  • Can scale (grow or shrink) to any size on demand, in seconds
  • Can import data from any source (using Azure Data Factory): hadoop, NoSQL database, SQL database
  • ·Can visualize data and run reports using PowerBI or other service
  • ·Use Azure Machine Learning to analyze, model and predict data
  • · Expose Machine Learning models as API for apps
  • · Provide full ANSI SQL support

image

MPP Architecture. At its core, SQL Data Warehouse runs using Microsoft’s massive parallel processing (MPP) architecture, originally introduced in Microsoft APS\PDW appliance and also successfully used by some other competitor analytics appliance products like Pivotal Greenplum. This architecture takes advantage of built-in data warehousing performance improvements and also allows SQL Data Warehouse to easily scale-out and parallelize computation of complex SQL queries. In addition, SQL Data Warehouse’s architecture is designed to take advantage of it’s presence in Azure. Combining these two aspects, the architecture breaks up into 4 key components:

SQL Data Warehouse Architecture

  • Control Node: You connect to the control node when using SQL Data Warehouse with any development, loading, or business intelligence tools. In SQL Data Warehouse, the compute node is a SQL Database, and when connecting it looks and feels like a standard SQL Database. However, under the surface, it coordinates all of the data movement and computation that takes place in the system. When a command is issued to the control node, it breaks it down into a set of queries that will be passed onto the compute nodes of the service.
  • Compute Nodes: Like the control node, the compute nodes of SQL Data Warehouse are powered using using SQL Databases. Their job is to serve as the compute power of the service. Behind the scenes, any time data is loaded into SQL Data Warehouse, it is distributed across the nodes of the service. Then, any time the control node receives a command it breaks it into pieces for each compute node, and the compute nodes operate over their corresponding data. After completing their computation, compute nodes pass partial results to the control node which then aggregates results before returning an answer.
  • Storage: All storage for SQL Data Warehouse is standard Azure Storage Blobs. This means that when interacting with data, compute nodes are writing and reading directly to/from Blobs. Azure Storage’s ability to expand transparently and nearly limitlessly allows us to automatically scale storage, and to do so separately from compute. Azure Storage also allows us to persist storage while scaling or paused, streamline our back-up and restore process, and have safer, more fault tolerant storage.
  • Data Movement Services: The final piece holding everything together in SQL Data Warehouse is our Data Movement Services. The data movement services allows the control node to communicate and pass data to all of the compute nodes. It also enables the compute nodes to pass data between each other, which gives them access to data on other compute nodes, and allows them to get the data that they need to complete joins and aggregations.

This MPP approach allows SQL Data Warehouse to take a divide and conquer approach as described above when solving large data problems. Since the data in SQL Data Warehouse is divided and distributed across the compute nodes of the service, each compute node is able to operate on its portion of the data in parallel. Finally, results are passed to the control node and aggregated before being passed back to the users

image

Elasticity. Currently the majority of cloud based database and data warehouse services are provisioned with fixed storage and compute resources. Resizing of resources cannot be performed without compromising availability and performance. This means service user typically ends up with over-provisioned underutilized expensive resources to accommodate possible peak demand. In the worst case, under-provisioned resources unable to handle sudden work overloads.

Azure SQL Data Warehouse (DW) is a fully managed, elastic, and petabyte-scale columnar data-warehouse service. Both Amazon Redshift and Azure SQL DW use massive parallel processing (MPP) architecture to deliver breakthrough performance. But unlike Amazon Redshift, Azure SQL DW architecture allows data and compute to scale independently without any downtime. In addition, Azure SQL DW enables one to dynamically grow and shrink resources taking advantage of best-in-class price and performance.

Most importantly, with Azure SQL DW, storage and compute are billed separately. Storage rates are based on standard blob rates. Compute is priced based on DWU (Data Warehouse Unit) blocks – a new metric to measure the compute capacity when using Azure SQL DW. Performance in Azure SQL DW scales linearly, and changing from one compute block to another (for instance 100 DWUs to 1000 DWUs) happens in seconds without disruption.

image

DWU (Data Warehouse Unit) capacity blocks. Data Warehouse Units are a new concept delivered by SQL Data Warehouse for capacity. Unfortunately outside of this article here – https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-performance-scale/ there isn’t much information on DWIs. Especially nice would be some sort of calculator to help translate traditional DW metrics like IOPs or queries per hour, etc. to DWUs.

At this time Microsoft recommends following approach to properly purchasing your Azure DW capacity in DWUs:

  • For a data warehouse in development, begin by selecting small number of DWUs
  • Monitor your application performance, observing the number of DWUs selected compared to the performance you observe
  • Determine how much faster or slower performance should be for you to reach the optimum performance level for your requirements by assuming linear scale
  • Increase or decrease the number of DWU selected. The service will respond quickly and adjust the compute resources to meet the DWU requirements.
  • Continue making adjustments until you reach an optimum performance level for your business requirements.

image

Hadoop Integration.

SQL DW is built with the same technology as APS, except that instead of using SQL Server 2014 it uses version 12 of Azure SQL Database.  It also includes PolyBase.  PolyBase allows APS and SQL DW to query data in a Hadoop cluster, either directly or by pushing some of the work to Hadoop itself so the query is actually run using the Hadoop clusters CPU’s.  The Hadoop data is made to look as if it were local to the data warehouse, so that end-users can use their existing skill sets to query it via SQL or any reporting tool that using SQL (like Excel, SSRS, Power BI, etc).  PolyBase can integrate with Hadoop in this manner via a Microsoft HDInsight cluster that can either be inside APS or in the cloud, or via a Hortonworks or Cloudera cluster.

image

So enough theory lets spin up our own SQL Data Warehouse in Azure. Unlike traditional systems this will not take weeks or months, but minutes. From Azure Preview Portal (it’s not on Classic Portal) go to New->Data + Storage ->Azure Data Warehouse

image

Add DW name, if you already have version 12 SQL Azure server you can use it, otherwise you will have to create new server instance with login and password. I picked lowest available DWU at 100.

image

As a result you will see animated GIF in upper right corner with status:

image

Few minutes and you have yourself a real MPP DW instance on the cloud. Finally after adding my client IP as firewall exclusion I can simply use my SSMS to login to this server and DB.

image

For more info on Azure SQL DW see – https://azure.microsoft.com/en-us/services/sql-data-warehouse/, https://azure.microsoft.com/en-us/documentation/services/sql-data-warehouse/, http://techcrunch.com/2015/04/29/microsoft-introduces-azure-sql-data-warehouse/, http://blogs.technet.com/b/dataplatforminsider/archive/2015/06/24/azure-sql-data-warehouse-opens-for-limited-public-preview.aspx

Vigilate Reservate – Live Query Statistics in SQL Server 2014

One of many new features with SQL Server 2014 that’s not to be overlooked is ability to see live query stats real time as query is being executed using new DMVs. The fact is that in SQL Server 2014 we have a new Dynamic Management View to track what a query is actually doing at a physical operator level.  I am referring to the sys.dm_exec_query_profiles Dynamic Management View (DMV).

This feature will help the database administrators and experienced developers to troubleshoot long running queries in real-time. I’m pretty sure you had to deal with the following questions: When will this query finish? What percentage of total workload is performed by this request? Which steps are the longest? Before SQL Server 2014 it was impossible to answer the questions above.

Important note – in order to use this feature with this DMV we must use some session options like STATISTICS PROFILE, STATISTICS XML or SET STATISTICS XML ON to  force SQL Server to display the execution plan after execution query with SQL Server Management Studio. So in other words Sit doesn’t work for plans stored in the Plan Cache.

SQL Server BOL has a good detail on new sys.dm_exec_query_profiles DMV here – https://msdn.microsoft.com/en-us/library/dn223301(v=sql.120).aspx. What BOL states there is – “Monitors real time query progress while the query is in execution. The counters are per operator per thread. The data collected is serialized into the SHOWPLAN XML when the query finishes.”  What Microsoft really wanted to say by it was that SHOWPLAN XML or any other sort of actual execution plan is a prerequisite. If actual execution plan isn’t required for your query, SQL Server won’t gather all this data at the first hand. And won’t show it in sys.dm_exec_query_profiles during runtime.

So that means I can use SSMS to run a particulary long running query and get stats on it. However you probably will not be running this all of the time, the overhead is pretty big here.  This DMV provides a lot of useful information.  These information are more granular than SET STATISTICS IO because the counters returned are per operator per thread (node_id / physical_operator_name and thread_id columns). Moreover two others columns are interesting like estimate_row_count and row_count. The former is the number of estimated rows an operator should address and the latter is the current number of rows addressed by the same operator (remember these counters are per operator per thread …). We can compare these two columns to estimate a completion percentage per operator (or per operator per thread). Finally we can categorize some information provided by this DMV in two parts: information related either to the query execution engine (row_count, estimated_row_count) or the storage execution engine (logical_read_count, write_page_count, lob_logical_read_count etc.)

so imagine I am running a following query against AdvetureWorks2012 sample DB:

 

USE AdventureWorks2012 GO SET STATISTICS XML ON SELECT * FROM Sales.SalesOrderHeader OH INNER JOIN Sales.SalesOrderDetail OD ON OD.SalesOrderID = OH.SalesOrderID INNER JOIN Sales.Customer C ON C.CustomerID = OH.CustomerID INNER JOIN Production.Product P ON P.ProductID = OD.ProductID INNER JOIN Production.ProductSubcategory PS ON pS.ProductSubcategoryID = P.ProductSubcategoryID GO

 

Query result will be when finished:

image

While that is running in another SSMS query window I execute:

SELECT  session_id ,
        node_id ,
        physical_operator_name ,
        SUM(row_count) row_count ,
        SUM(estimate_row_count) AS estimate_row_count ,
        IIF(COUNT(thread_id) = 0, 1, COUNT(thread_id)) [Threads] ,
        CAST(SUM(row_count) * 100. / SUM(estimate_row_count) AS DECIMAL(30, 2)) [% Complete] ,
        CONVERT(TIME, DATEADD(ms, MAX(elapsed_time_ms), 0)) [Operator time] ,
        DB_NAME(database_id) + '.' + OBJECT_SCHEMA_NAME(QP.object_id,
                                                        qp.database_id) + '.'
        + OBJECT_NAME(QP.object_id, qp.database_id) [Object Name]
FROM    sys.dm_exec_query_profiles QP
GROUP BY session_id ,
        node_id ,
        physical_operator_name ,
        qp.database_id ,
        QP.OBJECT_ID ,
        QP.index_id
ORDER BY session_id ,
        node_id
GO


I get following as a result:

image

 

Just as with execution plan information can be useful looking at operator time and differences in estimated row counts to actual. Looking on the web appears that folks went further and correlated sys.dm_exec_query_profiles, sys.objects, sys.indexes, sys.dm_exec_requets, sys.dm_exec_sessions, and sys.dm_os_waiting_tasks to cover both the real-time execution information and the related waits occurring in the same time. One such interesting script by David Barbarin is here – http://blog.dbi-services.com/sql-server-2014-sysdmexecqueryprofiles-a-new-promising-feature/ 

 

This feature and new DMV generated lots of buzz so more details are available here – http://sqlserver-help.com/2014/06/10/sql-2014-learning-series-11-new-dmv-sys-dm_exec_query_profiles-part-1/, http://sqlserver-help.com/tag/sys-dm_exec_query_profiles/, and http://blog.sqltechie.com/2013/11/sysdmexecqueryprofiles-sql-2014-new-way.html, http://sqlblog.com/blogs/michael_zilberstein/archive/2013/10/21/51415.aspx