Data Cargo In The Clouds – Running SQL Server 2017 in Docker

sqlserver_and_docker

In my previous post I attempted to explain my new interest in Docker and running data-centric applications in container. Starting here I will show how to install SQL Server for Linux on Azure IaaS in container.

Lets start by creating Linux VM in Azure that will run Docker for us.  I dont want to spend all of the time and space going through steps in portal or PowerShell since I already went through these in this post, Good video titorial  can also be found here .

Assuming you successfully created Linux VM and can login like this:

azlinux1

Next thing we will install Docker on that VM.  We will get latest and greatest version of Docker from Docker repository.

  • First, add the GPG key for the official Docker repository to the system:

    $ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    
  • Add the Docker repository to APT sources:

    $ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
  • Next, update the package database with the Docker packages from the newly added repo:

    $ sudo apt-get update
  • Make sure you are about to install from the Docker repo instead of the default Ubuntu 16.04 repo:
    apt-cache policy docker-ce

    You should see following output:

    docker-ce:
      Installed: (none)
      Candidate: 17.03.1~ce-0~ubuntu-xenial
      Version table:
         17.03.1~ce-0~ubuntu-xenial 500
            500 https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
         17.03.0~ce-0~ubuntu-xenial 500
            500 https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
    
    

    Notice that docker-ce is not installed, but the candidate for installation is from the Docker repository for Ubuntu 16.04. The docker-ce version number might be different.

  • Finally install Docker
    $ sudo apt-get install -y docker-ce
  • Make sure its installed and running
    $ sudo systemctl status docker

    Output should be similar to below showing that daemon is started and running

    linuxdoc

Next we need to pull latest SQL Server Docker image:

$ sudo docker pull microsoft/mssql-server-linux

Once the image is pulled and extracted you should see similar output

linuxdoc2

Now lets run container image with Docker:

$ sudo docker run -e 'ACCEPT_EULA=Y' -e 'MSSQL_SA_PASSWORD=<YourStrong!Passw0rd>' -e 'MSSQL_PID=Developer' --cap-add SYS_PTRACE -p 1401:1433 -d microsoft/mssql-server-linux

Now we are running Docker container with SQL Server 2017 in Azure. We should be able to list our containers like:

 $ sudo docker ps -a

And see output like:

linuxdoc3

Connect to SQL Server in the container.

The following steps use the SQL Server command-line tool, sqlcmd, inside the container to connect to SQL Server. First lets connect to bash inside the container using docker exec command. Note I am providing container id here as parameter fetched from previous output of ps command

$ sudo docker exec -it d95734b7f9ba  "bash"

Now lets connect via sqlcmd here

/opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P '<YourStrong!Passw0rd>'

Now lets make simplest query:

SELECT @@version
     GO

You should see output like:

linuxdoc4

Now you can go ahead and start creating databases, tables, moving data, etc.

For more see – https://docs.microsoft.com/en-us/sql/linux/quickstart-install-connect-dockerhttps://docs.microsoft.com/en-us/sql/linux/sql-server-linux-configure-docker. SQL Server images on Docker Hub – https://hub.docker.com/r/microsoft/mssql-server-linux/https://mathaywardhill.com/2017/05/08/how-to-attach-a-sql-server-database-in-a-linux-docker-container/

 

Hope this helps.

Advertisements

Forecast Cloudy – The Magic of AzCopy As Database Migration Tool

Here is a quick note that can be helpful to folks. Recently I had to migrate database to Azure via taking a backup from on-premises SQL Server and restoring this backup on SQL Server on Azure VM.  Now, there is a better , more structured method available for the same using Database Migration Assistant (DMA) from Microsoft – https://www.microsoft.com/en-us/download/details.aspx?id=53595 , however I went old fashioned way purely through backups.

But how do I move 800 GB backup quickly to the cloud? What about network issues and need to restart on network hiccup? All of these issues and more can be solved via AzCopy tool.

AzCopy is a command-line utility designed for copying data to and from Microsoft Azure Blob, File, and Table storage using simple commands with optimal performance. You can copy data from one object to another within your storage account, or between storage accounts.

There are two versions of AzCopy that you can download. AzCopy on Windows is built with .NET Framework, and offers Windows style command-line options. AzCopy on Linux is built with .NET Core Framework which targets Linux platforms offering POSIX style command-line options. I will showcase Windows version here for now.

So what I did first is created a File Share on Azure Blob Storage. Azure File storage is a service that offers file shares in the cloud using the standard Server Message Block (SMB) Protocol. Both SMB 2.1 and SMB 3.0 are supported. You can follow this tutorial to do so – https://www.petri.com/configure-a-file-share-using-azure-files . Next, I needed to upload my backup to this file share via AzCopy:

The basic syntax to use AzCopy from command line as follows

AzCopy /Source: /Dest: [Options]

I can upload a single file, called test.bak , from C:\Temp to the demo container in the storage account (blob service) using the following command:

AzCopy /Source:C:\Temp /Dest:https://pazcopy.blob.core.windows.net/demo /DestKey:QB/asdasHJGHJGHJGHJ+QOoPu1fC/Asdjkfh48975845Mh/KlMc/Ur7Dm3485745348gbsUWGz/v0e== /Pattern:"test.bak"

Some notes:

  • Note how the URL of the destination storage account specifies the blob service.
  • I have specified the storage account access key with the /DestKey option. This key is unique to the storage account (source or destination).
  • The /Pattern option specifies the file being copied

Once file copied ypu should see stats in command line as well:

azcopy

Once file is uploaded from my destination Azure VM all I have to is map to the File Share (example here – https://blogs.msdn.microsoft.com/windowsazurestorage/2014/05/12/introducing-microsoft-azure-file-service/)  and restore backup.

Hope this helps.  More on AzCopy – https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopyhttps://blogs.technet.microsoft.com/canitpro/2015/12/28/step-by-step-using-azcopy-to-transfer-files-to-azure/

 

Data Cargo In The Clouds – Data and Containerization On The Cloud Platforms

container

This topic is where I actually planned to pivot to for quite a while, but either had no bandwith\time with moving to Seattle or some other excuse like that. This topic is an interesting to me as its intersection of technologies where I used to spend quite a bit of time including:

  • Data, including both traditional RDBMS databases and NoSQL designs
  • Cloud, including Microsoft Azure, AWS and GCP
  • Containers and container orchestration (that is area new to me)

Those who worked with me either displayed various degrees of agreement, but more often of annoyance, on me pushing this topic as growing part of data engineering discipline, but I truly believe in it. But before we start combining all these areas and do something useful in code I will spend some time here with most basic concepts of what I am about to enter here.

I will skip introduction to basic concepts of RDBMS, NoSQL or BigData\Hadoop technologies, as if you didn’t hide under the rock for last five years, you should be quite aware of those. That brings us straight to containers:

As definition states – “A container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings. Available for both Linux and Windows based apps, containerized software will always run the same, regardless of the environment. Containers isolate software from its surroundings, for example differences between development and staging environments and help reduce conflicts between teams running different software on the same infrastructure.”

Containers are a solution to the problem of how to get software to run reliably when moved from one computing environment to another. This could be from a developer’s laptop to a test environment, from a staging environment into production, and perhaps from a physical machine in a data center to a virtual machine in a private or public cloud.

container.jpg

But why containers if I already have VMs? 

VMs take up a lot of system resources. Each VM runs not just a full copy of an operating system, but a virtual copy of all the hardware that the operating system needs to run. This quickly adds up to a lot of RAM and CPU cycles. In contrast, all that a container requires is enough of an operating system, supporting programs and libraries, and system resources to run a specific program.
What this means in practice is you can put two to three times as many as applications on a single server with containers than you can with a VM.  In addition, with containers you can create a portable, consistent operating environment for development, testing, and deployment. 

On Linux, containers run on top of LXC. This is a userspace interface for the Linux kernel containment features. It includes an application programming interface (API) to enable Linux users to create and manage system or application containers.

Docker is an open platform tool to make it easier to create, deploy and to execute the applications by using containers.

  • The heart of Docker is Docker engine. The Docker engine is a part of Docker which create and run the Docker containers. The docker container is a live running instance of a docker image. Docker Engine is a client-server based application with following components :
    • A server which is a continuously running service called a daemon process.
    • A REST API which interfaces the programs to use talk with the daemon and give instruct it what to do.
    • A command line interface client.

             docker_engine 

  • The command line interface client uses the Docker REST API to interact with the        Docker daemon through using CLI commands. Many other Docker applications    also use the API and CLI. The daemon process creates and manage Docker images, containers, networks, and volumes.
  • Docker client is the primary service using which Docker users communicate with the Docker. When we use commands “docker run” the client sends these commands to dockerd, which execute them out.

You will build Docker images using Docker and deploy these into what are known as Docker registries. When we run the docker pull and docker run commands, the required images are pulled from our configured registry directory.Using Docker push command, the image can be uploaded to our configured registry directory. 

Finally deploying instances of images from your registry you will deploy containers. We can create, run, stop, or delete a container using the Docker CLI. We can connect a container to more than one networks, or even create a new image based on its current state.By default, a container is well isolated from other containers and its system machine. A container defined by its image or configuration options that we provide during to create or run it.

For more info on containers , LXC and Docker see – https://blog.scottlowe.org/2013/11/25/a-brief-introduction-to-linux-containers-with-lxc/, https://www.docker.com/what-container#/virtual_machines http://searchservervirtualization.techtarget.com/definition/container-based-virtualization-operating-system-level-virtualization

That brings us to container orchestration engines. While the CLI meets the needs of managing one container on one host, it falls short when it comes to managing multiple containers deployed on multiple hosts. To go beyond the management of individual containers, we must turn to orchestration tools. Orchestration tools extend lifecycle management capabilities to complex, multi-container workloads deployed on a cluster of machines.

Some of the well known orchestration engines include:

Kubernetes.  Almost a standard nowdays , originally developed at Google. Kubernetes’ architecture is based on a master server with multiple minions. The command line tool, called kubecfg, connects to the API endpoint of the master to manage and orchestrate the minions.  The service definition, along with the rules and constraints, is described in a JSON file. For service discovery, Kubernetes provides a stable IP address and DNS name that corresponds to a dynamic set of pods. When a container running in a Kubernetes pod connects to this address, the connection is forwarded by a local agent (called the kube-proxy) running on the source machine to one of the corresponding backend containers.Kubernetes supports user-implemented application health checks. These checks are performed by the kubelet running on each minion to ensure that the application is operating correctly.

kub

For more see – https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/#kubernetes-is

Apache Mesos. This is an open source cluster manager that simplifies the complexity of running tasks on a shared pool of servers. A typical Mesos cluster consists of one or more servers running the mesos-master and a cluster of servers running the mesos-slave component. Each slave is registered with the master to offer resources. The master interacts with deployed frameworks to delegate tasks to slaves. Unlike other tools, Mesos ensures high availability of the master nodes using Apache ZooKeeper, which replicates the masters to form a quorum. A high availability deployment requires at least three master nodes. All nodes in the system, including masters and slaves, communicate with ZooKeeper to determine which master is the current leading master. The leader performs health checks on all the slaves and proactively deactivates any that fail.

 

Over the next couple of posts I will create containers running SQL Server and other data stores, both RDBMS and NoSQL, deploy these on the cloud and finally attempt to orchestrate these hopefully as well.  So lets move off theory and pictures into world of parctical data engine deployments.

Hope you will find this detour interesting.

Forecast Cloudy – Profiling Azure Cloud Service

We can get an in-depth analysis of the computational aspects of how azure application runs by using the Visual Studio Profiler.  Below I will show you how to use Sampling performance gathering method in Visual Studio Profiler to profile Cloud Service in Azure. If you need basic information on Visual Studio Profiler start here –

https://docs.microsoft.com/en-us/visualstudio/profiling/beginners-guide-to-performance-profiling .
Sampling is a statistical profiling method that shows you the functions that are doing most of the user mode work in the application. Sampling is a good place to start to look for areas to speed up your application.

At specified intervals, the Sampling method collects information about the functions that are executing in your application. After you finish a profiling run, the Summary view of the profiling data shows the most active function call tree, called the Hot Path, where most of the work in the application was performed. The view also lists the functions that were performing the most individual work, and provides a timeline graph you can use to focus on specific segments of the sampling session.

Sampling is the most common method to application profiling, there are other methods as well, but they may come with more performance overhead or require more application instrumentation.
Make sure you enable appropriate settings when publishing your application to Azure

  • In Solution Explorer, open the shortcut menu for your Azure project, and then choose Publish.
  • In the Advanced Settings tab, select the Enable profiling

publish

  • Choose the Settings And select Sampling  and Enable Tier Interaction Profiling, and click OK.
  • Click Next and Publish the application

publish2

Once you ran your profile you can now view profile reports. To get them do following

  • Using Visual Studio Server Explorer, expand Azure -> Cloud Services ->Your_Cloud_Role ->Production (Profiling), right click on the instance, click on View Profiling Report.
  • report_view
  • It may take couple minutes for the profiling report to show up. Click on the Save icon to save a local copy for analysis

profile

  • Once you are done with profiling you may republish application without that option checked.

Happy performance hunting. For more see – http://msdn.microsoft.com/library/azure/hh369930.aspx.

Forecast Cloudy – NGINX On Microsoft Azure

NGINX

NGNIX. Nginx (pronounced “engine x”) is software to provide a web server. It can act as a reverse proxy server for TCP, UDP, HTTP, HTTPS, SMTP, POP3, and IMAP protocols, as well as a load balancer and an HTTP cache. Nginx comes in two flavors , free and open source basic version and more advanced Plus version. Here is comparison of both in terms of features from Nginx docs:

https://www.nginx.com/products/feature-matrix/

As you may know NGNIX Plus flavor is already available as VMs in Azure Marketplace.  So to setup in Azure, all I had to do is following:

  • Go to Azure Marketplace, find NGNIX Inc
  • Pick Resource Group or Classic Deployment (You will want to pick Azure Resource Group – see below)
  • Pick  my VM size (I picked smallest A1 Standard  for my experiment)

nginx2

This is all done via portal , but what if you really want to use Azure Resource Manager Templates to automate this?

Azure originally provided only the classic deployment model. In this model, each resource existed independently; there was no way to group related resources together. Instead, you had to manually track which resources made up your solution or application, and remember to manage them in a coordinated approach. To deploy a solution, you had to either create each resource individually through the classic portal or create a script that deployed all of the resources in the correct order. To delete a solution, you had to delete each resource individually. You could not easily apply and update access control policies for related resources. Finally, you could not apply tags to resources to label them with terms that help you monitor your resources and manage billing.

With the introduction of the new Azure Portal it possible to view resources (such as websites, virtual machines and databases) as a single logical unit. This logical unit is called a resource group. The following screenshot shows a resource group called ecommerce-westus which contains a website, SQL database and an Application Insights instance.

nginx4

Resource groups aren’t visible in the old portal, but almost everything within your Azure subscription exists within a set of resource groups that were created by default when the preview portal opened for business. If you access the new portal, press the Browse button on the jump bar (the icons down the left hand side of the screen) and navigate to resource groups you should see a whole bunch listed.

The benefit of resource groups is that they allow an Azure administrator to roll-up billing and monitoring information for resources within a resource group and manage access to those resources as a set. This can be extremely useful when you have a single subscription but you need to do cost recovery on the resources used by a customer or internal department.

For more on Azure Resource Manager see  docs – https://azure.microsoft.com/en-us/documentation/articles/resource-group-overview/ , https://channel9.msdn.com/Shows/Edge/Edge-Show-121-Azure-Resource-Manager , http://www.codeproject.com/Articles/854592/Using-Azure-Resource-Manager

Azure  Resource Manager allows for new deployment paradigm via ARM Templateshttps://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-deployment-model

Below is sample template top deploy NGINX VM:

"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "location": {
            "type": "String"
        },
        "virtualMachineName": {
            "type": "String"
        },
        "virtualMachineSize": {
            "type": "String"
        },
        "adminUsername": {
            "type": "String"
        },
        "storageAccountName": {
            "type": "String"
        },
        "virtualNetworkName": {
            "type": "String"
        },
        "networkInterfaceName": {
            "type": "String"
        },
        "networkSecurityGroupName": {
            "type": "String"
        },
        "adminPublicKey": {
            "type": "String"
        },
        "availabilitySetName": {
            "type": "String"
        },
        "availabilitySetPlatformFaultDomainCount": {
            "type": "String"
        },
        "availabilitySetPlatformUpdateDomainCount": {
            "type": "String"
        },
        "storageAccountType": {
            "type": "String"
        },
        "diagnosticsStorageAccountName": {
            "type": "String"
        },
        "diagnosticsStorageAccountId": {
            "type": "String"
        },
        "diagnosticsStorageAccountType": {
            "type": "String"
        },
        "addressPrefix": {
            "type": "String"
        },
        "subnetName": {
            "type": "String"
        },
        "subnetPrefix": {
            "type": "String"
        },
        "publicIpAddressName": {
            "type": "String"
        },
        "publicIpAddressType": {
            "type": "String"
        }
    },
    "variables": {
        "vnetId": "[resourceId('gennadykngnix','Microsoft.Network/virtualNetworks', parameters('virtualNetworkName'))]",
        "subnetRef": "[concat(variables('vnetId'), '/subnets/', parameters('subnetName'))]",
        "metricsresourceid": "[concat('/subscriptions/', subscription().subscriptionId, '/resourceGroups/', resourceGroup().name, '/providers/', 'Microsoft.Compute/virtualMachines/', parameters('virtualMachineName'))]",
        "metricsclosing": "[concat('')]",
        "metricscounters": "",
        "metricsstart": "",
        "wadcfgx": "[concat(variables('metricsstart'), variables('metricscounters'), variables('metricsclosing'))]",
        "diagnosticsExtensionName": "Microsoft.Insights.VMDiagnosticsSettings"
    },
    "resources": [
        {
            "type": "Microsoft.Compute/virtualMachines",
            "name": "[parameters('virtualMachineName')]",
            "apiVersion": "2015-06-15",
            "location": "[parameters('location')]",
            "plan": {
                "name": "nginx-plus",
                "publisher": "nginxinc",
                "product": "nginx-plus-v1"
            },
            "properties": {
                "osProfile": {
                    "computerName": "[parameters('virtualMachineName')]",
                    "adminUsername": "[parameters('adminUsername')]",
                    "linuxConfiguration": {
                        "disablePasswordAuthentication": "true",
                        "ssh": {
                            "publicKeys": [
                                {
                                    "path": "[concat('/home/', parameters('adminUsername'), '/.ssh/authorized_keys')]",
                                    "keyData": "[parameters('adminPublicKey')]"
                                }
                            ]
                        }
                    }
                },
                "hardwareProfile": {
                    "vmSize": "[parameters('virtualMachineSize')]"
                },
                "storageProfile": {
                    "imageReference": {
                        "publisher": "nginxinc",
                        "offer": "nginx-plus-v1",
                        "sku": "nginx-plus",
                        "version": "latest"
                    },
                    "osDisk": {
                        "name": "[parameters('virtualMachineName')]",
                        "vhd": {
                            "uri": "[concat(concat(reference(resourceId('gennadykngnix', 'Microsoft.Storage/storageAccounts', parameters('storageAccountName')), '2015-06-15').primaryEndpoints['blob'], 'vhds/'), parameters('virtualMachineName'), '20161130115146.vhd')]"
                        },
                        "createOption": "fromImage"
                    },
                    "dataDisks": []
                },
                "networkProfile": {
                    "networkInterfaces": [
                        {
                            "id": "[resourceId('Microsoft.Network/networkInterfaces', parameters('networkInterfaceName'))]"
                        }
                    ]
                },
                "diagnosticsProfile": {
                    "bootDiagnostics": {
                        "enabled": true,
                        "storageUri": "[reference(resourceId('gennadykngnix', 'Microsoft.Storage/storageAccounts', parameters('diagnosticsStorageAccountName')), '2015-06-15').primaryEndpoints['blob']]"
                    }
                },
                "availabilitySet": {
                    "id": "[resourceId('Microsoft.Compute/availabilitySets', parameters('availabilitySetName'))]"
                }
            },
            "dependsOn": [
                "[concat('Microsoft.Network/networkInterfaces/', parameters('networkInterfaceName'))]",
                "[concat('Microsoft.Compute/availabilitySets/', parameters('availabilitySetName'))]",
                "[concat('Microsoft.Storage/storageAccounts/', parameters('storageAccountName'))]",
                "[concat('Microsoft.Storage/storageAccounts/', parameters('diagnosticsStorageAccountName'))]"
            ]
        },
        {
            "type": "Microsoft.Compute/virtualMachines/extensions",
            "name": "[concat(parameters('virtualMachineName'),'/', variables('diagnosticsExtensionName'))]",
            "apiVersion": "2015-06-15",
            "location": "[parameters('location')]",
            "properties": {
                "publisher": "Microsoft.OSTCExtensions",
                "type": "LinuxDiagnostic",
                "typeHandlerVersion": "2.3",
                "autoUpgradeMinorVersion": true,
                "settings": {
                    "StorageAccount": "[parameters('diagnosticsStorageAccountName')]",
                    "xmlCfg": "[base64(variables('wadcfgx'))]"
                },
                "protectedSettings": {
                    "storageAccountName": "[parameters('diagnosticsStorageAccountName')]",
                    "storageAccountKey": "[listKeys(parameters('diagnosticsStorageAccountId'),'2015-06-15').key1]",
                    "storageAccountEndPoint": "https://core.windows.net/"
                }
            },
            "dependsOn": [
                "[concat('Microsoft.Compute/virtualMachines/', parameters('virtualMachineName'))]"
            ]
        },
        {
            "type": "Microsoft.Compute/availabilitySets",
            "name": "[parameters('availabilitySetName')]",
            "apiVersion": "2015-06-15",
            "location": "[parameters('location')]",
            "properties": {
                "platformFaultDomainCount": "[parameters('availabilitySetPlatformFaultDomainCount')]",
                "platformUpdateDomainCount": "[parameters('availabilitySetPlatformUpdateDomainCount')]"
            }
        },
        {
            "type": "Microsoft.Storage/storageAccounts",
            "name": "[parameters('storageAccountName')]",
            "apiVersion": "2015-06-15",
            "location": "[parameters('location')]",
            "properties": {
                "accountType": "[parameters('storageAccountType')]"
            }
        },
        {
            "type": "Microsoft.Storage/storageAccounts",
            "name": "[parameters('diagnosticsStorageAccountName')]",
            "apiVersion": "2015-06-15",
            "location": "[parameters('location')]",
            "properties": {
                "accountType": "[parameters('diagnosticsStorageAccountType')]"
            }
        },
        {
            "type": "Microsoft.Network/virtualNetworks",
            "name": "[parameters('virtualNetworkName')]",
            "apiVersion": "2016-09-01",
            "location": "[parameters('location')]",
            "properties": {
                "addressSpace": {
                    "addressPrefixes": [
                        "[parameters('addressPrefix')]"
                    ]
                },
                "subnets": [
                    {
                        "name": "[parameters('subnetName')]",
                        "properties": {
                            "addressPrefix": "[parameters('subnetPrefix')]"
                        }
                    }
                ]
            }
        },
        {
            "type": "Microsoft.Network/networkInterfaces",
            "name": "[parameters('networkInterfaceName')]",
            "apiVersion": "2016-09-01",
            "location": "[parameters('location')]",
            "properties": {
                "ipConfigurations": [
                    {
                        "name": "ipconfig1",
                        "properties": {
                            "subnet": {
                                "id": "[variables('subnetRef')]"
                            },
                            "privateIPAllocationMethod": "Dynamic",
                            "publicIpAddress": {
                                "id": "[resourceId('gennadykngnix','Microsoft.Network/publicIpAddresses', parameters('publicIpAddressName'))]"
                            }
                        }
                    }
                ],
                "networkSecurityGroup": {
                    "id": "[resourceId('gennadykngnix', 'Microsoft.Network/networkSecurityGroups', parameters('networkSecurityGroupName'))]"
                }
            },
            "dependsOn": [
                "[concat('Microsoft.Network/virtualNetworks/', parameters('virtualNetworkName'))]",
                "[concat('Microsoft.Network/publicIpAddresses/', parameters('publicIpAddressName'))]",
                "[concat('Microsoft.Network/networkSecurityGroups/', parameters('networkSecurityGroupName'))]"
            ]
        },
        {
            "type": "Microsoft.Network/publicIpAddresses",
            "name": "[parameters('publicIpAddressName')]",
            "apiVersion": "2016-09-01",
            "location": "[parameters('location')]",
            "properties": {
                "publicIpAllocationMethod": "[parameters('publicIpAddressType')]"
            }
        },
        {
            "type": "Microsoft.Network/networkSecurityGroups",
            "name": "[parameters('networkSecurityGroupName')]",
            "apiVersion": "2016-09-01",
            "location": "[parameters('location')]",
            "properties": {
                "securityRules": [
                    {
                        "name": "default-allow-ssh",
                        "properties": {
                            "priority": 1000,
                            "sourceAddressPrefix": "*",
                            "protocol": "TCP",
                            "destinationPortRange": "22",
                            "access": "Allow",
                            "direction": "Inbound",
                            "sourcePortRange": "*",
                            "destinationAddressPrefix": "*"
                        }
                    }
                ]
            }
        }
    ],
    "outputs": {
        "adminUsername": {
            "type": "String",
            "value": "[parameters('adminUsername')]"
        }
    }
}

Obviously parameters. json has my values like storage account and resource group names, VM sizes, network names and finally SSH key. All of those parameters would be different for you.

More on NGINX on Azure as load balancer or reverse proxy  here https://www.nginx.com/products/nginx-plus-microsoft-azure/ 

Hope this helps

Forecast Cloudy – Using Azure Blob Storage with Apache Hive on HDInsight

The beauty of working with Big Data in Azure is that you can manage (create\delete) compute resources with your HDInsight cluster independent of your data stored either in Azure Data Lake or Azure blob storage.  In this case I will concentrate on using Azure blob storage\WASB as data store for HDInsWight Azure PaaS Hadoop service

With a typical Hadoop installation you load your data to a staging location then you import it into the Hadoop Distributed File System (HDFS) within a single Hadoop cluster. That data is manipulated, massaged, and transformed. Then you may export some or all of the data back as resultset for consumption by other systems (think PowerBI, Tableau, etc)
Windows Azure Storage Blob (WASB) is an extension built on top of the HDFS APIs. The WASBS variation uses SSL certificates for improved security. It in many ways “is” HDFS. However, WASB creates a layer of abstraction that enables separation of storage. This separation is what enables your data to persist even when no clusters currently exist and enables multiple clusters plus other applications to access a single piece of data all at the same time. This increases functionality and flexibility while reducing costs and reducing the time from question to insight.

hdinsight

In Azure you store blobs on containers within Azure storage accounts. You grant access to a storage account, you create collections at the container level, and you place blobs (files of any format) inside the containers. This illustration from Microsoft’s documentation helps to show the structure:

blob1

Hold on, isn’t the whole selling point of Hadoop is proximity of data to compute?  Yes, and just like with any other Hadoop system on premises data is loaded into memory on the individual nodes at compute time. With Azure data infrastructure setup and data center backbone within data center built for performance, your job performance is generally the same or better than if you used disks locally attached to the VMs.

Below is diagram of HDInsight data storage architecture:

hdi.wasb.arch

HDInsight provides access to the distributed file system that is locally attached to the compute nodes. This file system can be accessed by using the fully qualified URI, for example:

hdfs:///

More important is ability access data that is stored in Azure Storage. The syntax is:

wasb[s]://@.blob.core.windows.net/

As per https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage you need to be aware of following:

  • Container Security for WASB storage.  For containers in storage accounts that are connected to cluster,because the account name and key are associated with the cluster during creation, you have full access to the blobs in those containers. For public containers that are not connected to cluster you have read-only permission to the blobs in the containers.  For private containers in storage accounts that are not connected to cluster , you can’t access the blobs in the containers unless you define the storage account when you submit the WebHCat jobs.
  • The storage accounts that are defined in the creation process and their keys are stored in %HADOOP_HOME%/conf/core-site.xml on the cluster nodes. The default behavior of HDInsight is to use the storage accounts defined in the core-site.xml file. It is not recommended to directly edit the core-site.xml file because the cluster head node(master) may be reimaged or migrated at any time, and any changes to this file are not persisted.

 You can create new or point existing storage account to HDinsight cluster easy via portal as I show below:

Capture

You can point your HDInsight cluster to multiple storage accounts as well , as explained here – https://blogs.msdn.microsoft.com/mostlytrue/2014/09/03/hdinsight-working-with-different-storage-accounts/ 

You can also create storage account and container via Azure PowerShell like in this sample:

$SubscriptionID = “<Your Azure Subscription ID>”
$ResourceGroupName = “<New Azure Resource Group Name>”
$Location = “EAST US 2”

$StorageAccountName = “<New Azure Storage Account Name>”
$containerName = “<New Azure Blob Container Name>”

Add-AzureRmAccount
Select-AzureRmSubscription -SubscriptionId $SubscriptionID

# Create resource group
New-AzureRmResourceGroup -name $ResourceGroupName -Location $Location

# Create default storage account
New-AzureRmStorageAccount -ResourceGroupName $ResourceGroupName -Name $StorageAccountName -Location $Location -Type Standard_LRS

# Create default blob containers
$storageAccountKey = (Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -StorageAccountName $StorageAccountName)[0].Value
$destContext = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageAccountKey
New-AzureStorageContainer -Name $containerName -Context $destContext

 

The URI scheme for accessing files in Azure storage from HDInsight is:

wasb[s]://<BlobStorageContainerName>@<StorageAccountName>.blob.core.windows.net/<path>

The URI scheme provides unencrypted access (with the wasb: prefix) and SSL encrypted access (with wasbs). Microsoft recommends using wasbs wherever possible, even when accessing data that lives inside the same region in Azure.

The <BlobStorageContainerName> identifies the name of the blob container in Azure storage. The <StorageAccountName> identifies the Azure Storage account name. A fully qualified domain name (FQDN) is required.

I ran into rather crazy little limitation\ issue when working with \WASB and HDInsight. Hadoop and Hive is looking for and  expects a valid folder hierarchy to import data  files, whereas  WASB does not support a folder hierarchy i.e. all blobs are listed under a container. The workaround is to use SSH session to login into head cluster node and use mkdir command line command to manually create such directory via the driver.

The SSH Procedure with HDInsight can be found here – https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-linux-use-ssh-unix

Another one recommended to me was that “/” character can be used within the key name to make it appear as if a file is stored within a directory structure. HDInsight sees these as if they are actual directories.For example, a blob’s key may be input/log1.txt. No actual “input” directory exists, but due to the presence of the “/” character in the key name, it has the appearance of a file path.

For more see – https://social.technet.microsoft.com/wiki/contents/articles/6621.azure-hdinsight-faq.aspx,  https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storagehttps://www.codeproject.com/articles/597940/understandingpluswindowsplusazureplusblobplusstora

Hope this helps.

Forecast Cloudy–Azure Resource Manager Templates

Azure originally provided only the classic deployment model. In this model, each resource existed independently; there was no way to group related resources together. Instead, you had to manually track which resources made up your solution or application, and remember to manage them in a coordinated approach. To deploy a solution, you had to either create each resource individually through the classic portal or create a script that deployed all of the resources in the correct order. To delete a solution, you had to delete each resource individually. You could not easily apply and update access control policies for related resources. Finally, you could not apply tags to resources to label them with terms that help you monitor your resources and manage billing.

With the introduction of the new Azure Portal  its possible to view resources (such as websites, virtual machines and databases) as a single logical unit. This logical unit is called a resource group. The following screenshot shows a resource group called ecommerce-westus which contains a website, SQL database and an Application Insights instance.

resource_group

 

Resource groups aren’t visible in the old portal, but almost everything within your Azure subscription exists within a set of resource groups that were created by default when the preview portal opened for business. If you access the new portal, press the Browse button on the jump bar (the icons down the left hand side of the screen) and navigate to resource groups you should see a whole bunch listed.

The benefit of resource groups is that they allow an Azure administrator to roll-up billing and monitoring information for resources within a resource group and manage access to those resources as a set. This can be extremely useful when you have a single subscription but you need to do cost recovery on the resources used by a customer or internal department.

For more on Azure Resource Manager see  docs – https://azure.microsoft.com/en-us/documentation/articles/resource-group-overview/ , https://channel9.msdn.com/Shows/Edge/Edge-Show-121-Azure-Resource-Manager , http://www.codeproject.com/Articles/854592/Using-Azure-Resource-Manager

As an example I created a template in Visual Studio 2015 , this sample with ARM Template and parameters file that creates two Windows Server 2012R2 VM(s) with IIS configured using DSC. It also installs one SQL Server 2014 standard edition VM, a VNET with two subnets, NSG, loader balancer, NATing and probing rules.

resources_geroup2

To start I used one of the open source templates available on GitHub and just added\modified resource.

So how does one author these templates in VS 2015?

Prerequisites: You will need Visual Studio 2015 installed.  In my case I have VS 2015 Update 1 and Azure Tools SDK V2.8.2.

Create Azure Resource Group  Project.

When you first create an Azure Resource Group Project in Visual Studio you are presented with a list of common templates you can use to start defining your desired infrastructure. For example, if you wanted a set of Virtual Machines that could be used to run a scalable web server workload.

RG3

Project structure for Resource Group Project.

Using blank template project results in a project that contains everything you need to begin authoring and eventually deploy your template. As shown below, the project structure contains three folders; Scripts, Templates, and Tools.

rg4

The Scripts folder contains the PowerShell script that is used to deploy the environment you describe in the project. This is a fully functional script that essentially requires no editing. It just works! However, if you have a need to modify the script to support some advanced deployment scenarios you can do so.

The Templates folder contains the JSON files that describe the environment you want to create. The azuredeploy.json file is where you describe the resources you want in your environment, such as the storage account, virtual machine, and other required resources. The azuredeploy.parameters.json file is where you can provide parameter values that you want passed into the template during deployment, such as the type of storage account, size of the virtual machine, credentials to use to sign into the virtual machine, and so on.

The Tools folder contains AzCopy.exe and is used in scenarios where your deployment template needs to copy files to a storage account used for deploying application and configuration files into a resource. For a virtual machine, this typically would be Desired State Configuration (DSC) and DSC resources that you want pushed into the virtual machine after the virtual machine is provisioned to do things like configure Active Directory or IIS Web Server roles in the virtual machine. The script in the Scripts folder uses AzCopy in the Tools folder to copy these kinds of artifacts to a storage account so that ARM can then copy them from storage into the virtual machine after the instance is fully provisioned

More information available here – https://azure.microsoft.com/en-us/documentation/articles/vs-azure-tools-resource-groups-deployment-projects-create-deploy/ , https://azure.microsoft.com/en-us/blog/azure-resource-manager-2-5-for-visual-studio/

In my case finished VS project structure looks like this

Capture

Note that AzureDeploy.json is a JSON template that describes all of the above mentioned infrastructure that I want to deploy in Azure, Deploy-AzureResourceGroup.ps1 is a PowerShell script to connect and deploy it to Azure, there is also parameters files with passwords, subscription names, etc, and finally test project to test such deploy in VS, all part of my test solution in Visual Studio.

So using my test template project  I created all of the above resources in resource group test in Central US data center:

rg5

Obviously this is just first baby steps, but should show you what is possible with this technology.  For more on Azure ARM Templates see – https://blogs.perficient.com/microsoft/2016/03/azure-arm-template-define-web-app-application-settings/, https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-create-first-template, https://blogs.msdn.microsoft.com/kaevans/2015/11/22/creating-arm-templates-with-azure-resource-explorer/, https://azure.microsoft.com/en-us/resources/templates/

Hope this helps.