AutoRecovery in the Public Cloud

“Everything fails, all the time.” – Werner Vogels

While VNS3 is extremely stable, it is not immune to the underlying hardware and network issues that public cloud vendors experience. VNS3 provides a variety of methods to achieve High Availability and instance replacement. However all of that takes place above the customer responsibility line. What can you do for your cloud deployment to protect yourself from the inevitable failures that take place below the line?

On top of the solutions offered by public cloud providers, Cohesive Networks offers a variety of methods for achieving instance and network recovery, whether it be BGP distance weighting, Cisco style Preferred Peer lists or our Management Server (VNS3:ms) which will programmatically replace a running instance or facilitate Active & Passive running of VNS3 instances. Keep an eye on this blog space for further discussions in these key areas.

AutoRecovery in AWS via CloudWatch

Amazon Web Services has perhaps the most comprehensive function for protecting yourself from underlaying failures. They offer what they call a CloudWatch alarm action. This monitor is tied to your instance ID, should AWS status checks fail, your instance will be brought up on new hardware, while retaining its instance ID, private IP, any Elastic IPs and all associated metadata. You get to set the periodicity of the check and the total checks that will kick off the migration. So if you need to have assurance that you instance will get moved to good hardware after as little as two minutes, you can set it as such. From a VNS3 perspective, this ensures that any IPSec tunnels will get reestablished, any overlay clients will reconnect and any route table rules pointing to the instance will maintain health once the instance has recovered. On top of all of this you can configure it to publish any alarm states to an SNS topic so that you receive notification should this occur. Cohesive Networks highly recommends that you set this up for all VNS3 controllers and Management Servers.

You can find out more about configuring AWS CloudWatch alarm actions here:

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html

Service Healing in Azure Cloud

The Microsoft Azure cloud has the concept of “Service Healing.” While it is not user configurable, it is not dissimilar from AWS in that Azure has a method whereby it monitors the underlaying health of the virtual machines and hypervisors in it’s data centers and will auto recover virtual machines should they or their hypervisors fail. This process is is managed by their Fabric Controllers which themselves have built in fault tolerance. As of now Azure does not provide any user controls over this process nor notifications and the process can take up to 15 minutes to complete, since the first action is to reboot the physical server that the virtual machines run on and failing that will then proceed to migrate VMs to other hardware. Azure does state that they employ some level of deterministic methodologies for pro-active auto-recovery.

Live Migration in Google Cloud

The Google Cloud Platform has taken a fairly different approach. Over at the Google cloud all instances are set to “Live Migrate” by default. So should there be a hardware degradation and not a total failure, your VM will be migrated to to new hardware with some loss of performance during the process. If there is a total failure your VM will be rebooted onto new hardware. This also applies to any planned maintenance that might effect they underlaying hardware your VM is running on. As with AWS and Azure all of your instance identity will transfer with the VM such as IPs, volume data and metadata. Should you want to forgo the “Live Migration, you can configure your instances to just reboot onto new hardware. All failed hardware events in GCP are logged at the host level and can be alerted on. 

Managing AWS Workspaces with VNS3

Cloud and network virtualization have created the opportunity to have virtual networks that transit your applications and staff to, through and across the clouds. These networks can stretch across the globe in multiple, to 10s of locations (points of presence) or more. In the case of Cohesive Networks our virtual networks are used to create cryptographically secure overlay networks in full mesh architectures. When implementing the cryptographic mesh (at scale machine-to-machine VPN) it is critical that the cryptographic credentials can be easily managed across the controller mesh. Our goal at Cohesive is to make managing the credentials straightforward and clear; associating credentials with users via tagging, enabling/disabling so that credentials can only be used when desired, checked out/in state to help manage via automation, check log information for specific credentials, and manage certificate revocation. Below is a short video showing the key elements of straightforward key state management in an N-way VNS3 controller mesh.

Hopefully the video highlights the essential key state management capabilities we have strived for. They are part of the foundation of the VNS3 Controllers which are used to build a wide array of service edge use cases. VNS3 encrypted topologies combined with our plug and play security system, you or your management service provider can achieve both Workload and Workforce mobility using secure network virtualization.

AWS re:Invent 2019 Recap

AWS re:Invent 2019 Recap

AWS Reinvent photo

Last week was AWS’s annual reinvent conference in the putatively beautiful and blissful Las Vegas. Andy Jassy, Amazon’s CEO, announced plenty of new products and features to excite and alarm the computing and soft-warring world. The conference also highlighted AWS’s leadership in highly resilient software architecture and design with their launch of the AWS Builders’ Library. Let’s run over some of the highlights.

Cloud Descending Back to Earth via New Edge Environments: AWS Local Zones, Outposts, and Wavelength

AWS launched two new environment types this year with AWS Local Zones and Wavelength. Local Zones was spurred by AWS customers requiring ultra-low latency for their compute, notably gaming companies based in L.A., where the first Local ZOne is now generally available. New zones will come online as customer demand in a city necessitates. Wavelength is an AWS environment colocated with telecom infrastructure, providing access to 5G endpoints. The general availability of AWS Outposts, a rack of AWS servers providing AWS on-premise, was also announced, enabling the rollout of Local Zones and Wavelength in fairly short order. AWS Outposts enable companies to test deployments in cloud-like environments without fully committing to the cloud, and give customers like Morningstar and Philips Healthcare ultra-low latency, hyper-local availability zones.

These environments showcase a new battle for the edge. AWS basically won the general compute cloud race, but we now find different telecommunication and networking competitors offering edge environments, with startups the likes of Packet and Vaper IO joining the race. As developers gain access to these new endpoints, along with increased networking capabilities and incredibly low hyper-local latencies, we are sure to see a revolutionary new age of applications and services.

We Have a Size for That: New Compute Instance Types

Amazon launched multiple new instance types including Graviton2 instances and EC2 Inf1 instances. The new Graviton2 boast a whopping 40% price performance improvement. They are based on the ARM architecture, effectively challenging Intel and AMD’s dominance in the chip space, and combined with the Nitro System security chip to support encrypted EBS storage volumes by default. The EC2 Inf1 instances are dedicated Machine Learning training instance types, effectively challenging Nvidia’s domination of the market with their GPUs. AWS promises that these chips provide a significant increase in throughput and price performance relative to Nvidia-powered instance types.

AWS Continues to March into SaaS Markets With New Machine Learning Services

Also announced were multiple ML based services including Code Guru for automated code reviews, Fraud Detector for automated fraud detection, Kendra for search indexing, Transcribe Medical for call transcription in the medical industry and Augmented AI for AI workflows requiring human intervention. You would be hard pressed to find a SaaS market Amazon isn’t capable of stepping into with their army of engineers and data scientists.

The release of the SageMaker IDE and SageMaker Debugger seems to be an attempt by AWS to capture the hearts and minds of data scientists with the promise of streamlining the building, training, debugging, deployment, and monitoring of Machine Learning models. This new IDE bypasses the need for users to understand and deploy a Python or R environment, enables progress reporting for long jobs, promises a simplified and automated debugging process, automates alerts about input data drift, and auto-trains your ML model from CSV data files. In early use, the IDE has proven to come with a steep learning curve and a high deal of complexity of use. The SSO feature, notably, only seems to work with newer AWS accounts. According to VentureBeat , the IDE provides “some features that appear to be just rebrandings of older products and some that solve new, legitimate customer pain points. Even the best new features are incremental improvements on existing products.”

Reducing Cloud Anxiety With New Security-Focused Services

It seems Amazon has heard the cries of its customers as they struggle to manage the complexity of their cloud environment’s security. They announced Amazon detective, Macie , and IAM Access Analyzer to review organizational security lattices and catch any potential privilege or access issues. IAM Access Analyzer helps to solve misconfiguration problems, one of the most common problems with AWS deployments, and can purportedly monitor and evaluate thousands of security policies across a deployment environment in seconds.

Thought Leadership in Designing Resilient Software Systems

Amazon showed some responsibility for their dominance of the cloud with their release of the AWS Builders’ Library. A number of sessions at re:Invent included references to their cell-based architecture approach and explained how AWS achieves high uptime numbers for their most important services.

Announcing AWS Quick Start Reference Deployment for VNS3

Announcing AWS Quick Start Reference Deployment for VNS3

Want a HIPAA/HITECH compliant application deployed to AWS in minutes? Read on!

We’re proud to announce the release of our first AWS Quick Start reference deployment for configuring and launching our VNS3 overlay network for your cloud application. Working closely with Amazon we’ve leveraged the proven power of AWS CloudFormation to take our secure and scalable solution and make it even more accessible. With our Quick Start deployment, VNS3 can easily secure your cloud application to HIPAA and HITECH standards in as few as fifteen minutes, supported by best practice tools and strategies for automating your infrastructure deployments.

Check out our Quick Start Guide here! Keep reading for more information about this release.

VNS3 AWS Quickstart Architecture

Save Time

Our Quick Start was built by AWS and Cohesive Networks solutions architects to help you automatically deploy a VNS3 topology quickly and easily. Don’t worry about high availability and security, we’ve included it for no extra charge! Build your production deployment fast and start using it now.

Reduce Complexity

Simple (not to be confused with simplistic) is secure. VNS3 provides a generalized approach to encryption across your cloud deployment. This enables you to field a clean VPC Route Table and Security Group configuration to reduce attack surface and minimize misconfigurations.

Control Encryption

AWS provided and controlled, symmetric encryption with common shared keys isn’t enough for regulated industries. Customer controlled encryption with VNS3 is essential to securing PII/PHI in order to pass HIPAA audits. VNS3 as demonstrated in this Quick Start Guide provides a simple and programmatic way for achieving HIPAA compliance.

Added Bonus

Do you use blocked protocols like UDP multicast? The VNS3 encrypted overlay network deployed by this guide allows you to redistribute UDP multicast within your AWS VPC deployment. Now you can apply the same design principles to your cloud applications, whether designing cloud native or lifting and shifting.

Moving Forward

Following the successful launch of our first AWS Quick Start Guide, we’re excited to move forward and create new reference deployments for all the various use cases VNS3 supports. We’re cooking up AWS Quick Start Guides that deal with more complex peered VNS3 topologies, demonstrating different High Availability and Network Federation capabilities. We are also working on an Azure QuickStart template for deploying the encrypted Overlay Network for Microsoft Windows VMs later this summer.

AWS re:Invent Recap

AWS re:Invent Recap

AWS REinvent 2018

We’ve been heads down working on the 3 P’s for a number of months (products, presence, and people). As a result we’ve all but stopped our social media and dynamic content. We’ll look to emerge from our cocoon in early 2019 but we had to pop out and do yet another re:Invent recap (YArIR!).

Cohesive Networks (and our parent company CohesiveFT) have attended/sponsored all AWS re:Invents. Each year the conference gets denser yet more spread out… think about that one. This year was no exception. Now that our “away team” is fully recovered from the ill effects of desert entertainment, had some time to reflect, and get our hand dirty trying out a few new services, we’re ready to state our opinion. That’s what the following is, the opinion of the smartest, coolest, and most experienced cloud networking experts in the game (see opinion).

Micro Blink Reaction – Crowd Sourcing the Self-driving Algos

AWS DeepRacer is awesome and the DeepRacer League is hilariously brilliant. I ordered my discounted DeepRacer a few seconds after it was announced during Andy Jassy’s keynote. The bummer is I won’t take delivery until March. Hopefully the simulation environment holds me over (request preview access).

Macro Blink Reaction – AWS appetite for its ecosystem grows

AWS continues to eat the ecosystem and this year they stepped up their game. Previous years had AWS entering markets and wiping out millions of $s in ecosystem players. This year we think the number is in the capital B BILLIONS.

As a member of the AWS Partner Network (Advanced Technology Partner), we, like all AWS partners, look to re:Invent every year with mixed feelings of excitement and dread. If you aren’t on the Customer Advisory Council, you never really know if this is the year AWS will announce a direct competitor to your business. We all know the risks, and the AWS “not built here” corp dev mentality that drives their roadmap, but there is too much opportunity not to participate. Multi-cloud helps, but AWS is still the King of Cloud both in usage and features/services. I won’t go into detail about what competes with whom, take a look at these other recap posts:

Specific Announcement Reactions

We also won’t cover all the announcements because of the number of announcements per service category.

  • App Integration – 2
  • Analytics – 4
  • Compute – 11
  • Databases – 6
  • Developer Tools – 2
  • IoT – 7
  • ML – 14
  • Management – 6
  • Marketplace – 3
  • Media – 1
  • Migration – 2
  • Mobile – 1
  • Networking – 6
  • Robotics – 1
  • Satellite – 1
  • Security/Identity – 2
  • Storage – 10

Below we’ll review the features and service announcements that piqued our interest from a security and networking perspective.

Transit Gateway (GA)

What is it?
An AWS managed gateway service that allows a hub-and-spoke network topology connecting VPCs in the same region (expect multi-region support in the future) owned by a single or multiple AWS accounts as well as remote networks. This offering replaces the multi-party solution that was previously being offered called the AWS Global Transit Network. Check out the Transit Gateway announcement blog or product home for more information.

Why it matters?
Transit gateway solves a significant number of issues around the need to be able to route between VPCs “in cloud” at AWS. The manner in which it has been solved creates an economic opportunity for AWS as well – charging $.05 per hour for each connection to the gateway.

For Cohesive Networks, we spend our days (and nights) helping customers Connect, Federate, and Secure. Just like the introduction of the VPC itself, Direct Connect, AZs, Regions, GovCloud, China, and all the related facets of AWS – this creates more demand for connecting, federating, and securing. “Transit” is a subset of the overall federation architecture, so definitely a feature – not a business, meaning this release is good news for Cohesive, and gives us parity with capability Azure and Google networking has had for some time (although they do it a bit differently).

The release of Transit Gateway lets us create some federation structures for customers that were previously too complex, and requiring, dare I say it, too many VNS3 controllers needed to complete the task, as a result of AWS networking limitations. Now our customers can spend a bit more money, reduce a little bit of complexity, and still get the attestable control they need as regulated or self-regulated businesses operating in 3rd party data centers over which they have no direct insight, visibility, or control (AKA “the cloud”).

AWS Security Hub (Preview)

What is it?
A monitoring platform service focused on security that aggregates security alerts and compliance status from native AWS services as well as from 3rd party services. Many security vendors announced initial support for Security Hub. Security Hub aims to create a single pane of glass for an organization’s security and compliance posture across all its AWS accounts. Check out the Security Hub announcement blo g or product home for more information.

Why it matters?
AWS Security Hub begins to solve the “feature glut” problem of the ever-growing Amazon services collection. One reason organizations suffer from data exploits is NOT because they lack monitoring information with events and alerts – it is because they have TOO many events and alerts. Security Hub appears that it will provide an encompassing overview of outputs coming from AWS GuardDuty, Inspector and Macie. Each of these has a rich set of features for your cloud deployments – running all three of them independently could be a bit overwhelming.

At Cohesive we have previously highlighted the world we are entering where the critical IT executive decision is “all-in vs. over-the-top”, meaning where on the spectrum of using cloud, AWS for example, do you position your organization? Do you go “all-in” on embedded AWS services which provide abstracted visibility and limited control – or do you go “over-the-top” and run many of your own layers of infrastructure and instrumentation, strung across AWS, Azure, Google, et.al.? For the “all-in” crowd we think Security Hub may make consuming some of these services easier.

Global Accelerator (GA)

What is it?
A service to help customers easily route traffic across multiple regions to improve availability and performance of cloud-based applications/deployments. Global Accelerator provides an entry point to allow TCP or UDP traffic to use the AWS Global Network to reach AWS deployed application topologies instead of the Public Internet. Global Accelerator provides static Anycast IPs that serve as a fixed entry point for an AWS deployed application available in any number of the currently support regions (us-east-1, us-east-2, us-west-1, us-west-2, eu-west-1, eu-central-1, ap-northeast-1, and ap-southeast-1). The Anycast IPs are advertised from the supported AWS regions so traffic enters the global network as cloud to the uses as possible. Global Accelerator can then be associated with cloud-based applications via application load balancers, network load balancers, or Elastic IPs. In addition to data transfer fees Global Accelerator costs $0.025 per hour.

Why it matters?
Other than the obvious HA and performance benefits, the big theme from this and Transit Gateway is coalescence. Clouds and cloud regions were built to be isolated by design. Increasingly as companies a have grown in the cloud organically or via acquisition, organization cloud estates have experienced sprawl. Providing avenues to bring the regions “closer together” while maintaining the logical separation is a key value for many of AWS’ largest customers.

We continue to experiment how our customers might benefit from using the Anycast IPs as static global cloud endpoint IPs for VPN connections and well as distributed and encrypted overlay networks.

EC2 C5n (GA)

What is it?
A new generation instance family focused on super fast networks speeds up to 100 Gbps. These new instances use the latest nitro hardware and allow for some serious packets per second performance. The instances sizes are available now in us-east-1, us-east-2, us-east-2, eu-west-1, and govcloud. Prices start Read more about the C5n instance family.

Why it matters?
We are getting a glimpse of the future of cloud network performance and throughput. Eliminating the current VPC gateway throughput restrictions will open up more use-cases for the cloud. Total throughput for VNS3 controller just increased dramatically. Of course there are some restrictions (see placement groups) but it’s always exciting when you get a bandwidth upgrade. Maybe AWS will soon host the first cloud-based high speed low latency trading app?

Goldilocks and the Amazon m4.medium

Goldilocks and the Amazon m4.medium

Once there was a little girl named Goldilocks who used cloud computing.

Starting out she launched a C5.18xlarge instance but at over $3.00 per hour, she realized it would cost more per month than the rent of her little cottage in the woods.

See the full article featured on Information Security Buzz

Next she tried a t2.nano, but try as she might, 500 meg of memory was not for the Photoshop work she wanted to do on her photo library, comprised of montages of her friends the three bears.

Then Goldilocks fired up an m4.medium, it did the trick, with multiple cores, and enough memory to run here Iheartporridge.com retail site.

That is pretty much the story. When you get started in the cloud, you often don’t know how much CPU, how much memory, how much net bandwidth – and the “M”s feel “JUST RIGHT”.

Once you get experienced then the banquet of instance-type offering start to make sense as you optimize your workloads.

Why use an M family instance in AWS?

Image source: Botmetric 2017 survey

In Amazon AWS EC2 is the most used AWS service. According to a Botmetric report , 46% of EC2 usage is with the M family and M4 is the most popular for production instances. So why do AWS users keep coming back to M family instances?

Behavior – traditional environment you were locked into a specific hardware configuration. Many organizations treat cloud similarly despite the simple and cost effective elasticity of cloud to profile and load test different instance sizes. People start with the general purpose M family, set it and forget it.

Unknown Requirements – selecting instance types that match the application needs is an obvious advantage to using a cloud like AWS with may instance family and size choices. This of course means the DevOps or OpsDev group deploying the cloud application knows their application components’ resource requirements enough to make decisions on specific instance types.

Reserved Instances – the fewer instance types and sizes included in a reserved instance contract, the easier it is for cost allocation. Buy a bunch of cheap M family instances and use them.

Cost Efficiency – R and M family instance sizes rank at the top of the chart when looking at both Compute Efficiency (Compute ECU / $-hr) and Memory Efficiency (Memory GB / $-hr)

Known Resources – T family instances would be more popular if not for the known of when the compute credits run out. AWS addressed this with the “unlimited” option. Expect T family to become more popular as more users become aware.

Evaluation of Alternatives – M family instance sizes map most closely to the generic instance/VM sizes of other clouds. When making a purchase decision the M family is the easiest to use when seeking out alternatives for price/performance comparisons.

Access to Extras – M4 instance sizes allow for optional Enhanced Networking and EBS-optimized.

This post was a team effort, written by Patrick Kerpan and Ryan Koop. Our favorite AWS instance type is t2 large with the t2 unlimited option. According to Botmetric, 83% of the non production workloads run on T family.