Virtual Machine Scale Sets

Lab Objective

In this hands-on lab, you will learn how to:

Configure Virtual Machine Scale Sets auto-scaling policies and thresholds
Manage scale set instances through portal and scaling operations
Test automatic scaling by triggering scale-out and scale-in events
Monitor scaling activity and performance metrics in real-time
Customize scaling rules for different metrics and schedules
Analyze scaling behavior and optimize for cost and performance

Scenario: Work with a pre-deployed web application scale set to understand auto-scaling behavior, configure advanced scaling policies, and optimize performance.

Pre-Provisioned Environment

Virtual Machine Scale Set Lab Environment
├── Resource Group: VMSS-Lab-RG
├── Virtual Network (vmss-vnet)
│   ├── Web Subnet (10.0.1.0/24)
│   └── Management Subnet (10.0.2.0/24)
├── Load Balancer (vmss-lb[unique])
│   ├── Public IP: vmss-pip[unique]
│   ├── Backend Pool: Scale Set instances
│   ├── Health Probe: HTTP:80/
│   └── Load Balancing Rule: HTTP:80
├── Virtual Machine Scale Set (web-vmss[unique])
│   ├── Instances: 2 (initial)
│   ├── VM Size: Standard_B2s
│   ├── OS: Ubuntu 20.04 LTS
│   ├── Web Server: Apache pre-installed
│   └── Auto-scaling: Disabled (for lab configuration)
├── Network Security Group (vmss-nsg[unique])
│   ├── HTTP (80): Allow from Internet
│   └── SSH (22): Allow from Load Balancer
└── Application Insights (vmss-insights[unique])
    └── Performance monitoring enabled

Important: The scale set is pre-deployed with a simple web application. Your focus will be on configuring and testing the scaling behavior.

Lab Exercises

Part 1: Explore Scale Set Configuration

Step 1: Examine Scale Set Properties

Navigate to Resource Groups → VMSS-Lab-RG
Click on the Virtual Machine Scale Set web-vmss[unique]
Review the Overview tab:
- Current instance count
- VM size and configuration
- Operating system details
Go to “Instances” and verify 2 running instances

Step 2: Test Load Balancer Distribution

Navigate to the Load Balancer vmss-lb[unique]
Copy the public IP address from the Overview
Open multiple browser tabs and navigate to http://[public-ip]
Refresh repeatedly to observe different instance responses
Note how traffic is distributed across instances

Step 3: Review Current Scaling Configuration

Go back to the scale set web-vmss[unique]
Click “Scaling” in the left menu
Observe that auto-scaling is currently disabled
Note the current manual scale setting (2 instances)

Expected Results: You can see the scale set infrastructure working with load balancing across 2 instances, ready for scaling configuration.

Part 2: Configure Basic Auto-Scaling Rules

Step 1: Enable Auto-Scaling

In the scale set Scaling section
Click “Custom autoscale”
Enter autoscale setting name: cpu-based-scaling
Set default instance count: 2

Step 2: Create Scale-Out Rule

Click ”+ Add a rule”
Configure the scale-out rule:
- Metric source: Current resource (vmss)
- Metric namespace: Virtual Machine Host
- Metric name: Percentage CPU
- Time aggregation: Average
- Operator: Greater than
- Threshold: 70
- Duration: 5 minutes
- Operation: Increase count by
- Instance count: 1
- Cool down: 5 minutes
Click “Add”

Step 3: Create Scale-In Rule

Click ”+ Add a rule”
Configure the scale-in rule:
- Metric source: Current resource (vmss)
- Metric namespace: Virtual Machine Host
- Metric name: Percentage CPU
- Time aggregation: Average
- Operator: Less than
- Threshold: 30
- Duration: 10 minutes
- Operation: Decrease count by
- Instance count: 1
- Cool down: 10 minutes
Click “Add”

Step 4: Set Instance Limits and Save

Configure scale condition:
- Minimum: 2
- Maximum: 5
- Default: 2
Click “Save”
Wait for the configuration to be applied

Expected Results: Auto-scaling is now enabled with CPU-based rules that will scale out above 70% CPU and scale in below 30% CPU.

Part 3: Test Auto-Scaling Behavior

Step 1: Generate Load to Trigger Scale-Out

Go to “Instances” in the scale set
Click on the first instance name
Click “Run command” → “RunShellScript”

Enter this command to generate CPU load:

sudo apt-get update && sudo apt-get install -y stress
nohup stress --cpu 2 --timeout 600 &

Click “Run”
Repeat for the second instance

Step 2: Monitor Scaling Activity

Go to the scale set “Metrics” section
Click “Add metric”
Select “Percentage CPU” metric
Set time range to “Last 30 minutes”
Watch CPU usage climb above 70%
Go to “Activity log” to monitor scaling events

Step 3: Observe Scale-Out Event

Wait 5-10 minutes for the scale-out rule to trigger
Go to “Instances” and refresh the view
Verify a third instance is being created
Monitor the new instance until it shows “Running”
Test load balancer includes the new instance

Step 4: Stop Load and Monitor Scale-In

Go back to “Run command” on each instance
Run this command to stop the load:
Terminal window
```
sudo pkill stress
```
Monitor CPU metrics dropping below 30%
Wait 10-15 minutes for scale-in cool-down
Verify instance count returns to 2

Expected Results: Scale set automatically creates a new instance when CPU exceeds 70% for 5 minutes, then removes it when CPU drops below 30% for 10 minutes.

Part 4: Configure Advanced Scaling Policies

Step 1: Add Memory-Based Scaling Rule

Go to scale set “Scaling” section
Click ”+ Add a rule”
Configure memory-based scale-out:
- Metric source: Current resource (vmss)
- Metric namespace: Virtual Machine Host
- Metric name: Available Memory Bytes
- Operator: Less than
- Threshold: 1073741824 (1GB in bytes)
- Duration: 5 minutes
- Operation: Increase count by 1
Click “Add”

Step 2: Create Schedule-Based Scaling Condition

Click ”+ Add a scale condition”
Select “Scale based on a schedule”
Name: business-hours-scaling
Configure schedule:
- Start time: 09:00
- End time: 17:00
- Days: Monday to Friday
- Time zone: Your local time zone
- Instance count: 4
Click “Add”

Step 3: Review Scaling Configuration

Go to the scale set “Scaling” overview
Verify you now have:
- Default condition with CPU rules (2-5 instances)
- Memory-based rule
- Business hours schedule (4 instances)
Check the current active condition

Expected Results: Scale set now has multiple scaling policies - CPU-based, memory-based, and schedule-based scaling rules.

Part 5: Monitor Scaling Performance

Step 1: Analyze Scaling Metrics

Go to scale set “Metrics”
Add multiple metrics:
- Percentage CPU
- Available Memory Bytes
- Instance Count
Set time range to “Last 4 hours”
Observe correlation between metrics and scaling events

Step 2: Review Scaling History

Go to “Activity log”
Filter by “Administrative” category
Look for “Scale” operations
Click on scaling events to see details
Note what triggered each scaling action

Step 3: Evaluate Load Balancer Performance

Go to load balancer “Metrics”
Add metrics:
- Data Path Availability
- Health Probe Status
- Byte Count
Analyze how load balancer handles traffic during scaling

Expected Results: You can see the relationship between metrics, scaling triggers, and load balancer behavior during scaling events.

Part 6: Optimize Scaling Configuration

Step 1: Fine-Tune Scaling Thresholds

Go to scale set “Scaling” configuration
Edit the CPU scale-out rule
Adjust threshold to 60% (more sensitive)
Reduce duration to 3 minutes (faster response)
Save changes

Step 2: Test Manual Scaling

Go to “Instances” in scale set
Click “Scale”
Set instance count to 3
Click “Save”
Monitor the new instance creation
Verify load balancer includes new instance

Step 3: Configure Scaling Notifications

Go to “Activity log”
Click “Add activity log alert”
Configure alert for scaling events:
- Resource type: Virtual Machine Scale Sets
- Operation name: Microsoft.Compute/virtualMachineScaleSets/scale/action
Add email notification action
Create the alert rule

Expected Results: Scale set has optimized scaling configuration with proactive monitoring and notifications for scaling events.

Troubleshooting Guide

Common Issues

Scale set creation fails: Check subscription quotas and region availability
Instances won’t start: Verify VM size availability and network configuration
Web pages don’t load: Check load balancer configuration and instance health
Auto-scaling not working: Verify metrics are being collected and thresholds are correct
SSH connection fails: Ensure network security group allows SSH traffic

Quick Fixes

Stuck instances: Delete and recreate the scale set
Load balancer issues: Check backend pool health status
High costs: Reduce instance count or use smaller VM sizes
Performance problems: Monitor CPU and memory metrics

Key Takeaways

After completing this lab, you should understand:

Scale sets provide automatic scaling based on metrics like CPU usage
Load balancers distribute traffic evenly across healthy instances
Custom script extensions allow software installation across all instances
Auto-scaling rules can scale out and scale in based on demand
Monitoring is essential to understand scaling behavior and performance
Cost management requires careful configuration of scaling limits