Skip to content

Virtual Machine Scale Sets

Lab Objective

In this hands-on lab, you will learn how to:

  • Configure Virtual Machine Scale Sets auto-scaling policies and thresholds
  • Manage scale set instances through portal and scaling operations
  • Test automatic scaling by triggering scale-out and scale-in events
  • Monitor scaling activity and performance metrics in real-time
  • Customize scaling rules for different metrics and schedules
  • Analyze scaling behavior and optimize for cost and performance

Scenario: Work with a pre-deployed web application scale set to understand auto-scaling behavior, configure advanced scaling policies, and optimize performance.


Pre-Provisioned Environment

Virtual Machine Scale Set Lab Environment
├── Resource Group: VMSS-Lab-RG
├── Virtual Network (vmss-vnet)
│ ├── Web Subnet (10.0.1.0/24)
│ └── Management Subnet (10.0.2.0/24)
├── Load Balancer (vmss-lb[unique])
│ ├── Public IP: vmss-pip[unique]
│ ├── Backend Pool: Scale Set instances
│ ├── Health Probe: HTTP:80/
│ └── Load Balancing Rule: HTTP:80
├── Virtual Machine Scale Set (web-vmss[unique])
│ ├── Instances: 2 (initial)
│ ├── VM Size: Standard_B2s
│ ├── OS: Ubuntu 20.04 LTS
│ ├── Web Server: Apache pre-installed
│ └── Auto-scaling: Disabled (for lab configuration)
├── Network Security Group (vmss-nsg[unique])
│ ├── HTTP (80): Allow from Internet
│ └── SSH (22): Allow from Load Balancer
└── Application Insights (vmss-insights[unique])
└── Performance monitoring enabled

Important: The scale set is pre-deployed with a simple web application. Your focus will be on configuring and testing the scaling behavior.


Lab Exercises

Part 1: Explore Scale Set Configuration

Step 1: Examine Scale Set Properties

  1. Navigate to Resource GroupsVMSS-Lab-RG
  2. Click on the Virtual Machine Scale Set web-vmss[unique]
  3. Review the Overview tab:
    • Current instance count
    • VM size and configuration
    • Operating system details
  4. Go to “Instances” and verify 2 running instances

Step 2: Test Load Balancer Distribution

  1. Navigate to the Load Balancer vmss-lb[unique]
  2. Copy the public IP address from the Overview
  3. Open multiple browser tabs and navigate to http://[public-ip]
  4. Refresh repeatedly to observe different instance responses
  5. Note how traffic is distributed across instances

Step 3: Review Current Scaling Configuration

  1. Go back to the scale set web-vmss[unique]
  2. Click “Scaling” in the left menu
  3. Observe that auto-scaling is currently disabled
  4. Note the current manual scale setting (2 instances)

Expected Results: You can see the scale set infrastructure working with load balancing across 2 instances, ready for scaling configuration.

Part 2: Configure Basic Auto-Scaling Rules

Step 1: Enable Auto-Scaling

  1. In the scale set Scaling section
  2. Click “Custom autoscale”
  3. Enter autoscale setting name: cpu-based-scaling
  4. Set default instance count: 2

Step 2: Create Scale-Out Rule

  1. Click ”+ Add a rule”
  2. Configure the scale-out rule:
    • Metric source: Current resource (vmss)
    • Metric namespace: Virtual Machine Host
    • Metric name: Percentage CPU
    • Time aggregation: Average
    • Operator: Greater than
    • Threshold: 70
    • Duration: 5 minutes
    • Operation: Increase count by
    • Instance count: 1
    • Cool down: 5 minutes
  3. Click “Add”

Step 3: Create Scale-In Rule

  1. Click ”+ Add a rule”
  2. Configure the scale-in rule:
    • Metric source: Current resource (vmss)
    • Metric namespace: Virtual Machine Host
    • Metric name: Percentage CPU
    • Time aggregation: Average
    • Operator: Less than
    • Threshold: 30
    • Duration: 10 minutes
    • Operation: Decrease count by
    • Instance count: 1
    • Cool down: 10 minutes
  3. Click “Add”

Step 4: Set Instance Limits and Save

  1. Configure scale condition:
    • Minimum: 2
    • Maximum: 5
    • Default: 2
  2. Click “Save”
  3. Wait for the configuration to be applied

Expected Results: Auto-scaling is now enabled with CPU-based rules that will scale out above 70% CPU and scale in below 30% CPU.

Part 3: Test Auto-Scaling Behavior

Step 1: Generate Load to Trigger Scale-Out

  1. Go to “Instances” in the scale set
  2. Click on the first instance name
  3. Click “Run command” → “RunShellScript”
  4. Enter this command to generate CPU load:
    Terminal window
    sudo apt-get update && sudo apt-get install -y stress
    nohup stress --cpu 2 --timeout 600 &
  5. Click “Run”
  6. Repeat for the second instance

Step 2: Monitor Scaling Activity

  1. Go to the scale set “Metrics” section
  2. Click “Add metric”
  3. Select “Percentage CPU” metric
  4. Set time range to “Last 30 minutes”
  5. Watch CPU usage climb above 70%
  6. Go to “Activity log” to monitor scaling events

Step 3: Observe Scale-Out Event

  1. Wait 5-10 minutes for the scale-out rule to trigger
  2. Go to “Instances” and refresh the view
  3. Verify a third instance is being created
  4. Monitor the new instance until it shows “Running”
  5. Test load balancer includes the new instance

Step 4: Stop Load and Monitor Scale-In

  1. Go back to “Run command” on each instance
  2. Run this command to stop the load:
    Terminal window
    sudo pkill stress
  3. Monitor CPU metrics dropping below 30%
  4. Wait 10-15 minutes for scale-in cool-down
  5. Verify instance count returns to 2

Expected Results: Scale set automatically creates a new instance when CPU exceeds 70% for 5 minutes, then removes it when CPU drops below 30% for 10 minutes.

Part 4: Configure Advanced Scaling Policies

Step 1: Add Memory-Based Scaling Rule

  1. Go to scale set “Scaling” section
  2. Click ”+ Add a rule”
  3. Configure memory-based scale-out:
    • Metric source: Current resource (vmss)
    • Metric namespace: Virtual Machine Host
    • Metric name: Available Memory Bytes
    • Operator: Less than
    • Threshold: 1073741824 (1GB in bytes)
    • Duration: 5 minutes
    • Operation: Increase count by 1
  4. Click “Add”

Step 2: Create Schedule-Based Scaling Condition

  1. Click ”+ Add a scale condition”
  2. Select “Scale based on a schedule”
  3. Name: business-hours-scaling
  4. Configure schedule:
    • Start time: 09:00
    • End time: 17:00
    • Days: Monday to Friday
    • Time zone: Your local time zone
    • Instance count: 4
  5. Click “Add”

Step 3: Review Scaling Configuration

  1. Go to the scale set “Scaling” overview
  2. Verify you now have:
    • Default condition with CPU rules (2-5 instances)
    • Memory-based rule
    • Business hours schedule (4 instances)
  3. Check the current active condition

Expected Results: Scale set now has multiple scaling policies - CPU-based, memory-based, and schedule-based scaling rules.

Part 5: Monitor Scaling Performance

Step 1: Analyze Scaling Metrics

  1. Go to scale set “Metrics”
  2. Add multiple metrics:
    • Percentage CPU
    • Available Memory Bytes
    • Instance Count
  3. Set time range to “Last 4 hours”
  4. Observe correlation between metrics and scaling events

Step 2: Review Scaling History

  1. Go to “Activity log”
  2. Filter by “Administrative” category
  3. Look for “Scale” operations
  4. Click on scaling events to see details
  5. Note what triggered each scaling action

Step 3: Evaluate Load Balancer Performance

  1. Go to load balancer “Metrics”
  2. Add metrics:
    • Data Path Availability
    • Health Probe Status
    • Byte Count
  3. Analyze how load balancer handles traffic during scaling

Expected Results: You can see the relationship between metrics, scaling triggers, and load balancer behavior during scaling events.

Part 6: Optimize Scaling Configuration

Step 1: Fine-Tune Scaling Thresholds

  1. Go to scale set “Scaling” configuration
  2. Edit the CPU scale-out rule
  3. Adjust threshold to 60% (more sensitive)
  4. Reduce duration to 3 minutes (faster response)
  5. Save changes

Step 2: Test Manual Scaling

  1. Go to “Instances” in scale set
  2. Click “Scale”
  3. Set instance count to 3
  4. Click “Save”
  5. Monitor the new instance creation
  6. Verify load balancer includes new instance

Step 3: Configure Scaling Notifications

  1. Go to “Activity log”
  2. Click “Add activity log alert”
  3. Configure alert for scaling events:
    • Resource type: Virtual Machine Scale Sets
    • Operation name: Microsoft.Compute/virtualMachineScaleSets/scale/action
  4. Add email notification action
  5. Create the alert rule

Expected Results: Scale set has optimized scaling configuration with proactive monitoring and notifications for scaling events.


Troubleshooting Guide

Common Issues

  • Scale set creation fails: Check subscription quotas and region availability
  • Instances won’t start: Verify VM size availability and network configuration
  • Web pages don’t load: Check load balancer configuration and instance health
  • Auto-scaling not working: Verify metrics are being collected and thresholds are correct
  • SSH connection fails: Ensure network security group allows SSH traffic

Quick Fixes

  • Stuck instances: Delete and recreate the scale set
  • Load balancer issues: Check backend pool health status
  • High costs: Reduce instance count or use smaller VM sizes
  • Performance problems: Monitor CPU and memory metrics

Key Takeaways

After completing this lab, you should understand:

  • Scale sets provide automatic scaling based on metrics like CPU usage
  • Load balancers distribute traffic evenly across healthy instances
  • Custom script extensions allow software installation across all instances
  • Auto-scaling rules can scale out and scale in based on demand
  • Monitoring is essential to understand scaling behavior and performance
  • Cost management requires careful configuration of scaling limits

Additional Resources