Virtual Machine Scale Sets
Lab Objective
In this hands-on lab, you will learn how to:
- Configure Virtual Machine Scale Sets auto-scaling policies and thresholds
- Manage scale set instances through portal and scaling operations
- Test automatic scaling by triggering scale-out and scale-in events
- Monitor scaling activity and performance metrics in real-time
- Customize scaling rules for different metrics and schedules
- Analyze scaling behavior and optimize for cost and performance
Scenario: Work with a pre-deployed web application scale set to understand auto-scaling behavior, configure advanced scaling policies, and optimize performance.
Please sign in to launch lab.
Pre-Provisioned Environment
Virtual Machine Scale Set Lab Environment├── Resource Group: VMSS-Lab-RG├── Virtual Network (vmss-vnet)│ ├── Web Subnet (10.0.1.0/24)│ └── Management Subnet (10.0.2.0/24)├── Load Balancer (vmss-lb[unique])│ ├── Public IP: vmss-pip[unique]│ ├── Backend Pool: Scale Set instances│ ├── Health Probe: HTTP:80/│ └── Load Balancing Rule: HTTP:80├── Virtual Machine Scale Set (web-vmss[unique])│ ├── Instances: 2 (initial)│ ├── VM Size: Standard_B2s│ ├── OS: Ubuntu 20.04 LTS│ ├── Web Server: Apache pre-installed│ └── Auto-scaling: Disabled (for lab configuration)├── Network Security Group (vmss-nsg[unique])│ ├── HTTP (80): Allow from Internet│ └── SSH (22): Allow from Load Balancer└── Application Insights (vmss-insights[unique]) └── Performance monitoring enabledImportant: The scale set is pre-deployed with a simple web application. Your focus will be on configuring and testing the scaling behavior.
Lab Exercises
Part 1: Explore Scale Set Configuration
Step 1: Examine Scale Set Properties
- Navigate to Resource Groups →
VMSS-Lab-RG - Click on the Virtual Machine Scale Set
web-vmss[unique] - Review the Overview tab:
- Current instance count
- VM size and configuration
- Operating system details
- Go to “Instances” and verify 2 running instances
Step 2: Test Load Balancer Distribution
- Navigate to the Load Balancer
vmss-lb[unique] - Copy the public IP address from the Overview
- Open multiple browser tabs and navigate to
http://[public-ip] - Refresh repeatedly to observe different instance responses
- Note how traffic is distributed across instances
Step 3: Review Current Scaling Configuration
- Go back to the scale set
web-vmss[unique] - Click “Scaling” in the left menu
- Observe that auto-scaling is currently disabled
- Note the current manual scale setting (2 instances)
Expected Results: You can see the scale set infrastructure working with load balancing across 2 instances, ready for scaling configuration.
Part 2: Configure Basic Auto-Scaling Rules
Step 1: Enable Auto-Scaling
- In the scale set Scaling section
- Click “Custom autoscale”
- Enter autoscale setting name:
cpu-based-scaling - Set default instance count:
2
Step 2: Create Scale-Out Rule
- Click ”+ Add a rule”
- Configure the scale-out rule:
- Metric source: Current resource (vmss)
- Metric namespace: Virtual Machine Host
- Metric name: Percentage CPU
- Time aggregation: Average
- Operator: Greater than
- Threshold: 70
- Duration: 5 minutes
- Operation: Increase count by
- Instance count: 1
- Cool down: 5 minutes
- Click “Add”
Step 3: Create Scale-In Rule
- Click ”+ Add a rule”
- Configure the scale-in rule:
- Metric source: Current resource (vmss)
- Metric namespace: Virtual Machine Host
- Metric name: Percentage CPU
- Time aggregation: Average
- Operator: Less than
- Threshold: 30
- Duration: 10 minutes
- Operation: Decrease count by
- Instance count: 1
- Cool down: 10 minutes
- Click “Add”
Step 4: Set Instance Limits and Save
- Configure scale condition:
- Minimum: 2
- Maximum: 5
- Default: 2
- Click “Save”
- Wait for the configuration to be applied
Expected Results: Auto-scaling is now enabled with CPU-based rules that will scale out above 70% CPU and scale in below 30% CPU.
Part 3: Test Auto-Scaling Behavior
Step 1: Generate Load to Trigger Scale-Out
- Go to “Instances” in the scale set
- Click on the first instance name
- Click “Run command” → “RunShellScript”
- Enter this command to generate CPU load:
Terminal window sudo apt-get update && sudo apt-get install -y stressnohup stress --cpu 2 --timeout 600 & - Click “Run”
- Repeat for the second instance
Step 2: Monitor Scaling Activity
- Go to the scale set “Metrics” section
- Click “Add metric”
- Select “Percentage CPU” metric
- Set time range to “Last 30 minutes”
- Watch CPU usage climb above 70%
- Go to “Activity log” to monitor scaling events
Step 3: Observe Scale-Out Event
- Wait 5-10 minutes for the scale-out rule to trigger
- Go to “Instances” and refresh the view
- Verify a third instance is being created
- Monitor the new instance until it shows “Running”
- Test load balancer includes the new instance
Step 4: Stop Load and Monitor Scale-In
- Go back to “Run command” on each instance
- Run this command to stop the load:
Terminal window sudo pkill stress - Monitor CPU metrics dropping below 30%
- Wait 10-15 minutes for scale-in cool-down
- Verify instance count returns to 2
Expected Results: Scale set automatically creates a new instance when CPU exceeds 70% for 5 minutes, then removes it when CPU drops below 30% for 10 minutes.
Part 4: Configure Advanced Scaling Policies
Step 1: Add Memory-Based Scaling Rule
- Go to scale set “Scaling” section
- Click ”+ Add a rule”
- Configure memory-based scale-out:
- Metric source: Current resource (vmss)
- Metric namespace: Virtual Machine Host
- Metric name: Available Memory Bytes
- Operator: Less than
- Threshold: 1073741824 (1GB in bytes)
- Duration: 5 minutes
- Operation: Increase count by 1
- Click “Add”
Step 2: Create Schedule-Based Scaling Condition
- Click ”+ Add a scale condition”
- Select “Scale based on a schedule”
- Name:
business-hours-scaling - Configure schedule:
- Start time: 09:00
- End time: 17:00
- Days: Monday to Friday
- Time zone: Your local time zone
- Instance count: 4
- Click “Add”
Step 3: Review Scaling Configuration
- Go to the scale set “Scaling” overview
- Verify you now have:
- Default condition with CPU rules (2-5 instances)
- Memory-based rule
- Business hours schedule (4 instances)
- Check the current active condition
Expected Results: Scale set now has multiple scaling policies - CPU-based, memory-based, and schedule-based scaling rules.
Part 5: Monitor Scaling Performance
Step 1: Analyze Scaling Metrics
- Go to scale set “Metrics”
- Add multiple metrics:
- Percentage CPU
- Available Memory Bytes
- Instance Count
- Set time range to “Last 4 hours”
- Observe correlation between metrics and scaling events
Step 2: Review Scaling History
- Go to “Activity log”
- Filter by “Administrative” category
- Look for “Scale” operations
- Click on scaling events to see details
- Note what triggered each scaling action
Step 3: Evaluate Load Balancer Performance
- Go to load balancer “Metrics”
- Add metrics:
- Data Path Availability
- Health Probe Status
- Byte Count
- Analyze how load balancer handles traffic during scaling
Expected Results: You can see the relationship between metrics, scaling triggers, and load balancer behavior during scaling events.
Part 6: Optimize Scaling Configuration
Step 1: Fine-Tune Scaling Thresholds
- Go to scale set “Scaling” configuration
- Edit the CPU scale-out rule
- Adjust threshold to 60% (more sensitive)
- Reduce duration to 3 minutes (faster response)
- Save changes
Step 2: Test Manual Scaling
- Go to “Instances” in scale set
- Click “Scale”
- Set instance count to 3
- Click “Save”
- Monitor the new instance creation
- Verify load balancer includes new instance
Step 3: Configure Scaling Notifications
- Go to “Activity log”
- Click “Add activity log alert”
- Configure alert for scaling events:
- Resource type: Virtual Machine Scale Sets
- Operation name: Microsoft.Compute/virtualMachineScaleSets/scale/action
- Add email notification action
- Create the alert rule
Expected Results: Scale set has optimized scaling configuration with proactive monitoring and notifications for scaling events.
Troubleshooting Guide
Common Issues
- Scale set creation fails: Check subscription quotas and region availability
- Instances won’t start: Verify VM size availability and network configuration
- Web pages don’t load: Check load balancer configuration and instance health
- Auto-scaling not working: Verify metrics are being collected and thresholds are correct
- SSH connection fails: Ensure network security group allows SSH traffic
Quick Fixes
- Stuck instances: Delete and recreate the scale set
- Load balancer issues: Check backend pool health status
- High costs: Reduce instance count or use smaller VM sizes
- Performance problems: Monitor CPU and memory metrics
Key Takeaways
After completing this lab, you should understand:
- Scale sets provide automatic scaling based on metrics like CPU usage
- Load balancers distribute traffic evenly across healthy instances
- Custom script extensions allow software installation across all instances
- Auto-scaling rules can scale out and scale in based on demand
- Monitoring is essential to understand scaling behavior and performance
- Cost management requires careful configuration of scaling limits