Cloud Monitoring


  • Cloud Monitoring helps understand and visualize the performance and health of AWS resources.
  • The main AWS service for monitoring is Amazon CloudWatch.

CloudWatch Metrics

Definition

  • Metrics: Variables that represent the performance of AWS resources over time.
  • Examples:
    • CPUUtilization (for EC2)
    • NetworkIn and NetworkOut
    • Billing (total AWS spending)

Key Points

  • Metrics are timestamped data points collected periodically.
  • You can visualize metrics in CloudWatch Dashboards.
  • Billing Metric:
    • Available only in us-east-1 region.
    • Represents total AWS spending for the entire account.
    • Resets monthly.

Common Metrics by Service

ServiceCommon MetricsNotes
EC2CPUUtilization, StatusCheck, NetworkIn/OutRAM metrics are not available
EBSDiskReadOps, DiskWriteOpsMeasures disk I/O
S3BucketSizeBytes, NumberOfObjects, AllRequestsTracks storage and request activity
BillingEstimatedChargesAccount-wide billing data (us-east-1)
Service LimitsAPI usageHelps monitor resource limits
Custom MetricsUser-definedPush your own metrics if needed

Metric Frequency

  • Standard Monitoring: every 5 minutes (default, free).
  • Detailed Monitoring: every 1 minute (paid).

CloudWatch Alarms

  • Alarms trigger actions based on metric thresholds.
  • Example: When CPU utilization > 90%, send an alert.

Alarm Actions

  1. Auto Scaling Actions – increase/decrease EC2 instance count automatically.
  2. EC2 Actions – stop, terminate, reboot, or recover instances.
  3. SNS Notifications – send alerts via email, SMS, or other channels.

Billing Alarms

  • Set alarms on the Billing metric to get notified when estimated charges exceed a certain amount (e.g., $10 or $20).

Alarm States

StateMeaning
OKMetric within normal range
INSUFFICIENT_DATANot enough data points
ALARMThreshold breached (bad condition)

Evaluation Options

  • You can configure:
    • Statistic type (average, min, max, percentage)
    • Evaluation period (e.g., 5 minutes, 1 hour)

Summary

  • CloudWatch Metrics track performance data.
  • CloudWatch Alarms automate responses or notifications when thresholds are crossed.
  • Billing Metrics and Alarms help control costs.
  • Custom Metrics allow monitoring of user-defined data.

Amazon CloudWatch Logs


Purpose

  • CloudWatch Logs is used to collect, monitor, store, and analyze log files from various AWS services and on-premises systems.
  • Enables real-time monitoring and troubleshooting of applications and infrastructure.

What Are Log Files?

  • Logs are records of events and activities generated by applications or systems.
  • Used for debugging, troubleshooting, and performance analysis.
  • Example: logs that record user actions, errors, cleanup tasks, or background processes.

Log Sources

CloudWatch Logs can collect logs from:

  • Elastic Beanstalk – application and environment logs.
  • ECS (Elastic Container Service).
  • AWS Lambda – automatically sends logs to CloudWatch.
  • CloudTrail – for auditing API calls.
  • EC2 instances – using the CloudWatch Logs Agent.
  • On-premises servers – via the same agent.
  • Route 53 – for DNS query logs.

CloudWatch Logs Agent

  • Purpose: Sends log data from EC2 or on-premises servers to CloudWatch Logs.
  • Setup:
    1. Install the CloudWatch Logs agent on the instance/server.
    2. Configure which log files to send.
    3. Ensure the instance has an IAM role with permissions to write to CloudWatch Logs.
  • Hybrid capability: Works on both AWS and on-premises environments.

Retention and Management

  • Log retention periods are configurable:
    • Options include 1 week, 30 days, 1 year, or indefinite storage.
  • Logs can be searched, filtered, and visualized in real-time.
  • Useful for alerting when specific log patterns occur.

Use Case Example (EC2)

  • By default, EC2 does not send logs to CloudWatch.
  • After installing and configuring the CloudWatch Logs Agent, logs from EC2 are pushed to CloudWatch Logs for central monitoring and analysis.

Summary

FeatureDescription
ServiceAmazon CloudWatch Logs
Main FunctionCollect and monitor log data
Data SourcesEC2, Lambda, ECS, Beanstalk, CloudTrail, Route 53, on-premises servers
Agent RequirementYes, for EC2 and on-premises
IAM Role NeededYes, to allow log data upload
Retention Options1 week to infinite
Supports Hybrid UseYes (AWS + on-premises)

Amazon EventBridge (formerly CloudWatch Events)


Purpose

  • Reacts to events happening in your AWS account or from external sources.
  • Can also be used to schedule cron jobs (serverless scheduling).

Key Concepts

1. EventBridge Use Cases

  • Cron jobs: Schedule scripts to run regularly (e.g., every hour trigger a Lambda).
  • Automated reactions: Respond to AWS events (e.g., root user sign-in, EC2 state change).

2. Example

  • Detect IAM root user sign-in → send event to SNS → email alert to security team.

Event Sources

  • AWS Services: EC2, CodeBuild, S3 events, Trusted Advisor, etc.
  • Schedule-based: Cron or rate expressions.
  • Partner Event Bus: From AWS partners like Datadog, Zendesk, etc.
  • Custom Event Bus: From your own applications to send and handle custom events.

Event Destinations

  • Lambda function (common)
  • SNS or SQS
  • Step Functions
  • Other AWS services for orchestration or automation

Advanced Features

  • Schema Registry:
    Defines and models event structure (data types, schema).
  • Event Archive:
    Archive all events indefinitely or for a defined time.
  • Replay Events:
    Replay past archived events for debugging or recovery.

EventBridge Structure

  1. Event Source: Something happens (AWS service, app, or schedule).
  2. Event Bus: Routes the event.
  3. Rule: Defines which events trigger which actions.
  4. Target: Destination service (Lambda, SNS, etc.).

Summary

  • EventBridge = event-driven automation service.
  • Used for cron jobs, reactive workflows, and cross-service integration.
  • Supports AWS events, partner events, and custom app events.
  • Advanced features include schema registry, archive, and replay.

AWS CloudTrail


Purpose

  • CloudTrail provides governance, compliance, and auditing for your AWS account.
  • It records API calls and events across your AWS environment.
  • Enabled by default for every AWS account.

What CloudTrail Records

CloudTrail logs who did what, where, and when in your AWS account.
It tracks all API interactions through:

  • AWS Management Console
  • AWS CLI
  • AWS SDKs
  • AWS Services (internal actions)

Examples:

  • User logs into AWS console → logged in CloudTrail
  • Command executed via CLI → logged in CloudTrail
  • API call made by SDK → logged in CloudTrail

Storage of Logs

CloudTrail logs can be sent to:

  • Amazon S3 → for long-term storage and compliance
  • Amazon CloudWatch Logs → for real-time monitoring and alerting

You can create a Trail that applies to:

  • All regions (recommended for full visibility)
  • Single region (limited scope)

Use Case Example

If a user deletes a resource (e.g., an EC2 instance or S3 bucket):

  • You can check CloudTrail to find:
    • Who deleted it
    • When it happened
    • From where (source IP)
    • Which API call was used

Key Features

FeatureDescription
Default StatusEnabled for all AWS accounts
TracksConsole, CLI, SDK, and service API activity
RetentionLong-term storage via S3
MonitoringReal-time insights via CloudWatch Logs
Multi-region TrailsOption to monitor all regions
Use CaseSecurity analysis, auditing, troubleshooting

Summary

  • CloudTrail = audit log of AWS API activity.
  • Logs who did what and when for security and compliance.
  • Integrates with S3 (for storage) and CloudWatch Logs (for alerts).
  • Always check CloudTrail when you need to identify who made a change in your AWS environment.

AWS X-Ray


Purpose

  • AWS X-Ray helps analyze and debug distributed applications in production or development.
  • Provides end-to-end tracing of requests as they travel through multiple AWS services.
  • Offers visual insights into application performance and dependencies.

Why X-Ray Is Needed

Without X-Ray:

  • Logs are spread across multiple services, hard to combine and analyze.
  • Debugging distributed architectures (like microservices using SQS, SNS, Lambda, etc.) becomes complex.
  • No single view of how requests move through the system.

X-Ray solves this by giving a centralized trace of all service interactions.

Key Features

FeatureDescription
Distributed TracingTracks a request end-to-end across all AWS services.
Service Map / GraphVisual representation of how components (services, APIs, functions) interact.
Performance BottlenecksIdentifies which part of the system causes delays or throttling.
Error AnalysisDetects and visualizes errors and exceptions in requests.
Dependency MappingShows relationships between microservices.
SLA MonitoringVerifies if your app meets response-time targets.
User ImpactIdentifies which users are affected by an issue.

Benefits

  • Visual debugging for microservice-based or distributed systems.
  • Quickly pinpoints issues (slow responses, timeouts, throttling).
  • Helps optimize performance and improve reliability.
  • Useful for both developers and DevOps teams.

Common Exam Use Cases

ScenarioAWS Service
Find who made an API callCloudTrail
Monitor performance metricsCloudWatch
Trace individual requests through multiple servicesX-Ray

Summary

  • AWS X-Ray = request tracing, debugging, and visualization tool.
  • Used for distributed tracing across multiple AWS services.
  • Best for troubleshooting, performance analysis, and microservices visualization.
  • Integrated with EC2, Lambda, ECS, API Gateway, and more.
  • No hands-on for the CCP exam—just understand what it does.

Amazon CodeGuru (Overview)


Definition:
A machine learning-powered service that provides:

  1. Automated code reviews (CodeGuru Reviewer)
  2. Application performance recommendations (CodeGuru Profiler)

Purpose:

  • Detects bugs, security issues, and inefficiencies automatically.
  • Improves code quality and application performance with minimal human intervention.

CodeGuru Reviewer

Function:
Performs static code analysis to automatically review code when you push commits to repositories such as:

  • CodeCommit
  • GitHub
  • Bitbucket

Key Features:

  • Identifies critical issues, security vulnerabilities, and hard-to-find bugs.
  • Detects resource leaks, input validation errors, and security holes.
  • Provides actionable recommendations directly in your code.
  • Uses machine learning and automated reasoning trained on:
    • Thousands of open-source repositories
    • Amazon.com’s internal repositories

Supported Languages:

  • Java
  • Python

Use Case:
When developers push code, CodeGuru Reviewer analyzes it and comments on potential problems or best practice violations.

CodeGuru Profiler

Function:
Analyzes runtime behavior of applications in production or pre-production environments.

Key Features:

  • Identifies performance bottlenecks and cost inefficiencies.
  • Detects excessive CPU usage, memory-heavy objects, and code inefficiencies.
  • Provides heap summaries to identify objects using a lot of memory.
  • Detects anomalies in runtime behavior.
  • Helps to reduce compute costs and optimize performance.
  • Adds minimal overhead to the monitored application.
  • Works for applications running on AWS or on-premises.

Example:
If a logging function consumes too much CPU, CodeGuru Profiler detects it and suggests optimizations.

Summary

ComponentPurposeKey Capabilities
CodeGuru ReviewerAutomated code reviewsFinds bugs, security issues, and resource leaks before deployment
CodeGuru ProfilerRuntime performance analysisOptimizes performance, reduces costs, detects anomalies

Remember for Exam (AWS CCP):

  • CodeGuru = ML-powered service for automated code review and performance profiling.
  • Reviewer = static analysis (before deploy)
  • Profiler = runtime analysis (after deploy)
  • Supports Java and Python, integrates with GitHub, Bitbucket, and CodeCommit.

AWS Health Dashboard


There are two main parts of the AWS Health Dashboard:

  1. Service Health Dashboard (Public View)
  2. Account Health Dashboard (Personal View)

Service Health Dashboard

  • Shows current and historical health of all AWS services across all regions.
  • Provides general AWS-wide information (not account-specific).
  • You can check:
    • Regional service statuses
    • Historical issues (day-by-day)
    • Subscribe to RSS feed for updates
  • Previously called the AWS Service Health Dashboard.

Account Health Dashboard (for your AWS account)

  • Formerly known as Personal Health Dashboard (PHD).
  • Gives alerts, notifications, and remediation guidance for issues affecting your own resources.
  • Helps you monitor:
    • Performance and availability of AWS services you use
    • Events impacting your account (e.g., EC2 issue in a specific region)
  • Provides:
    • Event log (past incidents)
    • Scheduled maintenance notifications
    • Proactive alerts for upcoming changes
  • Can aggregate health information across your AWS Organizations accounts.
  • Access from the top-right corner (bell icon) in the AWS Console.
  • It’s a global service showing outages or events that directly impact you.

Summary

DashboardScopePurposeExample
Service Health DashboardAll AWS services (public)View general AWS status & outagesCheck if S3 is down in us-east-1
Account Health DashboardYour AWS account (private)Personalized alerts & guidanceSee EC2 issue affecting your account

AWS Monitoring and Observability - Summary


AWS provides multiple tools for monitoring, logging, auditing, and debugging your applications and infrastructure.

Amazon CloudWatch

A suite of tools for monitoring and automating AWS service performance.

CloudWatch Metrics

  • Collects and monitors performance metrics of AWS services and billing.
  • Example: CPU utilization, disk I/O, network usage.

CloudWatch Alarms

  • Triggers actions when metrics exceed defined thresholds.
  • Can:
    • Send notifications via SNS
    • Perform EC2 actions (e.g., reboot, stop, terminate)
    • Automate responses to metric changes

CloudWatch Logs

  • Centralized log management for:
    • EC2 instances
    • Lambda functions
    • On-premises servers
  • Helps monitor and analyze logs in real-time.

CloudWatch Events (now called Amazon EventBridge)

  • Reacts to AWS service events or scheduled rules.
  • Example: Trigger Lambda when an EC2 instance state changes.

AWS CloudTrail

  • Records all API calls made within your AWS account.
  • Tracks who did what, when, and from where.
  • Used for auditing and security analysis.

CloudTrail Insights

  • Uses machine learning to detect unusual API activity automatically.
  • Example: Detects spikes in IAM access or EC2 instance launches.

AWS X-Ray

  • Used for tracing requests across distributed applications.
  • Helps with:
    • Root cause analysis
    • Performance troubleshooting
    • Understanding how microservices interact
  • Ideal for debugging serverless and microservice architectures.

AWS Health Dashboard

  • Two views:
    • Service Health Dashboard → General status of all AWS services and regions.
    • Account Health Dashboard → Personalized view showing events that impact your own resources.

Amazon CodeGuru

  • AI-powered service for:
    • Automated code reviews (CodeGuru Reviewer)
    • Application performance profiling (CodeGuru Profiler)
  • Detects bugs, inefficiencies, and performance issues using machine learning.

Summary Table

ServicePurposeKey Use Case
CloudWatch MetricsMonitor resource performanceTrack EC2 CPU usage
CloudWatch AlarmsAutomate alerts/actionsReboot EC2 when CPU > 80%
CloudWatch LogsCollect and analyze logsCentralize Lambda logs
EventBridge (CloudWatch Events)React to events or schedulesTrigger Lambda on EC2 state change
CloudTrailAudit API callsDetect unauthorized actions
CloudTrail InsightsDetect unusual activitySpike in IAM API calls
X-RayTrace distributed requestsDebug microservice performance
Health DashboardMonitor AWS/global healthView service outages or account impacts
CodeGuruAI-based code and performance reviewOptimize production code efficiency