The PDCA (plan-do-check-act) framework can be used to outline the performance, availability, and monitoring to enable teams to ensure performant and highly available applications. These include infrastructure design and setup, application architecture and design, coding, performance testing, and application monitoring.
application performance management
InfoQ recently caught up with Observability experts to discuss several topics including fundamental questions about what Observability really entails, the misconceptions and challenges that the users are facing, the open standards that are influencing the industry in general and why this more interest in this area off late.
In a recent keynote for The DEVOPS Conference, Cheryl Hung, VP Ecosystem for the Cloud Native Computing Foundation (CNCF) shared her top 10 predictions for cloud native in the upcoming year. This includes improvements in cross cloud support, growth in GitOps and chaos engineering practices, and an increase in the adoption of FinOps.
In this podcast Anurag Gupta, founder and CEO of Shoreline.io, sat down with InfoQ podcast host Daniel Bryant and discussed: the role of DevOps and site reliability engineering (SRE), day 2 operations, and the importance of building observability into applications and platforms.
PagerDuty has released a number of new updates and enhancements to their incident response platform. This includes new integrations with Amazon DevOps Guru, AWS Control Tower, and Microsoft Teams. Other improvements include improvements to mapping failures back to changes, automatic triggers, and content-based alert grouping.
InfoQ Live, the one-day virtual event for software engineers and architects, returns on March 16th with a new edition, this time focusing on ways to reduce the uncertainty of your software development cycle.
Lightstep has released a number of improvements to their observability platform. These include native support for OpenTelemetry metrics, a new underlying time series database, and Change Intelligence, a new feature that looks to connect unusual patterns with impacting changes by bringing together system metrics and trace data.
Amazon Web Services (AWS) recently introduced Amazon DevOps Guru, one of several new machine learning-driven services. DevOps Guru detects operational issues, generates reports and notifications, and offers insights and recommendations on how to take action.
This eMag helps you reflect on the subject of reducing complexity within modern applications and distributed systems, and provides you with different perspectives and learned lessons from people who have already had to deal with challenges from the real world.
In one of the latest announcements of re:Invent 2020, AWS introduced the preview of Amazon Managed Service for Grafana, a managed Grafana that automatically scales compute and database infrastructure, with automated version updates and security patching. AWS also introduced a preview for Amazon Managed Service for Prometheus.
Grafana Labs recently released the distributed tracing backend Grafana Tempo. It only requires object storage like Amazon S3 or Google Cloud Storage (GCS) to operate. Grafana Tempo integrates with any existing logging system to create links from trace IDs in log lines.
AIOps platforms empower IT teams to quickly find the root issues that originate in the network and disrupt running applications. AI/ML algorithms need access to high quality network data to determine what went wrong and where. Network visibility starts from TAPs around network equipment, and teams can add application instrumentation and logs as data sources for complete insights.
AWS recently added to the Amazon Builders' Library their best practices for building dashboards for operational visibility. The document includes a detailed description of the different types of dashboards that exist at Amazon as well as a discussion of the design best practices used to create dashboards.
AWS recently introduced the ability to share Amazon CloudWatch Dashboards with users who do not have access to the AWS account. This feature opens up new use cases for dashboards, including sharing metrics and information on big screens, or embedding real-time information in public pages.
In a recent InfoQ podcast, Liran Haimovitch, CTO at Rookout, discussed the concept of “understandability” and how this relates to building modern software systems. Building on the concepts introduced in his recent InfoQ article, he also discussed how complexity impacts a system’s understandability, and the benefits of live debugging tooling.