Adam Johnson – Whose Fault Is It When Kubernetes Breaks?

October 7, 2020

36m

So, you’ve gone “cloud native”. You’re running apps in containers, you’re scheduling them with Kubernetes, and now you’re trying to create a better experience for your team and for your customers. But when things break — and they often do — it can be challenging to understand how to resolve an incident quickly, or even which service owner is responsible. Distributed tracing brings the code execution to the forefront, and gives a new view focused on service performance. In this presentation, we discuss:

– Why traditional logs and metrics can’t answer the most important questions about K8s reliability
– How distributed tracing brings a service-centric view to the forefront of your monitoring teams
– How to instantly understand changes to services, pods, and containers
– How to share responsibility for incident response, and quickly engage the right team for resolution
– What complete system visibility actually means
– How you can take advantage of ‘shipping your org chart

Share some ❤
Guest(s): Adam Johnson
starts in 10 seconds