So, you’ve gone “cloud native”. You’re running apps in containers, you’re scheduling them with Kubernetes, and now you’re trying to create a better experience for your team and for your customers. But when things break — and they often do — it can be challenging to understand how to resolve an incident quickly, or even which service owner is responsible. Distributed tracing brings the code execution to the forefront, and gives a new view focused on service performance. In this presentation, we discuss:
– Why traditional logs and metrics can’t answer the most important questions about K8s reliability
– How distributed tracing brings a service-centric view to the forefront of your monitoring teams
– How to instantly understand changes to services, pods, and containers
– How to share responsibility for incident response, and quickly engage the right team for resolution
– What complete system visibility actually means
– How you can take advantage of ‘shipping your org chart