Seeing through the chaos with OpenTelemetry in Google Cloud Platform
We’ve all been there. We built a system using a combination of our own carefully coded services, some purchased or open-source systems we deploy to VM’s or functions, and a cloud provider’s systems. It was all working so well and then... it just isn’t. It’s too slow, it is throwing errors, it’s timing out, it’s forcing me to put off leaving at the end of the day for a nice visit to the woods to recuperate. Looking at log messages, ok, how did that handful of systems produce that many messages. How can I look at that pile of gibberish and figure out what is going on? Running on my local computer, maybe I could debug this, but I don’t even know enough to set up the matching scenario to trigger this problem. Should we add more logging? NOOOOOOO! OpenTelemetry helps solve this problem by standardizing how to correlate and track activities across all your systems. The language and platform specific libraries and SDK’s make it much simpler to implement tracing in your own applications and correlate activities across the whole distributed network of nodes you call home. It is usable in all platforms so much of what we’ll talk about is applicable to GCP, Azure and AWS. It doesn’t replace logging but being able to associate logs with traces provides a wonderful way to filter log entries. Some of the foundations of this started in Google as OpenCensus and so this way of tackling the problem is baked into how GCP works. So, we’ll explore the setup of a few services running in GCP with Google Trace, OpenTelemetry and Google Logging libraries and try to tame this beast. The demo will be code and query heavy, showing requests originating from REST endpoints, graphql, and pubsub. It will also show how tracing correlates those requests to CloudSql, Google Cloud Storage and other built-in systems. The services will be in node.js, python and c# to show a bigger cross section of implementations.
It would be helpful but not required that you have worked with a cloud provider and worked with the logging mechanisms they provide for you. Bonus if you have worked on a multi/micro service architecture already including apis, functions/lambdas and apps
- Understand the different between logging, tracing and where they fit in distributed systems