Jonan Scheffler interviews Founding Engineer at Traceable.ai, Jayesh Ahire, about what experience we should give to developers by reducing cognitive load, how alerting is a very important component when it comes to monitoring observability, and gives the most wonderful and actionable advice to new developers: READ. Read as much as you can, focus on first principles, and your career will grow and thrive.

Should you find a burning need to share your thoughts or rants about the show, please spray them at devrel@newrelic.com. While you're going to all the trouble of shipping us some bytes, please consider taking a moment to let us know what you'd like to hear on the show in the future. Despite the all-caps flaming you will receive in response, please know that we are sincerely interested in your feedback; we aim to appease. Follow us on the Twitters: @ObservyMcObserv.

Collective Collection – Shifting Cognitive Load with Jayesh Ahire

Observy McObservface

Jonan Scheffler: Hello and welcome back to Observy McObservface, proudly brought to you by New Relic's Developer Relations team, The Relicans. Observy is about observability in something a bit more than the traditional sense. It's often about technology and tools that we use to gain visibility into our systems. But it is also about people because, fundamentally, software is about people. You can think of Observy as something of an observability variety show where we will apply systems thinking and think critically about challenges across our entire industry, and we very much look forward to having you join us. You can find the show notes for this episode along with all of The Relicans podcasts on developer.newrelic.com/podcasts. We're so pleased to have you here this week. Enjoy the show.

Hello and welcome back to Observy. My name is Jonan. I am the Director of Developer Relations here at New Relic, and I am joined by Jayesh. How are you?

Jayesh Ahire: I'm doing good and excited to be here. How are you doing?

Jonan: I'm doing all right. I'm hanging in there. I'm also excited that you're here. We are communicating from across the entire planet, which is kind of magical. But I appreciate you joining us so late at night. Where are you located?

Jayesh: I'm currently in a city in India called Nasik. It's pretty beautiful out here, rainy days. And it's good to look out of a window. You can't really go out, considering the COVID situation. But it's good to look out the window and see the greenery out here.

Jonan: Yeah, I bet. I love that about Oregon too. We live in an actual forest. Even in downtown areas in most cities in Oregon, there are just trees everywhere. I miss it so much when I go to bigger cities. So tell us a little bit about yourself, how it is that you ended up doing the work that you do. What is the work that you do, by the way? Why don't we start there?

Jayesh: So currently, I work at Traceable. I joined as a founding engineer almost one and a half years back. And then, I was mostly focused on an open-source project called Hypertrace, which is a distributed tracing platform. And at Traceable, we do security on top of distributed tracing, so that's our niche.

So before coming to Traceable or before coming into this tracing observability monitoring space, I was mostly doing consulting on AI/ML space and cloud, mostly on AWS, and I'm even an AWS Machine Learning Hero for that matter. So mostly working around that part. But yes, as you're working with Cloud, you are mostly caring about all these things: resources, infrastructure, and networking. And you're working on that indirectly and directly, and that happens.

But when I got into the space, it was completely new to me in general that okay, this is something cool, and we can explore this. And that's a completely new perspective, looking at things from...because most of the people who work in this specific space like APM monitoring, observability generally in this space, for that matter, they come from a very broad experience in the whole sense.

So there is one perspective you've built all years. You know that this is how things work, and this is how they should work. So when someone new comes up from the outside, you're learning things. You're exploring that okay, this is how this happens, or this is how you should look at things. You read more about the new happenings around. You understand, and then you develop your own perspective. So that's what ended up happening. And that's where I ended up adding a lot of value to, in general, what we do, I think. [laughs]

Jonan: I hope, right? Yeah, yeah. [laughs]

Jayesh: Yes, because of the new perspective and the context I brought in. And that's when I started reading about this whole thing. And when I started exploring, it was pretty interesting as a space. So observability, if you look at it, it's expanding. There are a lot of new things happening. And maybe every day, like right now, if you see the news, every day, something new is happening around here. So it's pretty interesting. That actually is good. So that's why I liked it here. And yes, I will probably stick around for a while and see how that goes. [laughter]

Jonan: You have a very interesting background coming from AI to ML into observability. And I think you make an excellent point that people who come into this space later maybe have more to add because they understand the pain of the developers that they're trying to help. You have worked in the spaces where it was necessary for you to monitor these large cloud systems and understand what was happening inside of them. It's almost a chicken and egg problem.

If you start out in observability and only ever stay in observability, then you maybe don't have the experience of knowing what it is you're trying to observe. And you can add all the instrumentation in the world, but unless you have that end goal in mind, you're not really solving the problems for your users.

So you have this background going in the ML and AI space a bit and then came over and started working at Traceable early on. And that AI experience has come in handy with the work that you all are doing now. You use AI in the work that you're doing with observability.

Jayesh: So, like right now, we don't really work heavily on observability. But again, the AI part is more in the security context as of now. We do that, but it's in a security context and not in the observability context that much. So I won't say I contributed a lot of the AI aspect at Traceable. I'm mostly affiliated with the open-source part to do with the Hypertrace project we have. Like, we have an open-source project called Hypertrace. But it's basically a distributed tracing platform, which many of our users are using. And we are building considering the observability perspective in mind. So some of the users are contributing new features, and it is evolving in that direction. So that's what my focus has been mostly.

Jonan: So you get to write a lot of open-source code during your day-to-day work.

Jayesh: Yes. I used to get to write a lot of open-source code. Recently, I moved to product management, and it has reduced down a bit. But I still go back and contribute something every now and then to make sure that I'm in touch with the whole ecosystem, but yes. And this has been interesting coming from the external world, then writing code for observability or tracing and then moving into product management for kind of tracing observability product itself. So it's an interesting journey. And there is a lot of learning around this.

Jonan: So Hypertrace, I am thinking from the name it's more about traces than other pieces of observability necessarily. And I don't want to represent observability as a concept as though it has pieces or pillars. I think it's pretty well-established at this point that observability is the ability to show up and just get the answer to the question. Do you want to talk a little bit to that? How it's not like you have four categories or however many categories you make up?

Jayesh: Yes. As we talk about observability, I will divide it into two different categories. One thing is data collection because all of the things you do with your systems need data. If you want to ask questions and get answers, you will need data. That's one thing.

The other thing is getting answers. So when you look at the traditional definition of observability, it basically is like you can throw questions at your system, and you can get answers back. And that's what it is about. So mostly...and these days it happens a lot. It happens in a lot of contexts that when people talk about observability, we talk about pillars. We talk about metrics, logs, traces, jumping between those, doing correlations, and then being able to find those answers. Being able to find those answers is a different thing; getting those answers is a different thing. The ability to find those answers is something monitoring used to do. And that's why if you talk in that context, the metrics, logs, traces, you can jump to that data and find answers. You're basically talking about glorified monitoring and not really observability.

So talking about the collection part, so this whole monitoring APM started, I guess 10 or 12 years back. It became a very fancy thing at that point, actually. And at that point, most of these things were driven by collection, considering they were not standard ways to collect data. There were not really mature ways to collect the data. So whoever can collect good data, they will win the race. Basically, if you can get available data from your customers' system and create nice dashboards around it, you're basically providing value to the customer.

That's where the monitoring part comes in that; okay, I can collect metrics from your system. I can collect logs from your system. I can even do correlation for you. I can collect traces. I can do all the things. And I will show you my dashboard so you can click through and find the answers you need. Or I will send you an alert when something blinks, and you can come to the dashboards. And you can find the information by clicking through a few of the dashboards and clicking through the information. So that's how it worked for several years.

And data collection was a very important aspect until recently maybe. But then the whole democratization part started. And now we have open standards to collect most of the data. OpenTelemetry is out there. OpenTelemetry can collect metrics, logs, and traces as of now. It's maturing. It's not there as of now. But the vision is like that. It can collect metrics, logs, and traces. You will have a single way to collect all the telemetry data. And the eBPF advancement, as we were talking about briefly before this can help you collect data as well. So most of the data collection aspect is being moved to open source, open standards, and companies have to abide by these things considering everybody has to use...If a customer asks you if you support some open standards, you have to pick. So most of the companies have to abide by these things and follow the standards out there. And once you start following, once the democratization comes into play, the only thing now you can do is show the relevant information.

So now you can't really sell the collection part. So now you have to sell value, and that value comes in as observability as we were briefly talking, shooting the questions, and getting right answers. And when I say getting right answers, many people like to talk about the single thing of observability that we can show you: metrics, logs, and traces. We can give you two click-through ways to jump between these data sources. But again, when you say that you can give us a way to click through these different data sources and find the answers, it's me who is doing the work. It's not some application or some software doing it for me. And that goes to the monitoring part. That's again the same thing happening again.

Jonan: It's almost like they're providing a collection of tools, but then you're still the required component to actually seal the data together, right?

Jayesh: Yes.

Jonan: You're clicking through the various dashboards and actually doing a manual trace job that we're now discovering ways to do with software and make people's lives easier. You have a view that you log into immediately at a glance and know the health of the entire system across all of these various features in the observability space. You mentioned eBPF briefly, another good way to start collecting data.

I really like your point about the transition from a data collection-driven industry to where we are today, where it's much more about providing the value as quickly as possible and help people draw those conclusions, drawing the conclusions for people from their data, and presenting them in a convenient way. The reality is today; if you're going to start a company in observability and you want to use your proprietary protocols to communicate between your agents and your collector, then no one's going to use your product. The open-source ecosystem has grown so quickly and has so much force behind it right now. You have no choice but to adopt things like OpenTelemetry.

Jayesh: Exactly. And the choice is not a packet. You can still build things on top of it, which you feel that users might get value out of it. But you still have to be compatible to all these things. Yes, I can build my own agent if I can provide more value than what OpenTelemetry provides as of now or maybe two years, three years back, OpenTracing, OpenSensors.

So I have to be compatible with the open standard. But obviously, if I'm able to provide more value, it makes sense. But if I'm doing the same thing as everybody is doing out there and still people have to use the agent which my company built, it doesn't make sense. So that's where the whole thing will evolve that; okay, the data collection part already saw the problem then what next? So now we have to focus on use cases definitely.

And one of the interesting use cases, observability. So the data is there, how you can do better observability. If a problem happens, how you can provide insight to a user, relevant information to a user on a dashboard instead of the user building ten custom dashboards with ten different metrics. And then you are just populating traces to it or logs to it and then giving them. So showing the relevant information as per the issue that's where the intelligence is coming in, more similar to the human interaction. I ask you a question, and you give me an answer. I'm not saying that that should be done automatically. That's maybe a very big ask at this point. But at least if I'm saying that the CPU requests for this particular service has spiked up, so if I come to a dashboard, I should see some of the relevant information related to that particular event.

And that's where most of the value propagation will be driven, I guess. Because you can't really tell the user to do most of the things if you are saying that it's an observability platform, or it can do a lot of things for you. So most of the cognitive load on the user should be handled by the system itself or should be handled by the software which is providing the statistic. Because when I have to jump through these things, I have to do some polishing now in my mind. It's a cognitive load of the mind. So if that can be done by a system, it's the best thing there.

Jonan: And that's the goal of what we're trying to achieve here, I think as an industry. We're shifting the cognitive load away from the user. Developers and engineers, during their day-to-day work they have plenty of things to worry about. And most often, when they are going to be using our products, they're trying to discover when something is going wrong why it is going wrong and quickly. And hopefully, they already have the tooling in place to achieve that.

I think the collection side is getting to a place right now where with a few exceptions, things like eBPF, we're still discovering new ways to collect more interesting data. But with few exceptions, we have most of the data collection piece pretty dialed in right now. It's entirely about how we can make those moments of crisis or even the day-to-day work of engineers easier. I would love to see a world where we're using this as an embedded part of our workflows.

I know that for a long time, I've been installing New Relic in my personal applications. It is one of the first things I install on anything new that I make because I want to be able to know when I ship something that it's broken. I remember many edge of my seat moments early on in my career watching for that dive in traffic or the little red logo to come up because I was worried about having deployed. And now we have so many other protections in place like CI/CD are very common practices. It's a little bit of a different workflow.

But I think the future of what we have here from the visualization space, and that observability piece, getting the cognitive load, using tools like AI and machine learning, that's, I think, a pretty predictable track. But going forward over the next couple of years, I wonder if you have any idea of something that is coming that maybe not a lot of people are aware of or something you see happening in the near future in this space. So the real goal here is, of course, to let you make a guess so we can have you back and accuse you of being wrong sometime in the future. But maybe you'll be right. Maybe you'll be the lottery winner.

Jayesh: [laughs]

Jonan: Nobody can predict technology.

Jayesh: If I am right, I can be the thought leader in the next two, three years. [laughs] I predicted this. I can go to Twitter and say I predicted this on this podcast. You can quote me. [laughs]

Jonan: Exactly. And then you can get a thought leader t-shirt and go to the secret thought leader club. It's going to be great.

Jayesh: I hope it's great. So before jumping into what the future might look like, I might give a little bit of context on what's happening today. So we talked about what experience we should give to a developer by reducing cognitive load; the data analytics part is already handled and giving them relevant information when they want something fixed.

But as we are talking about very complex systems, and as you mentioned, the systems are getting more complex. The points of failure are increasing between the CI/CD, Kubernetes, the cloud development, and all these things. There are too many unknown unknowns, which wasn't the case 10 years back, considering they were not most of this microservice world. But most of the services, you know how those systems scaled because they were predictable failure modes.

Right now, when I'm dealing with 120, 130 microservices, I'm talking way less here. So, this is a very, very small scale we're talking. So even if I don't deal with 120, 130 microservices, it's very hard to predict what's going to be. And you can't really build dashboards for that. You can't really say if this thing failed once, I should add it to my dashboard, considering it might fail again. So at this point, this becomes your known failure. But there can be thousands of ways for the system to fail.

So one of the things on this front I'm looking forward to…and this might happen or might not happen, but currently, I have to create an alert for everything. At least I have to set basic alerting that if this CPU thing spiked, send me an alert. If this latency goes beyond my SLA or SLO, send me an alert. But if these things can be automated, if a system can retain a state somewhere and say, "This is the working state, and if this fails at one point, maybe we should send an alert to a user."

So alerting is a very important component when it comes to monitoring observability and all those things in general because that's where a user will start debugging. That's how a user will know something failed. That's how a user will know something happened to a system because I'm not sitting with a coffee every day in the morning and looking at my dashboard. I won't do that, and nobody wants to do that.

So alerting is a very important part, and there can be some kind of automated alerting mechanisms where people don't even have...they can set a basic alert, but then the system has taken care of some of those components like okay, that user forgot to set an alert on this thing, but it might be relevant. It might be important as per their experience. Maybe the ML can chip in here. But I don't really like to talk about AI/ML perspectives in this particular domain, considering there's data. Obviously, if there's data there, you can do a lot of machine learning around it, but it's systems you're dealing with. And for many customers, it's directly correlated to revenue. So if one prediction fails, you might be losing some revenue there. It's still a very traditional thing when it comes to the whole space itself. But yes, like some kind of intelligence, some kind of automation which we can reduce the burden on the developer DevOps folks will always help.

And some more things on the use cases perspective, as we are talking about democratization of data, the data collection aspect, as data is there, you can build different use cases from data. And one of the interesting use cases we are working on Traceable is security. So we build API security on the same kind of data. And we are helping our customers to find while navigating the application, detect attacks, solve them, mitigate them. So that is one interesting use case.

And one thing we found out was if you are able to detect if you're able to discover, if you're able to observe your application if you're able to find API, if you're able to find services, the entities, if you're able to find them, you can do lots of things around it. Finding the information is the important part. That's already done. That part we have handled. So on top of it, you can build different use cases. Security is one product.

And what users are really looking for, even in general, is not showing them what went wrong, giving them actionable insights. Okay, this went wrong; it’s fine. But now, what I can do for it? And that's why even for security, people don't just want an alert if something goes wrong. They want actionable insight for every single thing. And that's where the evolution might happen. Analytics is one of the interesting use cases out there. And there are many people building interesting things around it.

And that's where I'm looking forward to what happens because it's such an interesting space. As we mentioned several times, the data is there. So what we built on top of data, the opportunities there are endless. So you can build your next interesting use case with that data. And it will definitely help someone out there who is looking for that part of it. So it's an interesting space, and a lot of interesting things will happen in the future, I guess.

Jonan: I think you touched on a couple of really interesting points there around the anomaly detection piece that's happening right now. Like many of the major providers, New Relic included, we have automated anomaly detection where you learn over time what the system is expected to be doing, and then you see something go well out of that band. And then you alert automatically to let people know. And I think you came into a little bit with the conversation around security about those actionable next steps.

I'm looking forward to a world where I get in there, and I am presented with a situation like this region of AWS is experiencing network latency issues. Do you want to spin up all of your pods in these clusters over here and then let those ones perform this work instead? It's almost like Clippy, the Clippy popping up in my Word document being like, I see you're trying to type a title. Do you want to make it big? Yeah, I want to just have an easy menu, get in there, solve the problem. Maybe someday we get to a place where these things can actually resolve some number of these issues on their own. You tie your observability solution into your actual Kubernetes cluster itself, and it's able to react. You have this reactive kind of system across the way. It's a very exciting future for observability.

So we've done our prediction piece. I think this was a fair number of predicting things. And the last question that I like to ask everyone on the show is what advice you might give to yourself in the past version of Jayesh. Or someone coming up today who would like to be in your shoes they aspire to someday work in the space where you work or maybe have approximately the same career path; what advice would you have for them or yourself in the past?

Jayesh: Sure. Yes. So one advice I like to give is read.

Jonan: To read? You said to read?

Jayesh: Yes, so to read, anything. So if you are jumping into anything new, not just observability, not just monitoring, any space for that matter, start reading, start reading around the space. Start reading books, start reading maybe good articles around it. Not all the blog posts because I defer to blog posts a bit. If you are new to the space and if you don't understand a lot of things, defer reading blog posts considering blog posts mainly come from one perspective. And you might develop that perspective from reading one type of content. So that's not the first thing to start with but instead, starting with the book, starting with the standard material out there, starting with playing with the parts in the space that helps. You actually learn how people are doing things. And you actually learn what problems people are solving, what problems you might want to solve one day, what problems are still there, what pain points are still there, what gaps are still there in the whole system.

Talk to people. Talking to people is one thing I prefer a lot. Even this week, I talked to six or eight folks around this space like what they are doing and even developers and DevOps folks, what different pain points they have, and what different gaps they're still filling there, observability tracing or maybe monitoring pipelines and what issues they think the systems can solve or these modern solutions can solve. So this is an interesting thing to learn about that...maybe two things, talk to people, read standard material. Read books, and that way, you will learn more than...and that way you will build your own perspective, that way, you will build...some people call it first principles. You'll build your own perspective. You'll build your own context.

And you might be able to give good insights than most of the people already in the industry because you came here, and you learned. And then now you know that okay, you have your own ideas. Always build your own ideas and see, like, if I'm using this, do I want to use it this way? Is there a better way I can use this system? Is there a better way this software can be utilized? And that's how you start developing your own ideas. That's how you start developing your own perspective around that, and that always helps. And I like reading a lot. So that's always the first advice for me for anything, anything in general.

Jonan: It's really good advice. It is. And I think you're absolutely right about going back and finding objective material and maybe avoiding things like blog posts and opinion pieces early on. It's very easy to then suddenly find yourself regurgitating the opinions that you read online instead of understanding why those opinions exist and maybe which ones are incorrect.

Jayesh: Yes. Once you mature enough in the ecosystem, you'll know if these things matter or not, or if it matters, to what extent. People throw opinions at every single take. And even if you go on Twitter, you will find ten different things about the same exact point. And maybe some of them make sense, and some of them don't. So you have to choose your own path there. And to choose your own path to see what makes sense, you have to know what you're looking for. And to know what you're looking for, again, you have to read content which is not biased or which isn't opinionated. It's more of standard literature, which you can process and then build your concept. And then go out there and see, okay, this doesn't match with what the book says. It doesn't make sense. [laughs]

Jonan: Yeah, absolutely, all excellent things to remember, read as much as you can, focus on those first principles, and your career will grow and thrive. It has been a pleasure having you on the show, Jayesh. I hope that we get a chance to do it again sooner. Are you going to come back and visit us in a year so we can talk about your prediction?

Jayesh: Yes, so that's the plan. I was supposed to visit, but then again, the whole COVID thing is still going on. I'm still not going somewhere, I guess. So let's see when it stops. I'm fully vaccinated, but the U.S. is not taking me as of now. [laughter]

Jonan: I'm pretty sure this whole pandemic thing is going to wrap up. In the next couple of weeks, we'll be done with this. It's going to be a while, I think. I think we're stuck for a bit.

Jayesh: We have been talking about this since last year. We said, "This will get done next year and maybe next few months." And this just keeps happening. At this point, I'm not really sure how things will turn up. But let's be very grateful about the places we are in. We still have jobs. We still have better lives than most of the people out there. We are still safe. So maybe we can just be grateful for that and move on with the things. We can't do anything about it. It's not on our hands at this point. Whatever we would have done, we might have done. [laughs]

Jonan: That's a beautiful perspective to have. We have a lot to be grateful for working in tech. Well, thank you so much for coming on the show. I hope to see you someday in person as soon as that is viable. And until then, we'll keep on having you on our podcast. Thank you so much, Jayesh. I hope you have a wonderful day.

Jayesh: I'm looking forward to it. Have a great day, and thanks for having me. We had a very interesting conversation, and hopefully, we can do it in person soon.

Jonan: Yes. Always a pleasure. Take care.

Thank you so much for joining us. We really appreciate it. You can find the show notes for this episode along with all of the rest of The Relicans podcasts on therelicans.com. In fact, most anything The Relicans get up to online will be on that site. We'll see you next week. Take care.

Blog

Collective Collection – Shifting Cognitive Load with Jayesh Ahire

Mandy Moore

Collective Collection – Shifting Cognitive Load with Jayesh Ahire

Observy McObservface

Join Our Newsletter. No Spam, Only the good stuff.

Related