Uncertain Types part 2 (featuring Barak Michener and Eric Schles)
Marianne
Posted on March 31, 2021
Summary
Still struggling to understand how to implement uncertain types, Marianne calls on two friends to sit down with her and brainstorm different approaches. It looks more and more like adding uncertain will cause the language to scale to unpractical levels of computational complexity… then suddenly Marianne has a stroke of inspiration that changes everything.
Links
Uncertain: A First-Order Type for Uncertain Data
Barak’s Draft Implementation
Official C# reference implementation
PGMpy
Bonus Content
Become a patron to support this project.
Transcript
MB: Welcome to Part 2 of my episode on Uncertain types. The goal of implementing uncertain types is to build models that estimate system fragility … in other words, not just whether a constraint can be violated as with formal verification, but an estimate of how likely that violation actually is. My thinking is, fragile systems will have lots of paths that end up violating a constraints, small changes in one part of the system will compound easily into undesirable states.
MB: Resilient systems, on the other hand, will have few scenarios that violate constraints and those inputs will be unlikely.
MB: If I’ve lost you already, I would recommend going back to some old episodes. Specifically episode 02, episode 07, and part 1 of uncertain types last week’s episode.
MB: In that episode I did a deep dive on probabilistic programming, we heightened my anxiety around creating these kind of models with the approach laid out in the Uncertain paper from Microsoft Research. Using probability distributions feels like the scalpel— no, change that. It feels like a robot laser. Not a tool for every problem. The approach may only work for a subset of use cases that are not relevant.
MB: And even if I understand the math well enough to use the techniques effectively, my language will have to help the user understand the appropriate way to build models. A tool that helps people come to false conclusions is not a helpful tool at all.
MB: So I needed to call in reinforcements. Luckily I have friends who were willing to give their time to brain storm solutions with me.
MB: Do you two actually know each other? I know kind of like both in the near tech scene for a while, OK. cool
BM: Yeah. For a minute there I was. I'm living in New York from 2013 to 2016.
ES: So, OK, I believe the time frame I was a social person then. I don't know. I'm Eric.
BM: Yeah. And I'm Barak. So.
ES: Barak … OH really?
MB: I swear to God I did not realize their names rhymed until that moment. Now there are three things we are going through as we talk here, all of which are linked in the show notes on the Dev Community.
MB: First is the Uncertain Types paper. Then there’s an example implementation that Barak started working on when I told him what I was trying to do. Lastly there’s an example implementation in C# from the authors of the paper. This may be a little tricky to follow along to, but I’ll do my best to fill you in on the missing details.
BM: It's all coming back to me… ummm was that…. There's no magic here, though, I just. OK, it’s— ah, sorry, I'm going to make noises as I remember what I was up to. So from the reference implementation, the way the reference implementation works is that they have their basic unit of something is the type in question and a weight. I think and so everything feeds forward and the way they're doing it is they're using a LINX in C# to. Essentially read the queries and the for loops and stuff that I'm going to do manually anyway and.
BM: Generally speaking, the idea is that if you have to and this is often the best part of the paper, there's one part of the paper….. I remember being really, really helpful here. Let me see if I can do my thing here at. Uh. Yeah, OK, here's where I left off last night on the other stream…
BM: I can still hear you won't be able to see it because KVM which on and I think I still have that window with all the stuff in it open.
ES: Ahh, it's a little fuzzy.
BM: Sure, let's see if I can see where I left myself off, if I let myself off anywhere.
MB: Oh, my God, I thought I had a tab problem. This is insane.
BM: Yeah, I think I've already cleaned up since January it’s been bit… Anyway. Let's see there…. Pull up the paper.
ES: It's in the invite.
BM: Here's the reference. Implementation is at least in my history. Yeah. And, uh, let's see, um.
ES: It's in the calendar if you don't remember it,
BM: Ah! there it is, Microsoft Research. The most useful bit of this entire paper for me was this little thing right here.
MB: What he’s referring to here is a figure called UncertainType operators and methods. Which groups operators into Arithmetic operators (plus/minus), equality operators (less than, equal), logical operators (and, or, not), sampling methods and hypothesis testing. Then the figure specifies what types come in as input and what types come out as values.
MB: In other words: if you write uncertain typed variable A is equal to uncertain variable B what’s return is not a True or False, but a distribution of Trues and Falses …. right?
BM: Idea being that there are. Two or three different types we're actually talking about here.
ES: Yeah,
BM: One is this is the general and uncertain type of type T. In general type T ends up being mostly a float, right?
ES: Yeah. Basically,
BM: Bernoulli is a uncertain type only where this is a bool.
ES: Right.
BM: And again, this is sort of a float, if you representable is like a 50 50 thing or a weighted thing. But let's go with the bool for the moment and then a sampling distribution is the same thing, except for there are no. It's literally just a materialized array of. Values,
ES: Right
BM: Where these are populations, this is this population statistics like median mean and standard deviation right here. This is actually just a real list of values, whatever they may be.
ES: Yeah, right.
BM: Cool. And this all works very well as you move as you kind of move things forward or if you run through all these operators and even the sampling methods outlined here. Because you can sample things and you can combine different things, and it's an optimization to say that I can combine the means and standard deviations if there's a meaningful way to do so.
ES: Right.
BM: If there's an equal quality, obviously says yes, either boolean output. So that's why this is an output. Same with logical things. If there's no good way of doing so, we just sample the heck out of it. And you could then turn that back into a thing. Sort of. Kind of. But then you're losing something. And I didn't go that far.
MB: So that that was actually one of my core questions with like implementing this for my purposes. Right? Is that my feeling for reading the papers that the bulk of the what they were talking about was assuming that you're dealing with independent variables. And because I'm constructing a model, they're all dependent variables right? So they’re sort of talking about like this is how you do it, dependent variables. And they've lost me a little bit on the math there. But I was actually wondering, like, is the best way forward to figure out how to represent these probability distributions and do basic operations on them? Or is the best way forward to actually have the distribution and then when the model is running, draw like a certain sample of likely values at a certain example of unlikely values and then run the model a bunch of times with those values to get to the the same conclusion that I want, which is like how much of an impact is an outlier value have in the performance of the model?
BM: ……Let's take that sentence, that's a very good sentence, I think that will cover like the entire hour and more. So let's try and take that down piece by piece.
BM: So independent and dependent variables. And that's the conditional probabilities. I also had trouble with because I wanted to say was what I was trying to figure out that I kind of only got part of the way into before I needed to really refactor. Everything was the whole trace of how I got here. That's how I was implementing it. That may not be the right way to do it, but that's the way I was thinking about it.
MB: OK.
BM: Which is I could instead of imagine, these are like all things that are flowing through each other, right? Like I have an AND operation and then an OR or whatever. And there's all they’re independent in the sense that I don't have any memory on them. But if I add memory to them, so I say, OK, I initially sampled this from here and it had this value here. And because it had this value here, it had this value here and then it had this value here after I did this to it. Right? Mm hmm. And then I have an output sample where the sample give me its entire history of how it got there and all the variables that it went through to get there. Yeah. Then I can write a function on saying, OK, well, look at your history and tell me based on your dependent, like based on the thing that I'm depending on. Whether I wanna do this or not. So this is. And this is where they were using LINX to effect in the C# implementation is essentially this is a where clause on the history. It's a thing where …..let’s see, because I know all the history and everything, and I'm doing something like. And probability of X given Y and Y is somewhere in the chain of X, I can then say, OK, I'm going through all the X is find the cases where that Y was true because it's in the history of X and then that's my sample. I reject anything else.
ES: Have you ever seen a probabilistic graphical model implemented?
MB: I’m just going to say no, because I know I have never seen that
BM: I haven't.
ES: Ok, so PGMs literally, that's the goal of PGMs, right? You start with—I’ve implemented PGMs before. They're really,,, you know, they're kind of trivial. So in the paper, they talk about Bayesian networks. Right? Like, Uhhh… I got to pull it up because. I know. I mean, so like somehow this paper was very confusing to me, honestly, because they like mentioned Bayesian networks a bunch of times and they’re just kind of like ….. “look at all these machine learning things, look at all these machine learning things.” They never really I don't know. It was just very confusing to me. It's like you're not giving formal definitions of anything and you just keep mentioning the benefits. And it's like if you repeat it enough times— see we introduce a Bayesian network, semantics and like, they use this a lot, but they never really define a Bayesian network beyond, like, some trivialized thing. Right. So, like, this is what they're saying is a Bayesian network, but ….not really.
BM: Right.
ES: Yes. You know, this is this is like their notion of addition. Right. And so if a… if these are not independent, like if you have one Gaussian and it's conditional on another one, then like you would just you can just implement a PGM. So, like, there's you know, I worked on PGMpy back in the day. So like this, I mean they change the reference API so much since I contributed to it. But I think it's still I think it's still in here. The thing I wrote but I would look at this like this is this is like a good place to go. If you want to understand how the conditional probabilities are going to, you know, connect. Yeah, the PGMpy’s implementation is is is perfectly adequate, I think, for your for your needs, for conditional probabilities. And then it looks like, Barak, it looks like you've got everything else. Just chain through that. Then you're set. I'm going to try to reason you through it, but I'm going to fail. Probably just …..crap, I haven't looked this reference implementation in a long time. It might be too hard for me to walk you through what they have. Yeah, OK. They've really clean this up. When I did it, it was garbage.
MB: WellI, I will point out that the wonderful thing about GitHub is that you can go to history and find the garbage that you checked in that you might be more comfortable walking us through.
ES: Ohhhh no one wants to look at my garbage code, you don’t want to look at that.
MB: Just a thought!
ES: I know but we don’t want to look at that.
ES: But so, like, I have a question and this is important. Are you trying to do a truly Bayesian, you know, regression model or do you want to trivially just sort of like take a maximum likelihood estimation?
MB: I don't know, because I am not entirely sure what I'm doing. So, OK, this this is kind of like basic, completely pointless model that I sketch out. Right? And this is what I'm trying to represent in code this idea that you have certain stocks that have certain units of value in them and then you have the flows that are basically rate of change going from one stock to the other. All this is great, super easy, love doing it. But the reality of it is that a lot of times the stuff I'm trying to model, like, I don't know, an exact value, that's not a realistic component of the model, but I know what a likely value is. Right? I know what an unlikely value is. So I thought it would be interesting to sort of inject a little bit of probabilistic programming into it and allow people to express models by taking values and saying, like, look, I don't actually know what this should be, doesn't have a default value, but I have a probability distribution of what the potential values of this are. And give me some— I think essentially the big thing that I'm curious about is can…. A model run with probabilities kind of give you an idea of how stable the model is, right? So that question of like how far away from the mean. …can you get before you start, the rest of the model starts to get into an unsafe state, and how likely is that?
BM: Another version of that question is how far away does it get from the initial conditions? I'm sorry.
ES: Yeah, that's for sure. But, um, I mean, it sounds like you want to Bayesian like you just want you just want you just. Yeah. You just want a Bayesian model. So the right thing to do in that case is you define your distributions, right. You have your distribution types and then you just have a sampler that samples from your prior distribution, which is what you would encode and gives you a posterior, and then that is your area boundary around your model output.
MB: Yeah, so that was sort of the question I was asking in the beginning, because, like this this model will run for, let's say, five steps, which means all of these rates will execute five times to all of these units will change depending on the rates. Right? And so I was looking at this in terms of like, well, we can keep if we say that this this big stock right here is like a distribution. Right. We could keep that as a distribution and like, I guess alter the distribution, which each iteration of the model or we could say shoot off some potential values, given the shape of that distribution and run that model with those potential values and say, OK, this is what this model looks like with the like highly likely values, and this is what it looks like with the outlier values. And this is the difference between them. I'm not sure which one is the more sensible way of going. I know which one is—
ES: They’re both sensible. It just depends what you need. Right. So, like, OK, so, you know, frequent statistics, right? We just we just literally take the maximum likelihood of our distribution. That's what we do anything. And that gets us most of the way. And then you put confidence intervals against some things sometimes. And that's like how you measure uncertainty in your model. Right. And that's it. Like and then the Bayesian approach is you start with the distribution, you sample against it, and then you get a posterior and your posterior. If it's the right, you know, prior distribution, like going into your thing, you as a as a matter of course, for your optimization strategy, you end up with a like a posterior that like is small and gives you reasonable our error bounce. And then it's just obvious what your like prediction is. So both of those things are reasonable. It's a question of what you want and it's a question of like what makes sense for your setup.
MB: I think….. So what I what I want to answer that in the easy non-math way is the ability to find situations where you have one part of the way your models configured that is really, really volatile. Right. Like you're hanging by a thread here that value wobbles just a little bit, like the entire model's behavior shifts pretty dramatically.
ES: Are we talking a statistical model? I just want to make sure on the same page.
MB: Not a statistical model, they're called compartmentalized models, which is not useful. It's like it's just a system model.
ES: Ok, all right, I'm going to say I'm with you now. Thank you. So you've got some… OK, so you're talking about modeling physical systems with probabilities.
MB: Exactly.
ES: I mean, I think you should just model it as a PGM.
MB: OK
ES: Really what you want. I think you are literally just doing a prediction on that. Given all this conditional probabilities will literally return to you, the state your overall system risk, because it'll like it'll predict it like, you know. So, yeah, because you just said, like failure, like your failure condition to be like true. And you're like success conditioned to be false or vice versa, depending on how you feel that day. And then you just literally do a prediction. I mean, you could use a neural network, you could use something else, but then you don't have it's not obvious how your system is going to operate, you know. So, like, I see why you looked at this paper because you wanted an uncertain type. But like my my honest recommendation, you need to go a step further. You really should just be thinking about this in a model context.
MB: Yeah. I mean, I think I generally follow what you're saying. I'm just trying to, like, process this from a parsing standpoint. All of this information will be in the model, but it will be determined by the person writing the model and then like getting the compiler to then essentially encode that into how the model runs. I, I feel like I have to like have a montage of me in a library somewhere, like digging.
ES: I shouldn't be. This shouldn't be this shouldn't be that hard. Right. You just have an implicit you just have an implicit statistical model embedded in your compiler that like does the prediction. So really you just need a reference implementation. There's not it's not more sophisticated than that if you pull any reference limitation. And I'm not claiming that, like the PGM model is definitely the right one. I'm claiming I think it is. I you know, obviously you need to consider this further, but some statistical model should just be embedded in your compiler. And like, literally, you're just running a model and that's it. Like you're running a statistical model over this space and the engineer encodes the probability distributions. And then either you return a Bayesian thing or you return to frequentency things, either returning, you know, this is the probability of success and this is the probability of failure or they're returning a distribution. It's like you're the the cases where you fail here, the cases where you succeeded.
ES: It's that simple.
BM: There's maybe actually in this case a little bit with the the paper itself is what you and this is just thinking. If I were writing in your language and trying to build the building model with these things, what I would want is a is a failure predicate. I want to be able to say—
MB: Right.
BM: This is what a failure looks like. It's when, say, this box here on the bottom right is totally empty or something like that.
MB: Right. Right.
BM: And what we're doing then is this of simple case is like, OK, everyone has their initial conditions and we just run it forward with integers and everything is fine and we can tell if it hits that point. But then you're asking is OK, if I have a distribution of these things and if I do, I ever hit that?
MB: Yeah,
BM: and that's similar to a PGM I think where I'm tying it together now is then. You can sample the heck out of it and some of those will fail and some of them will fail with some amount of probability and you can tell the user then what failed? And then also this goes a little bit to my history thing, but not entirely. But it's kind of it's all about conditional probabilities where you can say, OK, for the failures. What I saw was here were the other states I started with.
MB: Yeah, yeah. OK, so it's sort of like in the same way that first order logic and a SMT solver like is does not actually run the model. It just defines a huge number of boolean rules and then like deduces all of the possible states and tells you if you have something that violates your constraint, like the the probability-- the statistical model behind the scenes isn't actually running any of the model. It is simply like calculating the statistical probabilities and identifying places where the constraints are violated. And other conditions are present
ES: A way you can think about it as it's running simulations. Yeah, that's the trivial case. It's just it's like literally it's actually trying things like from a sampler and it's just literally saying, OK, so here are my parameters on my distribution. And then like, I'm going to just run a bunch of cases and then we're going to see what kind of probabilities I get to see what results I get back. That's Bayesian samplers work. That's how Bayesian statistics work. So, you know, that's all you're really doing. So you just like try to model a bunch of times
BM: Yeah so you run your code forward. You've already got it working for the static case of a single integer. So just run it with a bunch of different integers.
MB: Ok, yeah. So that is what I started thinking around the second option of just pulling a sample and running the simulation a bunch of times. That's essentially what we're talking about. I just didn't know any of the fancy words for it. Damn it.
ES: I mean, academics need to get paid somehow, right? So I'm teasing all these terminologies are actually super important, but like, yeah, I can get really confusing. There's just a lot to think about.
MB: So now my question about this approach is, since it's based on sampling, does this get more and more complex if we add more and more uncertainty to the model?
ES: Well, I don't think so. I mean. I don't think so. I don't think so.
BM: I mean, as the model gets bigger, you'll need more samples.
ES: Yeah, yeah. I mean, it's expensive computationally, right? Yeah.
BM: I mean. For a lot of these things they had, that computational cost is a part of the business of doing models and B…
MB: Yeah, I kind of want to stop short of the this language can only be run on GPUs that you pay like millions of dollars for.
BM: I don't think you're going to run into that soon.
MB: It's true. This is true.
MB: But you know I wasn’t satisfied with this answer. I’m okay with a little resource intensive, most formal verification is a little resource intensive, but I want Fault models to be runnable on normal machines. I want models too big for a single laptop to be really truly outliers.
MB: So I kept thinking about how to break the model up into smaller runnable parts…. like could we structure models into modules that in effect act as black boxes to one another? Each flow is a function where … maybe… we calculate the range of possible return values given whatever uncertain values are in scope—HOLY SHIT.
MB: Holy shit! Guys wait… Holy shit I think I’ve got it. I can take the flow’s function and represent it as SMT. A logic puzzle like normal formal verification. But if there’s an uncertain value, the compiler will generate SMT that solves for that value, negating the assertions so that we’re looking only for scenarios that result in an undesirable state.
MB: Then we can take the probability distribution and the solution presented by the SMT solver and see what the odds of our uncertain type actually having that value.
MB: I think that will work … more to the point: I know exactly how to implement that!
Posted on March 31, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.