I Vouch For MIRI

Another take with more links: AI: A Reason to Worry, A Reason to Donate

I have made a $10,000 donation to the Machine Intelligence Research Institute (MIRI) as part of their winter fundraiser. This is the best organization I know of to donate money to, by a wide margin, and I encourage others to also donate. This belief comes from a combination of public information, private information and my own analysis. This post will share some of my private information and analysis to help others make the best decisions.

I consider AI Safety the most important, urgent and under-funded cause. If your private information and analysis says  another AI Safety organization is a better place to give, give to there. I believe many AI Safety organizations do good work. If you have the talent and skills, and can get involved directly, or get others who have the talent and skills involved directly, that’s even better than donating money.

If you do not know about AI Safety and unfriendly artificial general intelligence, I encourage you to read about them. If you’re up for a book, read this one.

If you decide you care about other causes more, donate to those causes instead, in the way your analysis says is most effective. Think for yourself, do and share your own analysis, and contribute as directly as possible.


I am very confident in the following facts about artificial general intelligence. None of my conclusions in this section require my private information.

Humanity is likely to develop artificial general intelligence (AGI) vastly smarter and more powerful than humans. We are unlikely to know that far in advance when this is about to happen. There is wide disagreement and uncertainty on how long this will take, but certainly there is substantial chance this happens within our lifetimes.

Whatever your previous beliefs, the events of the last year, including AlphaGo Zero, should convince you that AGI is more likely to happen, and more likely to happen soon.

If we do build an AGI, its actions will determine what is done with the universe.

If the first such AGI we build turns out to be an unfriendly AI that is optimizing for something other than humans and human values, all value in the universe will be destroyed. We are made of atoms that could be used for something else.

If the first such AGI we build turns out to care about humans and human values, the universe will be a place of value many orders of magnitude greater than it is now.

Almost all AGIs that could be constructed care about something other than humans and human values, and would create a universe with zero value. Mindspace is deep and wide, and almost all of it does not care about us.

The default outcome, if we do not work hard and carefully now on AGI safety, is for AGI to wipe out all value in the universe.

AI Safety is a hard problem on many levels. Solving it is much harder than it looks even with the best of intentions, and incentives are likely to conspire to give those involved very bad personal incentives. Without security mindset, value alignment and tons of advance work, chances of success are very low.

We are currently spending ludicrously little time, attention and money on this problem.

For space reasons I am not further justifying these claims here. Jacob’s post has more links.


In these next two sections I will share what I can of my own private information and analysis.

I know many principles at MIRI, including senior research fellow Eliezer Yudkowsky and executive director Nate Soares. They are brilliant, and are as dedicated as one can be to the cause of AI Safety and ensuring a good future for the universe. I trust them, based on personal experience with them, to do what they believe is best to achieve these goals.

I believe they have already done much exceptional and valuable work. I have also read many of their recent papers and found them excellent.

MIRI has been invaluable in laying the groundwork for this field. This is true both on the level of the field existing at all, and also on the level of thinking in ways that might actually work.

Even today, most who talk about AI Safety suggest strategies that have essentially no chance of success, but at least they are talking about it at all. MIRI is a large part of why they’re talking at all. I believe that something as simple as these DeepMind AI Safety test environments is good, helping researchers understand there is a problem much more deadly than algorithmic discrimination. The risk is that researchers will realize a problem exists, then think ‘I’ve solved these problems, so I’ve done the AI Safety thing’ when we need the actual thing the most.

From the beginning, MIRI understood the AI Safety problem is hard, requiring difficult high-precision thinking, and long term development of new ideas and tools. MIRI continues to fight to turn concern about ‘AI Safety’ into concern about AI Safety.

AI Safety is so hard to understand that Eliezer Yudkowsky decided he needed to teach the world the art of rationality so we could then understand AI Safety. He did exactly that, which is why this blog exists.

MIRI is developing techniques to make AGIs we can understand and predict and prove things about. MIRI seeks to understand how agents can and should think. If AGI comes from such models, this is a huge boost to our chances of success. MIRI is also working on techniques to make machine learning based agents safer, in case that path leads to AGI first. Both tasks are valuable, but I am especially excited by MIRI’s work on logic.


Eliezer’s model was that if we teach people to think, then they can think about AI.

What I’ve come to realize is that when we try to think about AI, we also learn how to think in general.

The paper that convinced OpenPhil to increase its grant to MIRI was about Logical Induction. That paper was impressive and worth understanding, but even more impressive and valuable in my eyes is MIRI’s work on Functional Decision Theory. This is vital to creating an AGI that makes decisions, and has been invaluable to me as a human making decisions. It gave me a much better way to understand, work with and explain how to think about making decisions.

Our society believes in and praises Causal Decision Theory, dismissing other considerations as irrational. This has been a disaster on a level hard to comprehend. It destroys the foundations of civilization. If we could spread practical, human use of Functional Decision Theory, and debate on that basis, we could get out of much of our current mess. Thanks to MIRI, we have a strong formal statement of Functional Decision Theory.

Whenever I think about AI or AI Safety, read AI papers or try to design AI systems, I learn how to think as a human. As a side effect of MIRI’s work, my thinking, and especially my ability to formalize, explain and share my thinking, has been greatly advanced. Their work even this year has been a great help.

MIRI does basic research into how to think. We should expect such research to continue to pay large and unexpected dividends, even ignoring its impact on AI Safety.


I believe it is always important to use strategies that are cooperative and information creating, rather than defecting and information destroying, and that preserve good incentives for all involved. If we’re not using a decision algorithm that cares more about such considerations than maximizing revenue raised, even when raising for a cause as good as ‘not destroying all value in the universe,’ it will not end well.

This means that I need to do three things. I need to share my information, as best I can. I need to include my own biases, so others can decide whether and how much to adjust for them. And I need to avoid using strategies that would be distort or mislead.

I have not been able to share all my information above, due to a combination of space, complexity and confidentiality considerations. I have done what I can. Beyond that, I will simply say that what remaining private information I have on net points in the direction of MIRI being a better place to donate money.

My own biases here are clear. The majority of my friends come from the rationality community, which would not exist except for Eliezer Yudkowsky. I met my wife Laura at a community meetup. I know several MIRI members personally, consider them friends, and even ran a strategy meeting for them several years back at their request. It would not be surprising if such considerations influenced my judgment somewhat. Such concerns go hand in hand with being in a position to do extensive analysis and acquire private information. This is all the more reason to do your own thinking and analysis of these issues.

To avoid distortions, I am giving the money directly, without qualifications or gimmicks or matching funds. My hope is that this will be a costly signal that I have thought long and hard about such questions, and reached the conclusion that MIRI is an excellent place to donate money. OpenPhil has a principle that they will not fund more than half of any organization’s budget. I think this is an excellent principle. There is more than enough money in the effective altruist community to fully fund MIRI and other such worthy causes, but these funds represent a great temptation. They risk causing great distortions, and tying up action with political considerations, despite everyone’s best intentions.

As small givers (at least, relative to some) our biggest value lies not in the use of the money itself, but in the information value of the costly signal our donations give and in the virtues we cultivate in ourselves by giving. I believe MIRI can efficiently utilize far more money than it currently has, but more than that this is me saying that I know them, I know their work, and I believe in and trust them. I vouch for MIRI.






This entry was posted in Uncategorized. Bookmark the permalink.

13 Responses to I Vouch For MIRI

  1. jake says:

    I enjoy your blog and am considering donating based on your (and SSC’s) endorsement. When I looked into this in the past and read superintelligence, the arguments didn’t make sense to me and when I asked about it online (for example, why one couldn’t just program the AI to do what I mean not what I literally say and to ask if unsure about any consequences (the kinds of question I assume many people have thought about)) I received the answer that I need to understand programming better (which may be true but isn’t directly helpful).
    The argument that it helps people to think better might push me over the edge to become a supporter so I was wondering if you could elaborate on it. Could you give some examples of how thinking about AI has lead you to better thinking? I would have thought that in general our thought is too informal for it to be helped by these formal models.
    Also, how is Causal decision theory destroying the foundations of civilization if people don’t use formal decision theory in making decisions?

    • TheZvi says:

      “Learn more programming” is indeed quite unhelpful. I do assure you that DWIM is *not* a good idea or even a thing, really, although double checking strange requests or requests that seem to have large consequences is obviously a good idea as far as it goes. A lot of what MIRI does is working on the question of what exactly you *do* mean and want, and how one might define that or figure that out, which is a hard problem for an AI… and indeed a hard problem for a human. Learning to figure out what you really think, and what outcomes you actually prefer, and trying to formalize the answers to such questions, is a good example of helpful thinking. It’s kind of the idea that you don’t really know something until you can teach it, and teaching what you care about is actually super hard. I’ve had several jobs where part of the job is ‘teach the computer to do your job’ and even when you can’t actually do that, trying to helps a lot! It’s often a useful exercise to turn any given thing into pseudo-code and see how that goes.

      Similarly, functional decision theory is super useful and arose out of thinking about AIs making decisions. Some other examples are exploration/exploitation and issues surrounding hill climbing, issues surrounding Goodhart’s Law (again, DWIM is not really a thing with computers, and trying to define what you want usually doesn’t end well), consideration of corner cases (which AIs have a habit of finding and exploiting with) which is an important habit e.g. More Dakka, and so on. Logical Induction actually has useful things to say about real knowledge and real trading, even if it’s impractical as a solution to anything in its current form. Our thinking is informal, but it’s important to formalize parts of it and work with those formal parts as useful, the same way that plans are worthless slash don’t survive contact with the enemy, but planning is essential. And my thinking about what it would take to get a real AGI out of machine learning gave me a bunch of insights into how I actually think.

      As for CDT, it’s not as much about people using formal models as developing cultural norms where caring about considerations CDT would not consider is thought of as irrational, or stupid, or something losers do – that such concerns are not legitimate concerns or valid arguments. I hope that helps.

      • jake says:

        I understand why DWIM is not a thing for regular programming. What confused me is that anything intelligent enough to outsmart humanity and destroy it, is intelligent enough to understand a voice instruction to DWIM etc.
        If I understand your response correctly you are saying that this won’t work as a strategy since by the ‘time’ it reaches the level where such an instruction would be meaningful its basic ‘values’ would have needed to been set up at a much earlier stage (with all of the problems that entails e.g. if done via proxies would have the problem of Goodhart’s law). In order to be intelligent at all it would need some internal incentive system from the beginning which would need to be aligned so that it would actually be aligned to obey the instruction to DWIM. Is this basically correct? If so I should probably attempt to reread the book keeping in mind the distinction between how to give safe goals to a completed AI which has no internal goals and is waiting for instructions (non-existent but how I was picturing it) and how do you structure the internal goals which are part of the structure of the intelligence (closer to the actual issue)
        With regard to the question of impacting our thinking, you point to many examples without spelling them out. Is this because it is the working through the issues which is beneficial to clear thinking rather than the specific insights and examples from real life? Maybe I will start working through the paper you quoted on functional decision theory with an eye towards its use for human thought (probably by analogy?) and where the failures of CDC are expressed in general society. Unless you suggest a different starting point.

    • TheZvi says:

      (This is a reply to your reply)

      I like the idea of starting with FDT. The paper is one place to start, Arbital’s explanations are another if you prefer a less technical starting point. (https://arbital.com/p/logical_dt/?l=5d6, they changed the name).

      I didn’t spell out the explanations partly because yes it’s the working out that helps, partly because limited space and time, partly DWIM is not a thing. The insights I need/get are likely different from the ones you would need/get – even if we’re on roughly similar levels, what helps us along will be different at any given time. Full explanations would be posts or even sequences. If others think such things would be useful, please chime in, and I’ll consider incorporating such explanations more explicitly in the future.

      On the issue of why a super-intelligence can’t just DWIM, there are a lot of reasons why that is super hard. I think you have an important piece there. When we design the initial system, it will have some goal slash reward/utility/etc function, so we have two problems right away. First, we need to make sure that it *keeps* that reward/utility function, rather than being subject to value drift or self-modification or reward hacking or any number of other things. Humans who learn things tend to change what they care about, we don’t care that much about what our ancestors cared about, our values aren’t even really coherent, etc etc. Second, we need to define that utility function such that maximizing it, or attempting to maximize it, gives us something we want, hopefully what we would want the most, and especially hopefully doesn’t do something orthogonal to what we actually value and destroy all value in the universe slash kill us all. That would be bad. DWIM relies on the concept of DWIM on DWIM to work (e.g. what do you mean by ‘do what you mean’?) and on the idea that you meant something specific or coherent *at all*. What do you mean/want? You’d pay to know what you really think. (I’ll talk about this more soon when I review The Elephant in the Brain). So it’s not a great place to start bootstrapping. Also, it’s a common mistake to assume that anything more powerful/smarter must be more powerful/smarter in every way, and that’s not true – Alice being able to easily outsmart Bob does not mean that Alice has any idea how Bob thinks or what Bob wants, and the brilliant scientist who can’t understand human emotions is a cliche. An AGI is still code, and it’s not obvious that understanding what humans really want is easier than building a Dyson sphere. We are not especially close to solving either problem. And it’s very possible for the AGI to wipe out all value in the process of getting smart enough to figure out what we want, or by accident.

      Then there’s the problem that having the AGI implement the real meaning of whatever an individual human tells it to do *might* not be the best idea even if you got it right…

      Add to all of this that you only get one try, it might not be a try on purpose, and if you mess it up, you lose hard and permanently. Things that can’t be iterated and tested do not have a strong track record, in coding or otherwise.

      MIRI’s best guess is to use CEV, or coherent extrapolated volition. So (roughly) we’d simulate humanity for a long long time while we debated and thought about what we really wanted, and then implement that answer. It’s not clear that this would work, but it at least *might* work if you could figure out how to get that far, as an attempt to implement a sort of DWIM, but there are lots of hard problems in getting there.

      I hope that helps. Re-reading the relevant parts of Superintelliegence is a reasonable thing to try, and Eliezer’s corpus is also a good idea – helping people understand these concepts is why this whole mode of thinking called ‘rationality’ is a thing we have a group for in the first place. If we wanted to go deeper than this one-on-one at some point, we’d probably want to move to Skype, to get better feedback and avoid talking past each other.

      • jake says:

        Thank you,
        I have to think about this before following up. I did not feel like we were talking past each other but am not averse to skype.
        For the time being we donated 50$ of my family’s year end donations to MIRI as a thank you to both you and scott at SSC. I want to clarify my own thoughts before deciding about a larger donation.

  2. layman says:

    I think that AI is likely to be pursued by militaries of various countries as a valuable asset. I suppose that in the heat of such pursuit moral norms and safety regulations might be cast aside, especially if a race condition between several nation states develops. I can suppose that military, possessing its financial, human and political resources might be able to be that first institution that develops AI that reaches sentience and actively crush, steamroll and/or aquire any opposition to their cause.
    My question is thus:
    Do we even stand a chance?

    • Historically speaking, as far as I can tell, the chance of a given piece of mathematics being created first by military or financial interests has been negligible outside of cryptography. In basic science, the chance is small. The further you go towards engineering, the higher the chance gets, but if it ever got very high we wouldn’t have start-ups, and certainly no Tesla or SpaceX, since not many technologies are as militarily useful as a Mars colony or an automated factory. AGI isn’t even slightly militarily useful, and financial and military interests won’t want to build it. The danger is that they might create it accidentally, or more likely, create technology close enough to AGI that small teams of irresponsible optimists create it on a foundation that they have built.

    • TheZvi says:

      I agree with Michael that I don’t expect the first AGI to come from a military, although it’s certainly possible. If it did happen, we would have to worry about the intentions of that organization or those in command, but I don’t think that’s different in kind from our worries about other organizations that might build it, and what they would want the AGI to do. Military people still mostly want things humans value and we would value. They’d face the same basic problems, including the same race condition problems, with two military forces as we would with DeepMind and OpenAI, although the details would likely be worse for us.
      Yes, I’d rather have Larry Page or Dennis Hassabis on the trigger than a 5-star general or president, if those are the choices, but I don’t think this is inevitable doom. It does make it that much harder.

      At that point, the same questions come into play. Have we given them the tools to make it reasonable to think about and build a friendly AI, or at least one that has some chance of being safe? Have we made it easier to take paths that lead to such safety, versus paths that almost certainly doom us? Have we made the key people generally aware of why and how such unsafe systems inevitably doom us, and what would be needed for them to not doom us? The more progress we can make, the better.

      So yes, we’d still have a chance, and yes what we do here and now would still impact how big that chance would be.

  3. Pingback: Rational Feed – deluks917

  4. Pingback: The Story CFAR | Don't Worry About the Vase

  5. Ultra Bongo says:

    “MIRI is also working on techniques to make machine learning based agents safer, in case that path leads to AGI first”

    Do they have results in this presently that you know of? (~30s of googling didn’t turn up anything I found, but that still leaves a lot of potential fruit on the tree.) Despite the relative theoretical values, I find this to be more viscerally compelling as useful work. I guess maybe because it’s closer to the level I could evaluate for effectiveness and because it’s applicable now instead of n years from now and I’m discounting.

  6. Pingback: January 2018 Newsletter - Machine Intelligence Research Institute

  7. Pingback: MIRI’s January 2018 Newsletter – Errau Geeks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s