AGI Alignment and Karl Popper

I quit the Effective Altruism forum due to a new rule requiring posts and comments be basically put in the public domain without copyright. I had a bunch of draft posts, so I’m posting some of them here with light editing.


On certain premises, which are primarily related to the epistemology of Karl Popper, artificial general intelligences (AGIs) aren’t a major threat. I tell you this as an expert on Popperian epistemology, which is called Critical Rationalism.

Further, approximately all AGI research is based on epistemological premises which contradict Popperian epistemology.

In other words, AGI research and AGI alignment research are both broadly premised on Popper being wrong. Most of the work being done is an implicit bet that Popper is wrong. If Popper is right, many people are wasting their careers, misdirecting a lot of donations, incorrectly scaring people about existential dangers, etc.

You might expect that alignment researchers would have done a literature review, found semi-famous relevant thinkers like Popper, and written refutations of them before being so sure of themselves and betting so much on the particular epistemological premises they favor. I haven’t seen anything of that nature, and I’ve looked a lot. If it exists, please link me to it.

To engage with and refute Popper requires expertise about Popper. He wrote a lot, and it takes a lot of study to understand and digest it. So you have three basic choices:

  • Do the work.
  • Rely on someone else’s expertise who agrees with you.
  • Rely on someone else’s expertise who disagrees with you.

How can you use the expertise of someone who disagrees with you? You can debate with them. You can also ask them clarifying questions, discuss issues with them, etc. Many people are happy to help explain ideas they consider important, even to intellectual opponents.

To rely on the expertise of someone on your side of the debate, you endorse literature they wrote. They study Popper, they write down Popper’s errors, and then you agree with them. Then when a Popperian comes along, you give them a couple citations instead of arguing the points yourself.

There is literature criticizing Popper. I’ve read a lot of it. My judgment is that the quality is terrible. And it’s mostly written by people who are pretty different than the AI alignment crowd.

There’s too much literature on your side to read all of it. What you need (to avoid doing a bunch of work yourself) is someone similar enough to you – someone likely to reach the same conclusions you would reach – to look into each thing. One person is potentially enough. So if someone who thinks similarly to you reads a Popper criticism and thinks it’s good, it’s somewhat reasonable to rely on that instead of investigating the matter yourself.

Keep in mind that the stakes are very high: potentially lots of wasted careers and dollars.

My general take is you shouldn’t trust the judgment of people similar to yourself all that much. Being personally well read regarding diverse viewpoints is worthwhile, especially if you’re trying to do intellectual work like AGI-related research.

And there aren’t a million well known and relevant viewpoints to look into, so I think it’s reasonable to just review them all yourself, at least a bit via secondary literature with summaries.

There are much more obscure viewpoints that are worth at least one person looking into, but most people can’t and shouldn’t try to look into most of those.

Gatekeepers like academic journals or university hiring committees are really problematic, but the least you should do is vet stuff that gets through gatekeeping. Popper was also respected by various smart people, like Richard Feynman.

Mind Design Space

The AI Alignment view claims something like:

Mind design space is large and varied.

Many minds in mind design space can design other, better minds in mind design space. Which can then design better minds. And so on.

So, a huge number of minds in mind design space work as starting points to quickly get to extremely powerful minds.

Many of the powerful minds are also weird, hard to understand, very different than us including regarding moral ideas, possibly very goal directed, and possibly significantly controlled by their original programming (which likely has bugs and literally says different things, including about goals, than the design intent).

So AGI is dangerous.

There is an epistemology which contradicts this, based primarily on Karl Popper and David Deutsch. It says that actually mind design space is like computer design space: sort of small. This shouldn’t be shocking since brains are literally computers, and all minds are software running on literal computers.

In computer design, there is a concept of universality or Turing completeness. In summary, when you start designing a computer and adding features, after very few features you get a universal computer. So there are only two types of computers: extremely limited computers and universal computers. This makes computer design space less interesting or relevant. We just keep building universal computers.

Every computer has a repertoire of computations it can perform. A universal computer has the maximal repertoire: it can perform any computation that any other computer can perform. You might expect universality to be difficult to get and require careful designing, but it’s actually difficult to avoid if you try to make a computer powerful or interesting.

Universal computers do vary in other design elements, besides what computations they can perform, such as how large they are. This is fundamentally less important than what computations they can do, but does matter in some ways.

There is a similar theory about minds: there are universal minds. (I think this was first proposed by David Deutsch, a Popperian intellectual.) The repertoire of things a universal mind can think (or learn, understand, or explain) includes anything that any other mind can think. There’s no reasoning that some other mind can do which it can’t do. There’s no knowledge that some other mind can create which it can’t create.

Further, human minds are universal. An AGI will, at best, also be universal. It won’t be super powerful. It won’t dramatically outthink us.

There are further details but that’s the gist.

Has anyone on the AI alignment side of the debate studied, understood and refuted this viewpoint? If so, where can I read that (and why did I fail to find it earlier)? If not, isn’t that really bad?


Elliot Temple | Permalink | Messages (0)

Betting Your Career

I quit the Effective Altruism forum due to a new rule requiring posts and comments be basically put in the public domain without copyright. More info. I had a bunch of draft posts, so I’m posting some of them here with minimal editing.


People bet their carers on various premises, outside their own expertise, e.g. AGI (alignment) researchers commonly bet on some epistemology without being experts on epistemology who actually read Popper and concluded, in their own judgment, that he’s wrong.

So you might expect them to be interested in criticism of those premises. Shouldn’t they want to investigate the risk?

But that depends on what you value about your career.

If you want money and status, and not to have to make changes, then maybe it’s safer to ignore critics who don’t seem likely to get much attention.

If you want to do productive work that’s actually useful, then your career is at risk.

People won’t admit it, but many of them don’t actually care that much about whether their career is productive. As long as they get status and money, they’re satisfied.

Also, a lot of people lack confidence that they can do very productive work whether or not their premises are wrong.

Actually, having wrong but normal/understandable/blameless premises has big advantages: you won’t come up with important research results but it’s not your fault. If it comes out that your premises were wrong, you did the noble work of investigating a lead that many people believed promising. Science and other types of research always involve investigating many leads that don’t turn out to be important. So if you find a lead people want investigated and then do nothing useful, and it turns out to be an important lead, then some other investigators outcompeted you. People could wonder why you didn’t figure out anything about the lead you worked on. But if the lead you work on turns out to be a dead end, then the awkward questions go away. So there’s an advantage to working on dead-ends as long as other people think it’s a good thing to work on.


Elliot Temple | Permalink | Messages (0)

Attention Filtering and Debate

I quit the Effective Altruism forum due to a new rule requiring posts and comments be basically put in the public domain without copyright. More info. I had a bunch of draft posts, so I’m posting some of them here with minimal editing.


People skim and filter. Gatekeepers and many other types of filters end up being indirect proxies for social status much more than they are about truth seeking.

Filtering isn’t the only problem though. If you have some credentials – awards, a PhD, a popular book, thousands of fans – people often still won’t debate you. Also, I certainly get through initial filtering sometimes. People talk with me some, and a lot more people read some of what I say.

After you get through filters, you run into problems like people still not wanting to debate or not wanting to put in enough effort to understand your point. We could call this secondary filtering. Maybe if you get through five layers of filters, then they’ll debate. Or maybe not. I think some of the filters are generated ad hoc because they don’t want to debate or consider (some types of) ideas that disagree with their current ideas. People can keep making up new excuses as necessary.

Why don’t people want to debate? Often because they’re bad at it.

And they know – even if they don’t consciously admit it – that debating is risky to their social status, and the expectation value for their debating result is to lose status.

And they know that, if they lose the debate, they will then face a problem. They’ll be conflicted. They will partly want to change their mind, but part of them won’t want to change their mind. So they don’t want to face that kind of conflict because they don’t know how to deal with it, so they’d rather avoid getting into that situation.

Also they already have a ton of urgent changes to make in their lives. They already know lots of ways they’re wrong. They already know about many mistakes. So they don’t exactly need new criticism. Adding more issues to the queue isn’t valuable.

All of that is fine but on the other hand anyone who admits that is no thought leader. So people don’t want to admit it. And if an intellectual position has no thought leaders capable of defending it, that’s a major problem. So people make excuses, pretend someone else will debate if debate is merited, shift responsibility to others (usually not to specific people), etc.

Debating is a status risk, a self-esteem risk, a hard activity, and they maybe don’t want to learn about (even more) errors which will lead to thinking they should change which is a hard thing they may fail at (which may further harm status and self-esteem, and be distracting and unpleasant).


Elliot Temple | Permalink | Messages (0)

Friendliness or Precision

I quit the Effective Altruism forum due to a new rule requiring posts and comments be basically put in the public domain without copyright. More info. I had a bunch of draft posts, so I’m posting some of them here with minimal editing.


In a debate, if you’re unfriendly and you make a lot of little mistakes, you should expect the mistakes to (on average) be biased for your side and against their side. In general, making many small, biased mistakes ruins debates dealing with complex or subtle issues. It’s too hard to fix them all, especially considering you’re the guy who made them (if you had the skill to fix them all, you could have used that same skill to avoid making some of them).

In other words, if you dislike someone, being extremely careful, rigorous and accurate with your reasoning provides a defense against bias. Without that defense, you don’t have much of a chance.

If you have a positive attitude and are happy to hear about their perspective, that helps prevent being biased against them. If you have really high intellectual standards and avoid making small mistakes, that helps prevent bias. If you have neither of those things, conversation doesn’t work well.


Elliot Temple | Permalink | Messages (0)

Hard and Soft Rationality Policies

I have two main rationality policies that are written down:

  1. Debate Policy
  2. Paths Forward Policy

I have many other smaller policies that are written down somewhere in some form, like about not misquoting or giving direct answers to direct questions (like say "yes" or "no" first when answering a yes or no question. then write extra stuff if you want. but don't skip the direct answer.)

A policy I thought of the other day, and recognized as worth writing down, is my debate policy sharing policy. I've had this policy for a long time. It's important but it isn't written in my debate policy.

If someone seems to want to debate me, but they don't invoke my debate policy, then I should link them to the debate policy so they have the option to use it. I shouldn't get out of the debate based on them not finding my debate policy.

In practice, I link the policy to a lot of people who I doubt want to debate me. I like sharing it. That's part of the point. It’s useful to me. It helps me deal with some situations in an easy way. I get in situations where I want to say/explain something, but writing it every time would be too much work, but some of the same things come up over and over, so I can write them once and then share links instead of rewriting the same points. My debate policy says some of the things I want frequently want to tell people, and linking it lets me repeat those things with very low effort.

One can imagine someone who put up a debate policy and then didn't mention it to critics who didn't ask for a debate in the right words. One can imagine someone who likes having the policy so they can claim they're rational, but they'd prefer to minimize actually using it. That would be problematic. I wrote my debate policy conditions so that if someone actually meets them, I'd like to debate. I don't dread that or want to avoid it. If you have a debate policy but hope people don't use it, then you have a problem to solve.

If I'm going to ignore a question or criticism from someone I don't know, then I want to link my policy so they have a way to fix things if I was wrong to ignore them. If I don't link it, and they have no idea it exists, then the results are similar to not having the policy. It doesn't function as a failsafe in that case.

Some policies offer hard guarantees and some are softer. What enforces the softer ones so they mean something instead of just being violated as much as one feels like? Generic, hard guarantees like a debate policy which can be used to address doing poorly at any softer guarantee.

For example, I don't have any specific written guarantee for linking people to my debate policy. There's an implicit (and now explicit in this post) soft guarantee that I should make a reasonable effort to share it with people who might want to use it. If I do poorly at that, someone could invoke my debate policy over my behavior. But I don't care much about making a specific, hard guarantee about debate policy link sharing because I have the debate policy itself as a failsafe to keep me honest. I think I do a good job of sharing my debate policy link, and I don't know how to write specific guarantees to make things better. It seems like something where a good faith effort is needed which is hard to define. Which is fine for some issues as long as you also have some clearer, more objective, generic guarantees in case you screw up on the fuzzier stuff.

Besides hard and soft policies, we could also distinguish policies from tools. Like I have a specific method of having a debate where people choose what key points they want to put in the debate tree. I have another debate method where people say two things at a time (it splits the conversation into two halves, one led by each person). I consider those tools. I don't have a policy of always using those things, or using those things in specific conditions. Instead, they're optional ways of debating that I can use when useful. There's a sort of soft policy there: use them when it looks like a good idea. Making a grammar tree is another tool, and I have a related soft policy of using that tool when it seems worthwhile. Having a big toolkit with great intellectual tools, along with actually recognizing situations for using them, is really useful.


Elliot Temple | Permalink | Messages (0)

A Non-Status-Based Filter

Asking people if they want to have a serious conversation is a way of filtering, or gatekeeping, which isn’t based on social status. Regardless of one’s status, anyone can opt in. This does require making the offer to large groups, randomized people, or something else that avoids social status. If you just make the offer to people you like, then your choice of who to offer conversations to is probably status based.

This might sound like the most ineffective filter ever. People can just say “yes I want to pass your filter” and then they pass. But in practice, I find it effective – the majority of people decline (or don’t reply, or reply about something else) and are filtered out.

You might think it only filters out people who were not going to have a conversations with you anyway. However, people often converse because they’re baited into it, triggered, defensive, caught up in trying to correct someone they think is wrong, etc. Asking people to make a decision about whether they want to be in a conversation can help them realize that they don’t want to. That’s beneficial for both you and them. However, I’ve never had one of them thank me for it.

A reason people dislike this filter is they associate all filters with status and therefore interpret being filtered out as an attack on their status – a claim they are not good enough in some way. But that’s a pretty weird interpretation with this specific filter.

This filter is, in some sense, the nicest filter ever. No one is ever filtered out who doesn’t want to be filtered out. Only this filter and variants of it have that property. Filtering on anything else, besides whether the person wants to opt in or out, would filter out some people who prefer to opt in. However, no one has ever reacted to me like it’s a nice filter. Many reactions are neutral, and some negative, but no one has praised me for being nice.

Useful non-status-based filters are somewhat difficult to come by and really important/valuable. Most filters people use are some sort of proxy for social status. That’s one of the major sources of bias in the world. What people pay attention to – what gets to them through gatekeeping/filtering – is heavily biased towards status. So it’s hard for them to disagree with high status ideas or learn about low status ideas (such as outliers and innovation).


Elliot Temple | Permalink | Messages (0)

Controversial Activism Is Problematic

EA mostly advocates controversial causes where they know that a lot of people disagree with them. In other words, there exist lots of people who think EA’s cause is bad and wrong.

AI Alignment, animal welfare, global warming, fossil fuels, vaccinations and universal basic income are all examples of controversies. There are many people on each side of the debate. There are also experts on each side of the debate.

Some causes do involve less controversy, such as vitamin A supplements or deworming. I think that, in general, less controversial causes are better independent of whether they’re correct. It’s better when people broadly agree on what to do, and then do it, instead of trying to proceed with stuff while having a lot of opponents who put effort into working against you. I think EA has far too little respect for getting wider spread agreement and cooperation, and not trying to proceed with action on issues where there are a lot of people taking action on the other side who you have to fight against. This comes up most with political issues but also applies to e.g. AI Alignment.

I’m not saying it’s never worth it to try to proceed despite large disagreements, and win the fight. But it’s something people should be really skeptical of and try to avoid. It has huge downsides. There’s a large risk that you’re in the wrong and are actually doing something bad. And even if you’re right, the efforts of your opponents will cancel out a lot of your effort. Also, proceeding with action when people disagree basically means you’ve given up on persuasion working any time soon. In general, focusing on persuasion and trying to make better more reasonable arguments that can bring people together is much better than giving up on talking it out and just trying to win a fight. EA values persuasion and rational debate too little.


Suppose you want to make the world better in the short term without worrying about a bunch of philosophy. We try to understand the situation we’re in, what our goal is, what methods would work well, what is risky, etc. So how can we analyze the big picture in a fairly short way that doesn’t require advanced skill to make sense?

We can look at the world and see there are lots of disagreements. If we try to do something that lots of people disagree with, we might be doing something bad. It’s risky. Currently in the world, a ton of people on both sides of many controversies are doing this. Both sides have tons of people who feel super confident that they’re right, and who donate or get involved in activism. This is especially common with political issues.

So if you want to make the world better, two major options are:

  • Avoid controversy
  • Help resolve controversy

There could be exceptions, but these are broadly better options than taking sides and fighting in a controversy. If there are exceptions, correctly knowing about them would probably require a bunch of intellectual skill and study, and wouldn’t be compatible with looking for quicker, more accessible wins. A lot of people think their side of their cause is a special exception when it isn’t.

The overall world situation is there are far too many confident people who are far too eager to fight instead of seeking harmony, cooperation, working together, etc. Persuasion is what enables people to be on the same team instead of working against each other.

Causes related to education and sharing information can help resolve controversy, especially when they’re done in a non-partisan, unbiased way. Some education or information sharing efforts are clearly biased to help one side win, rather than focused on being fair and helpful. Stuff about raising awareness often means raising awareness of your key talking points and why your side is right. Propaganda efforts are very different than being neutral and helping enable people to form better opinions.

Another approach to resolving controversy is to look at intellectual thought leaders, and how they debate and engage with each other (or don’t), and try to figure out what’s going wrong there and what can be done about it.

Another approach is to look at how regular people debate each other and talk about issues, and try to understand why people on both sides aren’t being persuaded and try to come up with some ideas to resolve the issue. That means coming to a conclusion that most people on both sides can be happy with.

Another approach is to study philosophy and rationality.

Avoiding controversy is a valid option too. Helping people avoid blindness by getting enough Vitamin A is a pretty safe thing to work on if you want to do something good with a low risk that you’re actually on the wrong side.

A common approach people try to use is to have some experts figure out which sides of which issues are right. Then they feel safe to know they’re right because they trust that some smart people already looked into the matter really well. This approach doesn’t make much sense in the common case that there are experts on both sides who disagree with each other. Why listen to these experts instead of some other experts who say other things? Often people already like a particular conclusion or cause then find experts who agree with it. The experts offer justification for a pre-existing opinion rather than actually guiding what they think. Listening to experts can also run into issues related to irrational, biased gatekeeping about who counts as an “expert”.

In general, people are just way too eager to pick a side and fight for it instead of trying to transcend, avoid or fix such fighting. They don’t see cooperation, persuasion or harmony as powerful or realistic enough tools. They are content to try to beat opponents. And they don’t seem very interested in looking at the symmetry of how they think they’re right and their cause is worth fighting for, but so do many people on the other side.

If your cause is really better, you should be able to find some sort of asymmetric advantage for your side. If it can give you a quick, clean victory that’s a good sign. If it’s a messy, protracted battle, that’s a sign that your asymmetric advantage wasn’t good enough and you shouldn’t be so confident that you know what you’re talking about.


Elliot Temple | Permalink | Messages (0)

Rationality Policies Tips

I quit the Effective Altruism forum due to a new rule requiring all new posts and comments be basically put in the public domain without copyright, so anyone could e.g. sell a book of my posts without my consent (they’d just have to give attribution). More info. I had a bunch of draft posts, so I’m posting some of them here with minimal editing. In general, I’m not going to submit them as link posts at EA myself. If you think they should be shared with EA as link posts, please do it yourself. I’m happy for other people to share links to my work at EA or on social media. Please share stuff in whatever ways you think are good to do.


Suppose you have some rationality policies, and you always want to and do follow them. You do exactly the same actions you would have without the policies, plus a little bit of reviewing the policies, comparing your actions with the policies to make sure you’re following them, etc.

In this case, are the policies useless and a small waste of time?

No. Policies are valuable for communication. They provide explanations and predictability for other people. Other people will be more convinced that you’re rational and will understand your actions more. You’ll less often be accused of irrationality or bias (or, worse, have people believe you’re being biased without telling you or allowing a rebuttal). People will respect you more and be more interested in interacting with you. It’ll be easier to get donations.

Also, written policies enable critical discussion of the policies. Having the policies lets people make suggestions or share critiques. So that’s another large advantage of the policies even when they make no difference to your actions. People can also learn from your policies and use start using some of the same policies for themselves.

It’s also fairly unrealistic that the policies make no difference to your actions. Policies can help you remember and use good ideas more frequently and consistently.

Example Rationality Policies

“When a discussion is hard, start using an idea tree.” This is a somewhat soft, squishy policy. How do you know when a discussion is hard? That’s up to your judgment. There are no objective criteria given. This policy could be improved but it’s still, as written, much better than nothing. It will work sometimes due to your own judgment and also other people who know about your policy can suggest that a discussion is hard and it’s time to use an idea tree.

A somewhat less vague policy is, “When any participant in a discussion thinks the discussion is hard, start using an idea tree.” In other words, if you think the discussion is tough and a tree would help, you use one. And also, if your discussion partner claims it’s tough, you use one. Now there is a level of external control over your actions. It’s not just up to your judgment.

External control can be triggered by measurements or other parts of reality that are separate from other people (e.g. “if the discussion length exceeds 5000 words, do X”). It can also be triggered by other people making claims or judgments. It’s important to have external control mechanisms so that things aren’t just left up to your judgment. But you need to design external control mechanisms well so that you aren’t controlled to do bad things.

It’s also problematic if you dislike or hate something but your policy makes you do it. It’s also problematic to have no policy and just do what your emotions want, which could easily be biased. An alternative would be to set the issue aside temporarily to actively do a lot of introspection and investigation, possibly followed by self-improvement.

A more flexible policy would be, “When any participant in a discussion thinks the discussion is hard, start using at least one option from my Hard Discussion Helpers list.” The list could contain using an idea tree and several other options such as doing grammar analysis or using Goldratt’s evaporating clouds.

More about Policies

If you find your rationality policies annoying to follow, or if they tell you to take inappropriate actions, then the solution is to improve your policy writing skill and your policies. The solution is not to give up on written policies.

If you change policies frequently, you should label them (all of them or specific ones) as being in “beta test mode” or something else to indicate they’re unstable. Otherwise you would mislead people. Note: It’s very bad to post written policies you aren’t going to follow; that’s basically lying to people in an unusually blatant, misleading way. But if you post a policy with a warning that it’s a work in progress, then it’s fine.

One way to dislike a policy is you find it takes extra work to use it. E.g. it could add extra paperwork so that some stuff takes longer to get done. That could be fine and worth it. If it’s a problem, try to figure out lighter weight policies that are more cost effective. You might also judge that some minor things don’t need written policies, and just use written policies for more important and broader issues.

Another way to dislike a policy is you don’t want to do what it says for some other reason than saving time and effort. You actually dislike that action. You think it’s telling you to do something biased, bad or irrational. In that case, there is a disagreement between your ideas about rationality that you used to write the policy and your current ideas. This disagreement is important to investigate. Maybe your abstract principles are confused and impractical. Maybe you’re rationalizing a bias right now and the policy is right. Either way – whether the policy or current idea is wrong – there’s a significant opportunity for improvement. Finding out about clashes between your general principles and the specific actions you want to do is important and those issues are worth fixing. You should have your explicit ideas and intuitions in alignment, as well as your abstract and concrete ideas, your big picture and little picture ideas, your practical and intellectual ideas, etc. All of those types of ideas should agree on what to do. When they don’t, something is going wrong and you should improve your thinking.

Some people don’t value opportunities to improve their thinking because they already have dozens of those opportunities. They’re stuck on a different issue other than finding opportunities, such as the step of actually coming up with solutions. If that’s you, it could explain a resistance to written policies. They would make pre-existing conflicts of ideas within yourself more explicit when you’re trying to ignore a too-long list of your problems. Policies could also make it harder to follow the inexplicit compromises you’re currently using. They’d make it harder to lie to yourself to maintain your self-esteem. If you have that problem, I suggest that it’s worth it to try to improve instead of just kind of giving up on rationality. (Also, if you do want to give up on rationality, or your ideas are such a mess that you don’t want to untangle them, then maybe EA and CF are both the wrong places for you. Most of the world isn’t strongly in favor of rationality and critical discussion, so you’ll have an easier time elsewhere. In other words, if you’ve given up on rationality, then why are you reading this or trying to talk to people like me? Don’t try to have it both ways and engage with this kind of article while also being unwilling to try to untangle your contradictory ideas.)


Elliot Temple | Permalink | Messages (0)