Book Review: Weapons of Math Destruction

Epistemic Status: Minus One Million Points

Shortness Status: Long (this is a proposed new norm I want to try out, in the sense of ‘apologies for writing a long letter, I did not have enough time to write a shorter one.’ By contrast, Against Facebook was longer in words, but would be short.)

Weapons of Math Destruction is an easy read, but a frustrating one.

The book claims to be about the misuse of big data and machine learning to guide decisions, how that harms people and leads to bad outcomes, and how to fix it. The distortions of aiming at and rewarding what we are measuring rather than what we actually want worries me more and more. It is one of the biggest issues of our age, so I was excited to read a new take on it, even if I expected to already know the bulk of the facts and ideas presented.

There is some of that in the book, and those parts provide some useful information, although if you are reading this you likely already know a lot of it.

I.

What the book is actually mostly about on its surface, alas, is how bad and unfair it is to be a Bayesian. There are two reasons, in her mind, why using algorithms to be a Bayesian is just awful. 

The first objection is that probabilistic algorithms are probabilistic. It is just awful that predictive algorithms are used to decide who should get or keep a job, or get cheaper credit, or see certain advertisements, because the algorithm might be wrong. Look at this example of someone the algorithm got wrong! Look at this reason the algorithm got it wrong! Look how wrong it is! Clearly we need to rely on humans, who get things wrong more often, but do so in a less systematic fashion so we can’t prove exactly why any given human got something wrong.

The second objection is that algorithms rank people and options likely to be better above people and options likely to be worse. It is just awful that an algorithm notices that people who have bad credit, or live in a certain zip code, or shop at a certain store, or share some other trait, are either a worse or a better business proposition. You see, this is not fair and probably makes you a racist. This is because the people who are ranked either worse or better tend to be poor, and/or they tend to be not white, and that is just awful. If the resulting system gives them more attention in some way – say, by marketing to them to sell them things they might want and offering them discounts, or providing them with more government attention – then you are taking advantage of them, being a predator and destroying their lives, which you should at least have the common decency to do without an algorithm. If the resulting system gives them less attention in some way, by not marketing to them and charging them more, or by providing them with less government attention – then you are discriminating against them by denying them opportunities and services, which is not fair. Once again, you could at least have the decency to do this without an algorithm so no one can be sure exactly how you made your decisions. And again, since this is likely correlated with race, that also makes you a racist. Which is of course just awful. 

These evil algorithms are sneaky. If you give them race as an input, they’ll pick up on the correlations involved and show how racist they (and you) are. Since you of course are not racist, you hide that data (but of course she will blame you for that too, since if you hide that data, we can’t see just how racist you truly are). So instead the evil algorithms notice things that are correlated with race, like income or zip code, and use those instead. So then you try to hide those, and then the algorithms get even sneakier and start picking up on other things that correlate in less direct ways. Or even worse, perhaps they do a good job of figuring out the actual answer, and the actual answer happens to be correlated with some trait you have to be fair to and therefore the algorithm is just awful. 

She also doesn’t like it when humans make similar decisions without the use of algorithms, but somehow that makes it better, because you can’t point to the rule that did it. Besides, did you really expect the humans to ignore data they have and act against their own interests? Well, yes, she and similar people do expect this, or at least think not doing so is just awful, but they understand that there are limits.

She never uses the term, but basically she is arguing against disparate impact when compared against completely random decisions – that in the end, for a given set of groups, if a system does not result in equal outcomes for that group, it is not fair to that group, and for some groups this is just awful and means we need to ban the system, and force people to use worse systems instead that are bad proxies for what you are trying to measure. Then you complain about how the proxies are.

That is not the most charitable way of describing the argument being made, but I do not think it is a straw man, either. This is what the author explicitly claims to believe.

II.

In something that is not a coincidence, the way I react to such arguments was brought home by Sarah’s excellent recent post, The Face of the Ice. There are man versus man stories, where we are competing for resources including social and sexual status, and then there are man versus nature stories where we are talking about survival. When dealing with potentially big issues, ones that can threaten our very survival, the temptation is to refuse to realize or admit that, and instead focus on the man versus man aspects and how you or the groups you like are being treated unfairly and how that is just awful. 

Thus, people talk about the unemployment that will be caused by self-driving cars instead of thinking, whoa, this will transform our entire society and way of life and supercharge our ability to move around both people and goods and maybe we should be both super excited and super worried about that for bigger reasons. People see that we are developing artificial intelligence… and worry about whether it will be racist or sexist, or our plans for income redistribution, rather than what to do with orders of magnitude more wealth and whether it will wipe out the human race because we are made of atoms it could use for something else, and also wipe out all utility in the universe. Which are questions I spend a lot of time on, since they seem rather important.  But if you admit that the problems are that big, you would have to stop playing zero sum status games.

III.

The good news is that the author does provide some good starting points to thinking about some of the real problems of big data. Rather than discard the facts that do not fit her narrative, she to her credit shares them anyway. She then tends to move on without noticing the implications of those thoughts, but my standards are low enough that I consider that a win. She also has the strange habit of noting that the thing she is talking about isn’t really a ‘weapon of math destruction’ but it has the potential to be one if things went a little farther in the direction they are headed.

One could even engage in a Straussian reading of the book. In this reading, the real problem is the distortions and destructive games that result from big data algorithms. The constant warnings about the poor are real enough, and point out real problems we should address, but are more important as illustrations of how important it will be for us to get good treatment from the algorithms. At its most basic level, you are poor, so the algorithm treats you badly, and you fix that by not being poor. Not being poor is a good idea anyway, so that works out well. If we start using more and more convoluted proxies? We might have a much bigger problem.

(The unspoken next line is, of course, that if we use these proxies as optimization targets or training data for true artificial intelligence, that would be infinitely worse, but I do not think she gave such issues any thought whatsoever.)

This is why her best discussion is about college rankings. She makes the case that it is primarily the US News & World Report college rankings, and the choices those rankings made, that have caused the explosion in tuition and the awful red queen’s race between different colleges. While I am not fully convinced, she did convince me that the rankings are a much more important cause than I realized.

My abstraction of her story is simple. Before the ratings, everyone knew vaguely what the best universities were (e.g. Harvard and Yale), and by looking carefully one could figure out vaguely how good a school was, but it was very difficult to know much beyond that. The world silently cried out for a rating system, and US News & World Report made the first credible attempt at creating such a system. They chose a whole bunch of things that one could reasonably assume would correlate with quality, such as selectivity of admissions and the accomplishments of graduates, along with a few things that one could at least hope would be correlated with quality, especially if you were measuring and thus controlling for other factors, such as graduation rates. Then, to make sure the ratings had a shot at looking reasonable rather than weird, they included a survey they sent out to colleges.

What they did not include was the cost of tuition, because higher tuition correlates with higher quality, and they wanted the ‘high quality’ colleges like Harvard to come out on top, not whatever state university turned out to be the best value for your dollar.

The result of this was a credible list that students and potential faculty and those evaluating students and faculty could use to evaluate institutional quality. Eventually, the ratings evolved to include less weight on the surveys and more on various measurements. Students used the guide as a key input in choosing where to go to college, which was reasonable since their alternative measurements were terrible. Those evaluating those students also used the guide, especially since admission rates were a key input, so going to a top rated college became an advantage in and of itself, even if the rating wasn’t based on anything.

Since everyone in the system was now using the ratings as a key input in their evaluations, colleges then started devoting a lot of attention to moving up in those ratings, and other similar ratings that came later. A lot of that effort meant improving the quality of the university, especially at first. Some places (she uses the example of TCU) invested in athletics to attract better students and move up. Others worked to make sure students did what they needed to do in order to graduate, or helped their students find good jobs, or even just tried to improve the quality of the education their kids got.

Then there were those who tried to pass the test without learning the material. Some tried to get more applicants to look more selective. I had a personal experience with this. Stanford University sent me a nice card congratulating me on my qualifying for the USAMO, and asked me to consider their fine educational institution. This was before I had started my ongoing war with San Francisco, so I would have welcomed a chance to go to that institution, but my high school only allowed us to apply to seven colleges, and my GPA was substantially below the lowest GPA anyone at my high school had ever had while being accepted to Stanford. My rough math put my chances of admission at 0%, so I had no intent of wasting one of my precious seven slots on them instead of a place I might actually gain admission.

My parents did not understand this. All they saw was that Stanford had asked me to apply, and Stanford was awesome so I was applying to Stanford, whether I liked it or not. This led to me having an admissions officer from Stanford on the phone, telling her that both of us knew Stanford was never going to accept me, and would she please just tell my parents that for the love of God. I didn’t want to plead my case for admission because I knew I had none, I knew that and she knew that, but of course revealing this doesn’t help Stanford so she kept saying that of course every application is carefully considered and we hope you can welcome you to the class of 2001.

This was the first time someone from San Francisco decided to act superficially nice while screwing up my life for the tiniest possible gain to themselves. It was not the last.

In any case, this problem has since gotten much worse. At least back then I knew my safety schools would accept me, whereas now schools that notice you are ‘too good’ for them will reject you, because you’re not going to attend anyway, so why not improve their numbers instead of holding out the vain hope that they were the only place not to notice your criminal record, or worse, your sexist Facebook post? Thus the game gets ever more frustrating and complicated, and punishes even more those who refuse to play it.

All of these games cost money to play, but you know what the schools aren’t being rated on? That’s right, tuition! So they are free to spend ever more money on all the things, and charge accordingly, and the students see this as another sign of a quality institution. She doesn’t mention student loans, which massively contribute to this problem. This is consistent with her other blind spots, since student loans are good and increased tuition is bad, but that story does not conflict with the story being told here, and I did update in favor of tulip ratings mattering more and tulip subsidies mattering less.

Would a lot of that have happened anyway? Certainly, especially given that other ratings would have come out instead. But it seems logical that when a decision can be distilled down into a pretty good number that considers some but not all factors, then people will focus on gaming that number, and the factors that don’t improve that number will be ignored even if they matter more. Goodhart’s Demon will win the day.

IV.

Other sections are less convincing, but I will go over them quickly.

She talks about getting a job, and how bad it is that there are algorithms evaluating people. Even more than elsewhere in the book, it felt like she was writing the bottom line. This resulted in some confused arguments that she knew were not good, but that she used either because she believes her conclusion or because you should be using a Straussian reading.

The first argument against algorithms in employment is that sometimes they miss out on a good employee. While obviously true, this isn’t saying much, since every other method known to man does this, and most do it far more often, so this objection is like calling self-driving cars unsafe because they might kill people 10% as often as human drivers, instead of human drivers who I am confident do it 100% as often.

The second argument is that the algorithms are used in many different places, so different decisions will be correlated, and those who score poorly won’t be able to find a job at all, whereas in the old method different places used different systems so you could just keep applying and eventually someone would take a liking to you and give you a chance. This does point to the paradox that it seems like it is easier to get a job if everyone’s ratings are different, despite the fact that the same number of people end up with jobs, so it cannot be easier in general to find a job, rather than increasing the returns to perseverance: The randomized ratings make it harder to find a job on the first try, because you face more other applicants that will be rated highly (since they do not automatically find jobs due to the random factor). However, if you apply a lot more than others, your chances go up, whereas if every job uses a common application, more tries does not help you much, and a low scorer is drawing dead.

In some sense this change is good, since it means less time wasted with job applications and results in better matching, but in another sense it is bad because it cuts out the signal of how much the applicant cares. Having to apply for lots of jobs in order to find one means that those who want jobs the most will get the jobs (or the better jobs) since they will send the costly signal of applying more often, whereas in the algorithmic world, that confers no advantage, so those who need a job the most could be shut out by those who don’t care much. Costly signals can be good! So there’s at least some argument here, if it is too hard for the algorithm to measure how much you want the job.

The problem of a mistake-making algorithm is also self-correcting in a free market. If the algorithm makes mistakes, which of course it does, and enough of your competition follow its recommendations, you can get great employees at discount prices with high motivation by having humans look carefully to find the good employees the algorithm is turning down. This is especially true if the algorithm is using proxies for race, class, sex or other such categories (argument three that she uses) since those are known to throw out a lot of great people. She answers her own third objection by pointing out that the old system of ‘get a friend to recommend you’ is overall more discriminatory in the bad sense, on every axis both good and bad, than any algorithm being used in the wild.

Her talk about what happens on the job is similar. Yes, these algorithms make mistakes and sometimes evaluate good teachers as bad teachers. Yes, some of them have tons of noise in them. But what is the alternative? If these systems are not on average improvements why are corporations (and governments) using them more and more? The argument she relies on, that sometimes the algorithms make dumb mistakes, is very weak. Humans make really, really dumb mistakes all the time.

What she does not mention in either section, but is the real issue with such things, is that the system will be gamed and that gaming it might take over people’s lives. This is even more glaring due to her using teachers as an example, as teaching to the test is rapidly taking over all of primary and secondary education (or so I am told). Teaching was already a thankless job, and it seems like it is becoming more and more of a hell every year.

If there is an algorithm that will determine who can get hired for entry-level jobs, how long will it take before people learn what it is looking for? How long after that do they start sculpting their resumes and answers to that algorithm? How long after that do they start to post on Facebook what the system wants to see, take the classes it wants them to take, buy the products the algorithm wants to see them buy? Where does it end? Do we all end up consulting a get-hired strategy guide before we choose a pizza place, unless we already have a job, in which case we consult the get-promoted guide?

Then how does the algorithm respond to that action, and how do we respond in kind? How deep does this go?

Those questions terrify me. They don’t keep me up at night, because I belong to the General Mathis school of not letting things keep me up at night (this is why I had to quit the online game of Advanced Civilization), but they are a reasonable choice if you need something to do that for you.

She also notes that a lot of this involves using increasingly convoluted and strange measures, such as mysterious ‘e-scores’ and personality tests, that do not correlate all that well with results, and which she assumes tend to be discriminatory. She contrasts this to IQ tests and credit scores, which are much better predictors and tend to discriminate less and be more ‘fair’ because they only measure what you have done and what you can do, rather than what category of person your past signals that you belong to. She then demands that we do something about this outrage.

I agree with her that IQ tests and credit scores sound way better. It is a real shame that we decided to make it illegal to use them in hiring decisions. So if we want better measures, there’s a solution. I don’t think she is going to like it.

The section on insurance brings up the paradox of insurance. As the purchaser, you have a bunch of knowledge about how likely you are to need insurance. As the insurer, the company has some information it can use to estimate how likely you are to need it, and how much it will cost them when you do. There are then two problems. The first is that if many people only buy when your hidden information says you will need the insurance, and/or when you intend to engage in behaviors that make the insurance more valuable, then it becomes very hard to sell anyone insurance. That’s classic and she doesn’t talk much about it, because it is the consumer benefiting at the expense of a corporation, but if there was a big data algorithm that the consumer could use to decide how much insurance to buy, what would that do to the insurance market? What would happen if it was illegal for the seller of insurance to look at it, or the calculation required too much private data? Could insurance markets collapse? Is this in our future?

Instead she talks about problem two, which is if the insurer uses the information they know to decide who is likely to need insurance, they might start charging different amounts to different people. This would result in people being effectively charged money for their life histories and decisions, which is of course just awful. If poor people cost more to insure, for example (and she says that in many cases this is true), they might have to pay more. As you might guess, I am not sympathetic. This sounds like people paying for the additional costs that their decisions and lifestyles create. This should result in people making better decisions. If this has bad distributional consequences, which it might, the right answer is progressive taxation and redistribution (to the extent that you find this desirable).

Again, she misses that the real problem would be if people started trying to change the outcome of the algorithm and whether the system would be robust enough to get them to do this via ‘do thing that actually decreases expected insurance payouts and is socially good’ rather than ‘do thing that manipulates the system but does not actually accomplish anything.’ She does hint at this a bit when she talks about wellness systems put in place by employers, and how they are sometimes imposing stupid lifestyle costs on employees, but she thinks of this as corporations trying to steal wages by charging some employees more fees, rather than as corporations trying to use algorithms to improve employee health, and the problems that result from that disaster.

This pattern is quite frustrating, as she keeps touching on important and interesting questions, only to pull back to focus on less interesting and less important ones.

One real concern she does point out is that some insurance companies use their systems to figure out who is likely to do more comparison shopping, and give higher prices to those likely to do less comparison shopping. Humans do this all the time, of course, but that does not make it a good thing. When an algorithm makes something easier to do, it can increase the harm and force us to confront something that wasn’t previously worth confronting. If everyone does this to you, and all the companies raise your prices by 10%, you’re paying 10% more no matter how much you shop around. Then again, it would be very much to a company’s advantage to have a way for you to tell them that no really you did comparison shop, since figuring out what that signal is represents a costly signal that you will actually put in the work to comparison shop, so this equilibrium also seems unstable, which makes me worry about it less. There’s also the issue of comparison websites, which also credibly signal that the user is doing comparison shopping.

Finding credit, another of her sections, is another place where we are already at the phase of everyone gaming the system all the time. When I moved out to Denver, I couldn’t get any credit. This made me quite angry, since I had always paid all my bills, but it turns out that the algorithms think that if you have not borrowed money, you might not pay borrowed money back. As a human, I think that if you never borrow money, it’s a great sign that you don’t need to, so of course you’ll pay it back (and thought this was obvious logic, and that the way you convince the bank to give you a loan is to prove that you don’t need one).

As a result, I had to get a pre-paid credit card so that I could explicitly owe someone money and then pay them back, even though I didn’t really ever owe anyone anything, so that I could then get a regular credit card with a tiny limit, so I could actually owe someone money for real, and pay that back, and so on in a cycle until a few years later when I get periodic new credit card offers in the mail with giant credit lines. We pay our bills on time in large part to protect our credit ratings, and also do other things to help our credit ratings. In this case, the system seems stable. If we decide that group of things X gives you a high credit rating, then the willingness to do lots of X is a great sign that you are worthy of credit even if X has nothing to do with anything! If you take the time to make sure your credit report looks good, I do in fact trust you to pay your bills.

This is an example of a great outcome, and it would be good to put more thought into how we got there. A strong argument she could make, but does not make (at least explicitly) is that we got there because credit ratings exclude lots of data they could use, but choose not to thus giving people control over those ratings in important ways, and preventing those ratings from intruding on the rest of our lives. Of course, the right way to respond to this is to allow people to use credit ratings for more things, thus crowding out other measures that use data we would rather not involve, instead of banning credit scores, which invites the use of whatever data we can find.

The sections on online advertising and civic life did not seem to raise any new and interesting concerns, so I’m going to skip over them, other than to echo her and issue the periodic public service announcement that for profit universities are almost all scams or near scams, you should never, ever, ever use them, and anything that gives them access to potential victims is scum and deserves to burn in hell.

V.

I would say that given my expectations, the book was about a 50th percentile result. That’s not a disaster, but it is a failure, because book utility has a huge fat positive tail. Given you have read this far, I can’t recommend that you read the book, since I do not think you would get much more out of reading the whole thing. If you are especially interested, though, it is a quick and mostly painless read and does have some useful facts in it I glossed over, so you could do a lot worse. I certainly do worse with my time reasonably often.

Advertisements
This entry was posted in Death by Metrics, Economic Analysis, Personal Experience, Reviews and tagged , , , , . Bookmark the permalink.

12 Responses to Book Review: Weapons of Math Destruction

  1. srconstantin says:

    I have met the author in person and am pretty sure she did not intend a Straussian reading.
    This is a person who left academia to work in finance and then was *shocked and appalled* that her fellow traders wanted to make money.

    • TheZvi says:

      She did seem like a comically poor fit for a trading firm, as well as the next job after that, where she was once again shocked that people would try to make money.

      The best part about Straussian readings, of course, is that they need not be intentional!

  2. James Miller says:

    Interesting review.

  3. Pingback: Rational Feed – deluks917

  4. James Cropcho says:

    Well put, Zvi.

  5. sniffnoy says:

    Her talk about what happens on the job is similar. Yes, these algorithms make mistakes and sometimes evaluate good teachers as bad teachers. Yes, some of them have tons of noise in them. But what is the alternative? If these systems are not on average improvements why are corporations (and governments) using them more and more?

    There do seem to be alternative explanations here. Legibility improvements don’t always yield overall improvements, after all. In a worse possibility, it could also be due to a need to appear to be doing something.

  6. If people want to hear the author’s arguments without having to suffer through the book, you can check out the [Econtalk episode about the book](http://www.econtalk.org/archives/2016/10/cathy_oneil_on_1.html). Russ Roberts is a master of being a skeptical interviewer without ever being adversarial, and he lets the arguments made by his guests stand or fall on their own.

    I got a similar feeling to Zvi after listening to that episode: O’Neill identifies real problems, but comes at them from an angle that can only serve to make them worse.

  7. Kaj Sotala says:

    What the book is actually mostly about on its surface, alas, is how bad and unfair it is to be a Bayesian. There are two reasons, in her mind, why using algorithms to be a Bayesian is just awful.

    Admittedly I’ve only read a quarter of the book so far, but this hasn’t been my reading of the book at all.

    I don’t think she’ said anything that could be fairly be described as being “against Bayesianism”: rather, she’s been against models which use questionable proxies for the variables they’re actually interested in, which have highly questionable statistical validity, whose creators aren’t incentivized to subject them to objective criteria, which aren’t tested against new information nor updated to changing conditions, which create self-fulfilling prophecies, which are claimed to be impartial and objective when their assumptions are just as susceptible to bias as anything else, or which are deliberately obfuscated and kept hidden from outside inspection because they are often just ways of justifying decisions with no rational basis.

    I thought this was pretty clearly the message in the opening chapter, where she contrasted baseball models and “WMDs”:

    > Baseball models are fair, in part, because they’re transparent. Everyone has access to the stats and can understand more or less how they’re interpreted. […] Baseball also has statistical rigor. Its gurus have an immense data set at hand, almost all of it directly related to the performance of players in the game. Moreover, their data is highly relevant to the outcomes they are trying to predict. This may sound obvious, but as we’ll see throughout this book, the folks building WMDs routinely lack data for the behaviors they’re most interested in. So they substitute stand-in data, or proxies. […] Baseball models, for the most part, don’t use proxies because they use pertinent inputs like balls, strikes, and hits. […]

    > Most crucially, that data is constantly pouring in, with new statistics from an average of twelve or thirteen games arriving daily from April to October. Statisticians can compare the results of these games to the predictions of their models, and they can see where they were wrong. Maybe they predicted that a left-handed reliever would give up lots of hits to right-handed batters—and yet he mowed them down. If so, the stats team has to tweak their model and also carry out research on why they got it wrong.

    > Did the pitcher’s new screwball affect his statistics? Does he pitch better at night? Whatever they learn, they can feed back into the model, refining it. That’s how trustworthy models operate. They maintain a constant back-and-forth with whatever in the world they’re trying to understand or predict. Conditions change, and so must the model.

    > Now, you may look at the baseball model, with its thousands of changing variables, and wonder how we could even be comparing it to the model used to evaluate teachers in Washington, D.C., schools. In one of them, an entire sport is modeled in fastidious detail and updated continuously. The other, while cloaked in mystery, appears to lean heavily on a handful of test results from one year to the next. […]

    > A model’s blind spots reflect the judgments and priorities of its creators. While the choices in Google Maps and avionics software appear cut and dried, others are far more problematic. The value-added model in Washington, D.C., schools, to return to that example, evaluates teachers largely on the basis of students’ test scores, while ignoring how much the teachers engage the students, work on specific skills, deal with classroom management, or help students with personal and family problems. It’s overly simple, sacrificing accuracy and insight for efficiency.

    To me, a fairer reading of the book would not be “Bayesianism is bad”, but rather “Bayesianism is good but using shitty models with no connection to reality is bad, especially if you claim that they are somehow objective and use them to decide things that determine how people’s lives turn out”.

    Again, this is admittedly based on only the first quarter of the book, but you said in the beginning of the reason that she basically only has two reasons for thinking that algorithms are bad: that they are probabilistic and that algorithms rank people. Not only do both seem to be false claims (since both are also true for baseball models, which she explicitly held up as an example to emulate), even if that was true, even in that first quarter of the book that I’ve read so far, she has already brought up all the other reasons that I’ve listed and hammered them in many times.

    • TheZvi says:

      Interesting take; I’m curious to hear what you think after reading further. I think that the more charitable take is easier to apply early than it is to apply late. I’d also note that Sarah met the author in person, and thinks my take is what the author believes, which I consider strong confirmation.

      I do think that the baseball example is enlightening, as she agrees that in this case people are doing something good and right. But what makes baseball models different?

      She talks explicitly about that, and you quote most of the relevant stuff, so let’s look directly.

      One good difference is that baseball has larger sample sizes. Most of the time in the real world, we need to work with much worse data; my attempts to do statistics on even the NFL show how data-poor you are compared to MLB, which is extremely data-rich. She understands that having the results of 600 at bats or 200 innings pitched (with some 800 at bats) is a lot more accurate than what we do elsewhere.

      The dirty secret is that even this is not enough. It’s enough to distinguish Zvi from Kaj, because we’re not remotely close (I have no idea who is better, but it’s 95%+ to not be close) but when you are trying to sort the best 0.01% of players, it’s a lot harder. Entire seasons go by, a player is in a slump, and people argue over what is going on, because yes it could just be random. Bullpen pitchers especially get labeled in both directions due to pure random variance, all the time.

      Baseball statistics went through basically three phases. In the first phase, they measured the wrong things (e.g. batters had HR, RBI and AVG, while pitchers had wins). These still correlated reasonably well, but a lot of mistakes were made and they cost teams games a lot. In the second era, we used better stats, like WAR and OPS, although a lot of fans and coverage is stuck largely in era one, because the era one stats are easy to understand and relate to events that seem to matter. This is what she loves about baseball stats, that they are based only on the results of the games – they’re objective in ways people can touch and understand.

      Then we got to the third era, and things like PECOTA came out that started rewarding people for who they most resemble in their career arcs and at what age players typically start to decline. She would not like that (and to find out how much she would not like that, keep reading). They’re not fair, they’re not easy to understand (although still relatively easy to grok if you know baseball, because basically all good things correlate). But they’re starting to use proxies, and extrapolations, and to discriminate on the basis of ‘unfair’ things.

      When I say that she objects to something being probabilistic, I don’t mean that she hates probability; I mean that she hates it when it is produced by a model using not-obviously-fair inputs. This is the same thing where if polling is sometimes off by three points people say polling is bad, whereas a pundit can be wrong 50% of the time and keep their job just fine, because there are different standards. Likewise with self-driving cars that are blamed for only being 90% safer than regular drivers rather than 100% safe. Once you start doing this weird thing, and claiming to know things (even probabilistically) you are blameworthy because you are sometimes wrong. You get a (predictably) isolated demand for rigor of the highest order.

      If you fed everything you know about high school pitchers into a neural network, including scouting reports, pitching speed and accuracy, height and weight, and so forth, and used that to offer college scholarships, she would strongly object to that as unfair, even if it was proven to create winning teams, unless it was ALWAYS right. If it missed even one talented kid, that would be a crime. She would especially object if it turned out to favor kids that were from advantaged backgrounds, as it likely would, since baseball is an expensive skill to learn. By contrast, if you looked only at high school ERA and wins, give or take some amount of affirmative action she would likely be OK with that, because it is ‘objective’, even if the resulting team reliably sucked, which it would.

      A lot of this comes from her strong feeling that using certain types of inputs in your model is fundamentally unfair and wrong. I think she does bite the bullet that using these inputs in human evaluations is also wrong, but humans end up getting a pass (up to a point, they’re still not allowed to be unusually racist/sexist/etc even when it would improve their results) because humans do this stuff automatically and she accepts, to her credit, that asking us to fully correct for such things is not reasonable.

      In terms of ranking people, I think she thinks it is not OK to rank ‘ordinary’ people in any systematic way that impacts their lives, unless those rankings are 100% accurate AND the rankings do not systematically disadvantage any group that has a right to complain (e.g. they cannot favor rich over poor, or men over women, etc). My guess is that ranking star athletes feels different because professional athletes are opting in and playing with house money, and I think that point is even reasonable as far as it goes – they are all going to be OK and they’ve invited the rankings into their lives by their choice of profession. I also think the ‘impacts their lives’ thing matters too. When someone gets hurt, that changes everything.

      To circle back to the question of being a Bayesian, she seems to be saying (by implication) you can be a Bayesian as long as it doesn’t actually matter. If you have tons of data, and that data directly measures the thing you are measuring, then the non-Bayesian and Bayesian answers converge on the right answer. If it’s OK to be a Bayesian exactly when it doesn’t impact your beliefs, then being a Bayesian isn’t OK.

      • Kaj Sotala says:

        Thanks for the response. Since you mentioned that the charitable take is easier to apply early than late in the book, I think I’ll finish reading it first, then get back to you. 🙂

  8. Pingback: Best of Don’t Worry About the Vase | Don't Worry About the Vase

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s