## bookmark_borderCelebrities, numerosity and the Weber-Fechner law

This article uses the net worth of celebrities as a practical example. Net worth values were shamelessly taken from celebritynetworth.com as of August 2020. They may fluctuate and become obsolete within days, but it does not change anything to the point of the article. Also, I will assume that you, the reader, have a net worth of $0 (trust me, it’s not going to matter). I. I recently had a discussion with my brother about Cristiano Ronaldo becoming the first billionaire footballer ever. We were both surprised, but for opposite reasons. He was surprised that no footballer ever before became a billionaire, while I was surprised that it was ever possible to reach one billion through football, even with associated income like advertisement and clothing. I think this disagreement gives some insight about the way we process large numbers. There are essentially two ways for humans to mentally handle quantities: one is called numeracy and resorts to a set of symbols with rules that tell you how to work with them. The other one is called numerosity and is some kind of analogue scale we use to compare things without resorting to symbols. To demonstrate that numerosity is more sophisticated than it looks, let’s do a thought experiment. Imagine you are in a large room with Jeff Bezos, the richest person in the world. There is a line painted on the floor, with numbers written on each end. One side is marked with a big 0, the other side is marked with «$190 billions ». Mmm, it looks like we are in a thought experiment where we have to stand on a line depending on our net worth, you think. As Jeff Bezos stands on the $190 billion mark, you reluctantly walk to the zero mark right next to the wall, where you belong. You see Bezos smirking at you from the other side. Suddenly, the door opens, and a bunch of world-class football players enter the room. Intuitively, where do you think they will stand on the line? This may come as a surprise, but compared to Jeff Bezos, the net worth of all these legendary footballers is not so different from yours (remember, you’re worth$0). Football players might be millionaires, but they are very unlikely to become billionaires, Cristiano Ronaldo being the exception. Thus, on a line from $0 to$190B, they are basically piled up right next to you. What about superstar singers?

Some singers become much richer than footballers, but they are still much closer to you than to Jeff Bezos. Let’s add a few famous billionaires. Like, people who are actually famous because they are billionaires.

Surprisingly, they are still very close to you in absolute value. Their wealth is still several orders of magnitude below Bezos. What happens if we look at big tech CEOs, like Elon Musk or Larry Page? Surely they belong to the same world as Bezos?

Now, this is indeed getting closer to Bezos. However, in absolute distance, they are still closer to you. Here is the punchline – the absolute wealth difference between Elon Musk and you is smaller than between Elon Musk and Jeff Bezos. This becomes obvious once you realize Bezos’s wealth is more than twice as much as Musk’s wealth.

II.

Why is this so counter-intuitive? This is because, unless we look carefully into the numbers, we are comparing all these large quantities using the numerosity scale, which is logarithmic. Musk has hundreds of thousands times more money than you, and only 3 times less money than Bezos. Since 3 is smaller than hundreds of thousands, you intuitively estimate that Musk is closer to Bezos than to you.

It makes sense: in the graphs above (which use linear scales), the dots for everybody under one billion are almost impossible to distinguish. If you wanted to display these people’s net worth in a readable way, you would need to use a log-scale. In the case of wealth, a log scale is especially appropriate since wealth accumulation is a multiplicative process: the more dollars you already have, the easier it is to acquire one extra dollar. In consequence, wealth can be well-approximated with a log-normal distribution, which is strongly skewed towards low values. Most values are lower than the average, but then you’ve got a few very high values that drive the mean up. A typical feature of this kind of distributions is that high values fall very far from each other. That’s why the richest human in the world (Bezos) beats the second richest (currently Bill Gates, not shown on the graphs) by a margin of several billions.

But our perception of numbers as a log-scale is not restricted to the wealth of celebrities. In fact, it appears to be an universal pattern is numerical cognition, called the Weber-Fechner law. Originally, this law is about sensory input, for example light intensity or sound loudness. But it also applies to counting objects:

In this picture (reprinted from Wikipedia), it is much easier to see the difference between 10 and 20 dots, than between 110 and 120 dots. We seem to have a logarithmic scale hard-wired into our brains.

III.

What really puzzles me about the Weber-Fechner law is that we are performing a logarithmic transformation intuitively, without thinking about it. There is evidence that it is rather innate: pre-school children have been shown to use a logarithmic number line before they learn about digital symbols. After a few years of schooling, children tend to switch away from the logarithmic line to a more linear number cognition system, which can be difficult. Eventually, in high school, they have to learn logarithms again, in an abstract formal way. Logarithms are notoriously difficult to teach (I know plenty of well-educated people who still struggle with them). This is a shame, because all these high-schoolers have been using log scales since they were young, without even realizing it.

The train is about to depart. Your ticket in your hand, you check your seat number, walk in the central alley, find your seat and sit down next to another traveler. You look around to see what the other people in the wagon look like.

How many people were there in the wagon you just imagined? If you are like me, it was probably rather crowded, with few empty seats. However, according to these European data, the average occupancy rate of trains is only about 45%, so there should be more empty seats than occupied ones. What is going on?

The issue here is a simple statistical phenomenon: the sample of “all the trains you took in your life” is not quite representative of “all the trains”. The occupancy rate of trains varies all the time. Some trains will be much more crowded than average, some others will be almost empty. And – guess what – the more people there are in a train, the more likely for you to be one of them. A train packed with hundreds of customers will be observed by, well, hundreds of passengers while the empty trains will not be observed at all. Thus, in your empirical sample, trains with n passengers will be over-represented n times compared to trains with only one passenger.

Here is a riddle: you want to estimate the average number of occupants in the trains that arrive to a station. To that end, you survey people leaving the station and ask how many people they saw in the same train. If you were to take the mean of your sample, the average occupancy would be over-estimated, for the reason stated above. How do you calculate the unbiased occupancy rate? Assume every train had at least one occupant (this is necessary since empty trains are never observed, so the number could be virtually anything).

We have an observed distribution P_o(n) and we want to get back to the true distribution P_t(n). As we saw before:

P_o(n) = \frac{nP_t(n)}{\sum_{k}{kP_t(k)}}

Since \sum_{k}{P_t(k)} = 1, the true distribution is

P_t(n) = \frac{P_o(n)/n}{\sum_{k}{P_o(k)/k}}

And the mean occupancy of the trains is

\langle n \rangle = \frac{1}{\sum_{k}{\frac{P_o(k)}{k}}}

which turns out to be the harmonic mean of the observed sample.

Harmonic mean is typically used to average rates. The textbook example is about calculating the average speed of something: if you write down the speed of a car once per kilometer, the average speed is the harmonic mean of your sample, not the arithmetic mean. This is because the car spends less time on the kilometers that it traveled through very fast, so you need to account for that by giving less weight to those kilometers. This is in fact closely related to the train occupancy riddle: in that case, the harmonic mean gives more weight to the trains with fewer people in them, to compensate for the sampling bias.

I don’t know if this statistical bias has a name (if you know, tell me in the comments). It occurs in a lot of situations. A prominent one is the fact that your average Facebook friend has more Facebook friends than average.

Consider how your Facebook friends are sampled: obviously, only people with at least one friend will appear in your sample. So all those idle accounts with no friends at all are already excluded. People with 100 friends are 10 times more likely to appear in your list than people with 10 friends. This leads to a big inflation of the average number of friends your friends have. To put it in a different way, if you have an average number of friends, it’s *perfectly normal* that you have fewer friends than your friends. So there is no need to worry about it.