Mikhail Simkin

Mikhail Simkin

Mikhail Simkin has a PhD in Physics and currently works as a research engineer.

Monday, 20 February 2012

Fingerprints of Fraud

In my last article, I refuted one mathematical proof of fraud in Russian elections. That “proof” stemmed from the observation that the distribution of votes for United Russia (UR) among election precincts was non-Gaussian. I showed  that theoretically there is no reason for the distribution to be Gaussian and presented non-bell-shaped curves in American elections. There are other “proofs” out there. The Wall Street Journal, for instance, found “fingerprints of fraud” in Russian elections. The deviation from Gaussian distribution is not listed as a “fingerprint,” but the WSJ has others:

Prime Minister Vladimir Putin's ruling United Russia party captured a high share of voters—far above the 49.3% it received nationwide—in precincts where voter turnout was reported to be well above the national average, according the analysis. That dynamic suggests broad ballot-stuffing, according to experts in vote monitoring.

Russian Elections British Elections

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1. Fingerprints of fraud: (a) in Russian 2011 parliamentary elections according to WSJ analysis; (b) in British 2010 elections according to my analysis.

Apparently, The WSJ assumed that the true turnout was more or less the same in different precincts, but in some of them, Putin’s KGB agents stuffed the boxes with ballots for United Russia. Such an operation could, indeed, increase both the turnout and UR vote share in the affected precincts. This looks like a reasonable explanation, and in 2008 American election, I could not find a significant correlation between turnout and the percentage of the vote for Obama. However, Kuznetsov has found such correlations in 2010 British parliamentary elections. Following this lead, I applied the WSJ’s analysis to British elections. Results are in Figure 1.

As you can see, when you go from constituencies with turnout below 60 percent to constituencies with turnout above 60 percent, the vote share of the Conservative Party increases two fold. The share of Liberal Democrats (LD) increases by one third, while the share of Labour Party drops 40 percent and the share of all other parties drops twice. Apparently, United Russia had borrowed the ballot-stuffing technique from the Conservative-LD coalition that rules Britain. 

The WSJ gives its estimate of vote fraud in Russian elections:

A comprehensive examination of the full results from Russia's nearly 100,000 voting precincts reveals statistical anomalies that experts say are consistent with widespread vote-rigging. These irregularities could cast doubt, by one rough measure, over as many as 14 million of the 65.7 million votes reportedly cast. 

Unfortunately, the WSJ authors do not say how they got this estimate. Apparently, not from the deviations from the bell curve, since they do not even mention them. However, Gazeta.ru had published an elaborate article by Shpilkin, which estimates the number of votes stolen in Russian elections. The method uses the very correlation between turnout and UR vote share, which the WSJ dubbed “a fingerprint of fraud.” Here is a figure from that article.

All Moscow

Figure 2. 2011 Russian parliamentary elections results in Moscow. Every dot represents a precinct (several hundred voters). You can clearly see that the higher the turnout is, the more votes United Russia gets.

I made a similar picture for British elections.

Figure 3. Results of 2010 British parliamentary elections.

British elections

In the above graph, every data point represents a constituency (several tens of thousands voters). You can see that the higher the turnout is, the more votes the Conservatives get.

British Election Fraud

Figure 4. Estimating the amount of fraud in British parliamentary elections.

In the next paragraph, I will use Shpilkin’s method to estimate the number of votes stolen in British elections. I keep all his language while making the minimal necessary changes to adopt it to British elections.

Assuming that the difference in distribution of votes between the Conservative-LD coalition and the other parties depending on the attendance is the result of artificial increase of the amount of votes for the coalition, we can try to establish the size of this increase. We will try to separate from Conservative-LD votes distribution, a part proportional to the sum of votes for all the other parties. As we see in Figure 4, we could separate a part from the votes for Conservative-LD, the part, proportional to all the other voices, so that until the attendance of 62% the remainder of the UR voices after subtraction of this part practically equals 0. In our assumptions that means that there were almost no added votes. The remaining (abnormal) part of this curve should be considered an artificial increase of the amount of votes for Conservative-LD. After the division between 'normal' (proportional to the votes for the other parties) and 'abnormal' is done, we can appraise them quantitatively and try to reconstruct the "corrected" voting results without such an increase. After subtracting this data from 17.5 million votes for Conservative-LD, the normal votes count up to 8.3 million, abnormal, artificially added votes total 9.2 million.

Since we are talking about the birthplace of parliamentary democracy, the conclusion is absurd. Therefore, we should seek a different explanation of the spurious correlation between attendance and vote share. Could it be that the members of the Conservative party are more active voters than the members of Labour party? Consider the following simplified model. There are two parties: Conservative and Labour. Three-quarters of the Conservative and one-half of the Labourists take the trouble to vote. One town is 80 percent Conservative and 20 percent Labour. The attendance here will be 70 percent; 60 percen of the registered voters will vote Conservative and 10 percent, Labour. Another town is 80 percent Labour and 20 percent Conservative. The turnout will be 55 percent; 15 percent of the registered voters will vote Conservative and 40 percent, Labour. We can get something similar to Figure 3 without any ballot stuffing by MI5.

Perhaps we do not need to postulate the KGB ballot-stuffing to explain the results of Russian elections as well?

Sunday, 08 January 2012

The Bell Curve Doth Not Toll

On Saturday, December 10, 20,000 people gathered for a mass rally in Moscow; they were protesting alleged fraud in the December 4 parliamentary elections. Some of the allegations were mathematical in nature.

The Washington Post reports: 

"Obviously, he [Putin] doesn’t agree with Gauss,” one commenter wrote, referring to pioneering mathematician Carl Friedrich Gauss, who lived 200 years ago. Disenchanted Russians argue that United Russia’s reported election results are so improbable as to violate Gauss’s groundbreaking work on statistics.

The article does not say what exactly the problem with the election result is and what work of Gauss is relevant. It only says that he lived 200 years ago. This should be enough to trigger an alert, as science has advanced a bit over the past 200 years . . .

I decided to take a closer look at the allegations. 

In the piece entitled “Mathematics against Election Committee: Gauss against Churov [the head of the committee],” a blogger complains that the distribution of the percentage of the vote for the United Russia Party among election precincts is “non-Gaussian.” This, he writes, is evidence of election fraud because Gaussian distribution arises “always . . . in every case, when there is not one factor, but many”:

Whatever is measured in large quantities. Make a plot of how many millions of men in the country have the height of 165, 170, 175 centimeters and so on—and you will also get a symmetric bell curve with the top corresponding to the most typical height in the country. 

If you do not know what the Gaussian distribution is, the blogger gave a good example: distribution of people by height. Most men are of average height; the greater the deviation from the average, the smaller the number of men. There are some very tall people, but none of them is twice the average height.

The heights of people are definitely Gaussian-distributed, but what about incomes? They are influenced by many factors and measured in large quantities. However, they are distributed as if most people were 170 centimeters tall, but often you would meet a three-meter guy. Rarely you would encounter a five-meter man, more rarely—a ten-meter one. Sometime, from a distance, you would see a hundred-meter person. And there would be several hundred-kilometer chaps in the country. This distribution is very far from Gaussian, but for some reason it does not attract the wrath of our mathematicians, or our Berezovskies

Gaussian

The banner says “We don’t trust Churov [the head of Election committee]! We trust Gauss."

In a recent article in Significance, I argued that since there are so many distributions in nature and society that are not Gaussian, there is no reason to believe that vote distributions must be. To support this conclusion, I gave a mathematical model, which produces a non-Gaussian distribution of the percent of votes for a party among election precincts. 

A commentator challenged me to show non-Gaussian distributions in U.S. elections.

I took up the challenge.

I decided to look at 2008 Republican primaries (mostly because this was the last election I voted in). The primaries differ from national elections, as different states hold votes on different dates. Moreover, some candidate drop out during the process. All of this complicates the analysis. But 21 states do hold elections on the same day, “Super Tuesday.” Since almost half of the nation votes on this day, the elections function like a national primary.

The most complete elections results database I could find is Dave Leip's “Atlas of U.S. Presidential Elections.” It does not have precinct-level results for the election in question, but its results are listed by county for 19 out of 21 Super Tuesday states (the exceptions being Alaska and North Dakota). I computed the distribution of the percentage of the vote for four major candidates among 1,162 counties.

As you can see in Figure 1, Mike Huckabee's distribution has two equal peaks at 15 and 35 percent. The drop between peaks is half the peaks' height. John McCain's distribution has one peak at 35 percent and another at 80 percent. Between these peaks, the distribution drops almost to zero. Mitt Romney has one peak at 25 percent and another at 90 percent. Ron Paul has an exponential distribution.

Apparently, American elections also “violate Gauss’s groundbreaking work on statistics.” (The least you could say is that these distributions are no more “Gaussian” than the distributions observed in Russian elections ( Figure 2).) 

Figure 1. The results of the 2008 Republican presidential primaries in 19 Super Tuesday states. The distribution of the percentage of the votes for four major candidates among 1,162 counties. I used a 5 percent bin. All counties with the vote for the candidate of no more than 5 percent went to the first “5 percent” bin. Those with the vote of more than 5 percent but no more than 10 percent went to the second “10 percent” bin, and so on.

Figure 2. 2011 Russian parliamentary elections. The distribution of the percentage of the votes for parties among election precincts. The x-axis shows percentage of votes for the party; the y-axis—the number of precincts. The bin is 0.5 percent. The Brown line is for United Russia, Red—Communist party, Green—Russian United Democratic Party "Yabloko," Black—Liberal Democratic Party, Blue—A Just Russia. This picture traveled across hundreds of blogs during the past couple of weeks.

Another issue brought up by the bloggers is that there are spurious peaks at 50 percent and other multiples of 5 (see Figure 2). But when you examine precinct-level results, you notice that in many precincts, very few people voted, as little as one person in some of them (!). When two, four, six, eight, or 10 people vote, you can easily get a result of 50 percent, and never 49 or 51 percent.

The database mentioned above has precinct-level statistics for the 2000 U.S. presidential elections in California. In Figure 3, I plotted the distributions of the percentages of the vote among election precincts. You can see obvious peaks at 50 percent in both Al Gore's and George W. Bush's distributions. There are also less pronounced peaks at 20, 25, 60, and 75 percent. However, there are other obvious peaks at 34 percent (1/3) and 67 percent (2/3). These, obviously came from the precincts where three (or another small divisible of three) people voted. (I do not see such peaks in Russian election results. This problem requires additional study.)

Note also, that the distributions in Figure 3 are far from Gaussian. If there is something resembling a bell curve in Figure 3, this is a combined curve made up of Gore's distribution below 50 percent and Bush's distribution above 50 percent. If I use the same methods of “proof” used by the bloggers to allege large-scale fraud in the Russian elections, I can “prove” that Gore stole millions of votes from Bush in California! Surely, the Washington Post would want to report on this! Of course, such “proofs” are nonsense, since the distributions should not necessarily be Gaussian in the first place.

It's worth pointing out that my study does not prove that the recent Russian elections were honest. It does, however, prove that in making the case that the Russian elections were fake, the bloggers used fake math.

Figure 3. Results of 2000 presidential elections in California. The distribution of the percentage of the vote for three main candidates among 21,970 precincts. I used a 1 percent bin.

Sunday, 22 May 2011

Aping the Abstract

One day, while doing some boring computer programming, as a diversion, I drew a picture in Microsoft Word and sent it to my friend in Belgium. He replied that his co-workers asked whether it is Picasso or Matisse. To check whether it was a mere anomaly or an indication of some deeper truth I produced more pictures. I mixed them with immortal masterpieces of modern art and put online my “True art, or fake?” quiz. The takers are to tell the masterpieces from my doodles.

Abstract or Doodle?

Who created this: the author of the article, or an immortal artist?

Apart from automatically recorded scores, occasionally I get feedback. This note arrived from a Cornell University professor: “I recognized that one of them was like a Mondrian, but it seemed to lack the sense of balance which good modern art is supposed to have.” Apparently, Mondrian’s art loses balance when his heavyweight name is detached from it. Even art critics are not sure that they can tell true art from fake: “I got 92%, which is a relief since I write about art.” It is thus not surprising that sometimes the quiz provokes angry reactions. One New York artist responded with the following utterance: “Go [profanity] yourself and your [profanity] academic quizzzzzzzzzzzzzzzzzz.” As if in response to this attempt at intimidation, one of my readers wrote: “Dear Mr. Simkin, just continue with this.” And I have.

In three years, I had over 56 thousand test results to analyze. The average score is 66 percent correct. This is not much better than the 50 percent one can get by random guessing. The 16 percent difference could be because many quiz takers had already seen the masterpieces identified as such. One of respondents wrote: “I gave this test to my oldest son who is teaching sculpture at The Finnish Art Academy. Much to my chagrin, he could not only separate the art from the chaff, but also name all the artists.”

So, most of the takers failed the quiz, but, maybe they are idiots who do not possess the proper gnoseological code to comprehend the masterpieces? Fortunately, I took care to record quiz takers’ IP addresses. Thus, I could select the scores received by those who downloaded the quiz from elite places. For the analysis, I chose people from Ivy League schools and Oxbridge. The average score of the 143 cognitive elite was 71 percent. Apparently, they, too, don’t know the code.

Many bloggers discussed my quiz. One such blog entry described an interesting story. Some tricksters had shown a picture painted by an ape to a director of an art museum. The director attributed it to a Guggenheim Prize-winning artist.

Ape-stract

Who painted this: an artist or an ape?

This brought about my new quiz “An artist or an ape?” It is now the time to release the results. The average score earned by over 164,000 people is 79 percent. A better result than on the other quiz, but one would expect so: the pictures were painted by members of different species. It is interesting that mistakes are at all possible. Moreover, one of ape paintings was attributed to an abstract artist by over 50 percent of quiz takers. The elite again did not much outperform the crowd. The average score of 367 Ivy Leaguers/Oxbridgers is 81 percent.

When someone dares to question the genius of modern artists, advanced people tell him that he does not understand their abstract messages, similar to how an illiterate person does not find a meaning in written text. Ape art, however, can be mistaken for abstract masterpieces. Apes are not capable of abstraction. Modern art is therefore not abstract, but apestract.

The book of Ecclesiastes states:

Is there any thing whereof it may be said, ‘See, this is new?’ It hath been already of old time, which was before us.

The conveyed above blasphemy against the strange gods of modern art is no exception. In the 1920s, the Los Angeles writer Paul Jordan-Smith, under pseudonym “Pavel Jerdanowitch,” founded the “Disumbrationist” School of Art. After several art critics praised his daubs, Jordan-Smith announced that all of it was a hoax. In 1964, Swedish pranksters hanged an ape painting in an art gallery, attributing it to unknown French artist Pierre Brassau. Art critics praised the painting.

People do not see in apestract masterpieces anything they would not see in anyone’s painting. The aforementioned hoaxes long ago had shown that the art critics do not see as well.

So, why does the high status of apestract art go unchallenged!?

One old Soviet film can help us understand. It is a 1971 documentary on psychological experiments that reveals the power of suggestion. In one of such experiment, a group of seven people is shown five photos of five different people. The experimenter asks the members of the group if among these photos are two portraits of the same person. Six members of the group are the experimenter's accomplices. They had been instructed to convince the test subject that a certain two photos are of the same person. Often they succeeded. One test subject explained why he was taken in:

I felt embarrassed before my comrades. They were so knowledgeable, could emphasize things so competently, could notice every small detail, compare and link everything together.

The same mensch?

Brainwashing can make you believe that these photos show the same person.

Could it be that the high status of masterpieces of apestract art is not challenged because people feel embarrassed before Trotskyite art dealers? After all, these comrades are so knowledgeable and sophisticated…