Mechanical Turkers Aren’t Representative of the 101Questions Userbase, Like At All

Some time ago I thought about bringing the enormous userbase of Mechanical Turk along to rate questions and videos here on 101questions. We could show them your photos and videos and they would ask their questions or simply move to the next if they were bored. Consequently, we could more quickly find more perplexing photos and videos.

As a test, I pulled ten photos out from our database that corresponded to the ten decile marks of the 101questions bank of photos and videos. This selection of photos represents our full range, in other words, from very boring to very perplexing.

Then I showed them to 100 Mechanical Turk users and paid them to answer with a question or a skip. Here are the results.

Photo 101qs Rating Turk Rating
Ticket Roll 81 71
Dueling Discounts 66 83
Dominos 56 73
Rally 52 87
Mural 48 92
Sunflower 43 69
War! 39 74
Dash 34 69
Shot put 28 92
River 23 90

Let me graph that for you.

140318_1

The correlation between our ratings and theirs is basically non-existent and if it exists it’s negative. (ie. the more popular an image is on our site, the less popular it is with Turkers.)

More damning, here’s a distribution of our users and theirs.

140318_3

Our users ask questions at every kind of rate. 8% of our users ask questions 10% of the time, 20% of the time, etc., all the way to 100% of the time. 27% of our users ask questions less than 10% of the time and boredom-skip the rest.

Then you have Turk, where the distribution is almost flipped. 40% of Turkers ask questions all the time. 0% of Turkers skip like we do at the left end of our distribution. The modes are switched. The fact that the Turkers were paid while our users spend their own currency to be here (their time) may explain why our users are so much more discriminating. Whatever that reason, this small test has convinced me that Turkers aren’t a useful proxy for our own userbase.

2014 Mar 19. The data.