Mechanical Turk’s Quirks

Or, “what I’ve learned about Amazon Mechanical Turk in 2013”.  There’s a lot to love about Amazon, AWS, and Mechanical Turk, for those of you who even know that there is something other than

For those of you who don’t, Amazon competes directly with Google, Salesforce, and others to host your web site, data, databases, and files, just to scratch the surface. They call it AWS, or Amazon Web Services. One of those services is Mechanical Turk (aka MTurk), a crowd sourcing platform, and the ‘engine’ behind deductmor (shelved) and now c|d8a.  It’s a great service, but you should know a few things going in, if you’re heading towards the crowd sourcing space.

  • HIT: “Human Interface Task”; this is what MTurk does for you.
  • Not a database – knowing that the data in MTurk is not a database will help your entire thought process about the service. YOU have to build the database. MTurk just supplies the labor and the data feed.
  • No batches – you can’t group tasks (called “HITs”) together, period. You can tag a group of them, but again, because it’s not a database, there’s no grouping/batching function.
  • No searching on batches – because of the above, so you can’t query to find a bunch of HITs with the same tag
  • One HIT at a time – MTurk has a very robust API, which is well documented, but it took us a long time to find out that you must publish one HIT at a time, no exceptions. Knowing that will help you plan your back end processing power needs more accurately.
  • Pay before you go – Ain’t no billing or invoicing after the job here.  You pay up front, or your stuff does not get done.  Painful example: publish 500 HITs but only have 200 HITs worth of credit “in the bank” at MTurk, and your remaining HITs are vaporized.
  • Support is iffy at best – Probably the toughest thing about using MTurk is the lack of direct service. With all other AWS services, you can pay for premium support and get chat, email, and “Call me now!” service, and they are very responsive. Not so with MTurk. It took me a month to get a Systems Architect on the phone. Painful.
Knowing all this months ago would have accelerated our development significantly. The MTurk forums are somewhat helpful, but there are far too many questions concerning the points above that were asked over 2 years ago and never answered on the forums.  I’ve done my best to post answers now that I know them.
Crowd sourcing is big and extremely powerful, and these are some early “gotchas” that may help you get your service up and running better and faster.

What do you think about that?