Okay, so with recent action (namely Casey) and reaction…anyway…there’s been another uptick in discussion about CAPTCHA’s viability as a comment-spam stopper. And it appears as if many people are of the opinion I am, which is that the only good CAPTCHA for now is a natural language query: “what symbol represents the speed of light?” Answer “c” and you get to post your comment. Character recognition is one thing, but it will be a while before bots can beat this without totally brute-forcing it…I mean…a well-implemented natural language query would be pretty difficult for a bot to beat. And as Casey said, if we up the ante on the problem domain, the number of people who will be able to solve it and use it for evil drops substantially.
So, my thought…how about a community project to work on a networked natural language query CAPTCHA. I know there are things like this out there, but I’m thinking it would work like this:
- First, be a web-service. Minimal barrier to entry for implementation.
- Centralized query repository, with limited (but not completely) access as to who can enter new questions and making sure they are correct.
- Optional variable levels of “query munging” including mixed up letters or improper spaces: “what isthe air speedof an unladen swallo w?” or “waht is the airseped of an ulnaden sawllow?”
- possibly even levels of techinicality (e.g. tech questions for a tech-blog).
- multiple failures results in various levels of denial, such as messages, temporary “lock” or all out IP blacklist.
Not sure how this would work out. My thought is that any individual natural language query CAPTCHA would be limited by the number and variety of questions, and if you studied them enough you can parse those and supply the right answer. Something with a centralized query repository with various other features would be harder to defeat automagically. It may be worth a shot…anyone have any ideas?