<alpha>
to match words, rather than [a-z]
. My understanding (though I'm having a hard time finding it in the spec at the moment) is that <alpha>
is Unicode-aware, which seems like a big win in terms of power.Unfortunately, my code, following Norvig's version, uses just a to z when it wants to substitute or add an arbitrary letter. I have no idea how to do a Unicode equivalent of that -- and even if I did, the combinatorial problem which already cripples Rakudo's ability to handle the two-edit case would just get horribly worse. How to handle that mismatch, I wondered?
That's when it occurred to me: instead of constructing a candidate list of psuedo-words that might match real words in our dictionary, construct a psuedo-word regular expression and grep it against the list of words in the dictionary. I'm guessing it will be slower, but it will be every bit as Unicode-aware as
<alpha>
is, and I'm pretty sure it can be done with only a few more lines of code.Okay, now that I've thought of it, it's time to actually try to write it...
No comments:
Post a Comment