Thursday, December 17, 2009

Now With Grammar And Tests

I've made a huge amount of progress with the ABC project in the last 36 hours. At this point I think I've just got a few more rules to write and debug before we are able to completely parse the sample ABC tune I posted several days ago. (Naturally they'll be the trickiest, I imagine.)
grammar ABC
{
regex header_field_name { \w }
regex header_field_data { \N* }
regex header_field { ^^ <header_field_name> ':' \s* <header_field_data> $$ }
regex header { [<header_field> \n]+ }

regex basenote { <[a..g]+[A..G]> }
regex octave { \'+ | \,+ }
regex accidental { '^' | '^^' | '_' | '__' | '=' }
regex pitch { <accidental>? <basenote> <octave>? }

regex tie { '-' }
regex note_length { [\d* ['/' \d*]? ] | '/' }
regex note { <pitch> <note_length>? <tie>? }

regex rest_type { <[x..z]> }
regex rest { <rest_type> <note_length>? }

regex gracing { '+' <alpha>+ '+' }

regex broken_rhythm_bracket { ['<'+ | '>'+] }
regex broken_rhythm { <note> <g1=gracing>* <broken_rhythm_bracket> <g2=gracing>* <note> }

regex element { <note> | <broken_rhythm> | <rest> | <gracing> }

regex barline { '|' | ':|' | '|:' | ':|:' | '::' }

regex line_of_music { <barline> | [<barline>? <element>+ [<barline> <element>+]* <barline>?] }
}

Much, much nicer than just having "abc_" at the beginning of every regex name. And wow, compared to any other parsing tool I've ever used, this is really, really easy. This comes very close to matching the ABC BNF, though I've simplified a lot, and changed !trill! to +trill+ (etc) to match the version of ABC present in this file.

So far the only downside I've found is that it is ugly to test:
{
my $match = "d'+p+<<<+accent+_B" ~~ m/ <ABC::broken_rhythm> /;
isa_ok $match, Match, '"d+p+<<<+accent+_B" is a broken rhythm';
is $match<ABC::broken_rhythm><note>[0]<pitch><basenote>, "d", 'first note is d';
is $match<ABC::broken_rhythm><note>[0]<pitch><octave>, "'", 'first note has an octave tick';
is $match<ABC::broken_rhythm><note>[0]<pitch><accidental>, "", 'first note has no accidental';
is $match<ABC::broken_rhythm><note>[0]<note_length>, "", 'first note has no length';
is $match<ABC::broken_rhythm><g1>[0], "+p+", 'first gracing is +p+';
is $match<ABC::broken_rhythm><broken_rhythm_bracket>, "<<<", 'angle is <<<';
is $match<ABC::broken_rhythm><g2>[0], "+accent+", 'second gracing is +accent+';
is $match<ABC::broken_rhythm><note>[1]<pitch><basenote>, "B", 'second note is B';
is $match<ABC::broken_rhythm><note>[1]<pitch><octave>, "", 'second note has no octave';
is $match<ABC::broken_rhythm><note>[1]<pitch><accidental>, "_", 'second note is flat';
is $match<ABC::broken_rhythm><note>[1]<note_length>, "", 'second note has no length';
}

On the plus side, this does show how to get at the parsed bits. On the downside, it's not really good at testing what is not present in the match, and it seems like any refactoring to the grammar will lead to massive changes in the tests. I'm guessing there will be a better way of testing this in the future... or there already is and I just don't know about it.

At this point, it seems to me the biggest obstacle is figuring out how to formulate line_of_music so that it actually returns its results in a usable matter. The thing is, the interleaved order of the barlines and the elements is very important to make sense of the music. The way I'm doing it now will return an array of barlines and an array of elements, with no idea how those two arrays interact....

Ack: Forgot to include mention of the word Perl here so it would get picked up by Ironman.

No comments:

Post a Comment