Parsing Multiple Regular Expressions Performance
I am attempting to split validated addresses into the base components
(Unit #, Street #, Street Name etc). To do this I am working backwards
(right to left) through the string:
Match [A-Z'-] characters => Street Name
Match [\d-]+[A-Za-z]{0,1} => Street Number (this matches things such as
10-12 or 11B, there will only be 0 or 1 letter)
Match the rest of the string to Unit Number
Now, I am able to write the three regex patterns to do this matching. What
I am unsure of is how to use them. This will be used in batch processing
of addresses.
My ideas are:
Run each pattern over the address and save each match on the appropriate
address object property
Some sort of match/replace to shorten the string after each step (I was
going to use look forwards and look backs)
Any advice on how the most efficient use of regular expressions would be
most helpful.
No comments:
Post a Comment