đ„ Regular Expressions
One statement, no questions,
Put your fist up for your regular expressions.
âDual Core, Regular Expressions
Regular expressions are powerful.
Regular expressions are fun.
Right up until they are not. Today was one such day.
At some point, anyone whoâs had to work with regular expressions has needed to match a specific range of characters. Letâs sayâfor no particular reasonâalpha characters. A to Z. Or a to z. The basics, none of those fancy Unicode ones, thankyouverymuch.
$ grep --perl-regexp 'i' <<< "testing"
testing
$ grep --perl-regexp 'a' <<< "testing"
To match all characters, youâd need to include them all somehow. You could, of course, go all-in:
$ grep --perl-regexp '[abcdefghijklmnopqrstuvwxyz]' <<< "testing"
testing
âŠwhich would work, but itâs not very concise, is it? You could, of course, use a character class and match any word character:
$ grep --perl-regexp '\w' <<< "testing"
testing
âŠwhich would, again, work but you could accidentally match a non-ASCII and you wouldnât want that, would you?
Of course, you could also use one of the convenient character ranges:
$ grep --perl-regexp '[a-z]' <<< "testing"
testing
Wonderful! But that wonât match upper-case characters, will it?
grep --perl-regexp '^[a-z]+$' <<< "Testing"
HmmmmâŠno. Using the upper-case variant would leave us in the same pickle, no?
grep --perl-regexp '^[A-Z]+$' <<< "Testing"
Curses! But, of course, thereâs another option: a combination of the two! How convenient!
grep --perl-regexp '^[A-z]+$' <<< "Testing"
Testing
It worked perfectly! Matching only alpha. characters and both casing variants! All is well, right?
Right?!
No
grep --perl-regexp '^[A-z]+$' <<< "Test[\]^_`ing"
Test[\]^_`ing
ErrrrâŠwhat? A-z
. Thatâs fairly clear, right?
No. No, it is not. Because A-z
is not, in fact, shorthand for âthe range of characters from a to a, upper- and lower-case.â It is, in fact, the shorthand for âthe range of ASCII characters from A
to z
â.
$ python3 -c 'print("".join(chr(c) for c in range(ord("A"), ord("z") + 1)))'
ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxy
Oh dear. Oh dear, oh dear.
Quite what was going on at the American Standards Association between Z
and a
, Iâd hate to guess. Presumably their cat walked across the keyboard while they were celebrating the addition of the capital letters.
But yes, thereâs a whole bunch of punctuation (^
and ` are technically symbols, not punctuation; fun fact) smack in the middle of those character ranges which will get matched should you ever opt to use A-z
in a regular expression.
Which no one would, right?
Right?!