"Okay I'm gonna do it. I'm just gonna learn regex, get it out of the way, and be that much closer to becoming a 10x developer"
— Me, more than twice
And tried once. I started working through Learn Regex the Easy Way. It didn’t leave me with much lasting knowledge. Looking back, I think that’s ‘cause I don’t need a complete understanding of regex. I don’t need to “learn” it, like it’s a single thing, but rather I could benefit from learning 3-4 techniques it enables, then continue as needed.
I realized this while doing Advent of Code 2023 and somewhere on the 2nd or third problem realized, when string.split()
was starting to be a bit of a “if your only tool is a hammer, everything starts to look like a nail” problem. I started looking up the things you’ll see here.
Let’s get regular y’all.
Only digits
\d
matches each digit.
[1-9]
does the same thing.
Given 123
, the regex expression /\d/
returns three matches: [1][2][3]
Adding +
matches one or more of the previous pattern:
/\d+/
or /[1-9]+/
.
Given 123
again, we’re returned one match [123]
Give 123abc345
, we get two matches [123]abc[345]
expression | explanation |
---|---|
/\d/ | match all digits in a string |
/[1-9]/ | same deal, more explicit |
/\d+/ | match all contiguous groups of digits (but not subgroups) |
/[1-9]+/ | "" |
Now we can add in the ^
and $
operators, beginning and end. These say, these matches must be in relation to the beginning and/or end.
expression | explanation |
---|---|
/^\d/ | Match a single digit at the beginning of a string |
/\d$/ | Match a single digit at the end of a string |
/^\d$/ | Match a string consisting only of a single digit |
/^\d+$/ | Match a string consisting only of a contiguous group of digits. |
How do we use this stuff in JS
or TS
?
Make sure a string is only comprised of digits in javascript with /^[1-9]$/.test(str)
.
test()
is the method of regex patterns themselves, not strings.
The .test()
method returns true if at least one match is found.
If you want to find matches, use the string method match()
, which takes regex patterns: str.match(/\d/g)
The g
is for global, and tells JS we want to return more than just the first match.
Without the g flag, we do get the index. For some reason.
Even though indexOf takes regex, and would return this as well.
If we want the indices of each match, we can get more info with matchAll
:
But since this returns an iterator we need to convert it into an array first:
This will create an array of objects with some unintuitive properties.
What the hell is matchAll()
returning?
So if we do something like ```
Then the object returned for each string is going to have these properties:
More Weird Gotchas (having to do with matchAll returning an iterator)
Here’s a weird gotcha - note that we didn’t do
str.matchAll(/\d+/g)].map
but instead[...str.matchAll(/\d+/g).map
.This is because matchAll returns something called an iterator, which is different than array, but map is a method of arrays.
Iterators can be turned into arrays with spread syntax, but second-level gotcha coming:
^The output for the above function is not what I expected. It’s this:
[ 0, 1, 2, 16 ] []
As covered in Things I Learned, the values of the iterator that the
matchAll
method returns has anindex
property that returns the index as a number, as seen above, aaaaand (programmers hate this!) a0
property, which returns the index as a string. For some reason.Anyways. This just happens to be something we know. So why is that second console.log above not returning anything?!
This is because iterators are not like arrays. If they were, they’d be arrays. Iterators can essentially be used up - when they are read, they are also destroyed. I want to read more up on them herebecause I don’t know much about them but I think that an iterator is sort of like a linked list where every node has a value and the last node has a done property.
The way that we fix this is super simple. We just save the values of the iterator to an array, which we just so happen to not be doing my doing
.map
. This mistake is actually sort of a point forfunctional programming
because if we’d just saved our values as we mutated them, this wouldn’t have happened.
What is a regex object in JS though?
So we’ve just been creating a few regex objects and using them inline like this
But what is this magic? Well, you can do this more explicitly like this:
And that’s exactly what we’re doing above. I guess this is a good way to create patterns dynamically, reuse them, and maybe cleanup the look of regex evaluation as well.
Stripping to alphabetic characters
So now we can tack on the .replace()
method and this should make sense:
But wait…why is the carrot ^
inside the brackets.
"AHHHHHHHHHHHHHHHHHHHHHH"
Good question. Turns out that if the carrot is the first character in the square brackets, it’s saying “everything but the pattern in these brackets”. So super different from what it means when it’s outside of them.
It inverts the pattern.
This is actually a good place to use the ‘word’ character class, known as \w
\w
corresponds to all characters [a-zA-Z0-9]
Stripping out whitespace
This is probably what we want.
Flags
g
(global) - find all matches rather than stopping at firsti
(case insensitive) - ignore case when matchingm
(multiline) - treat beginning and end characters (^ and $) as working over multiple liness
(dotall) - matches any character, including newlinesu
(unicode) - treat pattern as unicode sequencey
(sticky) - matches only from the last index indicated
Character Classes
Surprise, capitals do the same as ^
and invert the class.
Boundaries
Word boundary examples
Okay sweet! Not a regex expert. But it doesn’t look foreign anymore and I can understand basic operations. I’ll keep adding to this, carefully - not to disrupt the fragile space I’ve reserved for regex in my brain - as I learn more.