“Okay I’m gonna do it. I’m just gonna learn regex, get it out of the way, and be that much closer to becoming a 10x developer” - Me, at least a couple times.

And I did it. I did the deep dive, practiced, tried to learn it front to back.

Do I remember what I learned from those times? Am I still asking rhetorical questions? No and yes, respectively.

Where do we go from here? Well I don’t know about you but I’m going to lower my expectations and lean into a type of learning that’s a bit more iterative. But this isn’t an online cooking recipe, let’s get to the good stuff.

Only digits

\d matches each digit. [1-9] does too.

Given 123, the regex expression /\d/ returns three matches: [1][2][3]

Adding + matches one or more of the previous pattern: /\d+/ or /[1-9]+/.

Given 123 again, we’re returned one match [123] Give this 123abc345, it returns 2 matches [123]abc[345]

expressionexplanation
/\d/match all digits in a string
/[1-9]/same deal, more explicit
/\d+/match all contiguous groups of digits (but not subgroups)
/[1-9]+/""

Now we can add in the ^ and $ operators, beginning and end. These say, these matches must be in relation to the beginning and/or end.

expressionexplanation
/^\d/Match a single digit at the beginning of a string
/\d$/Match a single digit at the end of a string
/^\d$/Match a string consisting only of a single digit
/^\d+$/Match a string consisting only of a contiguous group of digits.

How do we use this stuff in JS or TS? Make sure a string is only comprised of digits in javascript with /^[1-9]$/.test(str).

The .test() method returns true if at least one match is found.

You can also get all matches found with a weirdly different syntax: str.match(/\d/g) The g is for global, and tells JS we want to return more than just the first match.

'123'.match(/\d/g)
// > ['1', '2', '3']

If we want the indices of each match, we can get more info with matchAll:

'123'.matchAll(/\d/g)

But since this returns an iterator we need to convert it into an array first:

Array.from('123'.matchAll(/\d/g))
//or
[...'123'.matchAll(/\d/g)]

This will create an array of objects with some weird properties

    const symbols = Array.from(data.matchAll(/[^0-9.]/g))
    const symbolIndexes = new Set(symbols.map(d=>d.index))

Stripping to alphabetic characters

So now we can tack on the .replace() method and this should make sense:

str.replace(/[^a-zA-Z]/g, '');

But wait…why is the carrot ^ inside the brackets.

AHHHHHHHHHHHHHHHHHHHHHH

Good question. Turns out that if the carrot is the first character in the square brackets, it’s saying “everything but the pattern in these brackets”. So super different from what it means when it’s outside of them.

It inverts the pattern.

Random note:

const str = s.replace(/[\w]+/g, '').toLowerCase()

\w corresponds to all characters [a-zA-Z0-9]

Stripping out whitespace

This is probably what we want.

data = data.replace(/\s+/g, '')

Flags

  • g (global) - find all matches rather than stopping at first
  • i (case insensitive) - ignore case when matching
  • m (multiline) - treat beginning and end characters (^ and $) as working over multiple lines
  • s (dotall) - matches any character, including newlines
  • u (unicode) - treat pattern as unicode sequence
  • y (sticky) - matches only from the last index indicated

Character Classes

Surprise, capitals do the same as ^ and invert the class.

// Character Classes
\w  // Word character: [a-zA-Z0-9_]
\W  // NOT a word character: [^a-zA-Z0-9_]
\d  // Digit: [0-9]
\D  // NOT a digit: [^0-9]
\s  // Whitespace: [ \t\n\r\f\v]
\S  // NOT whitespace: [^ \t\n\r\f\v]

Boundaries

// The ones I already know:
^     // Start of string/line
$     // End of string/line
 
// Word boundary:
\b    // Word boundary
\B    // NOT a word boundary

Word boundary examples

// Matches positions where a word character is next to a non-word character
"Hi there".match(/\bHi\b/)    // ✓ Matches 'Hi' as whole word
"High".match(/\bHi\b/)        // ✗ Doesn't match hi as part of word
 
// \B - Not a Word Boundary
// Matches positions where \b WOULDN'T match
"High".match(/\BHi/)          // Does NOT match "Hi" at start
"High".match(/\Bgh\B/)        // Matches "gh" in middle of "High"

Okay sweet! Not a regex expert. But it doesn’t look foreign anymore and I can understand basic operations. I’ll keep adding to this, carefully - not to disrupt the fragile space I’ve reserved for regex in my brain - as I learn more.