Comprehensive REGEX Reference for SEO and Website Design Work
Author: Bill Ross | Reading Time: 4 minutes | Published: April 24, 2026 | Updated: April 24, 2026
Many hosting companies use the .htaccess file for alot of tasks that an SEO or website designer would need. Many hosting companies, such as WPengine and Kinsta, have moved away from the .htaccess file due to security concerns.
Since we do alot of work within WordPress, and are constantly searching for Regex, we put together a working reference for the regex patterns you’ll actually use in day-to-day SEO and website work – grouped by where you’ll use them.
1. Regex Syntax Quick Primer
| Symbol |
Meaning |
Example |
. |
Any single character |
c.t matches cat, cot, cut |
* |
0 or more of previous |
ca*t matches ct, cat, caaat |
+ |
1 or more of previous |
ca+t matches cat, caat |
? |
0 or 1 of previous (optional) |
colou?r matches color, colour |
^ |
Start of string |
^https must start with https |
$ |
End of string |
\.pdf$ must end with .pdf |
\b |
Word boundary |
\bseo\b matches seo but not seoul |
\d |
Any digit (0-9) |
\d{4} matches 4 digits |
\w |
Word character (a-z, A-Z, 0-9, _) |
|
\s |
Whitespace |
|
[abc] |
Any of a, b, or c |
[aeiou] any vowel |
[^abc] |
NOT a, b, or c |
|
[a-z] |
Range |
|
(a|b) |
a OR b |
(cat|dog) |
{n} |
Exactly n times |
\d{3} exactly 3 digits |
{n,m} |
Between n and m times |
\d{2,4} 2 to 4 digits |
(?i) |
Case-insensitive flag |
(?i)seo matches SEO, Seo, seo |
(?!...) |
Negative lookahead |
excludes matches |
(?=...) |
Positive lookahead |
|
\ |
Escape special character |
\. matches literal dot |
2. Google Search Console Regex
GSC accepts regex in both “Query” and “Page” filters. Default is case-sensitive — prefix with (?i) for case-insensitive matching. 4,096 character limit per regex.
Query filters
| Purpose |
Regex |
| Question queries |
^(who|what|when|where|why|how|which|are|is|does|do|can|should|will|was|were)\b |
| Question queries (case-insensitive) |
(?i)^(who|what|when|where|why|how|which|are|is|does|do|can)\b |
| Long-tail (5+ words) |
([^\s]+\s){4,}[^\s]+ |
| Short-tail (1-2 words) |
^[\w]+(\s[\w]+)?$ |
| Branded queries |
brandname|brand name|brnd|common misspelling |
| Non-branded (exclude brand) |
^((?!brandname).)*$ |
| “Near me” searches |
(?i)\bnear me\b |
| Comparison queries |
(?i)\b(vs|versus|or|compared to)\b |
| Price/cost queries |
(?i)\b(price|cost|cheap|affordable|best price|how much)\b |
| Review queries |
(?i)\b(review|reviews|rating|ratings|best)\b |
| Buying intent |
(?i)\b(buy|purchase|order|discount|coupon|deal|sale)\b |
| Informational intent |
(?i)\b(how to|guide|tutorial|tips|what is|learn)\b |
| Local intent |
(?i)\b(near me|in [a-z]+|[a-z]+ area)\b |
| Contains year |
20(1|2)\d |
| Contains numbers |
\d+ |
| Specific word variants |
(seo|s\.e\.o\.|search engine optimization) |
Page filters
| Purpose |
Regex |
| Blog posts only |
/blog/ |
| Product pages |
/product/|/shop/|/store/ |
| Category pages |
/category/|/cat/|/topic/ |
| Exact URL |
^https://example\.com/page/?$ |
| Subfolder only (not subfolders of it) |
/services/[^/]+/?$ |
| Everything except admin/login |
^(?!.*/(admin|wp-admin|login)).*$ |
| PDF files |
\.pdf$ |
| URLs with query parameters |
\?.+ |
| URLs WITHOUT query parameters |
^[^?]*$ |
| Specific depth (3 folders deep) |
^https://example\.com/[^/]+/[^/]+/[^/]+/?$ |
| Trailing slash URLs |
/$ |
| Non-trailing-slash URLs |
[^/]$ |
| HTTPS only |
^https:// |
| Specific subdomain |
^https://blog\.example\.com |
| Pagination pages |
/page/\d+/?$ |
| Date-based URLs |
/\d{4}/\d{2}/ |
3. Google Analytics 4 Regex
Used in Explorations, Audiences, and custom reports. Supports full regex (RE2 syntax).
| Purpose |
Regex |
| Multiple pages |
/home|/about|/contact |
| Pages containing word |
.*pricing.* |
| Exclude a path |
^(?!.*\/admin).*$ |
| Landing page variants |
^/(landing-page|lp|promo) |
| Multiple UTM sources |
^(facebook|twitter|linkedin|instagram)$ |
| Paid vs organic split |
(cpc|ppc|paid) |
| Search engines |
(google|bing|yahoo|duckduckgo|ecosia) |
| Social referrers |
(facebook|instagram|twitter|linkedin|pinterest|tiktok|reddit) |
| Thank-you/conversion pages |
/(thank-you|thanks|confirmation|success|order-complete) |
| Form submissions |
/contact/?$|/quote/?$|/get-started/?$ |
| Blog engagement |
^/blog/[a-z0-9-]+/?$ |
| Mobile traffic device |
(mobile|tablet) |
4. Screaming Frog (and similar crawlers)
Used in the Include/Exclude settings to control what gets crawled.
| Purpose |
Regex |
| Include only blog |
https://example\.com/blog/.* |
| Include subdomain |
https://shop\.example\.com/.* |
| Exclude query parameters |
.*\?.* |
| Exclude specific parameters |
.*[?&](utm_|sessionid=|ref=).* |
| Exclude PDFs and images |
.*\.(pdf|jpg|jpeg|png|gif|svg)$ |
| Exclude admin areas |
.*(wp-admin|admin|login|cart|checkout).* |
| Exclude pagination |
.*/page/\d+.* |
| Exclude faceted nav |
.*\?(color|size|price|sort)=.* |
| Only crawl product pages |
.*/product/[^/]+/?$ |
| Skip tags & category archives |
^(?!.*(/tag/|/category/)).*$ |
5. .htaccess / Apache mod_rewrite
Regex drives RewriteRule and RewriteCond patterns.
Redirects
# Force HTTPS
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
# Force www
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]
# Force non-www
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC]
RewriteRule ^(.*)$ https://example.com/$1 [L,R=301]
# Remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [L,R=301]
# Add trailing slash
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]
# Remove .html extension
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html [L]
# Redirect old URL pattern to new
RewriteRule ^old-folder/(.*)$ /new-folder/$1 [R=301,L]
# Redirect specific page
RewriteRule ^about-us/?$ /about [R=301,L]
# Catch-all to homepage (use carefully)
RewriteRule ^(.*)$ / [R=301,L]
Common patterns
| Purpose |
Pattern |
| Match any file |
^(.*)$ |
| Match specific folder |
^blog/(.*)$ |
| Match file with extension |
^(.*)\.html$ |
| Match dynamic page |
^product\.php\?id=(\d+)$ |
| Block specific bot |
RewriteCond %{HTTP_USER_AGENT} (badbot|scraper) [NC] |
| Block referrer spam |
RewriteCond %{HTTP_REFERER} (semalt|buttons-for-website) [NC] |
| Match numeric ID |
^item/(\d+)/?$ |
| Match slug |
^post/([a-z0-9-]+)/?$ |
| Match date pattern |
^(\d{4})/(\d{2})/(\d{2})/ |
View our full .htaccess redirect document here.
6. Nginx Rewrite Rules
# Force HTTPS
if ($scheme != "https") {
return 301 https://$host$request_uri;
}
# Force non-www
if ($host ~* ^www\.(.*)$) {
return 301 https://$1$request_uri;
}
# Remove trailing slash
rewrite ^/(.*)/$ /$1 permanent;
# Old to new URL
rewrite ^/old-page/?$ /new-page permanent;
# Pattern-based redirect
rewrite ^/category/(.+)/?$ /shop/$1 permanent;
# Block user agents
if ($http_user_agent ~* (badbot|scraper|crawler)) {
return 403;
}
7. robots.txt Patterns
robots.txt uses a limited subset — only * (wildcard) and $ (end of string).
# Block all PDFs
Disallow: /*.pdf$
# Block URL parameters
Disallow: /*?
# Block specific parameter
Disallow: /*?sessionid=
# Block file type
Disallow: /*.xlsx$
# Block directory and subdirs
Disallow: /private/
# Block specific pattern
Disallow: /*/print/
# Allow specific file in blocked dir
Disallow: /admin/
Allow: /admin/public-page.html
8. URL Validation & Parsing
| Purpose |
Regex |
| Valid URL (general) |
^https?:\/\/[\w.-]+(?:\.[a-z]{2,})+(?:\/[\w.\-~:?#[\]@!$&'()*+,;=%]*)*$ |
| HTTPS URL only |
^https:\/\/[\w.-]+\.[a-z]{2,}(\/.*)?$ |
| Extract domain from URL |
^https?:\/\/(?:www\.)?([^\/]+) |
| Extract path |
^https?:\/\/[^\/]+(\/.*) |
| Extract query string |
\?(.+)$ |
| Extract a single parameter |
[?&]utm_source=([^&]+) |
| Strip UTM parameters |
[?&]utm_[^&]+ |
| Match slug format |
^[a-z0-9]+(-[a-z0-9]+)*$ |
| Invalid slug (has caps/spaces) |
[A-Z\s_] |
9. HTML & Content Patterns
| Purpose |
Regex |
| Find all links |
<a[^>]+href=["']([^"']+)["'][^>]*> |
| Find nofollow links |
<a[^>]+rel=["'][^"']*nofollow[^"']*["'] |
| Find images without alt |
<img(?![^>]*\balt=)[^>]*> |
| Find images with empty alt |
<img[^>]+alt=["']["'][^>]*> |
| Find H1 tags |
<h1[^>]*>(.*?)<\/h1> |
| Find all heading tags |
<h[1-6][^>]*>(.*?)<\/h[1-6]> |
| Find meta description |
<meta[^>]+name=["']description["'][^>]+content=["']([^"']+)["'] |
| Find title tag |
<title[^>]*>(.*?)<\/title> |
| Find canonical |
<link[^>]+rel=["']canonical["'][^>]+href=["']([^"']+)["'] |
| Find iframe embeds |
<iframe[^>]+src=["']([^"']+)["'] |
| Strip HTML tags |
<[^>]+> |
| Find inline styles |
style=["'][^"']*["'] |
| Find hex colors |
#[0-9a-fA-F]{6}\b|#[0-9a-fA-F]{3}\b |
10. Data Validation
| Purpose |
Regex |
| Email |
^[\w.+-]+@[\w-]+\.[\w.-]+$ |
| Email (stricter) |
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ |
| US phone |
^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$ |
| UK phone |
^(?:0|\+?44)(?:\d\s?){9,10}$ |
| International phone (E.164) |
^\+[1-9]\d{1,14}$ |
| US ZIP code |
^\d{5}(-\d{4})?$ |
| UK postcode |
^[A-Z]{1,2}\d[A-Z\d]?\s?\d[A-Z]{2}$ |
| Credit card (general) |
^\d{13,19}$ |
| Date YYYY-MM-DD |
^\d{4}-\d{2}-\d{2}$ |
| Date MM/DD/YYYY |
^(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/\d{4}$ |
| IPv4 address |
^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$ |
| Hex color |
^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$ |
| Username (alphanumeric + underscore) |
^[a-zA-Z0-9_]{3,20}$ |
| Strong password |
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$ |
11. WordPress-Specific Patterns
| Purpose |
Regex |
| Match post slug |
^[a-z0-9]+(-[a-z0-9]+)*$ |
| Block wp-login attacks (.htaccess) |
RewriteRule ^wp-login\.php$ - [F,L] |
| Hide /wp-admin from non-admins |
RewriteRule ^wp-admin/ - [F,L] |
| Post ID URL |
^\?p=(\d+)$ |
| Category URL |
^category/([a-z0-9-]+)/?$ |
| Tag URL |
^tag/([a-z0-9-]+)/?$ |
| Author URL |
^author/([a-z0-9-]+)/?$ |
| Date archive |
^(\d{4})/(\d{2})?/?(\d{2})?/?$ |
| Feed URLs |
/feed/?$|/rss/?$|/atom/?$ |
| Attachment URLs |
/wp-content/uploads/ |
Redirection plugin (WordPress)
Supports full PCRE. Common uses:
Source: /old-blog/(.*) Target: /blog/$1
Source: /product-(\d+) Target: /product/?id=$1
Source: /(.*)/$ Target: /$1 (remove trailing slash)
12. Log File Analysis
For parsing access logs (Apache/Nginx combined format).
| Purpose |
Regex |
| Full log line |
^(\S+) \S+ \S+ \[([^\]]+)\] "(\S+) (\S+) (\S+)" (\d+) (\d+) "([^"]*)" "([^"]*)" |
| IP address |
^(\d{1,3}\.){3}\d{1,3} |
| Timestamp |
\[(\d{2}\/\w{3}\/\d{4}:\d{2}:\d{2}:\d{2} [+-]\d{4})\] |
| Status code |
" (\d{3}) |
| 4xx errors |
" 4\d{2} |
| 5xx errors |
" 5\d{2} |
| Googlebot requests |
Googlebot|Googlebot-Image|Googlebot-Video |
| Bingbot |
Bingbot|bingbot |
| All major bots |
(Googlebot|Bingbot|Slurp|DuckDuckBot|Baiduspider|YandexBot|facebookexternalhit|Twitterbot|LinkedInBot) |
| Bad bots |
(AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot) |
13. CSV / Data Cleaning
| Purpose |
Regex |
| Remove tracking parameters |
[?&](utm_\w+|fbclid|gclid|msclkid|mc_cid|mc_eid)=[^&]* |
| Extract hashtags |
#\w+ |
| Extract @mentions |
@\w+ |
| Match whitespace runs |
\s{2,} |
| Trim leading/trailing space |
^\s+|\s+$ |
| Remove non-ASCII |
[^\x00-\x7F]+ |
| Match currency |
[$£€¥]\s?\d+(?:[.,]\d{2})? |
Tips for SEO work
- Google Search Console and GA4 use RE2 syntax, which means no lookbehinds (
(?<=...)) but lookaheads work. If a pattern fails, that’s usually the reason.
- GSC regex is case-sensitive unless you add
(?i) at the start. That trips up a lot of people trying to catch brand misspellings.
- For
.htaccess, always test redirects in a staging environment — a bad RewriteRule can take down a whole site. Use a redirect checker like httpstatus.io to verify chains and loops.
- When building exclusion regex, the pattern
^((?!keyword).)*$ matches anything that doesn’t contain “keyword” — invaluable for filtering out branded queries, admin paths, etc.
- Escape dots in domain names (
example\.com), otherwise the dot matches any character and you get false matches.