• Latest
  • Trending
  • All
Python Regex Cheatsheet20 Essential Patterns - PeopleAreGeek

Python Regex Cheatsheet: 20 Essential Patterns Explained (2026)

May 27, 2026
WordPress Security Hardening Checklist: 34 Scored Controls with Copy-Paste Fixes - cover image

WordPress Security Hardening Checklist: 34 Scored Controls with Copy-Paste Fixes

June 3, 2026
Maximizing Website Speed with Image Optimization Techniques for 2026 - cover image

Maximizing Website Speed with Image Optimization Techniques for 2026

June 3, 2026
SSL certificate renewal manager - 8 ACME clients, expiry calculator and monitoring - cover image

SSL Certificate Renewal Manager: certbot, acme.sh, lego, Caddy, cert-manager

June 3, 2026
CORS policy generator - 14 server and framework configs with presets and live security review - cover image

CORS Policy Generator: Headers + Nginx, Apache, Express, FastAPI, Django Config

June 3, 2026
netsh wlan command reference - 72 commands with example output and copy - cover image

netsh wlan Commands: Windows Wi-Fi Cheat Sheet (Show Password, Profiles, Hotspot)

June 2, 2026
Fix: ESXi Host Not Responding / Disconnected in vCenter (2026) - cover image

Fix: ESXi Host Not Responding / Disconnected in vCenter (2026)

June 1, 2026
VMware ESXi Purple Screen of Death (PSOD): Diagnose and Recover (2026) - cover image

VMware ESXi Purple Screen of Death (PSOD): Diagnose and Recover (2026)

June 1, 2026
VMware PowerCLI command generator cover

VMware PowerCLI Command Generator: VM, Snapshots, Networking, esxcli

June 1, 2026
dd Command Generator: Write ISO to USB, Image Disks, Wipe Drives - cover image

dd Command Generator: Write ISO to USB, Image Disks, Wipe Drives

June 1, 2026
SSH Tunnel Command Generator: Local, Remote and Dynamic Forwarding - cover image

SSH Tunnel Command Generator: Local, Remote and Dynamic Forwarding

June 1, 2026
sed Command Generator: Build Substitute, Delete and Print Commands - cover image

sed Command Generator: Build Substitute, Delete and Print Commands

May 31, 2026
VMware Workstation and Hyper-V on the Same Machine (2026 Fix) - cover image

VMware Workstation and Hyper-V on the Same Machine (2026 Fix)

May 31, 2026
  • Online Tools
  • Network Tools
  • Developer Tools
  • Security Tools
Wednesday, June 3, 2026
  • Login
People Are Geek
  • Online Tools
  • Network Tools
  • Developer Tools
  • Security Tools
No Result
View All Result
People Are Geek
No Result
View All Result
Home Developer Tools

Python Regex Cheatsheet: 20 Essential Patterns Explained (2026)

by People Are Geek
May 27, 2026
in Developer Tools
0
Python Regex Cheatsheet20 Essential Patterns - PeopleAreGeek
0
SHARES
4
VIEWS
Share on FacebookShare on Twitter

Cheatsheet Python regex · 12 min read · Updated May 2026

Python’s re module is the unglamorous workhorse behind log parsing, form validation, data cleanup and a thousand other daily tasks. The patterns below are the 20 I reach for most often in 2026, each presented with the pattern itself, what it does, a tested example, a common edge case the pattern misses, and an alternative when a library is the better tool. Bookmark this page; the patterns are tested against Python 3.12+ and follow modern style (raw strings, anchored when validating, non-capturing groups for grouping-only constructs).

The 20 patterns

  1. Email validation
  2. IPv4 address
  3. IPv6 address (simplified)
  4. URL (http/https)
  5. Phone in E.164
  6. ISO 8601 datetime
  7. Date DD/MM/YYYY
  8. Time HH:MM:SS (24h)
  9. UUID v4
  10. Hex colour code
  11. CSV row split (quoted)
  12. Apache / nginx log entry
  13. Whitespace collapse
  14. Markdown link
  15. JSON number
  16. Python identifier
  17. Unix file path
  18. Windows file path
  19. Strong password
  20. URL-friendly slug

How to read the cards

Every card gives the raw re pattern, the intent (“what it does”), an input that matches, the most common edge case the pattern intentionally does not handle, and an alternative library or approach when regex stops being the right tool. All patterns are anchored with ^ and $ when used for validation; remove the anchors to use them as re.search or re.findall patterns inside larger text.

1

Email validation

r"^[\w.+-]+@[\w-]+(\.[\w-]+)+$"

What it does: Matches typical addresses with letters, digits, dots, plus and dash on both sides of the at sign, with a domain that has at least one dot.

Example: alice.smith+work@example.co.uk matches.

Edge case: Misses RFC 5321 quoted local parts, internationalised domains (IDN) and some unusual TLDs. Catches addresses your application accepts; not a full RFC parser.

Alternative: email.utils.parseaddr for canonical parsing, or the email-validator PyPI package for RFC compliance.

2

IPv4 address

r"^(?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d?\d)$"

What it does: Validates each octet is 0-255. Rejects strings like 999.0.0.1 or 1.2.3.4.5.

Example: 192.168.1.42 matches; 256.0.0.1 does not.

Edge case: Accepts leading zeros like 010.0.0.1 which some parsers treat as octal. Strip leading zeros first if that matters.

Alternative: ipaddress.IPv4Address(s) raises ValueError on invalid input; cleaner for validation.

3

IPv6 address (simplified)

r"^(?:[A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}$"

What it does: Matches a fully-expanded IPv6 address with 8 hex groups separated by colons.

Example: 2001:0db8:85a3:0000:0000:8a2e:0370:7334 matches.

Edge case: Does not handle the :: compressed form, IPv4-mapped ::ffff:1.2.3.4, or the zone identifier %eth0.

Alternative: ipaddress.IPv6Address(s) handles every standard form correctly.

4

URL (http / https)

r"^https?://[\w.-]+(?:\.[a-zA-Z]{2,})+(?:[/?#][^\s]*)?$"

What it does: Matches HTTP and HTTPS URLs with a domain that has a TLD, optionally followed by a path, query or fragment.

Example: https://example.com/path?id=42#section matches.

Edge case: Skips other schemes (ftp, mailto, data:), userinfo (user:pass@) and IDN domains.

Alternative: urllib.parse.urlparse(s) always succeeds; check the resulting scheme and netloc.

5

Phone number in E.164

r"^\+[1-9]\d{6,14}$"

What it does: Validates the E.164 international format: a plus sign, a non-zero country code, then 6 to 14 digits.

Example: +33612345678 matches.

Edge case: Does not validate that the country code or the length matches a real numbering plan. +19999999999 matches structurally but is not a real US number.

Alternative: Google’s phonenumbers PyPI package validates country-specific length and prefix rules.

6

ISO 8601 datetime

r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:\d{2})?$"

What it does: Matches the canonical ISO 8601 format with optional fractional seconds and timezone offset.

Example: 2026-05-27T14:30:00.123+02:00 matches.

Edge case: Accepts impossible dates like 2026-02-30 because the pattern does not enforce calendar rules.

Alternative: datetime.fromisoformat(s) in Python 3.11+ handles every ISO 8601 variant and validates calendar dates.

7

Date DD/MM/YYYY (European)

r"^(0[1-9]|[12]\d|3[01])/(0[1-9]|1[0-2])/\d{4}$"

What it does: Validates the European DD/MM/YYYY format with realistic day and month ranges.

Example: 27/05/2026 matches; 32/05/2026 does not.

Edge case: Accepts 31/02/2026 because the pattern does not know February. Validate calendar with datetime.strptime(s, "%d/%m/%Y") if it matters.

Alternative: datetime.strptime raises ValueError on impossible dates; cleaner for strict validation.

8

Time HH:MM:SS (24-hour)

r"^(?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d$"

What it does: Validates a 24-hour clock time with hours 00-23, minutes 00-59, seconds 00-59.

Example: 14:30:00 matches; 25:00:00 does not.

Edge case: Does not accept leap seconds (23:59:60) which are valid in some timestamp standards.

Alternative: datetime.time.fromisoformat for stricter parsing.

9

UUID v4 (random)

r"^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$"

What it does: Validates the v4 UUID layout: version nibble 4 and variant high bits 10 (so the first variant nibble is 8, 9, a or b).

Example: 550e8400-e29b-41d4-a716-446655440000 matches.

Edge case: Lowercase only. For uppercase, use re.IGNORECASE or [0-9a-fA-F].

Alternative: uuid.UUID(s) validates any version and any case.

10

Hex colour code

r"^#(?:[0-9a-fA-F]{3}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$"

What it does: Matches CSS hex colours in 3-digit shorthand (#f00), 6-digit (#ff0000) or 8-digit RGBA (#ff0000ff).

Example: #1e293b matches.

Edge case: Does not match 4-digit shorthand RGBA (#f00f); add {4} to the alternation if you need it.

Alternative: CSS Color module level 4 also accepts named colours and modern function syntax — for those, use a CSS parser.

11

CSV row split with quoted fields

r'(?:^|,)("(?:[^"]|"")*"|[^,]*)'

What it does: Iterates fields separated by comma, supporting double-quoted fields that contain commas or escaped quotes ("").

Example: alice,"smith, jr.",42 yields three captures.

Edge case: Does not handle quoted newlines (multi-line CSV records) and does not strip the surrounding quotes from captured fields.

Alternative: csv.reader from the standard library handles every edge case correctly; reach for regex only when csv is overkill.

12

Apache or nginx common log entry

r'^(\S+) \S+ \S+ \[([^\]]+)\] "(\S+) (\S+) (\S+)" (\d+) (\S+)'

What it does: Extracts seven fields from the common log format: client IP, timestamp, HTTP method, path, protocol, status code, bytes sent.

Example: 127.0.0.1 - - [27/May/2026:14:30:00 +0200] "GET /index.html HTTP/1.1" 200 1234 matches.

Edge case: Does not capture the combined format extras (referer, user-agent). Add "([^"]*)" "([^"]*)" at the end to capture them.

Alternative: Tools like logparser or shipping logs to Loki with a parser pipeline scale better than regex.

13

Whitespace collapse

re.sub(r"\s+", " ", text).strip()

What it does: Replaces every run of whitespace (spaces, tabs, newlines) by a single space, then trims edges. The one-liner for cleaning OCR output, scraped HTML or user-pasted text.

Example: " Hello\t\n world " becomes "Hello world".

Edge case: Also collapses meaningful whitespace inside <pre> blocks if you run it on full HTML; only apply to text content.

Alternative: " ".join(text.split()) achieves the same result without regex and is slightly faster for short strings.

14

Markdown link extraction

r"\[([^\]]+)\]\(([^)]+)\)"

What it does: Extracts the visible label (group 1) and the URL (group 2) from a Markdown link like [label](https://example.com).

Example: See [the docs](https://docs.example.com) yields label “the docs” and URL “https://docs.example.com”.

Edge case: Fails when the label contains square brackets, or when the URL contains parentheses (common in Wikipedia URLs).

Alternative: A real Markdown parser like mistune or markdown-it-py if you process untrusted content.

15

JSON number

r"-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?"

What it does: Matches a JSON-compliant number: optional sign, no leading zero, optional fractional and exponent parts.

Example: -1.23e-4 matches; 007 does not.

Edge case: Does not match Infinity, NaN or hexadecimal numbers (those are not valid JSON anyway).

Alternative: json.loads(s) for full JSON parsing.

16

Python identifier (ASCII-only)

r"^[A-Za-z_][A-Za-z0-9_]*$"

What it does: Validates a legal Python 2 / restricted-ASCII variable name: starts with letter or underscore, continues with letters, digits or underscores.

Example: my_var2 matches.

Edge case: Modern Python allows Unicode identifiers (élève, π); the pattern rejects them. Use str.isidentifier() if you accept Unicode.

Alternative: "name".isidentifier() is the canonical Python check.

17

Unix file path

r"^/(?:[^/\x00]+/)*[^/\x00]*$"

What it does: Matches an absolute Unix path with forward slashes and no null bytes.

Example: /home/user/file.txt matches.

Edge case: Accepts .. and . segments (path traversal). Resolve with pathlib.Path.resolve() before security-sensitive use.

Alternative: pathlib.PurePosixPath(s) for parsing without filesystem access.

18

Windows file path

r'^[A-Za-z]:\\(?:[^\\/:*?"<>|]+\\)*[^\\/:*?"<>|]*$'

What it does: Matches an absolute Windows path with a drive letter and backslash separators. Rejects illegal filename characters.

Example: C:\Users\admin\file.txt matches.

Edge case: Does not handle UNC paths (\\server\share), long path prefix (\\?\) or forward slashes that Windows also accepts.

Alternative: pathlib.PureWindowsPath(s) for parsing without filesystem access.

19

Strong password

r"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^A-Za-z0-9]).{12,}$"

What it does: Requires a minimum length of 12, at least one lowercase, one uppercase, one digit, one special character using four lookaheads.

Example: P@ssw0rd-StrongEnough matches.

Edge case: Composition rules do not equal strength; P@ssw0rd1234 matches the regex but is in every breach list. Combine with a list check.

Alternative: zxcvbn-python measures actual entropy and detects dictionary patterns.

20

URL-friendly slug

r"^[a-z0-9]+(?:-[a-z0-9]+)*$"

What it does: Validates lowercase slugs with single hyphens between alphanumeric groups. No leading, trailing or doubled hyphens.

Example: my-awesome-post-2026 matches; --bad-- does not.

Edge case: Does not strip diacritics. To turn "Café Brûlé" into a clean slug, normalise first with unicodedata.normalize.

Alternative: The python-slugify PyPI package handles Unicode normalisation, transliteration and length capping in one call.

Test your regex live?

Our Regex Tester runs Python-flavour patterns against sample input with capture group highlighting and replacement preview — no install needed.

Open Regex Tester →

Patterns to avoid in 2026

Three “classic” regex patterns still circulate that you should not copy-paste from old StackOverflow answers. The “perfect email” regex (1,000+ chars long) tries to implement RFC 5322 verbatim and ends up rejecting real-world addresses that work fine in production; use a short pragmatic pattern plus an actual delivery confirmation step instead. The “URL with everything” pattern that accepts every scheme, userinfo and IDN is brittle; urllib.parse.urlparse is faster and correct. The “credit-card number” regex that claims to validate Visa, Mastercard and Amex by prefix gives a false sense of security; only the Luhn check confirms a number is well-formed, and only the bank confirms it is real.

Performance notes for high-volume regex

Compile once, reuse forever. re.compile(pattern) in module scope, used as PATTERN.match(s) in your hot loop, is meaningfully faster than calling re.match(pattern, s) which recompiles internally. For very high volumes (millions of matches per second), the regex PyPI package supports possessive quantifiers and atomic groups that prevent catastrophic backtracking, and the hyperscan bindings give 100x speedups for static pattern sets where you do not need named captures.

Frequently asked questions

Why use raw strings (r"...") for every pattern?

Backslashes have special meaning in both Python string literals and regex. Without the r prefix, "\d" becomes "\\d" at the Python level, which still works but is confusing; some sequences like "\n" get interpreted as actual newlines. The raw prefix makes the pattern look exactly like the regex documentation. Use it universally; it has no downside.

When is regex the wrong tool?

For nested structures (HTML, JSON, recursive grammars), regex theoretically cannot parse them correctly — the famous “you cannot parse HTML with regex” rule. For dates, paths and UUIDs, a specialised library (datetime, pathlib, uuid) gives stronger guarantees and clearer code. Regex shines for pattern-matching simple, flat, well-defined formats.

How do I avoid catastrophic backtracking?

Three principles. First, prefer non-capturing groups (?:...) over capturing (...) when you do not need the capture. Second, use possessive quantifiers (*+, ++) in the regex PyPI module to disable backtracking on greedy stretches. Third, test your pattern against pathological input (“aaaaaaaaaa” repeated 30 times); if it hangs, refactor.

Does Python regex support look-behind?

Yes, both fixed-width (?<=...) and variable-width since Python 3.7. Look-aheads (?=...) have always been supported. Combine them sparingly: a regex with three look-arounds is usually clearer rewritten as two passes.

What about Unicode classes like \p{L}?

The standard library re module does not support \p{...} Unicode property classes. The regex PyPI module does, and is a drop-in replacement with extra features. For ASCII-only validation, re is fine; for true Unicode handling, install regex.

Where can I download this cheatsheet?

Save the page as PDF from your browser (Ctrl+P on Windows, Cmd+P on macOS, then “Save as PDF”). The article is print-friendly with no banner ads. We may add an exportable JSON of the patterns in a future release.

PeopleAreGeek tools to go further

Regex Tester (live) JSON Formatter UUID Generator Hash Generator Timestamp Converter Base64 Decoder Developer Error Fix Hub
ShareTweetPin
People Are Geek

People Are Geek

I'm Stephane, a network and systems engineer with over 15 years of hands-on experience on production infrastructure, virtualization (ESXi, Proxmox), networking, and self-hosting. Earlier in my career I built and ran a Linux resource site that became a well-known reference for sysadmins. Today I focus on cybersecurity, and I also work as a technical trainer, teaching networking and security to people who do it for a living. Everything on People Are Geek comes from real-world practice, not theory. I build every tool on this site myself, and I write about what I've actually deployed, broken, and fixed. If it's here, I've used it.

People Are Geek

Copyright © 2017 JNews.

Navigate Site

  • About PeopleAreGeek
  • All Tools and Articles
  • Contact
  • Cookie Policy
  • Hyper-V Hub: Tools, Error Fixes and Lab Guides
  • Linux Hub: Cross-Distro Reference, Articles, Tools
  • Page de test Codex
  • Privacy Policy
  • Sample Page
  • Terms of Service
  • VMware vSphere & ESXi Hub: Tools, Error Fixes and Guides

Follow Us

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Online Tools
  • Network Tools
  • Developer Tools
  • Security Tools

Copyright © 2017 JNews.