Advertisements

headerup to 320x100 / 728x90

URL Extractor

Pull all HTTP / HTTPS URLs out of any text with trailing punctuation cleaned

Input
Loading editor...
Output

Output will appear here...

Advertisements

content bottomup to 300x250

What is URL Extractor

Last reviewed:

A URL (Uniform Resource Locator) is a web address that points to a resource like a page, API endpoint, or file, typically including scheme, host, path, and query.

URL Extractor scans for 'http://' and 'https://' patterns, captures the continuous URL tokens, strips trailing punctuation like . , ; : ! ? ) ], deduplicates them, and outputs one URL per line.

It's a staple for SEO audits, content reviews, link rot detection, and extracting references from long-form articles or PDFs pasted into the browser.

Why use it

  • Audit internal and external links inside a draft article.
  • Extract every URL from a PDF or Markdown export.
  • Build outreach lists from public directories or forum threads.
  • Prepare data for link rot checkers and SEO crawlers.
  • Analyse citations from research papers.

Features

  • http and https scheme support
  • Strips trailing punctuation
  • Deduplicates output
  • Handles Unicode and percent-encoded URLs
  • Zero-upload URL Extractor pipeline, nothing touches a server

How to use URL Extractor

  1. Paste text. Drop any block (HTML, Markdown, plain text).
  2. Run. Every URL is captured and cleaned.
  3. Copy the list. Use the deduplicated URL list for audits or outreach.

Example (before/after)

Input

See https://example.com/page, https://docs.example.com/intro, and http://old.site.

URLs

https://example.com/page
https://docs.example.com/intro
http://old.site

Common errors

Scheme-less URLs missed

Bare 'example.com' without scheme is not matched.

Fix: Prepend https:// before running, or use a more permissive regex.

Trailing punctuation included

Older tools may keep '.' or ',' on the end.

Fix: This tool strips trailing . , ; : ! ? ) ] automatically.

FAQ

Which schemes are supported?

http and https. FTP, mailto, and magnet links are not captured.

Are duplicates removed?

Yes — exact duplicates are deduplicated.

Does it handle Unicode URLs?

Yes — percent-encoded and UTF-8 URLs both work.

Can I paste HTML?

Yes — href attributes are extracted along with naked URLs.

Is my data uploaded?

No — extraction is entirely client-side.