Split URL into components using regex

  • 23 May 2016
  • ADM

 

Split URL into components using regex - images/logos/java.jpg

 

If you need to split the URL into components: protocol, domain, port and URI you can do it very simple using one of the following methods:

  • regular expression: The regular expression for an valid URL would be:
    (https?://)([^:^/]*)(:\\d*)?(.*)?
    method splitUsingRegex from the following example will use this method.
  • or using class URL from Java: method splitUsingURL will use this method.

Example

package com.admfactory;

import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class URLSplit {
	
	/**
	 * Split the URL into components
	 *
	 * @param url to be splitted
	 */
	public static void splitUsingRegex(String url) {
		System.out.println("Split URL example using regex");
		System.out.println();
		Pattern pattern = Pattern.compile("(https?://)([^:^/]*)(:\\d*)?(.*)?");
		Matcher matcher = pattern.matcher(url);
		matcher.find();
		
		String protocol = matcher.group(1);
		String domain = matcher.group(2);
		String port = matcher.group(3);
		String uri = matcher.group(4);
		System.out.println(url);
		System.out.println("protocol: " + (protocol != null ? protocol : ""));
		System.out.println("domain: " + (domain != null ? domain : ""));
		System.out.println("port: " + (port != null ? port : ""));
		System.out.println("uri: " + (uri != null ? uri : ""));
		System.out.println();
	}
	
	/**
	 * Split the URL into components
	 *
	 * @param path to be splitted
	 */
	public static void splitUsingURL(String path) throws Exception {
		System.out.println("Split url example using URL class");
		System.out.println();
		URL url = new URL(path);
		System.out.println("protocol: " + url.getProtocol());
		System.out.println("domain: " + url.getHost());
		System.out.println("port: " + url.getPort());
		System.out.println("uri: " + url.getPath());
	}
	
	public static void main(String[] args) throws Exception {
		String url1 = "/how-to-format-xmlgregoriancalendar/";
		String url2 = "http://docs.oracle.com/javase/7/docs/api/java/util/Date.html";
		String url3 = "https://example.com:8080/test1/index.html";
		
		splitUsingRegex(url1);
		splitUsingRegex(url2);
		splitUsingRegex(url3);
		
		splitUsingURL(url1);
		splitUsingURL(url2);
		splitUsingURL(url3);
		
	}
}

Output

Split URL example using regex

/how-to-format-xmlgregoriancalendar/
protocol: http://
domain: www.admfactory.com
port: 
uri: /how-to-format-xmlgregoriancalendar/

Split URL example using regex

http://docs.oracle.com/javase/7/docs/api/java/util/Date.html
protocol: http://
domain: docs.oracle.com
port: 
uri: /javase/7/docs/api/java/util/Date.html

Split URL example using regex

https://example.com:8080/test1/index.html
protocol: https://
domain: example.com
port: :8080
uri: /test1/index.html

Split url example using URL class

protocol: http
domain: www.admfactory.com
port: -1
uri: /how-to-format-xmlgregoriancalendar/
Split url example using URL class

protocol: http
domain: docs.oracle.com
port: -1
uri: /javase/7/docs/api/java/util/Date.html
Split url example using URL class

protocol: https
domain: example.com
port: 8080
uri: /test1/index.html

 

References