Querying WHOIS/RDAP servers

networking
programming

Querying information about internet resources using the WHOIS and RDAP protocol.

WHOIS Protocol

Usages for WHOIS

According to ICANN (Internet Corporation for Assigned Names and Numbers):

Internet operators use WHOIS to identify individuals or entities responsible for the operation of a network resource on the Internet. Over time, WHOIS has evolved to serve the need of many different stakeholders, such as domain name registrants, law enforcement agents, intellectual property and trademark owners, businesses and individuals.

Source: https://whois.icann.org/en/primer

Essentially, people can query WHOIS to get information about contact persons or responsible persons for this resource.

Authoritative WHOIS Service

Some TLD (Top-level domain) WHOIS servers like the one from the org registry store the complete information about their domains, which is called a “thick” WHOIS server, while others like the com registry WHOIS server will refer to the registrar’s WHOIS server where the domain was registered, which is called a “thin” WHOIS server.

Querying with the WHOIS Protocol

Users are able to query WHOIS servers using the whois Unix command line tool. The WHOIS protocol is publicly specified by the RFC 3912: WHOIS Protocol Specification, which makes it rather easy to code your own WHOIS client, like I did. The WHOIS response is in a “human-readable” format, which makes it harder for computers to parse it, especially with the unfortunate reality that the format is not specified and different WHOIS servers use different formats. Fortunately though, IANA (Internet Assigned Numbers Authority) experimented in 2010 and later adopted a new WHOIS server which provides the output in a more predictable and RPSL-style format as explained on the ICANN blog:

The first thing that one would notice comparing the output is that we have adopted a new RPSL-style output format. It is a more predictable format that is commonly used by other WHOIS services, and also is easier to parse with a predictable “key: value” format.

Source: https://www.icann.org/en/blogs/details/try-the-new-iana-whois-server-20-5-2010-en

RPSL is specified by the RFC 2622: Routing Policy Specification Language (RPSL).

The following Python function I wrote can be used to query the IANA WHOIS server and parse it in an acceptable format:

import socket


def whois(cmd: str, whois_server="whois.iana.org", port=43):
    # Connect to the service host
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((whois_server, port))
    # Send a single "command line", ending with <CRLF>.
    sock.send(cmd.encode("utf-8"))
    sock.send("\r\n".encode("utf-8"))
    # Receive information in response to the command line.
    msg = ""
    data = sock.recv(1024)
    while data:
        msg += data.decode("utf-8")
        data = sock.recv(1024)
    groups = []
    currGroup = {}
    for line in msg.splitlines():
        if line.startswith("%"):  # Skip comments
            continue
        if line == "":  # Blank line refers to start of new group
            if currGroup != {}:
                groups.append(currGroup)
            currGroup = {}
            continue
        kv = line.split(":", 1)  # Split line to key/value pair
        key = kv[0].lstrip()  # Remove leading spaces
        value = kv[1].lstrip()  # Remove leading spaces
        if key in currGroup:
            # If key already exists, join values by newline
            currGroup[key] = "\n".join([currGroup[key], value])
        else:
            currGroup[key] = value
    return groups


if __name__ == "__main__":
    import json

    s = whois("google.com")
    print(json.dumps(s, indent=4))

My full GitHub Gist (with example output): Query WHOIS information from IANA server

Registration Data Access Protocol (RDAP)

RDAP (Registration Data Access Protocol), which is the successor to the very old WHOIS protocol, provides, according to rdap.org:

  • A machine-readable representation of registration data;
  • Differentiated access;
  • Structured request and response semantics;
  • Extensibility.

Source: https://about.rdap.org

RDAP is uses RESTful communication through HTTP, with the response being in JSON format, making it way easier to parse and automate requests.

Authoritative RDAP Service

IANA provides a bootstrap files at https://data.iana.org/rdap/ which maps a resource (e.g. a TLD like com) to an RDAP server. This RDAP server might redirect to a different one, etc… until the authoritative RDAP server is reached. You could also use a bootstrap RDAP server like https://rdap.org/ which will basically keep track of the IANA bootstrap file for you and lead you to the same, authoritative server.

Querying with the RDAP Protocol

Unlike the WHOIS protocol, I found multiple RFC specifications for the RDAP protocol including the following RFCs:

RFC Specs regarding RDAP

Fetching the Bootstrap Data

First, the bootstrap data for the specific purpose has to be fetched. The bootstrap data could be saved so it doesn’t have to be fetched on every query. The following bootstrap files exist:

IANA Bootstrap Files

Here’s a Python function I made to fetch bootstrap data for type asn, dns, ipv4, or ipv6.

import http.client
import urllib.parse
import json


def rdap_fetch_bootstrap(
    obj_type: str, bootstrap_server_base="https://data.iana.org/rdap", fetch_url=None
):
    if not fetch_url:
        fetch_url = "".join([bootstrap_server_base, "/", obj_type, ".json"])
    url_parse = urllib.parse.urlparse(fetch_url)
    https = url_parse.scheme == "https"
    port = http.client.HTTPS_PORT if https else http.client.HTTP_PORT
    conn = http.client.HTTPSConnection(url_parse.netloc, port)
    # Send HTTPS request
    conn.request("GET", url_parse.path)
    res = conn.getresponse()
    if res.status != 200:
        raise Exception()

    data = json.loads(res.read())
    return data

My full GitHub Gist (with example output): Query RDAP information using a server from IANA’s bootstrap file

This data structure might reduce file size, but there could be a more optimal structure for finding an object inside it. With the structure used, we have to loop through the whole array, until the desired object is found. I developed this Python function to do that loop through the services array of the bootstrap data from the bootstrap parameter, and check each value if it matches the obj_query provided in the parameter.

import ipaddress


def rdap_find_in_bootstrap(obj_type: str, obj_query: str, bootstrap: dict):
    if obj_type == "asn":
        for service in bootstrap["services"]:
            for value in service[0]:
                range: list[str] = value.split("-")
                range0 = int(range[0])
                obj_int = int(obj_query)
                if len(range) > 1:
                    range1 = int(range[1])
                    if not (range0 <= obj_int and obj_int <= range1):
                        continue
                else:
                    if obj_int != range0:
                        continue
                return service[1]
    elif obj_type == "dns":
        tld = obj_query.split(".")[-1]
        for service in bootstrap["services"]:
            for value in service[0]:
                if value != tld:
                    continue
                return service[1]
    elif obj_type == "ipv4" or obj_type == "ipv6":
        for service in bootstrap["services"]:
            for value in service[0]:
                value_net = ipaddress.ip_network(value)
                query_ip = ipaddress.ip_address(obj_query)
                if query_ip not in value_net:
                    continue
                return service[1]
    return []

My full GitHub Gist (with example output): Query RDAP information using a server from IANA’s bootstrap file

Sending RDAP Queries

An RDAP query can now be sent to one of the RDAP services returned by the rdap_find_in_bootstrap function. First an HTTP GET request is made to the following URL pattern: {baseurl}/{type}/{object} (e.g. https://rdap.verisign.com/com/v1/domain/google.com) The type can be either one of ip, autnum, domain, nameserver, entity, or help as specified in RFC 9082: Registration Data Access Protocol (RDAP) Query Format.

I wrote this Python function to send an RDAP request.

import http.client
import urllib.parse
import json


def rdap(
    obj_type: str,
    obj_query: str,
    rdap_server_base="https://rdap.org",
    redirects=0,
    query_url=None,
):
    if not query_url:
        # The query URL is constructed by concatenating the base URL with the entity path segment
        query_url = "".join([rdap_server_base, obj_type, "/", obj_query])
    url_parse = urllib.parse.urlparse(query_url)
    https = url_parse.scheme == "https"
    HTTP_SConn = http.client.HTTPSConnection if https else http.client.HTTPConnection
    port = http.client.HTTPS_PORT if https else http.client.HTTP_PORT
    conn = HTTP_SConn(url_parse.netloc, port)
    # Send HTTP/S request
    conn.request("GET", url_parse.path)
    res = conn.getresponse()
    if res.status == 301 or res.status == 302 or res.status == 303 or res.status == 307:
        # Server knows of an RDAP service which is authoritative for the requested resource.
        # Follow the URL listed in the Location header.
        if redirects > 10:
            # Limit max redirects to 10
            raise Exception()
        elif res.headers["Location"] == query_url:
            # Prevent redirecting to same server
            raise Exception()
        else:
            return rdap(
                obj_type,
                obj_query,
                redirects=redirects + 1,
                query_url=res.headers["Location"],
            )
    elif res.status == 400:
        # Server received an invalid request (malformed path, unsupported object type, invalid IP address, etc).
        raise Exception()
    elif res.status == 404:
        # Server didn't know of an RDAP service which is authoritative for the requested resource.
        raise Exception()
    elif res.status == 429:
        # Exceeded the server's rate limit.
        raise Exception()
    elif res.status == 500:
        # Server is broken in some way.
        raise Exception()
    elif res.status == 504:
        # Server needed to refresh the IANA bootstrap registry, but couldn't.
        raise Exception()
    elif res.status != 200:
        raise Exception()

    data = json.loads(res.read())
    return data

My full GitHub Gist (with example output): Query RDAP information using a server from IANA’s bootstrap file