IP ports and protocols

What are IP “ports” and how are they used?

Computers can receive communication messages from other systems for many different purposes.  For example, it could receive e-mail messages, requests for web pages, requests to open a remote desktop session, a file transfer, a request for data stored in a database, etc.

But when a message enters the computer’s internal memory from its network hardware, how does the computer know what service is being requested?  The answer is that each message is encoded with a number that has a decimal value from 1 through 65,535 that identifies the “port” to which the source computer has directed the message.

Destination port assignment

Now that the receiving computer knows the port number, how does it know what application program is supposed to process the incoming message?  When any application or service that processes specific types of messages is started on the computer, one of the tasks it performs is to “bind” to a specific port number.  For example, a web server application typically binds to ports 80 and 443, the ports associated with the “http:” and “https:” protocols.  Basically, through the binding process, the application is telling the networking software that all communications traffic with a port number of 80 or 443 should be sent to the web server application.  But a web server is not required to bind to ports 80 and 443.  Any program can bind to any available port between 1 and 65,535 as long as no other application program or service has already bound to that port already. 

Well-known ports

To avoid network chaos, the port numbers of many standard ports are predefined by convention.  For example, the e-mail protocol “smtp” is usually bound to port 25 and Windows systems use ports 137, 138, 139, 143 for many of its network functions.  Usually, these predefined port numbers have a value between 1 and 1023, and are called “well-known” ports.  Virtually every port value in this range is predefined.  Again, remember that if someone would like to listen for e-mail on port 12345, he or she could modify the e-mail program to bind to port 12345, and the program could receive e-mail messages through that port if nothing else has bound to it already.  Of course, any sending e-mail server would need to know that otherwise the connection would fail.

The power of well-known ports

Any application program or service that uses a well-known port must run with full administrative privileges.  That implies that any application program or service that is bound to a port below 1024, if it is compromised, would give the attacker full administrative privileges to the computer running the program or service.  For this reason, some organizations do not run their web servers bound to port 80, but to another port above 1023, such as port 8080, since it would not require the web server application to run with full privileges.

Ports 1024 and above

While some services have standardized on certain port numbers over 1023, such as Oracle database software that uses 1521, most of the ports above 1023 can be used for any purpose that an application programmer requires.  So, if you write a program that is intended to listen for its own special message type you can bind it to most ports above the well-known range.

What about source ports?

Source ports are important because they tell the destination computer where to send the response to the incoming message.  When a computer initiates a communications session with a destination device and service, a source port is assigned, usually equal to the next available port number above the well-known range.  The source port number is almost never the same as the destination port although, in theory, it could be.

If a source computer opens up a browser window and visits a web site, it will create a message with a header that includes a destination port of 80 (or 443) and a source port set to an available value above 1024, e.g., 12345.  When the destination computer responds to the source device, it builds the response’s message header with its own port number (80 or 443) as the source and a destination port equal to the source port in the original message, i.e., 12345.  If the user has three browser windows open on the same computer, he or she may be sending messages to the web servers using source ports 12345, 12346 and 12347, one for each window.  The each web server would then respond to the appropriate destination port (12345, 12346 or 12347) on the workstation.  When building firewall rules which usually restrict traffic based upon source and destination IP address and destination port, keep in mind that variable source ports can also become destination ports.

IP protocols

Each port not only identifies its associated application program or service, but it also needs to know what protocol to use when communicating over that port.  There are three protocols that will be covered in this discussion:  TCP, UDP and ICMP.

Messages can be very large, but the amount of data that can be received by the destination computer at any one time must be limited to a certain buffer size.  As a result, complete messages are not always sent in one transmission, but are sent as a series of message segments called “packets”.  So, for this discussion, the term “message packet” will be used to refer to one of the message segments of the complete message.

The TCP protocol

TCP is a “guaranteed” protocol, i.e., the protocol has built-in mechanisms to ensure that each message packet sent was received exactly as sent, and that no message packets have been dropped.  To protect the integrity of each message packet, the sending computer hashes the message packet content and includes the hash in the message packet’s header.  The destination computer recalculates the message content hash and, if the two values match, the integrity of the packet is confirmed.  Once confirmed, the destination computer acknowledges that the message packet was received correctly.  If the two values do not match, the destination computer sends a status message indicating that the packet was not received accurately.  To ensure that all packets are received, the sending computer includes a sequence number in the message header.  The destination computer checks the sequence numbers to ensure that it has received them all.  Interestingly, message packets may not arrive in sequence as message routes are adjusted based upon traffic patterns, so the sequence checking is not as simple as just checking for the next number.

The UDP protocol

The UDP protocol is not guaranteed – some message packets may not arrive at the destination device.  Why would anyone want to use UDP?  TCP is a protocol that has a pretty significant amount of overhead associated with hashing, sequence checking, etc.  If you have some kind of data collection service, the use of a guaranteed protocol like TCP could overwhelm the system and result in service disruption and data loss.  For example, if you are running a service that collects the continuously generated log data from all of your systems, the loss of a single packet every once in a while may not be much of an issue.  All the log collection service may be doing is receiving a packet, writing it to disk, receiving a packet, writing it out to disk, etc.  If the destination computer told the source to please repeat that, the source would be well beyond the errant log record and searching for it may result in the loss of other messages.  So, UDP, even with its lack of a guarantee, can be very useful.

The ICMP protocol

ICMP is basically a network health testing protocol.  The most common function that uses the ICMP protocol is “ping”, a function whose purpose is to see if a test message can reach an intended destination and a response can be returned successfully.  No data exchange is necessary.  It is just sending “Are you there?” out and receiving “yes” in with timestamps to see how long the roundtrip takes.  A more sophisticated ICMP function that builds on “ping” is “traceroute” which collects the identity of all of the routing devices along the way to the destination to see the path that is taken.  This is a good function to find network loops and other inefficiencies.

Contact
  • UCT Support Center

    Phone: 281-283-2828
    Email: supportcenter@uhcl.edu

    Bayou 2300
    2700 Bay Area Blvd.
    Houston, TX 77058-1002

    Office hours:
    Monday - Thursday, 8am - 10:30pm
    Friday - Saturday, 8am - 5pm
    Sunday, Closed