By Jeff Silverman, jeffsilverm at gmail dot com
I am doing this work as a classroom project for The University of Washington Python Programming class. Most of the material is in github.com. To get it, give the following commands:
[ps37854]$ mkdir dpkt_doc [ps37854]$ cd dpkt_doc [ps37854]$ git init . Initialized empty Git repository in /home/jeffsilverm/commercialventvac/dpkt_doc/.git/ [ps37854]$ git pull git://github.com/jeffsilverm/dpkt_doc.git remote: Counting objects: 3, done. remote: Compressing objects: 100% (2/2), done. remote: Total 3 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (3/3), done. From git://github.com/jeffsilverm/dpkt_doc * branch HEAD -> FETCH_HEAD [ps37854]$
dpkt is an ethernet packet decoding module. It was written by
Dugsong.
Fundemental to understanding how dpkt works is the fact that it
decodes single network packets. This has two consequences:
For example, if you are doing an HTTP GET operation, then probably the entire header will fit into 1500 bytes, which is the default message transfer unit (MTU) size of an ethernet and most modern wide area networks as well. However, if you are doing an HTTP POST operation with a lot of data moving from the client browser to the web server, dpkt will be able to parse many of the headers but not all of them.
These problems I have tried to solve with software that sits on top of
dpkt's low level interfaces. For example,
decode_tcp_iterator_2.py implements a crude TCP stack on top of dpkt.
jeffs@heavy:/usr$ find . -name "*dpkt*" -print
./share/doc/python-dpkt
./share/pyshared/dpkt
./share/pyshared/dpkt/dpkt.py
./share/pyshared/dpkt-1.6.egg-info
./share/python-support/python-dpkt.public
./lib/pymodules/python2.6/dpkt
./lib/pymodules/python2.6/dpkt/dpkt.py
./lib/pymodules/python2.6/dpkt/dpkt.pyc
./lib/pymodules/python2.6/dpkt-1.6.egg-info
ls /usr/share/pyshared/dpkt
ah.py dpkt.py icmp.py ntp.py rip.py stp.py
ah.pyc dpkt.pyc icmp.pyc ntp.pyc rip.pyc stp.pyc
aim.py dtp.py igmp.py ospf.py rpc.py stun.py
aim.pyc dtp.pyc igmp.pyc ospf.pyc rpc.pyc stun.pyc
arp.py esp.py __init__.py pcap.py rtp.py tcp.py
arp.pyc esp.pyc __init__.pyc pcap.pyc rtp.pyc tcp.pyc
asn1.py ethernet.py ip6.py pim.py rx.py telnet.py
asn1.pyc ethernet.pyc ip6.pyc pim.pyc rx.pyc telnet.pyc
bgp.py gre.py ip.py pmap.py sccp.py tftp.py
bgp.pyc gre.pyc ip.pyc pmap.pyc sccp.pyc tftp.pyc
cdp.py gzip.py ipx.py pppoe.py sctp.py tns.py
cdp.pyc gzip.pyc ipx.pyc pppoe.pyc sctp.pyc tns.pyc
crc32c.py h225.py loopback.py ppp.py sip.py tpkt.py
crc32c.pyc h225.pyc loopback.pyc ppp.pyc sip.pyc tpkt.pyc
dhcp.py hsrp.py mrt.py qq.py sll.py udp.py
dhcp.pyc hsrp.pyc mrt.pyc qq.pyc sll.pyc udp.pyc
diameter.py http.py netbios.py radius.py smb.py vrrp.py
diameter.pyc http.pyc netbios.pyc radius.pyc smb.pyc vrrp.pyc
dns.py icmp6.py netflow.py rfb.py ssl.py yahoo.py
dns.pyc icmp6.pyc netflow.pyc rfb.pyc ssl.pyc yahoo.pyc
This decodes an IPSEC authentication header. Insofar as I can
tell, dpkt.ah will only work for IPSEC over IPv4.
AOL instant messenger
Address resolution protocol. If the ethernet packet has type ETH_TYPE_ARP, then an dpkt.ethernet.Ethernet object will have an arp attribute. Refer to decode_arp.py.
The length of the hardware address. For Ethernet, this is 6
bytes.
The operation. 1=request, 2=reply
The protocol address length. For IPv4, this is 4.
The upper layer protocol for which the ARP request is intended. For
IPv4, this has the value 0x0800. The permitted values share a
numbering space with those for ethertype.
The source hardware address (SHA). This should be the same as
the
source ethernet address but doesn't have to be. If the SHA and
the ethernet source address are different, then you might suspect
somebody trying to poison your arp cache. However, there are
legitimate reasons why they might be different, most of which are
anacronisms these days. Just because the source ethernet address
and the SHA are the same, doesn't mean that nobody is trying to poison
your arp cache.
The source protocol address. This will usually be an IPv4
address. You will never see an IPv6 address here, because IPv6
uses neighbor discovery protocol.
The target hardware address. In an ARP request, this will be
0. In an ARP reply, it should be the same as the Ethernet source
address, but it doesn't have to be. You don't see this very often
any more, but there used to be ARP servers which would supply ARP
replies for hosts too dumb to provide their own ARP replies. You
can also use this field to poison an ARP cache.
The target protocol address. This is the address that ARP is going to translate into a MAC address. The IPv4 hosts in the network listen for ARP requests, which are sent to the broadcast Ethernet address. When a host sees its own IP address in the tpa field, it knows it has to reply with an ARP reply.
In general, DNS runs on top of UDP. DNS can run on top of TCP
and will do so if the query is large and also if there is a zone
transfer. Many of the values in these fields come from IANA.
Insofar as I can tell, dpkt doesn't provide the AD bit. The AD
bit is set if this response is authoritative according to the policies
of the server. A caching name server is generally not
authoritative.
Name servers return answers that have character values outside of
the range 33 to 126 decimal inclusive. I don't know why.
Both dig and wireshark are able to decipher the strings properly.
A list of answers. Each answer is an RR record. For a
DNS query, this is an empty list. To get the results from the DNS
query, you have to iterate over the list and decode each RR record.
Class. For modern resolvers, this should always be 1.
There were other classes defined in RFC 1035, but nobody uses them
anymore.
This is the canonical name of the thing being looked up. The
response will be applicable to this cname.
The name that was looked up, before translation to a canonical name,
if any.
This is either an Authority Record or an Additional Record.
qr is 0 if this message is a query and 1 if this message is a response.
The ID field is set by the DNS resolver when it makes a query.
The ID field is also set by the nameserver when it responds to the
query. That way, the resolver knows which query the response is
in response to.
A list of name servers for this domain. To decode this list,
iterate over each entry in the list and decode as in dpt.dns.an
The opcode is 0 a standard query (QUERY), 1 an inverse query
(IQUERY) (obsoleted by RFC 3425), 2 a server status
request (STATUS), 3 is unassigned, 4 is Notify (RFC 1996), 5 is
update, defined by RFC 2136
qr is 0 if this message is a query and 1 if this message is a
response.
Contains the data payload of the ethernet packet.
Contains the destination address of the ethernet packet as a 6 byte strings.
6 Byte Ethernet addresses can be converted to strings in format nn:nn:nn:nn:nn:nn with the code:
def add_colons_to_mac( mac_addr ) :
"""This function accepts a 12 hex digit string and converts it to a colon separated string"""
s = list()
for i in range(12/2) : # mac_addr should always be 12 chars, we work in groups of 2 chars
s.append( mac_addr[i*2:i*2+2] )
r = ":".join(s) # I know this looks strange, refer to http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange
return r
add_colons_to_mac( binascii.hexlify(arp.sha) )
Returns a class which is something from the Ethernet Type field
(Pdb) print eth._typesw.keys()
[2048, 8192, 34916, 2054, 34827, 33079, 8196, 34525]
(Pdb) print eth._typesw.values()
[<class 'dpkt.ip.IP'>, <class 'dpkt.cdp.CDP'>, <class
'dpkt.pppoe.PPPoE'>, <class 'dpkt.arp.ARP'>, <class
'dpkt.ppp.PPP'>, <class 'dpkt.ipx.IPX'>, <class
'dpkt.dtp.DTP'>, <class 'dpkt.ip6.IP6'>]
(Pdb) print eth.get_type(2048)
<class 'dpkt.ip.IP'>
(Pdb) print eth.get_type(34525)
<class 'dpkt.ip6.IP6'>
(Pdb)
Another way to do the same thing is:
import dpkt
eth = dpkt.ethernet.Ethernet()
print eth._typesw[2048]
<class 'dpkt.ip.IP'>
print eth._typesw[34525]
<class 'dpkt.ip6.IP6'>
print eth._typesw[33079]
<class 'dpkt.ipx.IPX'>
Contains the source address of the ethernet packet as a 6 byte
string. To decode to ASCII, see dst,
above.
Returns the Ethernet type. For example, type 2048 (0x0800) is
IPv4 and 34525 (0x86DD) is IPv6. For a complete list of Ethernet
types, refer to http://www.iana.org/assignments/ethernet-numbers
To get a list of ethernet types that are supported by dpkt, refer to
the code at get_type.
This is spanning tree protocol.
Objects for decoding HTTP. The message going from the client
browser to the server is the request, the message going from the server
to the client browser is the response.
Create a dpkt.http.Reply object from the received string. The
received string probably will include the body of the response and may
be quite large. Use a module such as decode_tcp_iterator_2 to
combine the payloads of several packets into a single received string.
This is the numeric code that the server returns to the browser
which describes the outcome of the request. Values in the range
of 1xx are continuation, values in the range of 2xx are success, values
in the range of 3xx are relocations, values in the range of 4xx are
errors from the client, values in the range of 5xx are errors in the
server. The status is just 3 digits, there is an explanation of
the error that is returned in dpkt.htt.Reply.reason. The full
list of status codes is given in RFC
2616 section 10.
The reason is a brief explanation of the status code.
According to RFC
2616 section 6.1.1 the reason is for human use only and has no
significance to the protocol.
The body is the payload of the HTTP response. Its type is
given by value of content-type header, which is a MIME type as
described in RFC 2045,
RFC 2046, and RFC 2047. The
list of MIME types is maintained by the
Internet Assigned Numbers Authority (IANA).
headers is a dictionary of the headers of the reply. The keys
are the headers that are present, the values are the values of those
headers. The list of allowed headers is given by RFC 2616
section 14.
Create a dpkt.http.Request object from the received string.
If the HTTP request method is POST, then this attribute will contain
the input that is passed to the server. If the method is GET,
then this attribute will be an empty string.
This is a dictionary. The keys of the dictionary are the
header fields that are present and the values of the dictionary are the
values of the field. For example:
(Pdb) print http_req.headers
{'host': 'www.kame.net', 'connection': 'Keep-Alive', 'accept': '*/*', 'user-agent': 'Wget/1.12 (linux-gnu)'}
(Pdb)
When doing an HTTP request, the client browser refers to the object
with a method. The most common methods are GET and PUT. The
difference between GET and PUT has to do with how arguments are passed
from the client to the server. With a GET, the arguments are
passed in the URI separated by & characters. With a PUT, the
values are passed as key value pairs in the body of the request.
Refer to RFC
2616 section 9.1.2 for a discussion of methods and idempotence.
For a list of valid methods, refer to RFC
2616 section 5.1.
The URI (Uniform Resource Identifier) the the part of the URL
(Uniform Resource Locator) which comes after the hostname. So,
for example, the URI of the URL http://www.jeffsilverman.ddns.net/dpkt.html
is /dpkt.html. The URI of the URL http://www.jeffsilverman.ddns.net
is /
The version of HTTP. Valid values (as of this writing) are
0.9, 1.0, and 1.1
'data', 'dst', 'get_proto', 'hl', 'id', 'len', 'off', 'opts', 'p', 'pack', 'pack_hdr', 'set_proto', 'src', 'sum', 'tcp', 'tos', 'ttl', 'unpack', 'v', 'v_hl'
This constructor can be used to create a packet.
from dpkt.udp import UDP
from dpkt.ip import IP
import socket
udp = UDP(data="testing")
src=socket.inet_aton("127.0.0.1")
dst=socket.inet_aton("67.205.52.141")
ip = IP(src=src, dst=dst, data=udp )
ip
IP(src='\x7f\x00\x00\x01', dst='C\xcd4\x8d', data=UDP(data='testing'))
This is the payload of the IP packet
The destination IPv4 address of the packet. You can convert
the destination IP address to an ASCII string in dotted quad format
using the socket package:
import socket
dst_ip_addr_str = socket.inet_ntoa(ip.dst)
Internet Header Length is the length of the internet header in 32 bit words, and thus points to the beginning of the data. Note that the minimum value for a correct header is 5.
This is the payload of the packet. Depending on ip6.nxt, this
will be UDP, TCP, or similar. dpkt magically casts this into the
proper datatype.
This is the destination IPv6 address, 128 bits. To create a
packed IPv6 address from an ASCII string:
import socket
dst_addr = socket.inet_pton(socket.AF_INET6, "2001:1938:26f:1:204:4bff:0:1")
IPv6 defines a optimization called a "flow". If a router sees
a packet with a non-zero flow for the first time, it makes its routing
decision and stores that decision in a fast hash table. Then when
subsequent packets come by with the same flow, the router can makes its
routing decision faster. The flow is initialized by a random
number generator on the host that is originating the connection.
It can be left 0.
The hop limit. Each time the packet hops from router to
router, this field is decremented by 1. When the hlim reaches 0,
the packet is discarded, and an ICMP6 type 3 packet is sent to the
sender. tracepath6 uses this field to probe the path to a
destination. It sends a packet with a small hlim. When the
router decrements the hlim to zero and sends back the ICMPv6 packet to
the sender, tracepath6 records the source address of the ICMPv6 packet
and knows the IP address of the router.
The next header type. If there is no next header, then the
protocol of the next level in the stack. Typical values are 6 for
TCP, 17 for UDP, 58 for ICMPv6, 132 for SCTP. For a list of
protocols, see http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xml
The payload length, not counting the IPv6 header. This is a 16
bit unsigned number.
The source IPv6 address, which is 128 bits long. To decode the
IPv6 address into an ASCII string comprehensible by humans, use the
socket.inet_ntop method:
import socket
import dpkt
eth = dpkt.ethernet.Ethernet(buf)
ip = eth.data
dst_ip_addr_str = socket.inet_ntop(AF_INET6, ip.dst)
print dst_ip_addr_str
This will give an output that looks like: 2001:1938:26f:1:204:4bff:0:1
This is the version of IP, which must be 6. If the ethernet
type is 34525 and this is not 6, then throw an exception because
something is wrong.
dpkt.pcap.Reader(f) implements an iterator. Each iteration
returns a tuple which is a timestamp and a buffer. The timestamp
contains a time as a floating point number. The buffer is a
complete packet. For example:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import dpkt
import sys
f = open(sys.argv[1])
pcap = dpkt.pcap.Reader(f)
frame_counter = 0
for ts, buf in pcap:
frame_counter += 1
if frame_counter > 1 :
print "%d: %f %f" % ( frame_counter, ts, ts - last_time )
last_time = ts
f.close()
TCP is a reliable, stream oriented protocol. Refer to RFC 793 for more information.
This is the last byte that the receiver has received. The
sender then knows that it need not resend any bytes prior to this point.
The payload of this TCP segment. Note that the seqments may be
delivered out of order, so it is not sufficient to simply join the
payloads together. The maximum length of a TCP segment payload is
the MTU size of this network less the length of the IP header less the
length of a TCP header with no options. For example, Ethernet has
an MTU of 1500 bytes (Most modern networks use this size to avoid
fragmentation on all links - IPv6 the smallest MTU must be at least
1200 bytes but may be longer). The IPv4 header is 20 bytes,
minimum. An IPv6 header is 40 bytes. The TCP header is 20
bytes, so the maximum length of a TCP segment is 1460 bytes for IPv4
and 1440 bytes for IPv6.
The destination port of the packet, a 16 bit unsigned number.
For a list of well known destination ports, refer to http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers.
If
the SYN flag is set and the ACK flag is cleared, then this packet is
the beginning of a connection and you may use the dport to determine
what service is being used and how to decode the conversation.
The TCP flags. To decode them, use the following code:
fin_flag = ( tcp.flags & dpkt.tcp.TH_FIN ) != 0
syn_flag = ( tcp.flags & dpkt.tcp.TH_SYN ) != 0
rst_flag = ( tcp.flags & dpkt.tcp.TH_RST ) != 0
psh_flag = ( tcp.flags & dpkt.tcp.TH_PUSH) != 0
ack_flag = ( tcp.flags & dpkt.tcp.TH_ACK ) != 0
urg_flag = ( tcp.flags & dpkt.tcp.TH_URG ) != 0
ece_flag = ( tcp.flags & dpkt.tcp.TH_ECE ) != 0
cwr_flag = ( tcp.flags & dpkt.tcp.TH_CWR ) != 0
4 bits. Specifies the size of the TCP header in 32 bit longwords. The minimum size of a TCP header is 20 bytes (5 longwords) and the maximum is 60 bytes (15 longwords). This field gets its name because it is the offset from the beginning of the header to the data.
This method parses the TCP options field into a list of (option
number, option value) tuples. Use the code sequence
option_list = dpkt.tcp.parse_opts ( tcp.opts )
to decode the options. For a fully worked, example, see the
get_message_segment_size method in decode_tcp_iterator_2.py.
A 32 bit unsigned number. If the SYN flag is set, then this is
the initial sequence number, and all subsequent sequence numbers are
relative to this number. If the SYN flag is clear, then this
number is the location of the payload of this sequence in the data
stream. Note that segment can be delivered out of order, so the
sequence number is used to get the bytes back in order.
The source port, a 16 bit unsigned number. On the initial
connection, this will be an ephemeral port in the range 49152 through
6553 for most modern operating systems.
A 16 bit unsigned number. Specifies how far past the current sequence number in this acknowlegement the receiver is willing to receive.
There are a lot of classes that inherit from dpkt.Packet. I
found these with the command egrep "class.*dpkt.Packet" *.py
ah.py:class AH(dpkt.Packet):
aim.py:class FLAP(dpkt.Packet):
aim.py:class SNAC(dpkt.Packet):
arp.py:class ARP(dpkt.Packet):
bgp.py:class BGP(dpkt.Packet):
bgp.py: class Open(dpkt.Packet):
bgp.py: class Parameter(dpkt.Packet):
bgp.py: class Authentication(dpkt.Packet):
bgp.py: class Capability(dpkt.Packet):
bgp.py: class Update(dpkt.Packet):
bgp.py: class Attribute(dpkt.Packet):
bgp.py: class Origin(dpkt.Packet):
bgp.py: class ASPath(dpkt.Packet):
bgp.py: class ASPathSegment(dpkt.Packet):
bgp.py: class NextHop(dpkt.Packet):
bgp.py: class MultiExitDisc(dpkt.Packet):
bgp.py: class LocalPref(dpkt.Packet):
bgp.py: class AtomicAggregate(dpkt.Packet):
bgp.py: class Aggregator(dpkt.Packet):
bgp.py: class Communities(dpkt.Packet):
bgp.py: class Community(dpkt.Packet):
bgp.py: class ReservedCommunity(dpkt.Packet):
bgp.py: class OriginatorID(dpkt.Packet):
bgp.py: class ClusterList(dpkt.Packet):
bgp.py: class MPReachNLRI(dpkt.Packet):
bgp.py: class MPUnreachNLRI(dpkt.Packet):
bgp.py: class Notification(dpkt.Packet):
bgp.py: class Keepalive(dpkt.Packet):
bgp.py: class RouteRefresh(dpkt.Packet):
bgp.py:class RouteGeneric(dpkt.Packet):
bgp.py:class RouteIPV4(dpkt.Packet):
bgp.py:class RouteIPV6(dpkt.Packet):
cdp.py:class CDP(dpkt.Packet):
cdp.py: class Address(dpkt.Packet):
cdp.py: class TLV(dpkt.Packet):
dhcp.py:class DHCP(dpkt.Packet):
diameter.py:class Diameter(dpkt.Packet):
diameter.py:class AVP(dpkt.Packet):
dns.py:class DNS(dpkt.Packet):
dns.py: class Q(dpkt.Packet):
dtp.py:class DTP(dpkt.Packet):
esp.py:class ESP(dpkt.Packet):
ethernet.py:class Ethernet(dpkt.Packet):
gre.py:class GRE(dpkt.Packet):
gre.py: class SRE(dpkt.Packet):
gzip.py:class GzipExtra(dpkt.Packet):
gzip.py:class Gzip(dpkt.Packet):
h225.py:class H225(dpkt.Packet):
h225.py: class IE(dpkt.Packet):
hsrp.py:class HSRP(dpkt.Packet):
http.py:class Message(dpkt.Packet):
icmp6.py:class ICMP6(dpkt.Packet):
icmp6.py: class Error(dpkt.Packet):
icmp6.py: class Echo(dpkt.Packet):
icmp.py:class ICMP(dpkt.Packet):
icmp.py: class Echo(dpkt.Packet):
icmp.py: class Quote(dpkt.Packet):
igmp.py:class IGMP(dpkt.Packet):
ip6.py:class IP6(dpkt.Packet):
ip.py:class IP(dpkt.Packet):
ipx.py:class IPX(dpkt.Packet):
loopback.py:class Loopback(dpkt.Packet):
mrt.py:class MRTHeader(dpkt.Packet):
mrt.py:class TableDump(dpkt.Packet):
mrt.py:class BGP4MPMessage(dpkt.Packet):
mrt.py:class BGP4MPMessage_32(dpkt.Packet):
netbios.py:class Session(dpkt.Packet):
netbios.py:class Datagram(dpkt.Packet):
netflow.py:class NetflowBase(dpkt.Packet):
netflow.py: class NetflowRecordBase(dpkt.Packet):
ntp.py:class NTP(dpkt.Packet):
ospf.py:class OSPF(dpkt.Packet):
pcap.py:class PktHdr(dpkt.Packet):
pcap.py:class FileHdr(dpkt.Packet):
pim.py:class PIM(dpkt.Packet):
pmap.py:class Pmap(dpkt.Packet):
pppoe.py:class PPPoE(dpkt.Packet):
ppp.py:class PPP(dpkt.Packet):
radius.py:class RADIUS(dpkt.Packet):
rfb.py:class RFB(dpkt.Packet):
rfb.py:class SetPixelFormat(dpkt.Packet):
rfb.py:class SetEncodings(dpkt.Packet):
rfb.py:class FramebufferUpdateRequest(dpkt.Packet):
rfb.py:class KeyEvent(dpkt.Packet):
rfb.py:class PointerEvent(dpkt.Packet):
rfb.py:class FramebufferUpdate(dpkt.Packet):
rfb.py:class SetColourMapEntries(dpkt.Packet):
rfb.py:class CutText(dpkt.Packet):
rip.py:class RIP(dpkt.Packet):
rip.py:class RTE(dpkt.Packet):
rip.py:class Auth(dpkt.Packet):
rpc.py:class RPC(dpkt.Packet):
rpc.py: class Auth(dpkt.Packet):
rpc.py: class Call(dpkt.Packet):
rpc.py: class Reply(dpkt.Packet):
rpc.py: class Accept(dpkt.Packet):
rpc.py: class Reject(dpkt.Packet)