Cisconinja’s Blog

Archive for the ‘QoS’ Category

Custom Queueing Byte Count Deficit

Posted by Andy on February 20, 2009

In older IOS versions, the custom queueing byte count deficit was not carried over to the next round robin pass, which could result in bandwidth sharing that was not proportional to the configured byte counts.  This example will demonstrate the behavior of custom queueing with the byte count deficit not being carried over in IOS 12.0(5), and then with the byte count deficit being carried over in IOS 12.4(18).  Since GTS and CB shaping do not support custom queueing on shaping queues, FRTS will be used.  The network topology and configurations are shown below:

topology1

R1:
interface Serial0/0
 no ip address
 encapsulation frame-relay
 load-interval 30
 no keepalive
!
interface Serial0/0.1 point-to-point
 ip address 10.1.12.1 255.255.255.0
 frame-relay interface-dlci 102
!
interface FastEthernet1/0
 ip address 10.1.1.1 255.255.255.0
 load-interval 30
 speed 100
 full-duplex
 no keepalive
 no mop enabled
!
no cdp run

R2:
interface Serial0/0
 no ip address
 encapsulation frame-relay
 load-interval 30
 no keepalive
!
interface Serial0/0.1 point-to-point
 ip address 10.1.12.2 255.255.255.0
 frame-relay interface-dlci 201
!
no cdp run

First we will use IOS 12.0(5) on R1.  PC will be used to generate 2 different types of traffic to UDP ports 1001 and 1002 (for more information on the method of generating traffic, see WFQ tests).  We will send both types of traffic at 8 packets/second with an L3 size of 996 bytes, which will give us 1000 bytes frames with the frame-relay header added.  This will result in 64,000 bps of each traffic type.  The first traffic type (sent to UDP port 1001) will be classified into custom queue 1 and the second traffic type will be classified into custom queue 2.  Queue 1 will be configured with a byte count of 1000 per round robin pass and queue 2 will be configured with a byte count of 1001.  Configuration and verification on R1:

R1:
queue-list 1 protocol ip 1 udp 1001
queue-list 1 protocol ip 2 udp 1002
queue-list 1 queue 1 byte-count 1000
queue-list 1 queue 2 byte-count 1001

cq-config

Next, we will enable FRTS on R1 with a CIR of 64,000 bps and apply the custom queueing configuration to the shaping queues:

R1:
interface Serial0/0
 frame-relay traffic-shaping
!
map-class frame-relay CQ-shaping-policy
 frame-relay cir 64000
 frame-relay custom-queue-list 1
!
interface Serial0/0.1 point-to-point
 frame-relay interface-dlci 102
  class CQ-shaping-policy

We will also configure R2 to measure incoming traffic of each type:

R2:
access-list 101 permit udp any any eq 1001
access-list 102 permit udp any any eq 1002
!
class-map match-all 1001
 match access-group 101
class-map match-all 1002
 match access-group 102
!
policy-map Traffic-Meter
 class 1001
 class 1002
!
interface Serial0/0
 service-policy input Traffic-Meter

Now we’re ready to start the 2 traffic streams:

flood.pl --port=1001 --size=996 --delay=125 10.1.12.2
flood.pl --port=1002 --size=996 --delay=125 10.1.12.2

On R1, we can see the FRTS and CQ information:

120-r1pvc

On R2 we can see the amount of each traffic type received:

120-r2pmap

Although the byte count allocated to queue 2 was only .1% more than queue 1 (1001/1000), it has sent exactly twice as many packets and bytes.  This is due to the byte count deficit not being carried over in 12.0(5).  The round robin cycle in 12.0(5) will go like this:

1. CQ takes a 1000-byte packet from queue 1.  The byte count for queue 1 has been met so CQ moves on to queue 2.

2. CQ takes a 1000-byte packet from queue 2.  The byte count for queue 2 has not been met (1000 < 1001), so CQ will take another packet from queue 2.

3. CQ takes another 1000-byte packet from queue 2.  The byte count for queue 2 has been met (2000 > 1001).  Since there are no other queues, return to Step #1 and service queue 1 again.

 

Now we will replace R1 with a router running IOS 12.4(18) and put the exact same configuration on it.  On R1, we can see the FRTS and CQ information as well as the packets enqueued in each of the CQ queues:

124-r1pvc

124-r1queue

On R2, the amount of each type of traffic received is shown below:

124-r2pmap

The traffic generator was left running for several hours to show how accurately the proportions of actual traffic sent match the byte counts in more modern IOS versions.  With equal sized packets in each queue, a byte count of 1000 configured for queue 1, and a byte count of 1001 configured for queue 2, queue 2 should be able to send 1 extra packet for every 1000 packets sent by queue 1.  We can see that after a little over 150,000 packets sent by queue 1, queue 2 has sent exactly 150 more packets than queue 1.  The dramatic difference in results is due to the byte count deficit being carried over.  The first 3 rounds of the round robin cycle in 12.4(18) will go like this:

1. CQ takes a 1000-byte packet from queue 1.  The byte count for queue 1 has been met so CQ moves on to queue 2.

2. CQ takes a 1000-byte packet from queue 2.  The byte count for queue 2 has not been met (1000 < 1001), so CQ will take another packet from queue 2.

3. CQ takes another 1000-byte packet from queue 2.  The byte count for queue 2 has been met and exceeded (2000 > 1001).  Subtract the excess bytes (999) from the configured byte count in the next round robin pass through queue 2 (1001 – 999 = 2).  Since there are no other queues, service queue 1 again.

4. CQ takes a 1000-byte packet from queue 1.  The byte count for queue 1 has been met so CQ moves on to queue 2.

5. CQ takes a 1000-byte packet from queue 2.  The byte count for queue 2 has been met and exceeded (1000 > 2).  Subtract the excess bytes (998) from the configured byte count in the next round robin pass through queue 2 (1001 – 998 = 3).  Since there are no other queues, service queue 1 again.

6. CQ takes a 1000-byte packet from queue 1.  The byte count for queue 1 has been met so CQ moves on to queue 2.

7. CQ takes a 1000-byte packet from queue 2.  The byte count for queue 2 has been met and exceeded (1000 > 3).  Subtract the excess bytes (997) from the configured byte count in the next round robin pass through queue 2 (1001 – 997 = 4).  Since there are no other queues, service queue 1 again.

As you can see, with the deficit being carried over, queue 2 will only be allowed to send 2 packets in a single pass once every 1000 rounds, rather than every single pass like in older IOS versions.

Posted in QoS | Leave a Comment »

CBWFQ, Routing Protocols, and max-reserved-bandwidth

Posted by Andy on February 18, 2009

Numerous sources, including Cisco documentation, often say that the percentage of bandwidth excluded from max-reserved-bandwidth (25% by default) is reserved for either link queues (routing updates, keepalives, etc.), unclassified best-effort traffic (matched by class-default), or both.  The 12.4 Mainline command reference for max-reserved-bandwidth says:

The sum of all bandwidth allocation on an interface should not exceed 75 percent of the available bandwidth on an interface. The remaining 25 percent of bandwidth is used for overhead, including Layer 2 overhead, control traffic, and best-effort traffic.

As can be seen in the previous CBWFQ tests that were performed, this definitely does not hold true for best-effort traffic that is put into dynamic conversations.  What about routing updates and other important traffic that ends up in the link queues?  To test this out, we will use the same simple topology shown below in IOS 12.4(18):

topology

R1:
interface FastEthernet0/0
 ip address 10.1.1.1 255.255.255.0
 load-interval 30
 speed 100
 full-duplex
 no keepalive
 no mop enabled
!
interface Serial0/0
 ip address 10.1.12.1 255.255.255.0
 load-interval 30
 no keepalive
!
no cdp run

R2:
interface Serial0/0
 ip address 10.1.12.2 255.255.255.0
 load-interval 30
 no keepalive
!
no cdp run

Next we will configure EIGRP on R1 and R2 and decrease the hello and hold timers to give us a little bit more traffic:

R1:
router eigrp 1
 network 10.1.12.1 0.0.0.0
 no auto-summary
!
interface Serial0/0
 ip hello-interval eigrp 1 1
 ip hold-time eigrp 1 3

R2:
router eigrp 1
 network 10.1.12.2 0.0.0.0
 no auto-summary
!
interface Serial0/0
 ip hello-interval eigrp 1 1
 ip hold-time eigrp 1 3

Next we will configure R2 to measure incoming traffic.  A TFTP flow will be used to create congestion, so we will create 3 different classes to measure TFTP, EIGRP hellos, and EIGRP updates (we will see later on why it was a good idea to measure EIGRP hellos and updates separately):

R2:
ip access-list extended EIGRP-Hello
 permit eigrp any host 224.0.0.10
ip access-list extended EIGRP-Update
 permit eigrp any host 10.1.12.2
ip access-list extended TFTP
 permit udp any any eq tftp
!
class-map match-all EIGRP-Hello
 match access-group name EIGRP-Hello
class-map match-all EIGRP-Update
 match access-group name EIGRP-Update
class-map match-all TFTP
 match access-group name TFTP
!
policy-map Traffic-Meter
 class EIGRP-Hello
 class EIGRP-Update
 class TFTP
!
interface Serial0/0
 service-policy input Traffic-Meter

Now let’s shutdown and re-enable R2’s S0/0 interface and examine how the EIGRP adjacency forms in the absence of congestion:

R2:
interface Serial0/0
 shutdown

A few seconds later…

R2:
interface Serial0/0
 no shutdown

wireshark1

cbwfq2-1-r2pmap

We can see that hello packets are being sent every 1 second in each direction.  When the adjacency forms, 3 update packets are exchanged in each direction followed by an acknowledgement from R1 to R2.  Each of the hello packets has size 64 bytes, which is confirmed in both Wireshark and the policy-map on R2.  Each of the update and acknowledgement packets has size 44 bytes.  Therefore we can expect that EIGRP traffic will use about 512 bps from hellos (8 * 64) plus a small amount of additional bandwidth from updates and acknowledgements at the start.  Next we will configure CBWFQ on R1 and shape traffic to 32 kbps:

R1:
ip access-list extended TFTP
 permit udp any any eq tftp
!
class-map match-all TFTP
 match access-group name TFTP
!
policy-map CBWFQ
 class TFTP
  bandwidth percent 75
 class class-default
  fair-queue 4096
policy-map Shaper
 class class-default
  shape average 32000
  service-policy CBWFQ
!
interface Serial0/0
 service-policy output Shaper

TFTP has been given 75% bandwidth, and the max-reserved-bandwidth has not been changed from the default of 75%.  If the remaining 25% (8,000 bps) is actually used for link queues, EIGRP should have way more bandwidth than it needs.  Now we will generate 64 kbps of TFTP traffic, more than enough to saturate the link and cause CBWFQ to begin:

flood.pl --port=69 --size=996 --delay=125 10.1.12.2

Within a few seconds, the following log messages repeatedly show up:

R1:
*Mar 1 04:25:31.998: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.2 (Serial0/0) is down: Interface Goodbye received
*Mar 1 04:25:32.930: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.2 (Serial0/0) is up: new adjacency
*Mar 1 04:25:40.302: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.2 (Serial0/0) is down: Interface Goodbye received
*Mar 1 04:25:41.242: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.2 (Serial0/0) is up: new adjacency
*Mar 1 04:25:48.278: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.2 (Serial0/0) is down: Interface Goodbye received
*Mar 1 04:25:49.222: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.2 (Serial0/0) is up: new adjacency
*Mar 1 04:25:56.538: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.2 (Serial0/0) is down: Interface Goodbye received
*Mar 1 04:25:57.530: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.2 (Serial0/0) is up: new adjacency
*Mar 1 04:26:04.782: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.2 (Serial0/0) is down: Interface Goodbye received
*Mar 1 04:26:05.730: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.2 (Serial0/0) is up: new adjacency

R2:
*Mar 1 04:26:34.506: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is up: new adjacency
*Mar 1 04:26:37.506: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is down: holding time expired
*Mar 1 04:26:42.774: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is up: new adjacency
*Mar 1 04:26:45.770: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is down: holding time expired
*Mar 1 04:26:51.242: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is up: new adjacency
*Mar 1 04:26:54.238: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is down: holding time expired
*Mar 1 04:26:59.298: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is up: new adjacency
*Mar 1 04:27:02.298: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is down: holding time expired
*Mar 1 04:27:07.522: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is up: new adjacency
*Mar 1 04:27:10.518: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is down: holding time expired

On R2, we can see that the hold timer keeps expiring and approximately every 8 to 8.5 seconds the adjacency is reforming.  Next take a look at the queues on R1 and the traffic  received on R2:

cbwfq2-2-r1queue

cbwfq2-2-r2pmap

As we saw in previous CBWFQ tests, CBWFQ uses a constant that is based on the number of WFQ dynamic queues when calculating the weights for user defined conversations as shown in the following table:

Number of flows

Constant

16

64

32

64

64

57

128

30

256

16

512

8

1024

4

2048

2

4096

1

We configured WFQ to use 4096 dynamic conversations, which results in the smallest possible constant.  Using the formula to calculate weights for user defined conversations, TFTP is assigned a weight of:

1  * (100 / 75)   = 1.33

When rounded up this becomes 2 as shown in the show traffic-shape queue output.  We also see 2 other conversations with packets enqueued.  Both are IP protocol 88, so we know they are both EIGRP.  One of them has destination address 224.0.0.10 and size 64 (EIGRP hellos) and the other has destination address 10.1.12.2 and size 44 (EIGRP updates).  Interestingly, they have both been given very different weight values.  The conversation number for EIGRP hellos (4103) falls within the range of the link queues (N through N+7, where N is the number of WFQ queues) and the weight of 1024 is also the same as the weight for link queues.  The conversation number for EIGRP updates (137) however falls within the range of dynamic queues (0 through N-1), and we can see that it’s weight of 4626 is consistent with a dynamic conversation that has IP Precedence of 6 ((32384 / (6 + 1)).  Because the link queue’s weight of 1024 is 512 times larger than TFTP’s weight of 2, TFTP will be able to send 512 times as many bytes.  Looking at the byte count of received traffic on R2, we can see that the results match this (1,149,000 / 2,240 = 512.9).  This also explains why the EIGRP adjacency reformed every 8 to 8.5 seconds.  For every 1 hello that EIGRP is allowed to send on R1 (512 bits), TFTP is allowed to send 262,144 bits (512 * 512).  The total time required for this is ((512 + 262,144) / 32,000) or about 8.2 seconds.  This is somewhat of an extreme example since we configured the max amount of WFQ dynamic queues in order to minimize TFTP’s weight and also shaped to a very low rate, but the main point here is that the 25% unallocated bandwidth is not reserved for anything.  Any non-priority queue can consume as much of the bandwidth as it’s weight relative to the other queues allows it to.

 

For one final test, let’s decrease the number of WFQ dynamic queues to a value such that the link queue can send at least 512 bps for EIGRP hellos:

R1:
policy-map CBWFQ
 class class-default
  fair-queue 256

Using 256 dynamic queues, the weight of TFTP will be:

16 * (100 / 75)   = 21.33

IOS rounds this to 21:

cbwfq2-3-r1queue1

The share that each queue receives is inversely proportional to it’s weight.  One way of finding the share that EIGRP hellos will receive is:

1 / ((1024 / 21) + (1024 / 1024) + (1024 / 4626))   =  2%, or about 640 bps

This should be a little more than enough to allow a hello packet every second, and looking at the output above we can see that only 1 packet is in the queue.  However, now there is a new problem:

R2:
*Mar 1 05:45:01.462: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is down: retry limit exceeded
*Mar 1 05:45:01.478: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is up: new adjacency
*Mar 1 05:46:20.998: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is down: retry limit exceeded
*Mar 1 05:46:21.226: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is up: new adjacency
*Mar 1 05:47:40.746: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is down: retry limit exceeded
*Mar 1 05:47:41.242: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is up: new adjacency
*Mar 1 05:49:00.758: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is down: retry limit exceeded
*Mar 1 05:49:01.502: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is up: new adjacency
*Mar 1 05:50:21.014: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is down: retry limit exceeded
*Mar 1 05:50:21.226: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is up: new adjacency

Approximately every 1 minute and 20 seconds, we see ‘retry limit exceeded’ followed by the adjacency coming back up less than 1 second later.  A debug eigrp packets update shows what is happening:

R2:
*Mar  1 06:08:22.550: EIGRP: Sending UPDATE on Serial0/0 nbr 10.1.12.1, retry 14, RTO 5000
*Mar  1 06:08:22.550:   AS 1, Flags 0x1, Seq 1186/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/2
*Mar  1 06:08:27.554: EIGRP: Sending UPDATE on Serial0/0 nbr 10.1.12.1, retry 15, RTO 5000
*Mar  1 06:08:27.554:   AS 1, Flags 0x1, Seq 1186/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/2
*Mar  1 06:08:32.558: EIGRP: Sending UPDATE on Serial0/0 nbr 10.1.12.1, retry 16, RTO 5000
*Mar  1 06:08:32.558:   AS 1, Flags 0x1, Seq 1186/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/2
*Mar  1 06:08:37.566: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is down: retry limit exceeded
*Mar  1 06:08:37.758: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 10.1.12.1 (Serial0/0) is up: new adjacency

R2 does not receive an acknowledgement from R1 for it’s update so it retries a total of 16 times at 5 second intervals.  Remember that EIGRP updates and acknowledgments were assigned to a dynamic conversation and given a weight of 4,626 based on their IPP of 6 – as a result, they cannot be scheduled in time, and after the last retry R2 takes the adjacency down.  Since we manipulated the TFTP queue weight so that a link queue has just more than enough bandwidth to send an EIGRP hello every second, the adjacency comes back up less than a second later, resulting in a very strange overall behavior.

Posted in EIGRP, QoS | 2 Comments »

Rate-limit ACLs

Posted by Andy on February 16, 2009

In this post we will examine how rate-limit ACLs work with CAR.  The topology and method of generating traffic will be the same as I used for testing WFQ.  The topology and inital configurations are shown below:

car-acl-topology

R1:
interface FastEthernet0/0
 ip address 10.1.1.1 255.255.255.0
 load-interval 30
 speed 100
 full-duplex
 no keepalive
 no mop enabled
!
interface Serial0/0
 ip address 10.1.12.1 255.255.255.0
 load-interval 30
 no keepalive
!
no cdp run

R2:
interface Serial0/0
 ip address 10.1.12.2 255.255.255.0
 load-interval 30
 no keepalive
!
no cdp run

The point of this example will be to examine how rate-limit ACLs work, and in particular the mask feature, so we will need traffic with various IP precedence values.  This particular traffic generator does not allow ToS byte values to be specified, so we will mark the traffic inbound on R1 F0/0.  We will generate 8 different traffic streams to ports 1000 – 1007, and each type of traffic will be marked with IPP X, where X is the last digit in the port number.  With an interpacket delay of 125 ms and packet size of 1500 bytes, this will give us 96 kbps of each IPP value (1500 * 8 * 8).  R2 will be configured to measure incoming traffic.  Let’s configure and verify this before setting up the rate-limit ACL:

R1:
access-list 100 permit udp any any eq 1000
access-list 101 permit udp any any eq 1001
access-list 102 permit udp any any eq 1002
access-list 103 permit udp any any eq 1003
access-list 104 permit udp any any eq 1004
access-list 105 permit udp any any eq 1005
access-list 106 permit udp any any eq 1006
access-list 107 permit udp any any eq 1007
!
class-map match-all Prec0
 match access-group 100
class-map match-all Prec1
 match access-group 101
class-map match-all Prec2
 match access-group 102
class-map match-all Prec3
 match access-group 103
class-map match-all Prec4
 match access-group 104
class-map match-all Prec5
 match access-group 105
class-map match-all Prec6
 match access-group 106
class-map match-all Prec7
 match access-group 107
!
policy-map Marker
 class Prec0
  set precedence 0
 class Prec1
  set precedence 1
 class Prec2
  set precedence 2
 class Prec3
  set precedence 3
 class Prec4
  set precedence 4
 class Prec5
  set precedence 5
 class Prec6
  set precedence 6
 class Prec7
  set precedence 7
!
interface FastEthernet0/0
 service-policy input Marker

R2:
class-map match-all Prec0
 match precedence 0
class-map match-all Prec1
 match precedence 1
class-map match-all Prec2
 match precedence 2
class-map match-all Prec3
 match precedence 3
class-map match-all Prec4
 match precedence 4
class-map match-all Prec5
 match precedence 5
class-map match-all Prec6
 match precedence 6
class-map match-all Prec7
 match precedence 7
!
policy-map Traffic-Meter
 class Prec0
 class Prec1
 class Prec2
 class Prec3
 class Prec4
 class Prec5
 class Prec6
 class Prec7
!
interface Serial0/0
 service-policy input Traffic-Meter

flood.pl --port=1000 --size=1496 --delay=125 10.1.12.2
flood.pl --port=1001 --size=1496 --delay=125 10.1.12.2
flood.pl --port=1002 --size=1496 --delay=125 10.1.12.2
flood.pl --port=1003 --size=1496 --delay=125 10.1.12.2
flood.pl --port=1004 --size=1496 --delay=125 10.1.12.2
flood.pl --port=1005 --size=1496 --delay=125 10.1.12.2
flood.pl --port=1006 --size=1496 --delay=125 10.1.12.2
flood.pl --port=1007 --size=1496 --delay=125 10.1.12.2

car-acl-1-r1f0

car-acl-1-r1s0

car-acl-1-r2s0

car-acl-1-r1pmap

car-acl-1-r2pmap

We can see that the input rate on R1 F0/0, output rate on R1 S0/0, and input rate on R2 S0/0 roughly matches the combined bandwidth of the 8 traffic streams.  We can also see that the traffic is being marked with the IPP values we specified on R1 and that R2 is receiving approximately 96 kbps of each type.  Now we can move onto configuring the rate-limit ACL.

Rate-limit ACLs, when used with the mask option, allow a 1-byte mask value to be entered.  Each position in the mask corresponds to an IPP value (MPLS EXP values work the same way), with IPP 7 being the left most value and IPP 0 the right most value.  The values that should be matched are set to 1 in their respective positions and the resulting mask is entered as a hexadecimal value.  Let’s say that we want to limit all IPP 0-4 traffic to a combined rate of 128 kbps.  The mask to match these values will be 00011111, which is hexadecimal 0x1F.  The configuration for this is:

R1:
access-list rate-limit 0 mask 1F
!
interface Serial0/0
 rate-limit output access-group rate-limit 0 128000 8000 8000 conform-action transmit exceed-action drop

On R1, we can see that policing is taking place:

car-acl-2-r1car1

On R2, we can verify the amount of each IP Precedence value received:

car-acl-2-r2pmap

The combined 30 second offered rates for IPP values 0-4 equal roughly 128 kbps, while the other IPP values continued to send 96 kbps each.  This verifies that we have used the mask value correctly.

Posted in ACL, QoS | Leave a Comment »

Classification, Marking, and Policing Order of Operations

Posted by Andy on February 3, 2009

Continuing from my previous policing tests, this post will take a look at another difference between class-based policing and CAR.  R1 will be configured to police traffic and R2 will be configured to measure incoming traffic.  The network topology and configurations are shown below:

policing2-topology

R1:
interface FastEthernet0/0
 ip address 10.1.1.1 255.255.255.0
 load-interval 30
 speed 100
 full-duplex
 no keepalive
 no mop enabled
!
interface Serial0/0
 ip address 10.1.12.1 255.255.255.0
 load-interval 30
 no keepalive
!
no cdp run

R2:
interface Serial0/0
 ip address 10.1.12.2 255.255.255.0
 load-interval 30
 no keepalive
!
no cdp run

R1 will be configured with a hierarchical class-based policing policy in order to test how the order of marking and policing on the parent and child policy maps take place.  As we saw in the last set of policing tests, the policing on the parent policy takes place first and the results are offered to the child policy.  The way that the marking takes place is also important to know because it can lead to a very different end result than what is expected.  Consider the following configuration:

R1:
class-map match-all Prec0
 match precedence 0
class-map match-all Prec2
 match precedence 2
!
policy-map child-policer
 class Prec2
  police cir 96000 bc 6000
   conform-action set-prec-transmit 3
   exceed-action drop
 class Prec0
  police cir 96000 bc 6000
   conform-action set-prec-transmit 1
   exceed-action drop
policy-map policer
 class class-default
  police cir 192000 bc 12000
   conform-action set-prec-transmit 2
   exceed-action drop
 service-policy child-policer
!
interface Serial0/0
 service-policy output policer

R2:
class-map match-all Prec1
 match precedence 1
class-map match-all Prec2
 match precedence 2
class-map match-all Prec3
 match precedence 3
!
policy-map Prec-Traffic-Meter
 class Prec3
 class Prec2
 class Prec1
!
interface Serial0/0
 service-policy input Prec-Traffic-Meter

Policy map ‘policer’ has been configured to police traffic to 192,000 bps, set the IP precedence to 2, and offer the results to ‘child-policer’.  All other exceeding traffic is dropped.  Policy-map ‘child-policer’ has been configured with 2 class maps, one to match IP precedence 2 traffic and one to match IP precedence 0 traffic.  The policer on the ‘Prec2’ class specifies that conforming traffic should be remarked as IP precedence 3 and transmitted, while the policer on the ‘Prec0’ class specifies that conforming traffic should be remarked as IP precedence 1 and transmitted.  R2 has been configured to measure incoming traffic by precedence value.  We will have PC generate 32 packets/second at 1496 bytes each for a total offered rate of approximately 384,000 bps on the serial link (1500 * 8 * 32):

flood.pl --port=53 --size=1496 --delay=31 10.1.12.2

show interfaces verifies that approximately 384,000 bps is being received on the FastEthernet interface and approximately 96,000 bps is being sent across the serial link: 

1-policing2-r1f0

1-policing2-r1s0

1-policing2-r2s0

Next look at the policy-map statistics on R1:

1-policing2-r1-pmap

The policing statistics on the parent policy look as expected – the offered rate is close to 384,000 bps and 192,000 bps has conformed, with the rest exceeding and being dropped.  The child policy policing statistics, however, may be a surprise.  Even though the parent policing takes place first and we configured the conform action to set IP precedence to 2 before transmitting, no packets have matched the class ‘Prec2’ on the child policy.  The class ‘Prec0’ on the other hand has matched as many packets as the parent policy has and also shows an offered rate close to 384,000 bps.  It also shows that 96,000 bps have conformed and 96,000 bps exceeded, which matches the rate being offered after policing on the parent policy has taken place.  R2 also confirms that all packets being sent are IP precedence 1:

1-policing2-r2-pmap

Based on this information, we can conclude that:

1. Classification into both the parent and child classes happens before policing occurs.  If it were the other way around, we would only see about 192,000 bps as the offered rate to the class ‘Prec0’.

2. Policing on the parent happens before policing on the child.  This was shown in earlier tests as well, and again we can see it is true because the conforming + exceeding packet counter on the child class ‘Prec0’ equals the conforming packet counter on the parent.

3. Classification into both the parent and child classes happens before any marking from the policers occur.  This is shown by the fact that all traffic on the child policy is being classified into ‘Prec0’, which is the default IP precedence of the traffic stream before any marking has occured, rather than class ‘Prec2’ which it would have been if the parent policer had marked it before offering it to the child.

 

Next, we will modify the configuration slightly by changing the conform action on the child policy class ‘Prec0’.  Instead of marking conforming traffic as precedence 1 and transmitting, it will transmit conforming traffic unchanged.  The rest of the configuration will remain the same.  Here is the new policy map configuration for reference:

R1:
policy-map child-policer
 class Prec2
  police cir 96000 bc 6000
   conform-action set-prec-transmit 3
   exceed-action drop
 class Prec0
  police cir 96000 bc 6000
   conform-action transmit
   exceed-action drop
policy-map policer
 class class-default
  police cir 192000 bc 12000
   conform-action set-prec-transmit 2
   exceed-action drop
 service-policy child-policer

The results of R1’s policing and R2’s traffic measurments are shown below:

2-policing2-r1-pmap

2-policing2-r2-pmap

The traffic is now being received on R2 as IPP 2, which is what we configured the parent policy to mark conforming traffic as.  Therefore, we can conclude that marking with class-based policing occurs first on the parent policy, and then on the child policy.  If the child policy policer has not been configured to remark traffic, then it will be transmitted as marked by the parent policer.

 

Next we will add class-based marking to both the parent and child policy maps, in addition to marking with class-based policing.  The parent policy will be configured with class-based marking to set IPP to 5.  The child policy class ‘Prec0’ will be configured with class-based marking to set IPP to 4.  The child policy will also be configured as it was initially, so that traffic conforming to the policer on class ‘Prec0’ is marked with IPP 1.  Additional classes will also be added to the traffic meter on R2 to test for incoming IPP values of 4 or 5.  The configuration is:

R1:
policy-map child-policer
 class Prec2
  police cir 96000 bc 6000
   conform-action set-prec-transmit 3
   exceed-action drop
 class Prec0
  set precedence 4
  police cir 96000 bc 6000
   conform-action set-prec-transmit 1
   exceed-action drop
policy-map policer
 class class-default
  police cir 192000 bc 12000
   conform-action set-prec-transmit 2
   exceed-action drop
  set precedence 5
  service-policy child-policer

R2:
class-map match-all Prec4
 match precedence 4
class-map match-all Prec5
 match precedence 5
!
policy-map Prec-Traffic-Meter
 class Prec1
 class Prec2
 class Prec3
 class Prec4
 class Prec5

The results of R1’s marking and policing and R2’s traffic measurments are shown below:

3-policing2-r1-pmap

3-policing2-r2-pmap

The policy map statistics on R1 show that class-based marking has marked 6103 packets as IPP 5 on the parent and as IPP 4 on the child, the same number of total packets before any policing occured – so we know that class-based marking occurs before policing does.  R2 shows that all packets being received are IPP 1, which is what the child policer has been configured to remark conforming traffic as – so we also know that of the 4 methods of marking being used, marking by the child policer takes place last.

 

Now let’s modify the conform action on the child policer class ‘Prec0’ again so that it does not remark traffic.  Only the conform action needs to be changed, but the whole policy map configuration on R1 is shown for reference:

R1:
policy-map child-policer
 class Prec2
  police cir 96000 bc 6000
   conform-action set-prec-transmit 3
   exceed-action drop
 class Prec0
  set precedence 4
  police cir 96000 bc 6000
   conform-action transmit
   exceed-action drop
policy-map policer
 class class-default
  police cir 192000 bc 12000
   conform-action set-prec-transmit 2
   exceed-action drop
  set precedence 5
  service-policy child-policer

The results of R1’s marking and policing and R2’s traffic measurments are shown below:

4-policing2-r1-pmap1

4-policing2-r2-pmap

All traffic being received by R2 is IPP 2, as configured on the parent policer.  Therefore we can determine that marking by the parent policer also takes place after class-based marking by either the child or parent policys.

 

Next we will modify the action for conforming traffic on the parent policer as well, so that both policers transmit conforming traffic unmodified.  This will allow us to see which occurs last of the class-based marking configured on the parent and child policys.  The new policy map configuration is:

R1:
policy-map child-policer
 class Prec2
  police cir 96000 bc 6000
   conform-action set-prec-transmit 3
   exceed-action drop
 class Prec0
  set precedence 4
  police cir 96000 bc 6000
   conform-action transmit
   exceed-action drop
policy-map policer
 class class-default
  police cir 192000 bc 12000
   conform-action transmit
   exceed-action drop
  set precedence 5
  service-policy child-policer

The results of R1’s marking and policing and R2’s traffic measurments are shown below:

5-policing2-r1-pmap

5-policing2-r2-pmap

All traffic received is IPP 4, which was configured with class-based marking on the child policy.  We can determine that class-based marking on the child policy takes place after class-based marking on the parent policy.

 

Now let’s remove class-based marking from the child policy just to verify that class-based marking on the parent policy takes place as expected in the absence of all other marking methods.  The policy map configuration is:

R1:
policy-map child-policer
 class Prec2
  police cir 96000 bc 6000
   conform-action set-prec-transmit 3
   exceed-action drop
 class Prec0
  police cir 96000 bc 6000
   conform-action transmit
   exceed-action drop
policy-map policer
 class class-default
  police cir 192000 bc 12000
   conform-action transmit
   exceed-action drop
  set precedence 5
  service-policy child-policer

The results of R1’s marking and policing and R2’s traffic measurments are shown below:

6-policing2-r1-pmap

6-policing2-r2-pmap

Our traffic is now being marked as IPP 5 as specified by the class-based marker on the parent policy after all other types of marking were removed.  Identical tests were performed for inbound classification, marking, and policing and the results were the same.  Based on this series of tests, we can conclude that the order of operation for classification, class-based marking, and class-based policing is:

1. Inbound classification for both parent and child policy maps.  This same classification will be used for determining which class traffic is placed into in Step #2 – 5, regardless of any changes made to the markings along the way.

2. Inbound class-based marking on the parent policy

3. Inbound class-based marking on the child policy.

4. Inbound class-based policing and marking on the parent policy.

5. Inbound class-based policing and marking on the child policy.

6. Outbound classification for both parent and child policy maps.  This same classification will be used for determining which class traffic is placed into in Step #7 – 10, regardless of any changes made to the markings along the way.

7. Outbound class-based marking on the parent policy

8. Outbound class-based marking on the child policy.

9. Outbound class-based policing and marking on the parent policy.

10. Outbound class-based policing and marking on the child policy.

 

The only tricky part to remember is that the classification occurs only at the beginning, once in each direction – traffic is not reclassified based on an updated marking that occured in the middle.  For example, as we saw in the tests, when traffic reached the outbound child policer (Step #10), it was still classified as IPP 0 traffic even though it was remarked 3 different times in between (Step #7 – 9) to different IPP values.

 

CAR on the other hand behaves in a much more logical way.  If cascaded CAR statements are used, a new classification is performed at each one.  Therefore if a certain QoS value is remarked in the first CAR statement, future CAR statements can police based on that value.  Take a look at the following CAR configuration, after the class-based policing configuration has been removed:

R1:
access-list 100 permit ip any any precedence routine
access-list 101 permit ip any any precedence priority
access-list 102 permit ip any any precedence immediate
access-list 103 permit ip any any precedence flash
access-list 104 permit ip any any precedence flash-override
access-list 105 permit ip any any precedence critical
!
interface Serial0/0
 rate-limit output access-group 100 192000 12000 12000 conform-action set-prec-continue 1 exceed-action drop
 rate-limit output access-group 101 96000 6000 6000 conform-action set-prec-continue 2 exceed-action drop
 rate-limit output access-group 102 96000 6000 6000 conform-action set-prec-continue 3 exceed-action drop
 rate-limit output access-group 103 96000 6000 6000 conform-action transmit exceed-action drop

6 different ACLs have been configured, 1 to match each IPP value between 0 and 5.  CAR has also been configured with 4 cascaded policys, with the first policing traffic to 192,000 bps and the next 3 policing traffic to 96,000 bps.  All conforming traffic is marked to the next higher precedence value, and exceeding traffic is dropped.  Using the same traffic stream (approximately 384,000 bps of IPP 0 traffic), the results on R1 and R2 are shown below:

7-policing2-r1-car

7-policing2-r2-pmap

We can see that roughly 191,000 bps matched the first CAR policy, which was configured to match IPP 0 traffic.  Since the configured action was set-prec-continue 1, the conforming traffic is compared against the next statement.  The second CAR policy is configured to match only IPP 1 traffic, and we can see that it has matched 849 conforming and 850 exceeding packets, which equals the first CAR policy’s 1699 conforming packets.  Our traffic was reclassified when the second CAR policy was examined, resulting in a match on ACL 101 which was configured to match IPP 1 traffic – if it worked like class-based policing and was not reclassified, none of the last 3 CAR policies would have resulted in a match, and 192,000 bps of IPP 0 traffic would have been sent.  The third and fourth CAR policies remark the traffic to IPP 2 and 3 respectively, and since the CIR on both is the same as the second policy no additional traffic is dropped.  On R2, approximately 96,000 bps of IPP 3 traffic is arriving.

 

Next we will add class-based marking to the configuration.  We will configure the parent class-based marking policy to mark traffic as IPP 5 and the child to mark traffic as IPP 4, just as in the class-based policing tests.  We also add a couple additional CAR policies after the four configured previously.  The new config on R1 is:

R1:
policy-map child-marker
 class class-default
  set precedence 4
policy-map marker
 class class-default
  set precedence 5
  service-policy child-marker
!
interface Serial0/0
 rate-limit output access-group 104 96000 6000 6000 conform-action transmit exceed-action drop
 rate-limit output access-group 105 96000 6000 6000 conform-action transmit exceed-action drop
 service-policy output marker

We already know from previous tests that the class-based marking on the child will be applied after the parent, so after the two remarkings by class-based marking have taken place the traffic should be IPP 4; what we are trying to find out now is will the first CAR statement see the traffic as it was initially as IPP 0 or after it has been marked by class-based marking as IPP 4.  The results on R1 and R2 are shown below:

8-policing2-r1-car

8-policing2-r2-pmap

Only the fifth CAR statement (ACL 104) was matched, so CAR must have reclassified the traffic after class-based marking had marked it rather than using the initial classification of IPP 0.  R2 also confirms that it saw only IPP 4 traffic.  These tests show that the order of operation for classification, class-based marking, and CAR is:

1. Inbound classification for both parent and child policy maps.  This same classification will be used for determining which class traffic is placed into in Step #2 – 3, regardless of any changes made to the markings along the way.

2. Inbound class-based marking on the parent policy

3. Inbound class-based marking on the child policy.

4. Inbound re-classification by CAR.

5. Inbound policing and marking on the CAR policy.

6. Return to Step #4 for each additional cascaded CAR policy.

7. Outbound classification for both parent and child policy maps.  This same classification will be used for determining which class traffic is placed into in Step #8 – 9, regardless of any changes made to the markings along the way.

8. Outbound class-based marking on the parent policy

9. Outbound class-based marking on the child policy.

10. Outbound re-classification by CAR.

11. Outbound policing and marking on the CAR policy.

12. Return to Step #10 for each additional cascaded CAR policy.

Posted in QoS | 8 Comments »

Policing Tests

Posted by Andy on January 31, 2009

This post will take a look at some various tests related to policing.  I will be using the same UDP flood script that I used for WFQ and CBWFQ/LLQ.  R1 will be configured to police traffic, and one or more UDP packet streams will be generated, depending on what is being tested.  R2 will be configured to measure incoming traffic after it has been policed.  The network topology and initial configurations are shown below:

policing-topology

R1:
interface FastEthernet0/0
 ip address 10.1.1.1 255.255.255.0
 load-interval 30
 speed 100
 full-duplex
 no keepalive
 no mop enabled
!
interface Serial0/0
 ip address 10.1.12.1 255.255.255.0
 load-interval 30
 no keepalive
!
no cdp run

R2:
ip access-list extended DHCP
 permit udp any any eq bootps
ip access-list extended DNS
 permit udp any any eq domain
ip access-list extended TFTP
 permit udp any any eq tftp
!
class-map match-all TFTP
 match access-group name TFTP
class-map match-all DHCP
 match access-group name DHCP
class-map match-all DNS
 match access-group name DNS
!
policy-map Traffic-Meter
 class TFTP
 class DHCP
 class DNS
!
interface Serial0/0
 ip address 10.1.12.2 255.255.255.0
 load-interval 30
 no keepalive
 service-policy input Traffic-Meter
!
no cdp run

For the first few tests, we will be looking at how the size of the token bucket(s) can affect how the policer behaves.  Consider the following configuration:

R1:
policy-map policer
 class class-default
  police 600000 1499
   conform-action transmit
   exceed-action drop
!
interface Serial0/0
 service-policy output policer

Using this configuration, R1 polices all traffic outbound on S0/0 at a rate of 600 kbps.  The policer uses a single token bucket of size 1499 bytes which will take approximately 20 ms to fill (1499 * 8 / 600,000).  Now we will use PC to send UDP packets every 125 ms with a layer-3 size of 1496 bytes:

flood.pl --port=53 --size=1496 --delay=125 10.1.12.2 

This will result in 1500-byte frames on R1’s S0/0.  The total bandwidth used by this flow is approximately 96 kbps (1500bytes * 8bits/byte * 8packets/second), which is far below the policing rate.  The results on R1 are shown below:

1-r1-pmap

All packets are dropped despite the fact that the offered rate is well below the policing rate.  Because the token bucket has a max size of 1499, it will never have enough tokens to allow a 1500-byte packet.  The output also confirms that the policer sees the packets as 1500 bytes (5,434,500 / 3,623).  Now let’s change Bc to 1500 bytes:

R1:
policy-map policer
 class class-default
  police 600000 1500

We will use the same parameters for the UDP traffic:

flood.pl --port=53 --size=1496 --delay=125 10.1.12.2

The results on R1 are shown below:

2-r1-pmap

This time not a single packet has been policed.  The size of the token bucket is 1500 bytes, exactly the same as the packet sizes, and takes 20 ms to fill.  With a packet arriving every 125 ms, the bucket is always full when a packet arrives.  R2 also confirms that all 96 kbps worth of traffic are being sent across the serial link:

2-r2-pmap

 

Next let’s see what can happen if the token bucket is configured to a small size (but larger than the maximum packet size to prevent all packets from being policed).  We will use the following configuration:

R1:
policy-map policer
 class class-default
  police cir 95000 bc 1500
   conform-action transmit
   exceed-action drop

We will also use the same parameters for generating traffic:

flood.pl --port=53 --size=1496 --delay=125 10.1.12.2

Take a look at the policing statistics on R1:

3-r1-pmap

Even though the bucket is as big as the packet size and we are policing at 95 kbps, only approximately 49 kbps is being allowed.  The reason for this is that with the values chosen for CIR and Bc, it takes approximately 126 ms to completely fill the token bucket to 1500 bytes (1500 * 8 / 95000) and a 1500 byte packet arrives approximately every 125 ms.  The first packet will be allowed and the token bucket will be decremented to 0 bytes.  The second packet will arrive 125 ms later, and 1484 tokens will be placed into the token bucket (.125 * 95000 / 8).  Since the bucket does not have enough tokens, the packet is policed.  The token bucket reaches it’s max size 1 ms later, but no packets arrive to use the tokens for another 124 ms.  When the third packet arrives, the token bucket is decremented to 0 bytes and the cycle starts over, with every other packet being policed.  The fact that slightly more packets conformed than exceeded is probably due to a slight variation in interpacket delay; if 2 consecutive packets arrive 126 ms or more apart they will both conform.  This example shows the worst possible scenario that can occur (other than Bc being smaller than the packet size):  the token bucket reaches its max size just after the first interval and spends nearly the entire second interval wasting tokens.  With a small Bc, the actual sending rate can be as low as half of the configured CIR.

 

Next we will keep the same configuration but add a second token bucket to the policer.  We will use a size of 3000 bytes for the second bucket (Be).  The new configuration is:

R1:
policy-map policer
 class class-default
  police cir 95000 bc 1500 be 3000
   conform-action transmit
   exceed-action transmit
   violate-action drop

The policing statistics on R1 and traffic metering on R2 are shown below:

4-r1-pmap

4-r2-pmap

This time, the extra tokens spill into the second bucket, and no traffic is dropped.  It probably would have made more sense just to increase Bc, but this shows that excess burst can be useful even in the middle of a sustained flow if the Bc bucket is configured to a small size. 

 

Next let’s look at what happens with more typical values for Bc and Be with an offered rate that exceeds the CIR.  We will configure Bc and Be each as 500 ms of CIR.  The configuration is:

R1:
policy-map policer
 class class-default
  police cir 96000 bc 6000 be 6000
   conform-action transmit
   exceed-action transmit
   violate-action drop

Instead of 8 packets/second, we will generate 16 packets/second for an offered rate of 192 kbps:

flood.pl --port=53 --size=1496 --delay=62 10.1.12.2

The policing statistics on R1 are shown below.  The first show command output was taken a little under 1 second after beginning the traffic flow, and the second several minutes later:

5-r1-pmap1

5-r1-pmap21

This shows the more typical behavior of a single rate, two bucket policer with a fairly large Bc.  Bc and Be both start full at 6000 tokens each.  The offered rate is considerably higher than the CIR, so the Bc bucket is soon used up and the router begins using tokens out of Be to transmit packets.  After a little under 1 second, we can see that Be has been used to transmit 4 packets, which completely empties it (4 * 1500 = 6000).  Several minutes later, we can see that the counters for conformed and violated packets have increased by a lot, but exceeded remains at 4.  This is because with single rate policing, Be is only refilled when Bc is full.  With Bc configured to a large enough value that it does not spend any time in a full state, Be will never have a chance to accumulate tokens unless the offered rate falls below the CIR.

 

Next we will look at how a policer functions on 2 flows with very different packet sizes.  We will send DNS traffic with 1250-byte packets at 8 packets/second and DHCP traffic with 156-byte packets at 64 packets/second so that each flow sends approximately 80 kbps of traffic.  We will police all traffic to 64 kbps, with Bc set to 4000 bytes.  The config is:

R1:
policy-map policer
 class class-default
  police cir 64000 bc 4000
   conform-action transmit
   exceed-action drop

Start the traffic flows on PC:

flood.pl --port=53 --size=1246 --delay=125 10.1.12.2
flood.pl --port=67 --size=152 --delay=15 10.1.12.2

The policing results on R1 and traffic measurements on R2 are:

6-r1-pmap1

6-r2-pmap1

DNS has not been allowed to send a single packet!  The two flows combined generate 160 kbps of traffic, well above the 64 kbps CIR, so Bc quickly empties.  As it refills, DHCP is able to send a packet anytime there are 156 tokens in the bucket, while DNS must wait for 1250.  The best situation DNS could hope for is that a DHCP packet arrives while there are 155 tokens in the bucket and is policed.  In this case another DHCP packet will arrive 1/64 of a second later, 125 tokens will be added to the bucket ((64000/8) / 64) bringing the total to 280, the DHCP packet will be allowed, and the bucket will be decremented to 124 (280 – 156).  Therefore as long as the DHCP flow sustains, the token bucket will never contain more than 280 tokens and a DNS packet will never be allowed.

 

Let’s say we’ve been given the requirement to limit all traffic being sent out an interface to 96 kbps.  DNS should be allowed to send up to 80 kbps, DHCP 24 kbps, and TFTP 24 kbps, as long as the overall CIR of 96 kbps has not been exceeded.  We’ve also been given the requirement to do this using only policing.  First let’s try this using class-based policing.  The configuration is:

R1:
ip access-list extended DHCP
 permit udp any any eq bootps
ip access-list extended DNS
 permit udp any any eq domain
ip access-list extended TFTP
 permit udp any any eq tftp
!
class-map match-all TFTP
 match access-group name TFTP
class-map match-all DHCP
 match access-group name DHCP
class-map match-all DNS
 match access-group name DNS
!
policy-map child-policer
 class DNS
  police cir 80000 bc 5000
   conform-action transmit
   exceed-action drop
 class DHCP
  police cir 24000 bc 1500
   conform-action transmit
   exceed-action drop
 class TFTP
  police cir 24000 bc 1500
   conform-action transmit
   exceed-action drop
policy-map policer
 class class-default
  police cir 96000 bc 6000
   conform-action transmit
   exceed-action drop
 service-policy child-policer

Now we will start the three different traffic streams.  DNS and DHCP will each send 1500-byte packets with an interpacket delay of 125 ms for an offered rate of approximately 96,000 each (1500 * 8 * 8).  TFTP will send 500-byte packets with an interpacket delay of 1/64 seconds for an offered rate of approximately 256,000 (500 * 64 * 8):

flood.pl --port=53 --size=1496 --delay=125 10.1.12.2
flood.pl --port=67 --size=1496 --delay=125 10.1.12.2
flood.pl --port=69 --size=496 --delay=15 10.1.12.2

First, look at the input and output rate on each interface:

7-r1-f01

7-r1-s01

7-r2-s0

R1 shows an input rate of 434,000 bps, which is close to the expected value of 454,400 bps (510 * 64 * 8 + 1510 * 8 * 8 + 1510 * 8 * 8).  However, after policing takes place only 24,000 bps is being sent out of S0/0.  The policing statistics shown on R1 are:

7-r1-pmap2

First look at the class-default statistics on the parent policy map.  Policing on the parent policy takes place first, and the results are offered to the child policy.  We can see that 432,000 bps of traffic has been offered to the parent policy, with 96,000 conforming and 336,000 exceeding.  Next look at the statistics on the child policy map.  The measurements of the 30 second offered rates on the child policy apparently take place before policing on the parent since the offered rates match the actual sending rate for each type of traffic.  The policing statistics, however, are based on the offered traffic after it has been policed by the parent and give some insight into what is causing the problem.  We can see that not a single packet has conformed to or exceeded the child policer on the DNS and DHCP classes – in other words every DNS and DHCP packet was policed by the parent policy first.  TFTP, on the other hand, shows that there has been 24,000 bps of conforming traffic and 72,000 bps of exceeding traffic.  The combined conforming and exceeding rates on the TFTP class match the CIR on the parent policer, and the conforming and exceeding packet counters on TFTP exactly match the conforming packet counter on the parent policy – so we know that the parent policy is admitting TFTP packets only.  TFTP packets are allowed by the parent policy at a rate of 96,000 kbps only to have most of them dropped by the child policy, and the bandwidth that the other classes could be using goes to waste.  The problem here is the order that the policing occurs in and that, as shown earlier, a policer will give preference to flows with smaller packet sizes when the CIR is exceeded for a sustained period.

 

Using Committed Access Rate (CAR) can partially overcome this problem.  CAR does not use MQC and does not allow named access lists, so we will have to create new ones.  We will use the same CIR and Bc sizes for each policer.  The configuration for this is:

R1:
access-list 100 permit udp any any eq domain
access-list 101 permit udp any any eq bootps
access-list 102 permit udp any any eq tftp
!
interface Serial0/0
 no service-policy output policer
 rate-limit output access-group 100 80000 5000 5000 conform-action continue exceed-action drop
 rate-limit output access-group 101 24000 2000 2000 conform-action continue exceed-action drop
 rate-limit output access-group 102 24000 2000 2000 conform-action continue exceed-action drop
 rate-limit output 96000 6000 6000 conform-action transmit exceed-action drop

The continue action tells CAR to continue looking at rate-limit commands until it finds another match, rather than performing an action based on the first match like an ACL or MQC class.  Start the traffic flows on PC again using the same parameters:

flood.pl --port=53 --size=1496 --delay=125 10.1.12.2
flood.pl --port=67 --size=1496 --delay=125 10.1.12.2
flood.pl --port=69 --size=496 --delay=15 10.1.12.2

The CAR statistics on R1 are:

8-r1-car

This time, R1 polices subsets of traffic before policing the combined traffic.  We can see that 79,000 bps of DNS, 23,000 bps of DHCP, and 24,000 bps of TFTP have conformed, which approximately matches the CIR configured for each type of traffic.  The results of the subset policing are then offered to the ‘all traffic policer’.  The output shows that 126,000 bps of traffic has been offered to the ‘all traffic policer’ which matches the combined CIRs of the individual subset policers.  Of this, 95,000 has conformed and been sent, while 31,000 has exceeded and been dropped.  R2 verifies that the entire CIR is now being used and the amount of each traffic type that is being received:

8-r2-s0

8-r2-pmap

The problem of inefficient link usage has been solved; however the problem of flows with smaller packet sizes receiving preference by the policer still remains.  All 24,000 bps of TFTP traffic that the subset policer offered to the ‘all traffic policer’ has been sent due to it’s much smaller packet sizes, just like earlier tests showed.  A better solution (if we removed the requirement that only policing is allowed) would probably be to shape traffic outbound on S0/0 and use the queueing strategy on the shaping queues to control which packets are sent, rather than relying on the random nature of packet size.  Policing could still be used either inbound or outbound if the rate of certain types of traffic should be limited before being placed into the shaping queue system.

Posted in QoS | Leave a Comment »

TCP Header Compression

Posted by Andy on January 27, 2009

This post will take a look at an example of configuring TCP header compression.  TCP header compression can be used to compress the IP and TCP headers (40 bytes combined) typically down to between 3-5 bytes.  The  topology for the example is shown below:

tcphc-topology

For testing purposes, we will be starting a telnet session and an HTTP download between the 2 routers.  Therefore, we will configure separate MQC classes to match each type of traffic and enable class-based TCP header compression on each of them.  TCP header compression configuration is required on each end of the link.  R1 will also be configured as an HTTP server.  The configuration for this is shown below:

R1:
class-map match-all HTTP
 match protocol http
class-map match-all Telnet
 match protocol telnet
!
policy-map test
 class HTTP
  compress header ip tcp
 class Telnet
  compress header ip tcp
!
interface Serial0/0
 ip address 10.1.12.1 255.255.255.0
 load-interval 30
 service-policy output test
!
ip http server
ip http path flash:

R2:
class-map match-all HTTP
 match protocol http
class-map match-all Telnet
 match protocol telnet
!
policy-map test
 class HTTP
  compress header ip tcp
 class Telnet
  compress header ip tcp
!
interface Serial0/0
 ip address 10.1.12.2 255.255.255.0
 load-interval 30
 service-policy output test

A large file (test.txt) has been placed in R1’s flash for R2 to download.  Now we will start the HTTP download and telnet session:


R1#telnet 10.1.12.2
R2#copy http://10.1.12.1/test.txt null:
 

 

The results with TCP header compression enabled are shown below:

tcphc-r1-pmap3

The efficiency improvement factor is a measure of bytes that would have been sent divided by bytes actually sent.  For each of the classes, this works out to:

HTTP:   (16,397 + 484,998) / 484,998   = 1.03

Telnet:  (5,837 + 2,935) / 2,935   = 2.98

which matches the output shown above.  The reason HTTP is much less efficient is due to it’s much larger packet sizes.  The average packet size for each class in this example is:

HTTP:   (16,397 + 484,998) / 477   = 1051 bytes

Telnet:  (5,837 + 2,935) / 192   = 45 bytes

The IP and TCP headers make up only a small portion of the total HTTP packet, while they make up nearly the entire telnet packet.  In most cases, it probably would not be worthwhile to enable TCP header compression for HTTP traffic.

 

Now let’s look at what would happen if we tried to enable TCP header compression on a link between 2 live routers as TCP traffic was being sent across the link.  First we will remove TCP header compression from both classes on both routers:

R1:
policy-map test
 class HTTP
  no compress header ip tcp
 class Telnet
  no compress header ip tcp

R2:
policy-map test
 class HTTP
  no compress header ip tcp
 class Telnet
  no compress header ip tcp

On R2, we will start an HTTP download from R1 again.  On R1, we will telnet to R2 and enable TCP header compression on both MQC classes on R2:

R2#copy http://10.1.12.1/test.txt null:

R1#telnet 10.1.12.2
Trying 10.1.12.2 ... Open
R2#conf t
 policy-map test
  class HTTP
   compress header ip tcp
  class Telnet
   compress header ip tcp

At this point, the HTTP download has stopped and R2 displays the following error message:

tcphc-r2-error

Meanwhile on R1, R2 no longer responds to the telnet session and we are unable to undo the configuration.  The reason for this is that R2 begins immediately sending compressed headers once the compress header ip tcp is entered, and R1 does not know how to interpret them if it does not have a similar configuration.  We could exit out of the telnet session and configure R1 similarly at this point, but traffic still has been temporarily disrupted and our HTTP download has to be completely restarted.  Another option is to first configure R2 with passive TCP header compression, and then configure R1 for TCP header compression.  Passive TCP header compression only compresses outgoing headers if the incoming headers from that destination are compressed.  Unfortunately, this is only supported at the interface level, so it cannot be used on a subset of TCP traffic like class-based TCP header compression can.  To test this out we will first remove the class-based TCP header compression policy from each router interface:

R1:
interface Serial0/0
 no service-policy output test

R2:
interface Serial0/0
 no service-policy output test

On R2, we will start an HTTP download from R1 again.  On R1, we will telnet to R2 and enable passive TCP header compression:

R2#copy http://10.1.12.1/test.txt null:

R1#telnet 10.1.12.2
Trying 10.1.12.2 ... Open
R2#conf t
 interface Serial0/0
  ip tcp header-compression passive

Now we close the telnet session and configure TCP header compression on R1:

R1:
interface Serial0/0
 ip tcp header-compression

R1 and R2 both begin using TCP header compression at this point, and no traffic was disrupted in between.  The TCP header compression statistics on R1 are shown below:

tcphc-r1-shtcphc

Posted in QoS | Leave a Comment »

Class-Based Weighted Fair Queueing and Low Latency Queueing Tests

Posted by Andy on January 22, 2009

This post will be about testing class-based weighted fair queueing (CBWFQ).  The same UDP flood script and topology will be used that was used for testing WFQ.  The topology and initial configurations of each router are shown below:

cbwfq-topology1

R1:
interface FastEthernet0/0
 ip address 10.1.1.1 255.255.255.0
 load-interval 30
 speed 100
 full-duplex
 no keepalive
 no mop enabled
!
interface Serial1/0
 ip address 10.1.12.1 255.255.255.0
 load-interval 30
 no keepalive
!
no cdp run

R2:
interface Serial1/0
 ip address 10.1.12.2 255.255.255.0
 load-interval 30
 no keepalive
!
no cdp run

R1 and R2 are dynamips routers, and PC is a loopback interface connected to R1 in dynamips.  PC will generate traffic destined for R2’s S1/0 interface which will allow queueing to be tested outbound on R1’s S1/0 interface.  R2 will drop the traffic because it does not have a route to reach PC (this is intentional so that the return traffic does not unnecessarily consume CPU use).

First we will do a simple test to verify the operation of CBWFQ.  We will have 3 separate UDP traffic streams sending traffic from PC to R2 on ports 53 (DNS), 67 (DHCP), and 69 (TFTP).  Each traffic flow will send 1500-byte packets every 125 ms.  The total sent by each flow will be approximately 96 kbps, resulting in approximately 288 kbps of offered traffic.  We will shape traffic outbound on R1 S1/0 to a rate of 96 kbps to simulate a clock rate of 96 kbps.  CBWFQ will be applied to the shaping queues to examine how it’s scheduler allocates bandwidth to each flow.  Since class-based shaping supports CBWFQ on shaping queues, we will use that for our shaping policy.  For the first test, we will create a policy-map on R1 with 3 classes (one to match each type of traffic) and assign the entire 96k of interface bandwidth to the classes.  We will assign 48k to DNS, 32k to DHCP, and 16k to TFTP.  This will require that the max-reserved-bandwidth be increased to 100.  A policy-map will also be created on R2 and applied inbound on S1/0 to measure the amount of traffic of each type that makes it to R2.  The configuration for this is:

R1:
ip access-list extended DHCP
 permit udp any any eq bootps
ip access-list extended DNS
 permit udp any any eq domain
ip access-list extended TFTP
 permit udp any any eq tftp
!
class-map match-all TFTP
 match access-group name TFTP
class-map match-all DHCP
 match access-group name DHCP
class-map match-all DNS
 match access-group name DNS
!
policy-map CBWFQ
 class DNS
  bandwidth 48
 class DHCP
  bandwidth 32
 class TFTP
  bandwidth 16
policy-map Shaper
 class class-default
  shape average 96000
  service-policy CBWFQ
!
interface Serial1/0
 bandwidth 96
 max-reserved-bandwidth 100
 service-policy output Shaper

R2:
ip access-list extended DHCP
 permit udp any any eq bootps
ip access-list extended DNS
 permit udp any any eq domain
ip access-list extended TFTP
 permit udp any any eq tftp
!
class-map match-all TFTP
 match access-group name TFTP
class-map match-all DHCP
 match access-group name DHCP
class-map match-all DNS
 match access-group name DNS
!
policy-map Traffic-Meter
 class TFTP
 class DHCP
 class DNS
!
interface Serial1/0
 service-policy input Traffic-Meter

Now we’re ready to start the 3 traffic streams:

flood.pl --port=53 --size=1500 --delay=125 10.1.12.2
flood.pl --port=67 --size=1500 --delay=125 10.1.12.2
flood.pl --port=69 --size=1500 --delay=125 10.1.12.2

A show policy-map interface on R1 verifies that approximately 96k of each class of traffic is being received from PC:

cbwfq-1-r1pmap

 

On R2 we can see the actual amount of each type of traffic that is being sent across the link:

cbwfq-1-r2pmap

The 30 second offered rates almost exactly match the bandwidth that we allocated each class in the CBWFQ policy.  The packet counter also matches our policy exactly – DHCP was given twice as much bandwidth as TFTP and has sent exactly twice as many packets, and DNS has been given 3 times as much bandwidth as TFTP and is 1 packet away from sending 3 times as many packets (most likely the next packet sent will be DNS). 

Next, let’s allocate bandwidth in the same proportions, but only allocate a small amount of our total bandwidth.  In the last example we gave DHCP twice as much as TFTP and DNS three times as much as TFTP, so we will keep that ratio by giving TFTP 1% of the total bandwidth, DHCP 2%, and DNS 3%.  The bandwidth statements must all be removed from the classes first since only consistent units are allowed.  The configuration is:

R1:
policy-map CBWFQ
 class DNS
  bandwidth percent 3
 class DHCP
  bandwidth percent 2
 class TFTP
  bandwidth percent 1

Now look at the traffic coming in on R2:

cbwfq-2-r2pmap

The ratio is still exactly 1:2:3, even though 94% of the bandwidth was not allocated to any class.  The reason for this has to do with how the CBWFQ scheduling algorithm works.  CBWFQ assigns a sequence number to each packet just like WFQ.  CBWFQ is essentially a combination of dynamic conversations and user defined conversations (that’s what Internetwork Expert calls them in this article on CBWFQ.  I think these names are somewhat misleading, which I’ll get to later, but for now we’ll stick to these names).  The weights of dynamic conversations are calculated the same as for WFQ conversations, 32384 / (IPP + 1).  The weights of user defined conversations are calculated as:

Weight = Constant * InterfaceBandwidth / ClassBandwidth

 if a flat bandwidth value is used, or:

Weight = Constant * 100 / BandwidthPercent

if bandwidth percent or bandwidth percent remaining is used.  The constant used in the formula depends on the number of dynamic flows in the WFQ system.  The following table shows the constant that is used in the weight calculation for each number of WFQ flows:

WFQ flows

Constant

16

64

32

64

64

57

128

30

256

16

512

8

1024

4

2048

2

4096

1

Like WFQ, CBWFQ also assigns a conversation number to each conversation.  Dynamic conversations work just like conversations in WFQ.  Based on a hash of header fields, they are classified into a conversation number between 0 and N-1, where N is the number of queues in the WFQ system.  Conversations N through N+7 are reserved for link queues.  Conversation N+8 is the priority queue (if LLQ is added to CBWFQ).  Conversations N+9 and above are used for user defined conversations.  Going back to the last example with 3% given to DNS, 2% to DHCP, and 1% to TFTP, we can see the conversation number and weight values assigned to each conversation on R1:

cbwfq-2-r1queue1

We can see that the WFQ system on the shaping queues is using 32 dynamic queues.  The user defined conversations for our flows are 41, 42, and 43, which is consistent with the formula for conversation numbers.  Using the weight formula for classes configured with bandwidth percent, we get:

DNS = 64 * 100 / 3 = 2133.33

DHCP = 64 * 100 / 2 = 3200

TFTP = 64 * 100 / 1 = 6400

which matches each of the weights shown in the output.  In the first example, when we allocated all of the interface bandwidth to the 3 classes, the weights would have been much lower, but still in the same proportion to one another.  Therefore, if all traffic is accounted for in user classes, it makes no difference how much is allocated to each class – only the proportion that the class is allocated relative to the others.

 

Now let’s look at what the criteria is for being put into a dynamic conversation vs. a user defined conversation.  Consider the following configuration:

R1:
class-map match-any DNS-DHCP
 match access-group name DNS
 match access-group name DHCP
!
policy-map CBWFQ
 class DNS-DHCP
 class class-default
  bandwidth percent 10
policy-map Shaper
 class class-default
  shape average 96000
  service-policy CBWFQ
!
interface Serial1/0
 service-policy output Shaper

DNS and DHCP are both being classified into the user defined class ‘DNS-DHCP’ which does not have a bandwidth guarantee.  TFTP will be classified into class-default which has been configured with 10% bandwidth guarantee.  You might expect that DNS and DHCP will both be put into the same user defined conversation, and that TFTP will be put into a dynamic converation.  The queueing information on R1 is shown below:

cbwfq-3-r1queue1

We can see that there are 16 dynamic queues in the WFQ system.  This means that user defined conversations will use a constant of 64 in the weight calculation and will start at conversation #25.  We can see that DNS and DHCP have each been given a separate queue, and the conversation numbers (0 and 13) fall within the dynamic queue range even though they were classified into a single user defined class.  The weight has also been calculated according to the weight formula for dynamic conversation, (32384 / (0 + 1)), giving them each a weight of 32,384.  Also, TFTP has been given a user defined conversation number even though it was classified into class-default.  The weight has been calculated according to the weight formula for user defined conversations with bandwidth percent configured, (64 * 100 / 10), resulting in a weight of 640.  Therefore, the type of conversation depends not on the class that the traffic is classifed into, but whether or not that class has a bandwidth guarantee.  The calculated weights shown in the output also point out another very surprising characteristic of CBWFQ – classes with a bandwidth guarantee generally have a much lower scheduling weight than classes without, even if the guarantee is very small.  Take a look at the traffic measurement information on R2:

cbwfq-3-r2pmap

Even though TFTP was only given bandwidth percent 10 in class-default, it has consumed almost the entire available bandwidth on the link.  Looking back at the weights on R1, we can see that DNS and DHCP have (32,384 / 640), or 50.6 times the scheduling weight of TFTP, allowing TFTP to send 50.6 times as many bytes.  The packet counters for packets received by R2 confirms this exactly (956 / 50.6 = 18.89).  Even if we had marked DNS or DHCP with IPP 7, the calculated weight would have been 4,048 (32,384 / (7 + 1)), which still would have allowed TFTP to consume the majority of the link bandwidth.

 

Now let’s add a priority guarantee to one of the classes, turning our CBWFQ policy into an LLQ policy.  We will give DNS a priority guarantee of 64 kbps, DHCP 2% of the bandwidth, and TFTP 1% of the bandwidth.  The new policy map configuration on R1 is shown below:

R1:
policy-map LLQ
 class DNS
  priority 64 8000
 class DHCP
  bandwidth percent 2
 class TFTP
  bandwidth percent 1
policy-map Shaper
 class class-default
  shape average 96000
  service-policy LLQ
!
interface Serial1/0
 service-policy output Shaper

Notice that the policer on the priority class has also been configured with a Bc of 8000 bytes.  We’ll look at why this was necessary in a minute, but for now we will ignore it.  Each traffic flow will be sent with the same parameters (1500-byte packets + L2 header and 125 ms interpacket delay for approximately 96 kbps per flow).  The queueing information from the shaping queues on R1 is shown below:

llq-1-r1queue

With WFQ configured to use 16 dynamic queues, the LLQ conversation number should be 24 and we can see that it is.  Notice that the weight for the LLQ conversation is 0 – this explains why packets in the LLQ are always scheduled first if the LLQ has not exceeded the policing rate.  R2 shows how much of each type of traffic is being sent across the serial link:

llq-1-r2pmap

As expected, DNS consumes all of it’s 64 kbps of priority bandwidth and DHCP and TFTP share the remaining bandwidth in a 2:1 ratio.

 

Next let’s see what happens if we didn’t adjust Bc on the policer of the LLQ.  The configuration remains the same, other than accepting the default Bc generated by IOS for the LLQ class:

R1:
policy-map LLQ
 class DNS
  priority 64

After starting the 3 traffic streams again, take a look at the traffic arriving at R2:

llq-2-r2pmap

Even though we’ve given it 64 kbps of priority bandwidth, DNS is only sending 48 kbps across the serial link.  The reason for this is due to how the LLQ is being policed.  The default Bc for a policer on an LLQ class is 20% of the policed rate.  With a policed rate of 64 kbps, Bc defaults to:

Bc = (64,000 bits/sec) * (.2 sec) * (1 byte / 8 bits) = 1600 bytes

This is verified on R1:

llq-2-r1pmap

The DNS traffic stream is sending a 1500-byte packet (1504 with HDLC) every 125 ms.  Once congestion starts to occur, the policer will function as follows:

1. Policer starts with a full 1600 token bucket.

2. DNS packet arrives at time T.  Packet has size 1504 and bucket has 1600 tokens, so the packet is allowed to be sent and bucket is decremented to 96 tokens.

3. DNS packet arrives at time T + .125.  Bucket is replenished with a pro-rated number of tokens based on the InterpacketArrivalTime * PolicerRate (in bytes).  In this case, the new number of tokens in the bucket is:  96 + .125 * 8000  = 1096.  Packet size (1504) > bucket size (1096), so the packet is policed.

4. DNS packet arrives at time T + .250.  Bucket is replenished with a pro-rated number of tokens based on the InterpacketArrivalTime * PolicerRate (in bytes).  In this case, the new number of tokens in the bucket is: 1096 + .125 * 8000 = 2096.  However, since the bucket has maximum size 1600, the extra tokens spill out of the bucket and the number of tokens is set to 1600.  Packet size (1504) < bucket size (1600), so the packet is sent.  Bucket is decremented to 96 tokens.

As you can see this cycle will continue forever.  The bucket reaches it’s maximum size at 200 ms, but another packet does not arrive until 50 ms later, and 50 ms worth of tokens are essentially wasted.  The net result is that every other packet is sent.  This explains why the amount of DNS traffic arriving at R2 (48 kbps) is half of the total amount of DNS traffic being sent (96 kbps).

 

Next let’s look at how the LLQ class behaves when there is no congestion on the interface.  The configuration remains the same, but only the DNS traffic flow will be sent.  Let’s see how much traffic R2 is receiving:

llq-3-r2pmap2

All of the traffic makes it to R2.  This verifies that the priority queue of LLQ is only policed when there is congestion.

 

For the last test, we will look at how CBWFQ behaves in IOS version 12.4(20)T.  We will create a CBWFQ policy and first test it in 12.4(18), which is what all the previous tests were done in, and then test the same CBWFQ policy in 12.4(20)T.  Consider the following configuration:

R1:
policy-map CBWFQ
 class DNS
  bandwidth percent 2
 class class-default
  fair-queue 128
policy-map Shaper
 class class-default
  shape average 96000
  service-policy CBWFQ

Using the same 3 traffic streams, DNS will be classified into the DNS class and DHCP and TFTP will be classified into class-default.  Based on the results of our previous tests, we can expect that DNS will be assigned conversation # 137 (128+9) and given a scheduling weight of 1500 (30 * 100 / 2).  DHCP and TFTP should each be given separate conversation numbers between 0 and 127 and given a scheduling weight of 32,384.  Therefore, DNS should get roughly 21.5 times as much bandwidth as DHCP and TFTP.  This is confirmed by the output on R1 and R2:

cbwfq-5-r1queue

cbwfq-5-r2pmap1

Now let’s put the exact same CBWFQ configuration into 12.4(20)T.  The show traffic-shaping queue command does not seem to work anymore in 12.4(20)T so it is difficult to determine exactly how the CBWFQ algorithm schedules packets.  However, take a look at the net result by looking at the incoming traffic on R2:

cbwfq-6-r2pmap

DNS has only been given approximately 2% of the total bandwidth, which was the minimum that we reserved for it in the bandwidth percent command.  All the remaining bandwidth has been divided between flows that did not have a bandwidth reservation given to them.  As you can see, the same configuration has a very different end result in 12.4(20)T.  DNS has gone from receiving nearly all the bandwidth, to receiving only 2% of it during congestion.

Posted in QoS | 4 Comments »

Weighted Fair Queueing

Posted by Andy on January 21, 2009

In my last post, I talked about using dynamips for testing queueing tools.  This one will take a look at using dynamips for testing weighted fair queueing (WFQ).  The network topology and inital configurations are shown below:

wfq-topology

R1:
interface FastEthernet0/0
 ip address 10.1.1.1 255.255.255.0
 load-interval 30
 speed 100
 full-duplex
 no keepalive
 no mop enabled
!
interface Serial0/0
 ip address 10.1.12.1 255.255.255.0
 load-interval 30
 no keepalive
!
no cdp run

R2:
interface Serial0/0
 ip address 10.1.12.2 255.255.255.0
 load-interval 30
 no keepalive
!
no cdp run
 

R1 and R2 are dynamips routers, and PC is a loopback interface connected to R1 in dynamips.  PC will generate traffic destined for R2’s S0/0 interface which will allow queueing to be tested outbound on R1’s S0/0 interface.  R2 will drop the traffic because it does not have a route to reach PC (this is intentional so that the return traffic does not unnecessarily consume CPU use).

In order to perform the tests, I wanted something simple that could generate a reliable and predictable amount of traffic through a dynamips network, and I decided to use this UDP flood script created by Ivan Pepelnjak.  While performing my inital tests to see if the script would work for my testing, I found that it had a very strange behavior – packets could only be sent at intervals roughly in increments of 1/64 (0.015625) of a second.  The UDP flood script allows the interpacket delay to be specified, however packet captures taken at each interface in the path showed that this was being rounded up to the next 64th of a second.  The following examples show this.  First, I’ll send traffic from the PC to R2 with a destination port of 1000, packet size of 500, and interpacket delay of 50 ms:

flood.pl --port=1000 --size=500 --delay=50 10.1.12.2

Wireshark capture taken from PC (loopback):

wfq-pc-50ms

 

Wireshark capture taken from R2 S0/0:

wfq-r2-50ms

The time display here has been changed from the default of seconds since the beginning of the capture to seconds between packets.  50 ms rounded up to the next 64th of a second results in 4/64, or 0.0625.  As you can see, the interpacket arrival times are extremely close to this.  Now let’s see what happens if we bump the interpacket delay up to 60 ms:

flood.pl --port=1000 --size=500 --delay=60 10.1.12.2

Wireshark capture taken from PC (loopback):

wfq-pc-60ms

 

Wireshark capture taken from R2 S0/0:

wfq-r2-60ms

Even though we increased the interpacket delay in the script by 10 ms, the actual interpacket delays have not changed because 60 ms still falls below 62.5 ms.  Next let’s change the interpacket delay in the script to 63 ms – just over the actual interpacket delay value that we saw in the last 2 tests:

flood.pl --port=1000 --size=500 --delay=63 10.1.12.2

Wireshark capture taken from PC (loopback):

wfq-pc-63ms

 

Wireshark capture taken from R2 S0/0:

wfq-r2-63ms

63 ms rounded up to the next 64th of a second results in 5/64, or 0.078125.  Once again, the actual interpacket delays shown by Wireshark are extremely close to this.  I’m not sure what exactly causes this strange behavior, whether it be a limitation of the loopback interface driver or some other issue, however we can see that the results are very consistent and predictable now that we know how to figure out the actual interpacket delay that is being used. 

For my tests, I chose to set the interpacket delay in the script to 125 ms, since it is exactly equal to 8/64.  This will result in the interpacket delay set in the script and the actual interpacket delay being the same to make things less confusing.  I also decided to use 1500-byte packets, which will result in 1514-byte frames on the Ethernet link and 1504-byte frames on the serial (HDLC) link.  The total bandwidth used by such a flow on each link should be

Ethernet: 1514 bytes/packet * 8 bits/byte * 8 packets/second = 96,896 bps

HDLC: 1504 bytes/packet * 8 bits/byte * 8 packets/second = 96,256 bps

or roughly 96kb/sec each.  Before getting into any WFQ tests, lets try sending 1 flow with these parameters and verify the results:

flood.pl --port=1000 --size=1500 --delay=125 10.1.12.2

Wireshark capture taken from PC (loopback):

wfq-pc-125ms

 

Wireshark capture taken from R2 S0/0:

wfq-r2-125ms

 

Output of show interfaces on R1 F0/0, R1 S0/0, and R2S0/0:

wfq-shint-r1f0

wfq-shint-r1s0

wfq-shint-r2s0

The results are amazingly accurate.  Wireshark shows packets being sent almost exactly every 125 ms and show interfaces shows that the input rate on R1 F0/0, output rate on R1 S0/0, and input rate on R2 S0/0 are exactly 96k.  I checked the output of show interfaces several times over the span of several minutes, and the rate on each interface was never more than 1000 bits above or below 96k.  Now that we have a good way of generating predictable traffic, let’s move onto the WFQ tests.

 

WFQ Test #1

For the first test, we will generate 3 separate UDP flows to ports 67 (DHCP), 69 (TFTP), and 514 (Syslog).  Each flow will use the same parameters as the last test performed above to generate 96k worth of traffic each.  We will simulate a clock rate of 96k on the interface by shaping to 96k.  This will force the 3 flows to compete for 96k of total bandwidth and allow us to see how the WFQ scheduler allocates bandwidth to each flow.  The IP Precedence of each flow will be left at the default of 0.  WFQ assigns a weight to each flow using the formula 32384 / (IPP + 1).  Since the IP Precedence of each flow is 0, they should each be given the same weight, and since the length of packets in each flow is the same, an equal number of packets from each flow should be sent.  Since GTS supports WFQ on shaping queues, we will use that.  On R2, we will create a policy-map to match each type of traffic so that we can measure the amount of each that makes it to R2:

R1:
interface Serial0/0
 traffic-shape rate 96000

R2:
ip access-list extended DHCP
 permit udp any any eq bootps
ip access-list extended SYSLOG
 permit udp any any eq syslog
ip access-list extended TFTP
 permit udp any any eq tftp
!
class-map match-all TFTP
 match access-group name TFTP
class-map match-all SYSLOG
 match access-group name SYSLOG
class-map match-all DHCP
 match access-group name DHCP
!
policy-map Traffic-Meter
 class TFTP
 class DHCP
 class SYSLOG
!
interface Serial0/0
 service-policy input Traffic-Meter


Now we’re ready to start each of the traffic streams:

 

flood.pl --port=67 --size=1500 --delay=125 10.1.12.2

flood.pl --port=69 --size=1500 --delay=125 10.1.12.2

flood.pl --port=514 --size=1500 --delay=125 10.1.12.2

 

Output of show interfaces on R1 F0/0, R1 S0/0, and R2S0/0:

wfq-test1-shint-r1f0

wfq-test1-shint-r1s0

wfq-test1-shint-r2s0

As expected, R1 F0/0 shows 288k worth of input traffic (96k * 3).  R1 S0/0 shows 96k of output traffic and R2 S0/0 shows 96k of input traffic which matches the shaped rate.

Next, look at the currently active flows in the shaping queues on R1:

wfq-test1-r1queue2

This verifies that WFQ is assigning each flow a weight of 32384 by using the formula 32384/(IPP +1), and therefore each of them receive an equal scheduling weight. 

Now let’s look at the traffic coming in on R2:

wfq-test1-r2meter

WFQ has distributed an exactly equal share to each of the 3 flows (1499 packets each and roughly 32kbps each over the last 30 seconds).  Pretty common knowledge, but whats most impressive about this is we were able to test WFQ using dynamips and it was accurate right down to the very packet.

 

WFQ Test #2

Let’s make things a little more interesting and change the IP Precedence and offered rates of our 3 flows.  For Syslog we will change packet size to 1000 and keep delay of 125 ms.  For TFTP we will keep packet size of 1500 and change delay to 60 ms (actual delay will be 62.5 ms, or 16 packets/second).  For DHCP we will keep both parameters the same.  The offered rate for each flow on the Ethernet link will now be:

Syslog: 1014 bytes/packet * 8 bits/byte * 8 packets/second = 64,896 bps

DHCP: 1514 bytes/packet * 8 bits/byte * 8 packets/second = 96,896 bps

TFTP: 1514 bytes/packet * 8 bits/byte * 16 packets/second = 193,792 bps

We’ll pretend that our Syslog flow is actually a high priority type of traffic such as video, DHCP is best effort traffic such as web, and TFTP is less than best effort such as peer-to-peer filesharing.  Syslog will be marked with IPP 4, DHCP with IPP 1, and TFTP with IPP 0.  The marking will be done inbound on R1 F0/0:

R1:
ip access-list extended DHCP
 permit udp any any eq bootps
ip access-list extended SYSLOG
 permit udp any any eq syslog
ip access-list extended TFTP
 permit udp any any eq tftp
!
class-map match-all TFTP
 match access-group name TFTP
class-map match-all SYSLOG
 match access-group name SYSLOG
class-map match-all DHCP
 match access-group name DHCP
!
policy-map Traffic-Marker
 class TFTP
  set ip precedence 0
 class DHCP
  set ip precedence 1
 class SYSLOG
  set ip precedence 4
!
interface FastEthernet0/0
 service-policy input Traffic-Marker

 

Now we can begin sending each of the traffic flows:

 

flood.pl --port=69 --size=1500 --delay=60 10.1.12.2

flood.pl --port=67 --size=1500 --delay=125 10.1.12.2

flood.pl --port=514 --size=1000 --delay=125 10.1.12.2

 

The marking policy-map on R1 verifies the amount of traffic being generated of each type and that they are being marked correctly:

wfq-test2-r1marking1

 

Next let’s look at the shaping queues on R1:

wfq-test2-r1queue

We can see that each of the weights matches what we expected:

Syslog = 32384 / (4 + 1) = 6476

TFTP = 32384 / (0 + 1) = 32384

DHCP = 32384 / (1 + 1) = 16192

Since the weight formula always uses the same numerator, the proportion of bandwidth given to a certain flow is approximately equal to the denominator (IPP +1) of that flow divided by the sum of denominators of all flows.  With a shaped rate of 96k (simulated clock rate of 96k), the bandwidth given to each flow (assuming each flow uses it’s full share that it is given) should be:

Syslog = 5/8 * 96,000 = 60,000

TFTP = 1/8 * 96,000 = 12,000

DHCP = 2/8 * 96,000 = 24,000

The policy map that we created to meter traffic on R2 confirms this:

wfq-test2-r2meter

We can even verify it down to a per-packet granularity like the 1st test.  DHCP sends the exact same sized packets as TFTP and is given half the weight (packets are scheduled twice as fast), and we can see that DHCP has sent exactly twice as many packets as TFTP (1327 * 2 = 2654).  Syslog’s packets are 1004-bytes on the serial link compared to TFTP’s 1504-byte packets.  Syslog is also given 1/5 of TFTP’s weight when assigning sequence numbers.  Therefore Syslog should be able to send (1504 / 1004) * 5, or approximately 7.49 more packets than TFTP sends.  If we calculate the number that syslog should have been able to send using the number that TFTP sent, it works out to:

(1504 / 1004) * 5 * 1327 = 9939.28

This almost exactly matches the value of 9937 that is listed.  (This very small discrepancy is probably due to the fact that the counters in the policy map simply show the results at a single point in time; at any given point syslog could have transmitted as much as 7.49 packets more or less than it should have relative to TFTP, since the ratio is 7.49 packets to 1 and only whole packets are transmitted).

Posted in QoS | Leave a Comment »

Testing Queueing Strategies with Dynamips

Posted by Andy on January 19, 2009

Dynamips works great for emulating the majority of Cisco router functions, but some things just don’t work.  I don’t know a ton about dynamips, but from my understanding it does not emulate the physical layer, and therefore the clock rate command does not actually limit the interface speed.  This creates a problem when trying to test software queueing tools, which are only utilized during congestion when the hardware queue fills, because the hardware queue never actually fills up.  From my experience, it seems that the only thing limiting output rate on an interface in dynamips is CPU usage on the computer running dynamips.  The following example demonstrates this:

R1:
interface Serial0/0
 ip address 10.1.1.1 255.255.255.0
 load-interval 30
 no keepalive
 clock rate 64000

R2:
interface Serial0/0
 ip address 10.1.1.2 255.255.255.0
 load-interval 30
 no keepalive
 clock rate 64000

r1-ping

r2-showint5

Each router’s interface has been configured with a 64kb/sec clock rate – however 1173kb/sec worth of traffic are being sent in each direction out of the interface.  There are 0 packets currently in the software queueing system and not a single packet has been dropped since the time the ping command was issued.  I don’t think there is a way to make software queueing occur using dynamips – however a good alternative for testing the various queueing tools is to use traffic shaping and enable the queueing tool on the shaping queues.  Each of the IOS shaping tools vary in which queueing tools they support on the shaping queues – so different types of shaping may need to be used depending on which queueing tool you are trying to test.  The queueing schemes supported by each shaping tool are:

 

GTS – WFQ

FRTS – FIFO, WFQ, CBWFQ, LLQ, PQ, CQ

Class-based Shaping – FIFO, WFQ, CBWFQ, LLQ

 

A couple limitations that initially come to mind are:

1. The throughput in dynamips is limited

2. The actual rate at which traffic is sent is not limited – only the time at which traffic starts to be sent is limited

The first limitation can be dealt with by shaping to a low rate (simulating a low clock rate) so that the traffic being sent through the dynamips router is higher than the shaped rate and causing the shape queues to fill.  The amount of traffic that can be sent through a dynamips network will probably vary depending on the number of routers running in dynamips and how good the PC running dynamips is.

The second limitation is a little bit more tricky.  An interface that is clocked at 64,000 bps should place 1 bit on the wire every 1/64,000 of a second.  A 1500-byte packet would take approximately 187 ms to be serialized onto such an interface (1500*8/64000).  However, if we were to configure shaping in dynamips to a rate of 64000 with a Bc of 12000 (causing Tc to be 187 ms), the packet would be sent as quickly as the PC’s CPU allows – approximately 1,200,000 bps in the simple network shown above on my PC – and then the interface would remain idle until Tc expired and the bucket was refilled.  In this case, the packet would be sent in approximately 10 ms (1500*8/1,200,000) and then no bits would be sent for 177 ms.  Now consider if instead of a 1500-byte packet, 2x 750 byte packets were being sent.  On the physical interface clocked at 64,000 bps, each packet would take roughly 93 ms (750*8/64,000) to serialize and would experience very low jitter.  On the dynamips router, each packet would take roughly 5 ms to be sent (750*8/1,200,000).  The first 2 packets would arrive 5 ms apart, but the 3rd would have to wait another 177 ms for Tc to expire, resulting in a huge amount of delay and jitter for the traffic.  This is similar to the problem that occurs when shaping is enabled on an interface where the clock rate is significantly faster than the shaped rate.  The best solution to this is to configure Bc so that Tc is set to a low value (less than 10 ms if possible). 

If you’re looking for a very accurate way to measure delay or jitter for a specific queueing scheme as if it were being used for software queueing, this probably won’t cut it even with a low Tc value.  However, if you’re looking to test the logic used by a queueing tool’s scheduling process or how a queueing tool divides bandwidth among different flows, using queueing on the shaping queues in dynamips works great.  This also allows other QoS tools that depend on congestion occuring – such as WRED – to be tested in dynamips.

Posted in QoS | Leave a Comment »

QoS Policy Propagation with BGP

Posted by Andy on January 14, 2009

QoS Policy Propagation with BGP (QPPB) allows an IP Precedence value and/or QoS-group value to be associated with a BGP route entry.  This allows QoS policies to easily be applied within and between autonomous systems.  Suppose an ISP (AS 23) has 2 customers (AS 4 and AS 5) as shown below in the diagram:

qppb-topology3

Customer 1  (AS 4) is paying for premium service and should have all traffic to and from their networks marked as IP Precedence 2.  Customer 2 (AS 5) is paying for the standard level of service and should have all traffic to and from their networks marked as IP Precedence 0.  AS 1 is used as a remote network that both Customer 1 and Customer 2 are communicating with so that we can test the results of the QPPB configuration.  The initial configurations of each router are:

R1:
interface Loopback0
 ip address 1.1.1.1 255.255.255.0
!
interface Serial0/0
 ip address 192.168.12.1 255.255.255.0
!
router bgp 1
 network 1.1.1.0 mask 255.255.255.0
 neighbor 192.168.12.2 remote-as 23

R2:
interface Serial0/0
 ip address 192.168.23.2 255.255.255.0
!
interface Serial0/1
 ip address 192.168.12.2 255.255.255.0
!
router bgp 23
 neighbor 192.168.12.1 remote-as 1
 neighbor 192.168.23.3 remote-as 23
 neighbor 192.168.23.3 next-hop-self
!
ip bgp-community new-format

R3:
interface Serial0/0
 ip address 192.168.23.3 255.255.255.0
!
interface Serial0/1
 ip address 192.168.34.3 255.255.255.0
!
interface Serial0/2
 ip address 192.168.35.3 255.255.255.0
!
router bgp 23
 neighbor 192.168.23.2 remote-as 23
 neighbor 192.168.23.2 next-hop-self
 neighbor 192.168.34.4 remote-as 4
 neighbor 192.168.35.5 remote-as 5
!
ip bgp-community new-format

R4:
interface Loopback0
 ip address 4.4.4.4 255.255.255.0
!
interface Serial0/0
 ip address 192.168.34.4 255.255.255.0
!
router bgp 4
 network 4.4.4.0 mask 255.255.255.0
 neighbor 192.168.34.3 remote-as 23

R5:
interface Loopback0
 ip address 5.5.5.5 255.255.255.0
!
interface Serial0/0
 ip address 192.168.35.5 255.255.255.0
!
router bgp 5
 network 5.5.5.0 mask 255.255.255.0
 neighbor 192.168.35.3 remote-as 23

First, we’ll configure R3 to set the community attribute of routes received from Customer 1 to 4:23 and Customer 2 to 5:23:


R3:
route-map Customer1 permit 10
 set community 4:23
!
route-map Customer2 permit 10
 set community 5:23
!
router bgp 23
 neighbor 192.168.34.4 route-map Customer1 in
 neighbor 192.168.35.5 route-map Customer2 in
 
We can verify that R3 is setting the community string appropriately for these routes:  qppb-r3-community

Alternatively, we could have used an access list to match specific routes or used an AS path access list to match on the AS path attribute.  The community attribute is not sent by default, so R2 has no knowledge of what R3 has set it to:

 qppb-r2-nocommunitystring1

To configure the community string to be sent, use the neighbor send-community command on R3 and then verify that R2 is now receiving it:

R3:
router bgp 23
 neighbor 192.168.23.2 send-community

qppb-r2-communitystring

Now we can configure R2 to mark these routes with the appropriate policy by matching the community attribute:

R2:
ip community-list 4 permit 4:23
ip community-list 5 permit 5:23
!
route-map QPPB-test permit 10
 match community 4
 set ip precedence 2
!
route-map QPPB-test permit 20
 match community 5
 set ip precedence 0
!
router bgp 23
 table-map QPPB-test

 

This causes R2 to apply the route-map to the BGP routes.  The results are stored in the Forwarding Information Base (FIB) used by CEF, which allows for IP Precedence and/or QoS-group information to be stored for a route.  We can verify that the routes have been marked on R2:

 

qppb-r2-cef

At this point, the routes have been marked correctly in the FIB, but no packets will be marked until we configure R2 to apply the policy to incoming traffic on an interface.  The bgp-policy command is used to apply QPPB policy to packets entering an interface.  When a packet enters an interface with bgp-policy configured, the router can perform a FIB lookup on either the source address, destination address, or both.  However if both are used on the same interface, the router will perform 2 lookups, first on the source and then on the destination.  The packet will be classified first based on the source and then reclassified based on the destination.  To apply the appropriate policy on R2 to traffic being sent from Customer 1 and 2, we need to perform a source lookup on traffic entering S0/0, and a destination lookup on traffic entering S0/1:

R2:
interface Serial0/0
 bgp-policy source ip-prec-map
!
interface Serial0/1
 bgp-policy destination ip-prec-map

To verify that QPPB is marking packets appropriately, we will create a policy-map on R1 and R3 with 3 separate classes to match ICMP traffic with IP precedence 0, 2, and all other values.  Then we will apply the policy-map on R1 S0/0 and R3 S0/0 in each direction to check the markings being applied to traffic sent through R2: 

R1:
ip access-list extended ICMP-Prec0
 permit icmp any any precedence routine
ip access-list extended ICMP-Prec2
 permit icmp any any precedence immediate
ip access-list extended ICMP-PrecX
 deny icmp any any precedence routine
 deny icmp any any precedence immediate
 permit icmp any any
!
class-map match-all Prec0
 match access-group name ICMP-Prec0
class-map match-all Prec2
 match access-group name ICMP-Prec2
class-map match-all PrecX
 match access-group name ICMP-PrecX
!
policy-map CheckMarkings
 class Prec0
 class Prec2
 class PrecX
!
interface Serial0/0
 service-policy input CheckMarkings
 service-policy output CheckMarkings

 
R3:
ip access-list extended ICMP-Prec0
 permit icmp any any precedence routine
ip access-list extended ICMP-Prec2
 permit icmp any any precedence immediate
ip access-list extended ICMP-PrecX
 deny icmp any any precedence routine
 deny icmp any any precedence immediate
 permit icmp any any
!
class-map match-all Prec0
 match access-group name ICMP-Prec0
class-map match-all Prec2
 match access-group name ICMP-Prec2
class-map match-all PrecX
 match access-group name ICMP-PrecX
!
policy-map CheckMarkings
 class Prec0
 class Prec2
 class PrecX
!
interface Serial0/0
 service-policy input CheckMarkings
 service-policy output CheckMarkings

 

First we’ll test traffic in each direction between Customer 1 and AS 1 and clear the counters after each test:

qppb-r4-r1-ping

qppb-r4-r1-ping-r3

qppb-r4-r1-ping-r1

qppb-r1-r4-ping

qppb-r1-r4-ping-r1

qppb-r1-r4-ping-r3

As we can see, traffic between AS 1 and Customer 1 is remarked from it’s default precedence of 0 to a precedence of 2 in each direction.  Next let’s test the policy between AS 1 and Customer 2.  We’ll try having AS 1 and Customer 2 mark their traffic as precedence 4 and see if they can fool the ISP into getting better service:

qppb-r5-r1-ping1

qppb-r5-r1-ping-r31

qppb-r5-r1-ping-r1

qppb-r1-r5-ping

qppb-r1-r5-ping-r1

qppb-r1-r5-ping-r3

The traffic is remarked down to precedence 0, just as the policy specified.

Posted in BGP, QoS | Leave a Comment »