March 21, 2009

BGP Notes

BGP

Information

  • Each BGP entry takes about 240 bytes of memory in the BGP table and another 240 bytes in the IP routing table. Each path takes about 110 bytes.
  • BGP uses TCP port 179.
  • BGP is connection oriented. As long as one side can establish the connection, it will work.
  • Private AS numbers: 64512-65535 (last 1024 of 16 bit block)

BGP Route Selection Process

  • Only consider paths with reachable NEXT_HOP attributes
  • Do not consider iBGP path if not synchronized
  • Highest WEIGHT
  • Highest LOCAL_PREF
  • Prefer locally originated route4. Shortest AS_PATH
  • Lowest ORIGIN code (IGP < EGP < incomplete)
  • Lowest Multi-Exit Discriminator (MED)
    1. IF bgp deterministic-med, order the paths before comparing2. IF bgp always-compare-med, then compare it for all paths
  • Considered only if paths are from the same neighbor AS
  • Prefer an External path over an Internal one
  • Lowest IGP metric to the NEXT_HOP
  • IF multipath is enabled, the router may install up to N parallel paths in the routing table
  • For eBGP paths, select the “oldest” (To minimize route-flap)
  • Lowest Router-ID (Originator-ID is considered for reflected routes)
  • Shortest Cluster-List (Client must be aware of RR attributes!)
  • Lowest neighbor IP address

iBGP

  • The NEXT_HOP attribute is not updated when a prefix is sent to an iBGP peer. You can use "next-hop-self" on the neighbor statement to solve any issues caused by that feature.
  • Applies the rule of iBGP split-horizon - an iBGP received prefix will not be propagated to another iBGP neighbor. Hence, iBGP requires a full mesh between BGP peers. There are two ways to circumvent this requirement:
    1. Route Reflectors (route server)
  • Confederations (sub ASes)

eBGP

  • The NEXT_HOP attribute is updated when a prefix is sent to an eBGP peer (but NOT if it is a member of the same confederation).
  • By default the eBGP neighbor is assumed to be directly connected. Consequently, the "neighbor ebgp-multihop" command MUST be used if the peers are not directly connected. Note: loopbacks are never directly connected - so you need to use either ebgp-multihop or disable-connected-check when peering eBGP on loopbacks...

Route-Reflectors

  • The NEXT_HOP attribute is preserved (same as iBGP). Because of this, if a route-reflector client is peering with an eBGP peer, you will most likely have to change the next hop to reflect the address of the RRC. That is unless you redistribute the connected external routes, of course.

Confederations

  • The NEXTHOP attribute is preserved between confederation members - this is differerent from a traditional eBGP peering session.* The MULTIEXIT_DESC is preserved
  • The LOCAL_PREF is preserved

Communities

The community attribute is a transitive, optional attribute designed to group destinations in a certain community and apply certain policies (such as accept, prefer, or redistribute). The following table shows the well known BGP communities.

   <table width="100%" border="1" cellpadding="2" cellspacing="0" bgcolor="#ffffff" class="tiny">
     <tbody>
       <tr>
         <th height="" width="20%" bgcolor="#ccccff" colspan="1" rowspan="1"> 

Community

         <th height="" width="80%" bgcolor="#ccccff" colspan="1" rowspan="1">  
Description

       </tr>
       <tr valign="top">
         <td height="" colspan="1" bgcolor="#ffffff">AS:VAL</td>
         <td height="" colspan="1" bgcolor="#ffffff"> This format represents 4 octet communities value. `AS` is high order 2 octet in digit format. `VAL` is low order 2 octet in digit format. This format is useful to define AS oriented policy value. For example, <span class="code">7675:80</span> can be used when AS 7675 wants to pass local policy value 80 to neighboring peer. </td>
       </tr>
       <tr valign="top">
         <td width="20%" height="" colspan="1" rowspan="1" bgcolor="#ffffff"><span class="content">LOCAL-AS</span></td>
         <td width="80%" height="" colspan="1" rowspan="1" bgcolor="#ffffff"><span class="content">Use in confederation scenarios to prevent sending packets outside the local autonomous system (AS).
               </span><span class="code">local-AS</span><span class="content"> represents well-known communities value </span><span class="code">NO_EXPORT_SUBCONFED</span><span class="content"> (0xFFFFFF03). All routes carry this value must not be advertised to external BGP peers. Even if the neighboring router is part of confederation, it is considered as external BGP peer, so the route will not be announced to the peer. </span></td>
       </tr>
       <tr valign="top">
         <td width="20%" height="" colspan="1" rowspan="1" bgcolor="#ffffff"><span class="content">NO-EXPORT</span></td>
         <td width="80%" height="" colspan="1" rowspan="1" bgcolor="#ffffff">Do not advertise to external BGP (eBGP) peers. Keep this route within an AS. 
           <span class="code">no-export</span> represents well-known communities value <span class="code">NO_EXPORT</span> (0xFFFFFF01). All routes carry this value must not be advertised to outside a BGP confederation boundary. If neighboring BGP peer is part of BGP confederation, the peer is considered as inside a BGP confederation boundary, so the route will be announced to the peer. </td>
       </tr>
       <tr valign="top">
         <td width="20%" height="" colspan="1" rowspan="1" bgcolor="#ffffff"><span class="content">NO-ADVERTISE</span></td>
         <td width="80%" height="" colspan="1" rowspan="1" bgcolor="#ffffff"><span class="content">Do not advertise this route to any peer, internal or external. 
             </span><span class="code">no-advertise</span><span class="content"> represents well-known communities value </span><span class="code">NO_ADVERTISE</span><span class="content"> (0xFFFFFF02). All routes carry this value must not be advertise to other BGP peers. </span></td>
       </tr>
       <tr valign="top">
         <td width="20%" height="" colspan="1" rowspan="1" bgcolor="#ffffff"><span class="content">INTERNET</span></td>
         <td width="80%" height="" colspan="1" rowspan="1" bgcolor="#ffffff"><span class="content">Advertise this route to the internet community, and any router that belongs to it. 
         </span><span class="code">internet</span><span class="content"> represents well-known communities value 0. </span></td>
       </tr>
       <tr valign="top">
         <td height="" colspan="1" rowspan="1" bgcolor="#ffffff"><span class="content">NONE</span></td>
         <td height="" colspan="1" rowspan="1" bgcolor="#ffffff"><span class="content">Apply no community attribute when you want to clear the communities associated with a route.</span></td>
       </tr>
     </tbody>
   </table>

Configuration

   <table width="100%" border="0" cellpadding="0" cellspacing="2" class="tiny">
     <tr valign="top">
       <td width="50%" class="code">neighbor 1.1.1.1 remove-private-as </td>
       <td width="50%">

Remove private ASN from AS_PATH before sending to neighbor 1.1.1.1 (always done on the outbound)

     </tr>
     <tr valign="top">
       <td class="code">&nbsp;</td>
       <td>&nbsp;</td>
     </tr>
     <tr valign="top">
       <td class="code"> bgp confederation identifier 123
       <td>Make this router/AS a member of confederation 123 </td>
     </tr>
     <tr valign="top">
       <td class="code">bgp confederation peers 1 2 </td>
       <td>List the AS belonging to confederation 123</td>
     </tr>
     <tr valign="top">
       <td class="code">&nbsp;</td>
       <td>&nbsp;</td>
     </tr>
     <tr valign="top">
       <td class="code">. (period) </td>
       <td>Match any single character, including white space </td>
     </tr>
     <tr valign="top">
       <td class="code">* (asterisk) </td>
       <td>Matches 0 or more occurence of the pattern </td>
     </tr>
     <tr valign="top">
       <td class="code">+ (plus) </td>
       <td>Matches 1 or more occurence of the pattern </td>
     </tr>
     <tr valign="top">
       <td class="code">? (question mark) </td>
       <td>Matches 0 or 1 occurence of the pattern </td>
     </tr>
     <tr valign="top">
       <td class="code">^ (caret) </td>
       <td>Marks the beginning of the string </td>
     </tr>
     <tr valign="top">
       <td class="code">$ (dollar sign) </td>
       <td>Marks the end of the string </td>
     </tr>
     <tr valign="top">
       <td class="code">_ (underscore)</td>
       <td>Matches the beginning of the string, the end of the string, white space, or a delimiter (comma, brace, parenthesis)</td>
     </tr>
     <tr valign="top">
       <td class="code">[] (brackets) </td>
       <td>Designate a range of single character patterns </td>
     </tr>
     <tr valign="top">
       <td class="code">- (hyphen) </td>
       <td>Specify a range of characters </td>
     </tr>
     <tr valign="top">
       <td class="code">() (parentheses) </td>
       <td>BGP specific symbols - parentheses designate a pattern as a confederation name </td>
     </tr>
     <tr valign="top">
       <td class="code">\ (backslash) </td>
       <td>Escape character (use before character to modify)</td>
     </tr>
     <tr valign="top">
       <td class="code">.*</td>
       <td>Match anything </td>
     </tr>
     <tr valign="top">
       <td class="code">&nbsp;</td>
       <td>&nbsp;</td>
     </tr>
     <tr valign="top">
       <td class="code">^$</td>
       <td>Match any paths originating in own AS </td>
     </tr>
     <tr valign="top">
       <td class="code">^100$</td>
       <td>Match paths from directly connected AS 100 </td>
     </tr>
     <tr valign="top">
       <td class="code">_200_</td>
       <td>Match any path that has transited AS 200 </td>
     </tr>
     <tr valign="top">
       <td class="code">_300$</td>
       <td>Match all paths originating in AS 300 </td>
     </tr>
     <tr valign="top">
       <td class="code">^400_</td>
       <td>Match paths transiting directly connected AS 400 </td>
     </tr>
     <tr valign="top">
       <td width="50%" class="code">_[1234]00_</td>
       <td width="50%">

Match any path that has transited through AS 100, 200, 300 or 400

     </tr>
   </table> 

Links

BGP 4 Case Studies

Understanding Route Aggregation in BGP

BGP Best Path Selection Algorithm

BGP FAQ

 <!-- InstanceEndEditable --></div>
       <div id="footer"><!-- InstanceBeginEditable name="Footer" -->

Appendix

  • The most common reasons for a prefix not showing up in the routing table are:
    • Ther router trying to advertise the prefix doesn't have it in its routing table - the prefix doesn't make it to the BGP table on the origin router.
  • The prefix is not synchronized:
    • There must be a match for the prefix in the IP routing table in order for an internal (iBGP) path to be considered a valid path. If the matching route is learned from an OSPF neighbor, its OSPF router ID must match the BGP router ID of the iBGP neighbor (watch for this one on route reflectors). There is no existing route to the BGP next hop.
  • A multihop session will not be established if the only route to the peer is a default route. A default route is never going to be used to establish a BGP session (iBGP/eBGP), and you will see the same (no route) output in the debugs, although you will be able to ping the BGP neighbor.
  • eBGP assumes directly connected peers. If doing eBGP through a FW, NAT device, etc., make sur you use the ebgp-multihop option.
  • If peering through a PIX (with identity NAT) and doing authentication, use the norandomseq keyword to stop the PIX from offsetting the TCP sequence number by using a static (inside,outside) 172.16.11.1 172.16.11.1 netmask 255.255.255.0 norandomseq. Authentication will not work through NAT (breaks the MD5 hash).
  • Other gotchas with NAT are: (1) make sure the NAT device has routes to the networks you want to route to and (2) make sure the next hop is consistent with the NAT addresses.* With iBGP watch out for logical split horizon since an iBGP router does not re-send the routes it received from other iBGP peers to other iBGP peers. It requires a full mesh or a route-reflector.
  • In "show ip bgp" the "*>" means that the bgp route in the bgp table was installed in the routing table.
  • To make believe you are a different AS to a peer, you can use "neighbor ... local-as" or confederations.
  • Use "set origin" in a route-map to manipulate how the origin looks like in the BGP table.
  • In "show ip bgp", "*" means the prefix is valid, ">" means the prefix is installed in the routing table.
  • If you have to do filtering and the question specifies "use a method that is flexible to change in the future", use prefix-list - they do have sequence numbers (although names access-lists now support sequencing). The ip prefix-list entry 0.0.0.0/0 le 32 permits everything. To use "aggregate-address" to source a summary in your local BGP table, you must have at least one of its subnets in the routing table. Use the summary-only or suppress-map options to suppress more specific updates. If "aggregate-address summary-only" is used the sub-routes will show as suppressed in the originating router. If the question specifies that it should not be the case, use "aggregate-address" plus a filtering technique on the outbound. Route filtering can be accomplished using route-map, distribute-list (access-list), filter-list (as-path), and prefix-list (range).
  • You can use an unsuppress-map on the neighbor statement to control the behavior of an aggregate-address. However, unsupress-map and route-map cannot be used toward the same neighbor, if both are configured, the route-map will be ignored.* Do not apply both a neighbor distribute-list and a neighbor prefix-list command to a neighbor in any given direction (inbound or outbound). These two commands are mutually exclusive, and only one command (neighbor prefix-list or neighbor distribute-list) can be applied to each inbound or outbound direction.
  • By default, iBGP redistribution into IGPs (RIP, OSPF & ISIS) is disabled because it can lead to major real-life problems. To enable redistribution of iBGP routes into IGP use the bgp "redistribute-internal" command under the BGP routing process of the router where you want to perform the redistribution.
  •  

    Neighbor formation

    BGP uses TCP 179. This means that when R1 initiates connection, it sends it to port 179 on R2. R2 will respond to the random source port though. So the initiating router will have it as a random source port and answering router will have port 179.

     

    If both initiate at the same time, the higher BGP router-id is the client. Otherwise whoever initiate first is client.

     

    Key thing is that the other side send packet from the IP have you have configured in the neighbor statement. So if there are two connections between 2 routers. R1 can send packets from IP1 and receive packets from IP2, while R2 can receive on IP1 and IP2, as long as the bgp neighbors are configured properly and have update-source. Only one end is server so technically only need it on one side, since it decides whether to accept or refuse connection.

     

    ![](BGP_files/image002.jpg)

    Received address or interface doesn’t matter (as long as exists on router). Source address is key.

     

    eBGP

    eBGP sends out with TTL of 1. So if more than one hop, use ebgp-multihop

    eBGP does loop prevention by AS filtering. It will not accept path that has its own AS in it. This is done at accepting end, not advertising end. Other side will still advertise the prefix.

    eBGP updates next-hop of route advertised. It updates it to be the interface where it is peering. So if physical, then physical. If logical, logical.

     

    iBGP

    Loop prevention based on route suppression. Do not advertise iBGP learned route to another iBGP neighbor. So direct peering needed (or route reflection or confederation)

     

    There is no next hop modification. To do this, you can do one of two things

    • neighbor x.x.x.x next-hop-self
    • route-map & set ip next-hop x.x.x.x

     

    Route reflector – only thing not advertised is non-client route to non-clients.

     

    Confederations

    Private range is last 1024 of range: 64512-65535

    Inter-sub-AS is eBGP, but neighbors within sub-AS must be fully meshed or need RR. So only behavior changed is between sub-AS, not within.

     

    Router bgp sub-as

    Bgp confederation id real-as

    Bgp confederation peers sub-as

    Neighbor x.x.x.x remote-as sub-as

     

    If in confederation, we see sub-AS in brackets () in as-path

    If outside, it is transparent and they don’t see the sub-AS in there.

     

    Sub-AS do not update next hop value, even though they do eBGP peering within confederation sub-AS. eBGP peering only in the sense that send on prefix received from neighbor.

     

    Synchronization

    If you have routes in the BGP table but they are not marked as best routes, check two things:

    1)      Do I have reachability to the next-hop

    2)      Synchronization

    Issue because in BGP we only advertise our best routes.

     

    There must be a match for prefix in routing table for BGP path to be considered valid path. If matching route learned from OSPF neighbor, OSPF router ID must match BGP router ID

     

    Designed to prevent traffic black holes

     

    Synchronization Solutions:

    • run BGP everywhere – transit path knows everything
    • redistribute BGP into IGP
    • Tunnel BGP over GRE/MPLS – hiding the final destination from transit path.

     

    By default when you redistribute BGP into IGP, only eBGP routes are sent. If you want to send iBGP too, then you need **Bgp redistribute-internal**. This is so that you don’t run into loops.

     

    Path selection

    1. weight – inbound, locally significant
    2. local pref – local to AS
    3. locally origin – network, redistribute, route injection over aggregate-address
    4. AS-path – unless specifically told to ignore it. AS_SET counts as 1, no matter how many AS inside the set, and AS_CONFED’s are not included
    5. Origin – IGP over EGP over incomplete
    6. Med – if not set, assigned value 0, which is best med. Can change so missing is worst with the bgp bestpath missing-as-worst. Also only compared if first AS in both path is the same (unless disabled via **bgp always-compare-med**)
    7. eBGP over iBGP
    8. shortest internal path
    9. oldest received
    10. lowest router-id
    11. minimum cluster-list length
    12. lowest neighbor id

     

    Inbound routing policy affects the outbound traffic. Weight and local preference are set in bound and affect how we send traffic out.

     

    Outbound routing policy affects the inbound traffic. For example AS Path or MED are set outbound but affects how traffic is sent to you.

     

    Cannot apply attributes in the neighbor route-map. That is only for filtering prefix’s. Attributes are applied inbound or outbound anyway

     

    MED

    BGP compares multiple paths based on order they were received. By default, MED is only compared if the first AS in AS-PATH is the same. However, if these multiple paths were received apart, then they might not be lined up in BGP table adjacent and so in-between path might prevent their comparison.

     

    For example suppose these three path were available (listed in reverse order of reception like BGP would show):

    1)      AS Path: 500, med 150, external, RID: 3.3.3.3

    2)      AS Path: 100, med 200, external, RID: 1.1.1.1

    3)     

    AS Path: 500, med 100, internal, RID: 2.2.2.2

     

    When comparing, BGP would compare 1 & 2 first. Since AS is different, MED is ignored and it ends up deciding based on RID and picking 2.

     

    If **bgp always-compare-med** is enabled, then it will always compare the MED, regardless of AS. So in above example, it would compare MED and pick 1 due to lower MED.

     

    If **bgp deterministic-med** is enabled, then it would group the prefix based on AS and then start the comparison once it has picked the best path per AS. So they would end up being arranged as:

    1)      AS Path: 500, med 150, external, RID: 3.3.3.3

    2)     

    AS Path: 500, med 100, internal, RID: 2.2.2.2

    3)      AS Path: 100, med 200, external, RID: 1.1.1.1

    First it would compare all in AS500, picking 2 due to lower MED. Similarly compare all from AS 100 and pick 3 since it is the only one. Then compare 2 & 3 and pick 3 since it is an external route (MED is ignored since no bgp always-compare-med. If that was enabled also then 2 would have been picked).

     

    Summarization

    Aggregate-address – requires at least one subnet of aggregate installed in BGP table, not necessarily the IP routing table.

     

    Can let component subnets through in two ways. Either apply suppress-map. Or summary-only

    with an unsuppress-map. In the latter, the show ip bgp will show the components with an s to indicate suppression but if you look at the **show ip bgp nei nei adv**, you can see them being advertised. Unsuppress-map is applied in the neighbor command while suppress-map is applied in the aggregate-address command.

     

    By default aggregate-address also wipes out the AS path information. If you want to retain the AS information from the component subnets, use the as-set keyword after aggregate-address

     

    Creating an aggregate with the as-set keyword increases risk that the other AS might drop it since they see their AS in the path. So if have summary-only and as-set could be breaking reachability. Advertise-map

    comes to the rescue allowing us to filter which AS we want to allow into the AS-SET by selecting only those routes (using route-map that matches an as-path list). The aggregate is still one that encompasses all AS’s but the only ones showing in the AS-SET will be the ones matching our advertise-map criteria, so we get to “define” the AS-SET in effect. Basically advertise-map means only advertise the aggregate if these specific routes exist, and not any component route of the aggregate. Therefore, downside is that if the routes matching advertise-map disappear, then the aggregate will be removed.

     

    Another way to allow external AS to receive this aggregate with AS information is the neighbor nei allow-as-in. This allows receiving the prefix from a neighbor even if our AS is in the path.

     

    Conditional advertisement

    Used to control the direction of inbound traffic. Gives us more control than using MED/AS-PATH because in that case the other AS’s in between can override using weight/local pref and send traffic back to us via another inbound path. With conditional advertisement, there is only one way for the traffic to go since only advertised one direction. Other direction only comes out if primary path fails.

     

    Neighbor ip** advertise-map route-map1 non-exist-map route-map2**

     

    Route-map2 matches the primary link and make sure it is available. If it isn’t, then it advertises the secondary path listed in route-map1.

    Both prefixes being matched need to exist in the BGP table, not routing table.

     

    ![](BGP_files/image004.jpg)

     

    R3 is advertising the network. It wants R5 to use R1 to get to it so it as-path prepends out to R5. However R1 doesn’t like that and wants to use R5 to get to R3. So it sets a higher weight on all routes coming from R5. At this point since R5’s route is via R1, it will not advertise that back to R1, and so R1 only has one route in its table – direct via R3. So have to clear neighborship with R3 till order of operation is such that the weight kicks in first. Then R1 will use R5 to get to R3, and R5 will go direct

     

    So if R3 really wanted to control the path, and not let R1 override it, it would conditionally advertise the network only out via R1. And advertise it out R5 when the link between R3 and R1 goes down only.

     

    Conditional Route Injection

    Prefix origination

    • Network
    • Redistribution
    • Conditional route injection

     

    Originate component subnets from a received aggregate subnet for traffic engineering

    Bgp inject-map inject exist-map exist

     

    Inject-map: prefix-list to originate (set ip address prefix-list)

    Exist-map: match the aggregate prefix (match ip address prefix-list) and source of aggregate (match ip route-source prefix-list)

    The source has to be defined in a prefix-list (/32 mask obviously)

     

    Outbound route filtering

    Traditional route filtering, upstream provider sends

    • full bgp table
    • default only
    • default plus local
    • no complex view

     

    Inbound filtering on downstream neighbor is useless since full view must still be processed by BGP inbound RIB (there is inbound and outbound RIB)

     

    ORF allows downstream neighbor to control what upstream neighbor advertises

     

    Requires the IPv4 address-family.

    **Address-family ipv4-unicast**

    Neighbor nei** activate**

     

    Upstream neighbor should be configured to receive ORF

    Neighbor downstream capability orf prefix-list receive

     

    Downstream neighbor should be configured to send ORF, and define the prefix-list

    Neighbor upstream capability orf prefix-list send

    Neighbor upstream prefix-list prefix in

     

    To clear the bgp advertisement, you have to clear ip bgp * in prefix

     

    Communities

    Used to group prefixes

    Not sent to any neighbor by default. Need send-community command.

     

    Match using an ip community-list which is called inside route-map

    Set using set community in route-map

     

    Well known:

    • No-advertise: do not advertise to any neighbor
    • No-export: do not advertise to eBGP. will be passed between sub-AS but won’t leave main confederation)
    • Local-as: do not advertise out of sub-AS
    • Internet: like no community. Everything by default has it.

     

    New format of community only for display purposes when showing config. Ip bgp-community new-format to enable that. Value is still old way.

     

    Misc

    Order of preference – inbound

    1)      route-map

    2)      filter-list

    3)      prefix-list or distribute-list

     

    Order of preference – inbound

    1)      prefix-list or distribute-list

    2)     

    filter-list

    3)      route-map

     

    If auto-summary is enabled and you redistribute connected routes into BGP, they come in as classful.

     

    If you have **no auto-summary have a network** command to advertise prefix without a mask on it, and you don’t have the entire class, then it will not be advertised.

     

    Can define dampening per prefix using a route-map. Default values set using bgp dampening do not apply to the route-map defined by bgp dampening route-map and so you have to manually define them using **set dampening** in the route-map.

     

    Local-AS used during AS migration. Update **router bgp with the new AS number and use neighbor nei local-as** with the old AS till the neighbor updates his configuration.

    There is also a no-prepend option to the command, because by default any prefix coming in or going out to the neighbor will have the local-as prepended to it. When receiving, local-as is the most recent. When sending, local-as is again the most recent, and the router bgp AS is second.

     

    `External BGP failover is detected based on line protocol of interface, when it goes down, peer is considered down. **No bgp fast-external-fallover** makes it so that router waits the holdtime before considering peer down. Useful if flaky connection that goes up & down a lot and you don’t want peering relationship to be broken each time.

     

     

     

    Tags: Cisco CCIE study BGP