Why is My Tunnel Bouncing Up and Down Every 30 Seconds?
When you have your underlay (tunnel peers) and overlay (tunneled traffic)both in the same route protocol and same route table (VRF) you have a potential issue. Add in a dynamic routing protocol, and the conditions are set for the tunnel peers to possibly learn a route from the remote partner that can over-write the next hop of the remote tunnel endpoint itself.
If you are relying on a default (0/0) route for peer A to route tunneled packets across the internet (or any ip fabric) to peer B then ANY route you learn for the tunnel endpoint of B that can be installed as the best path to Router B
on Router A [that uses the tunnel between A and B itself] creates the recursion.
The GRE packets get routed across the GRE tunnel, instead of across the Internet. The tunnel times out, and then goes down. Once the tunnel goes down the OSPF or BGP session/state across the tunnel times out, and once this happens the dynamic routes learned across the tunnel get removed.
Then the Router A can suddenly talk GRE to Router B again, across the Internet. Then the tunnel comes back up, then OSPF adj or BGP peering re-establishes 10 to 20 sec later, and the process repeats.
How to Prevent This ?
Use a different route protocol for overlay vs underlay. I.e. could tunnel IPv4 (underlay) inside an IPv6 tunnel. Or IPv4 inside MPLS.
Or use 2 different route tables. In the enterprise world you would typically place your Internet facing interface in a private VRF, map a 0/0 route to your public IP gateway. Then when you build your tunnel the tunnel endpoints will be in the public VRF (underlay). The overlay (what is tunneled) would be in the main/global table.
In the Service Provider (SP) world typically the main table would have the underlay, and client networks might be carried in private VRFs.
Note:If both the underlay and overlay are carried in the same table we can fix the problem with /32 tunnel peer pin routes.
Remember, IPv4 routing is based on longest prefix match for the destination prefix.
On both Router A and Router B we create /32 routes for the opposite partner. The next hop of the routes MUST be pointed at the ISP next hop.
On Router A:
route Router_B_ip/32 next-hop via Router_A_isp_gw
On Router B:
route Router_A_ip/32 next-hop via Router_B_isp_gw
The static routes have the exact prefix, /32 so they should beat all other routes. This PINS or maps the tunnel traffic (GRE packets) to ALWAYS go out the desired ISP connection (if Router A has dual ISP you can create 2 tunnels to Router B and force tunnel 1 out ISP 1 and tunnel 2 out ISP 2 this way).
The static routes are also going to have a preference of 1 which should beat an OSPF learned /32 for example across the underlay.
For enterprise networks this should never happen ( we would NEVER place the public ISP wan links/IPs of a router in OSPF). But in an SP world, if you are using GRE tunnels across an underlay it could be a case where you could have router loopbacks and links in your internal RIB that could conflict. Typically in the SP world any tunnels would be inside another VRF just to prevent this conflict. Example might be a CO-OP backbone hauling Co-OP scada/netmon traffic inside tunnels. We would VRF all of the tunnels (or use EVPN) to prevent this.