The protocol standards for VRRP (and its Cisco-specific predecessor, HSRP) say that the virtual router IP address should be assigned a special MAC address, so that when there is a transition between the primary and backup routers, the other devices on the network do not need to learn a new MAC address to route traffic.
This is how the protocol is implemented on a Cisco router, and it works just great. When there is a router transition, your network switch learns that the MAC address has moved to a new port, and none of your computers on the network notice any problems.
Alas, Linux does not support using more than one MAC address on the same interface, so the Linux implementation of VRRP (used by Vyatta), keepalived, uses a "trick" instead - it issues gratuitous ARPs to all the hosts on the network whenever keepalived has transited to master or won a router election.
This is nowhere near as reliable as its supposed to be according to the standard - when using the virtual MAC address, only your network switch needs to do anything for a router transition. With the Linux keepalived, every host on your network needs to update their ARP tables for the transition to work.
I'm currently supporting a network that is converting from Cisco to Vyatta - the catch is, during the interim they're using the Cisco as the backup failover if something goes wrong with the Vyatta. But thanks to the non-standard VRRP implementation of Linux/keepalived, there are a few problems:
* when there is a transition from Cisco master to Linux master, some traffic gets lost until all the clients have updated their ARP tables, as the Cisco once it has resigned the master role will not accept traffic on the virtual MAC address that was formerly used for the master. Going the other way is OK as long as the Linux router is still up and routing, since out-of-date clients are sending to the same MAC address as the router's regular interface on that subnet, so it receives the traffic and routes it.
* sometimes the ARP tables of the network devices don't get updated fast enough, or don't get updated at all. If the transition was from the Cisco to the Linux router, then those devices are off the air. I'm still tracking this one down - I think the gratuitous ARP messages from the Linux box are either getting lost or ignored. It varies among different OS's - we haven't seen any Linux boxes fail to update their ARP caches, but have had trouble with Solaris and NetBSD.
Any ideas on how to make the Cisco/Vyatta VRRP work smoother?