Problem with VRRP-A and Virtual Server IP (VIP)
Hi everyone:
We have a problem with our Thunder 930 cluster (two A10 Thunder 930 in a L3V scenario with VRRP-A and VCS). There are some partitions created on the cluster and we are seeing a strange behaviour when we change the priority of VRRP-A in any of this partitions.
When we decrease the priority of VRRP-A on an active partition, the MAC address of the floatings IP's changes correctly to the standby device, and communication to the server side and to the upstream router works fine. But, Virtual Server IP's (VIPs) stop working. This means that below the custer everyting is ok as wells as upstream, but no communication or service can be established thourgh the VIP's.
We see that the packets with destination to any of the VIPs, are routed-back to the upstream router by the new active device. Obviously, the upstream router send again packets to the active device, creating a L3 loop.
However, if instead of decreasing the priority, we shutdown all interfaces on the active device (or shutdown the actual active device), then the failover to standby device occurs and everything works OK (including VIPs).
We have 4.1.1-P6 (build 62) version in both devices. Last upgrade done in 2018 from 2.7.1-GR1 (build 58) to the running one.
Has anyone had a behavior similar to this?
Below is an example of configuring one of the problem partitions.
interface ve 1/201
name SERVER-SIDE
ip address 10.10.10.2 255.255.255.0
!
interface ve 1/301
name UPSTREAM
ip address 20.20.20.2 255.255.255.0
!
interface ve 2/201
name SERVER-SIDE
ip address 10.10.10.3 255.255.255.0
!
interface ve 2/301
name UPSTREAM
ip address 20.20.20.3 255.255.255.0
vlan 1/201
tagged trunk 10
router-interface ve 201
name SERVER-SIDE
!
vlan 1/301
tagged trunk 10
router-interface ve 301
name UPSTREAM
!
vlan 2/201
tagged trunk 9
router-interface ve 201
name SERVER-SIDE
!
vlan 2/301
tagged trunk 9
router-interface ve 301
name UPSTREAM
vrrp-a vrid 5
floating-ip 10.10.10.1
floating-ip 20.20.20.1
device-context 1
blade-parameters
priority 200
tracking-options
vlan 301 timeout 5 priority-cost 100
vlan 201 timeout 5 priority-cost 100
device-context 2
blade-parameters
priority 150
tracking-options
vlan 301 timeout 5 priority-cost 100
vlan 201 timeout 5 priority-cost 100
device-context 1
ip route 0.0.0.0 /0 20.20.20.20
!
device-context 2
ip route 0.0.0.0 /0 20.20.20.20
Thanks in advance
Comments
Hello - This is an interesting scenario... One thought that comes to mind - I see you're using VRID 5 for VRRP. Do you have vrid 5 assigned to the VIPs that fail to failover?
Mike
First of all, thank's for your reply, Mike.
About your suggestion, we don´t have that command in the virtual-server configuration. When we try to configure it, we receive this error:
We use source-nat in our L3V scenario because servers in that partition need to reach virtual servers (VIPs) in the same partition.
So, we try to remove the NAT configuration first, then configure "vrid xx" on virtual-server and then configure again the NAT, but we receive this error:
Using the same image from the last post, this is an example of one of our slb configurations.
slb server server-01 10.10.10.5
port 443 tcp
slb server server-02 10.10.10.6
port 443 tcp
slb service-group SG-test_443 tcp
method least-connection
extended-stats
member server-01 443
member server-02 443
slb virtual-server VIP-test 30.30.30.10
extended-stats
port 443 tcp
ha-conn-mirror
extended-stats
access-list 100 source-nat-pool source_nat
service-group SG-test_443
access-list 100 remark acl_source_nat
access-list 100 permit ip 20.20.20.0 0.0.0.255 any
ip nat pool source_nat 30.30.30.254 30.30.30.254 netmask /24
What do you think is the mistake?
Thank's in advance...
In the instance of NAT pool, you will also need to set the VRID for the pool to match the VIP:
After that, you should be allowed to bind the pool to the vPort and test.
Hope this helps!
@mdunn, thanks for your reply.
Indeed, when we applied the command you suggested, the VIPs "switched" from the active context to standby. according to VRRP status.
However, this causes all traffic to the VIPs to be "nated", changing the source IP of all packets. Our intention was that "source-NAT" would only be applied when a server in a context wanted to query a VIP of the same context.
In other words (using the example above), that the source-NAT is applied only when a server with IP 10.10.10.x needs to access a VIP 30.30.30.x. When other IPs query the VIP, we don't want "source-nat" to apply.
Is there any possibility of doing both? That the VIPs switch to standby according to the VRRP, but that the source-nat is applied to a group of source IPs, but not to all the packets that reach the VIP.
Thank's in advance...
Strange, I would expect the "access-list 100 source-nat-pool source_nat" config under the vPort to function the same regardless of the vrid assignment for the VIP. Would you mind sharing the current config of the VIP?
thank's @mdunn
Using the same image from the previous post, this is an example of one of our slb configurations.
When we try to use "ip nat pool source_nat 30.30.30.254 30.30.30.254 netmask / 24 vrid 4", all traffic to the VIP is nated. But we need NAT only when a server in a context needs to reach a VIP on the same context. User traffic should not be nated.
Any clue to what's going on?
Hello,
I tested this in my lab with version 4.1.4-GR1-P2 as well as 4.1.1-P12. In both scenarios, if the source IP missed the ACL, the traffic was not NAT'd. When I added the subnet of the source IP to the ACL, the traffic was NAT'd. Perhaps this is a bug with 4.1.1-P6, or there is something missing with your config. Here's what I tested:
Another thought that comes to mind is maybe you have some different settings within "slb common" which are interfering.
Maybe some issue.😋