Problem with VRRP-A and Virtual Server IP (VIP)

REDES_STI_UCHILE · February 2020

Hi everyone:

We have a problem with our Thunder 930 cluster (two A10 Thunder 930 in a L3V scenario with VRRP-A and VCS). There are some partitions created on the cluster and we are seeing a strange behaviour when we change the priority of VRRP-A in any of this partitions.

When we decrease the priority of VRRP-A on an active partition, the MAC address of the floatings IP's changes correctly to the standby device, and communication to the server side and to the upstream router works fine. But, Virtual Server IP's (VIPs) stop working. This means that below the custer everyting is ok as wells as upstream, but no communication or service can be established thourgh the VIP's.

We see that the packets with destination to any of the VIPs, are routed-back to the upstream router by the new active device. Obviously, the upstream router send again packets to the active device, creating a L3 loop.

However, if instead of decreasing the priority, we shutdown all interfaces on the active device (or shutdown the actual active device), then the failover to standby device occurs and everything works OK (including VIPs).

We have 4.1.1-P6 (build 62) version in both devices. Last upgrade done in 2018 from 2.7.1-GR1 (build 58) to the running one.

Has anyone had a behavior similar to this?

Below is an example of configuring one of the problem partitions.

interface ve 1/201

name SERVER-SIDE

ip address 10.10.10.2 255.255.255.0

!

interface ve 1/301

name UPSTREAM

ip address 20.20.20.2 255.255.255.0

!

interface ve 2/201

name SERVER-SIDE

ip address 10.10.10.3 255.255.255.0

!

interface ve 2/301

name UPSTREAM

ip address 20.20.20.3 255.255.255.0

vlan 1/201

tagged trunk 10

router-interface ve 201

name SERVER-SIDE

!

vlan 1/301

tagged trunk 10

router-interface ve 301

name UPSTREAM

!

vlan 2/201

tagged trunk 9

router-interface ve 201

name SERVER-SIDE

!

vlan 2/301

tagged trunk 9

router-interface ve 301

name UPSTREAM

vrrp-a vrid 5

floating-ip 10.10.10.1

floating-ip 20.20.20.1

device-context 1

blade-parameters

priority 200

tracking-options

vlan 301 timeout 5 priority-cost 100

vlan 201 timeout 5 priority-cost 100

device-context 2

blade-parameters

priority 150

tracking-options

vlan 301 timeout 5 priority-cost 100

vlan 201 timeout 5 priority-cost 100

device-context 1

ip route 0.0.0.0 /0 20.20.20.20

!

device-context 2

ip route 0.0.0.0 /0 20.20.20.20

Thanks in advance

mdunn · February 2020

Hello - This is an interesting scenario... One thought that comes to mind - I see you're using VRID 5 for VRRP. Do you have vrid 5 assigned to the VIPs that fail to failover?

slb virtual-server test 192.168.1.20
  vrid 5

Mike

REDES_STI_UCHILE · February 2020

First of all, thank's for your reply, Mike.

About your suggestion, we don´t have that command in the virtual-server configuration. When we try to configure it, we receive this error:

"You are trying to change the vrid of virtual server with nat pool on it, please delete nat pool first !"

We use source-nat in our L3V scenario because servers in that partition need to reach virtual servers (VIPs) in the same partition.

So, we try to remove the NAT configuration first, then configure "vrid xx" on virtual-server and then configure again the NAT, but we receive this error:

"Invalid HA ID specified."

Using the same image from the last post, this is an example of one of our slb configurations.

slb server server-01 10.10.10.5

port 443 tcp

slb server server-02 10.10.10.6

port 443 tcp

slb service-group SG-test_443 tcp

method least-connection

extended-stats

member server-01 443

member server-02 443

slb virtual-server VIP-test 30.30.30.10

extended-stats

port 443 tcp

ha-conn-mirror

extended-stats

access-list 100 source-nat-pool source_nat

service-group SG-test_443

access-list 100 remark acl_source_nat

access-list 100 permit ip 20.20.20.0 0.0.0.255 any

ip nat pool source_nat 30.30.30.254 30.30.30.254 netmask /24

What do you think is the mistake?

Thank's in advance...

mdunn · February 2020

In the instance of NAT pool, you will also need to set the VRID for the pool to match the VIP:

ip nat pool source_nat 30.30.30.254 30.30.30.254 netmask /24 vrid 4

After that, you should be allowed to bind the pool to the vPort and test.

Hope this helps!

REDES_STI_UCHILE · April 2020

@mdunn, thanks for your reply.

Indeed, when we applied the command you suggested, the VIPs "switched" from the active context to standby. according to VRRP status.

However, this causes all traffic to the VIPs to be "nated", changing the source IP of all packets. Our intention was that "source-NAT" would only be applied when a server in a context wanted to query a VIP of the same context.

In other words (using the example above), that the source-NAT is applied only when a server with IP 10.10.10.x needs to access a VIP 30.30.30.x. When other IPs query the VIP, we don't want "source-nat" to apply.

Is there any possibility of doing both? That the VIPs switch to standby according to the VRRP, but that the source-nat is applied to a group of source IPs, but not to all the packets that reach the VIP.

Thank's in advance...

mdunn · April 2020

Strange, I would expect the "access-list 100 source-nat-pool source_nat" config under the vPort to function the same regardless of the vrid assignment for the VIP. Would you mind sharing the current config of the VIP?

REDES_STI_UCHILE · April 2020

thank's @mdunn

Using the same image from the previous post, this is an example of one of our slb configurations.

slb server server-01 10.10.10.5 
 port 443 tcp

slb server server-02 10.10.10.6 
 port 443 tcp


slb service-group SG-test_443 tcp 
 method least-connection 
 extended-stats 
 member server-01 443 
 member server-02 443

slb virtual-server VIP-test 30.30.30.10
 extended-stats
 port 443 tcp
  ha-conn-mirror
  extended-stats
  access-list 100 source-nat-pool source_nat
  service-group SG-test_443

access-list 100 remark acl_source_nat
access-list 100 permit ip 20.20.20.0 0.0.0.255 any

ip nat pool source_nat 30.30.30.254 30.30.30.254 netmask /24

When we try to use "ip nat pool source_nat 30.30.30.254 30.30.30.254 netmask / 24 vrid 4", all traffic to the VIP is nated. But we need NAT only when a server in a context needs to reach a VIP on the same context. User traffic should not be nated.

Any clue to what's going on?

mdunn · April 2020

Hello,

I tested this in my lab with version 4.1.4-GR1-P2 as well as 4.1.1-P12. In both scenarios, if the source IP missed the ACL, the traffic was not NAT'd. When I added the subnet of the source IP to the ACL, the traffic was NAT'd. Perhaps this is a bug with 4.1.1-P6, or there is something missing with your config. Here's what I tested:

vrrp-a common
  device-id 1
  set-id 1
  disable-default-vrid
  enable
!
vrrp-a vrid 4
!
ip nat pool test 192.168.1.2 192.168.1.2 netmask /32 vrid 4
!
access-list 100 permit ip 20.20.20.0 0.0.0.255 any
!
access-list 100 permit ip 10.0.0.0 0.255.255.255 any <-- ACL I added / removed
!
slb virtual-server act_l4_slb_vip 10.13.16.195
  vrid 4
  port 80 tcp
    access-list 100 source-nat-pool test
    service-group sg_act_l4_slb_vip_80_tcp

Another thought that comes to mind is maybe you have some different settings within "slb common" which are interfering.

huzhiqi · April 2020

Maybe some issue.😋

Problem with VRRP-A and Virtual Server IP (VIP)

Comments