Problem with VRRP-A and Virtual Server IP (VIP)

Hi everyone:


We have a problem with our Thunder 930 cluster (two A10 Thunder 930 in a L3V scenario with VRRP-A and VCS). There are some partitions created on the cluster and we are seeing a strange behaviour when we change the priority of VRRP-A in any of this partitions.


When we decrease the priority of VRRP-A on an active partition, the MAC address of the floatings IP's changes correctly to the standby device, and communication to the server side and to the upstream router works fine. But, Virtual Server IP's (VIPs) stop working.  This means that below the custer everyting is ok as wells as upstream, but no communication or service can be established thourgh the VIP's.


We see that the packets with destination to any of the VIPs, are routed-back to the upstream router by the new active device. Obviously, the upstream router send again packets to the active device, creating a L3 loop.


However, if instead of decreasing the priority, we shutdown all interfaces on the active device (or shutdown the actual active device), then the failover to standby device occurs and everything works OK (including VIPs).


We have 4.1.1-P6 (build 62) version in both devices. Last upgrade done in 2018 from 2.7.1-GR1 (build 58) to the running one.


Has anyone had a behavior similar to this?


Below is an example of configuring one of the problem partitions.



interface ve 1/201 

 name SERVER-SIDE 

 ip address 10.10.10.2 255.255.255.0 

!

interface ve 1/301 

 name UPSTREAM

 ip address 20.20.20.2 255.255.255.0 

!

interface ve 2/201 

 name SERVER-SIDE

 ip address 10.10.10.3 255.255.255.0 

!

interface ve 2/301 

 name UPSTREAM

 ip address 20.20.20.3 255.255.255.0 


vlan 1/201 

 tagged trunk 10

 router-interface ve 201 

 name SERVER-SIDE

!    

vlan 1/301 

 tagged trunk 10

 router-interface ve 301 

 name UPSTREAM

!    

vlan 2/201 

 tagged trunk 9

 router-interface ve 201 

 name SERVER-SIDE

!    

vlan 2/301 

 tagged trunk 9

 router-interface ve 301 

 name UPSTREAM


vrrp-a vrid 5 

 floating-ip 10.10.10.1 

 floating-ip 20.20.20.1 

 device-context 1

  blade-parameters 

   priority 200 

   tracking-options 

    vlan 301 timeout 5 priority-cost 100 

    vlan 201 timeout 5 priority-cost 100 

 device-context 2

  blade-parameters 

   priority 150 

   tracking-options 

    vlan 301 timeout 5 priority-cost 100 

    vlan 201 timeout 5 priority-cost 100 


device-context 1

 ip route 0.0.0.0 /0 20.20.20.20 

!

device-context 2

 ip route 0.0.0.0 /0 20.20.20.20


Thanks in advance

Tagged:

Comments

  • mdunnmdunn Member ✭✭

    Hello - This is an interesting scenario... One thought that comes to mind - I see you're using VRID 5 for VRRP. Do you have vrid 5 assigned to the VIPs that fail to failover?

    slb virtual-server test 192.168.1.20
      vrid 5
    

    Mike

  • First of all, thank's for your reply, Mike.


    About your suggestion, we don´t have that command in the virtual-server configuration. When we try to configure it, we receive this error:


    "You are trying to change the vrid of virtual server with nat pool on it, please delete nat pool first !"
    


    We use source-nat in our L3V scenario because servers in that partition need to reach virtual servers (VIPs) in the same partition.


    So, we try to remove the NAT configuration first, then configure "vrid xx" on virtual-server and then configure again the NAT, but we receive this error:

    "Invalid HA ID specified."
    

    Using the same image from the last post, this is an example of one of our slb configurations.


    slb server server-01 10.10.10.5 

     port 443 tcp


    slb server server-02 10.10.10.6 

     port 443 tcp


    slb service-group SG-test_443 tcp 

     method least-connection 

     extended-stats 

     member server-01 443 

     member server-02 443


    slb virtual-server VIP-test 30.30.30.10

     extended-stats

     port 443 tcp

      ha-conn-mirror

      extended-stats

      access-list 100 source-nat-pool source_nat

      service-group SG-test_443


    access-list 100 remark acl_source_nat

    access-list 100 permit ip 20.20.20.0 0.0.0.255 any


    ip nat pool source_nat 30.30.30.254 30.30.30.254 netmask /24


    What do you think is the mistake?

    Thank's in advance...

  • mdunnmdunn Member ✭✭

    In the instance of NAT pool, you will also need to set the VRID for the pool to match the VIP:

    ip nat pool source_nat 30.30.30.254 30.30.30.254 netmask /24 vrid 4
    

    After that, you should be allowed to bind the pool to the vPort and test.

    Hope this helps!

  • @mdunn, thanks for your reply.


    Indeed, when we applied the command you suggested, the VIPs "switched" from the active context to standby. according to VRRP status.


    However, this causes all traffic to the VIPs to be "nated", changing the source IP of all packets. Our intention was that "source-NAT" would only be applied when a server in a context wanted to query a VIP of the same context.


    In other words (using the example above), that the source-NAT is applied only when a server with IP 10.10.10.x needs to access a VIP 30.30.30.x. When other IPs query the VIP, we don't want "source-nat" to apply.


    Is there any possibility of doing both? That the VIPs switch to standby according to the VRRP, but that the source-nat is applied to a group of source IPs, but not to all the packets that reach the VIP.


    Thank's in advance...

  • mdunnmdunn Member ✭✭

    Strange, I would expect the "access-list 100 source-nat-pool source_nat" config under the vPort to function the same regardless of the vrid assignment for the VIP. Would you mind sharing the current config of the VIP?

  • thank's @mdunn

    Using the same image from the previous post, this is an example of one of our slb configurations.


    slb server server-01 10.10.10.5 
     port 443 tcp
    
    slb server server-02 10.10.10.6 
     port 443 tcp
    
    
    slb service-group SG-test_443 tcp 
     method least-connection 
     extended-stats 
     member server-01 443 
     member server-02 443
    
    slb virtual-server VIP-test 30.30.30.10
     extended-stats
     port 443 tcp
      ha-conn-mirror
      extended-stats
      access-list 100 source-nat-pool source_nat
      service-group SG-test_443
    
    access-list 100 remark acl_source_nat
    access-list 100 permit ip 20.20.20.0 0.0.0.255 any
    
    ip nat pool source_nat 30.30.30.254 30.30.30.254 netmask /24
    


    When we try to use "ip nat pool source_nat 30.30.30.254 30.30.30.254 netmask / 24 vrid 4", all traffic to the VIP is nated. But we need NAT only when a server in a context needs to reach a VIP on the same context. User traffic should not be nated.


    Any clue to what's going on?

  • mdunnmdunn Member ✭✭

    Hello,

    I tested this in my lab with version 4.1.4-GR1-P2 as well as 4.1.1-P12. In both scenarios, if the source IP missed the ACL, the traffic was not NAT'd. When I added the subnet of the source IP to the ACL, the traffic was NAT'd. Perhaps this is a bug with 4.1.1-P6, or there is something missing with your config. Here's what I tested:

    vrrp-a common
      device-id 1
      set-id 1
      disable-default-vrid
      enable
    !
    vrrp-a vrid 4
    !
    ip nat pool test 192.168.1.2 192.168.1.2 netmask /32 vrid 4
    !
    access-list 100 permit ip 20.20.20.0 0.0.0.255 any
    !
    access-list 100 permit ip 10.0.0.0 0.255.255.255 any <-- ACL I added / removed
    !
    slb virtual-server act_l4_slb_vip 10.13.16.195
      vrid 4
      port 80 tcp
        access-list 100 source-nat-pool test
        service-group sg_act_l4_slb_vip_80_tcp
    
    

    Another thought that comes to mind is maybe you have some different settings within "slb common" which are interfering.

  • huzhiqihuzhiqi Member

    Maybe some issue.😋

Sign In or Register to comment.