Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kea DHCP HA failover for "sync-timeout": 6000 doesn't occur #7458

Open
2 tasks done
tom-citizencard opened this issue May 15, 2024 · 3 comments
Open
2 tasks done

Kea DHCP HA failover for "sync-timeout": 6000 doesn't occur #7458

tom-citizencard opened this issue May 15, 2024 · 3 comments

Comments

@tom-citizencard
Copy link

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

Opensense Version: OPNsense 24.1.6-amd64

We have a straightforward setup with CARP configured for WAN and LAN which is working fine. We also set up Kia DHCP and failover from Master/primary (when it's switched off) to the backup only occurs after 5-6 unacted clients and never occurs after "sync-timeout": 60000. This was tested a few times.

To Reproduce

Steps to reproduce the behavior:

  1. Enable Kea DHCP and set up HA.
  2. Turn off "primary" Kea DHCP server, "standby" server should take over after 60 seconds (60000 milliseconds) - this doesn't happen in our tests. The failover happens only after 5-6 unacted clients.

kea-ctrl-agent.conf on both servers:

{
"Control-agent": {
    "http-host": "127.0.0.1",
    "http-port": 8000,
    "control-sockets": {
        "dhcp4": {
            "socket-type": "unix",
            "socket-name": "/var/run/kea4-ctrl-socket"
        },
        "dhcp6": {
            "socket-type": "unix",
            "socket-name": "/var/run/kea6-ctrl-socket"
        },
        "d2": {
            "socket-type": "unix",
            "socket-name": "/var/run/kea-ddns-ctrl-socket"
        }
    },
    "loggers": [
    {
        "name": "kea-ctrl-agent",
        "output_options": [
            {
                "output": "syslog"
            }
        ],
        "severity": "INFO",
        "debuglevel": 0
    }
  ]
}
}

kea-dhcp4.conf on "primary" / master:

{
    "Dhcp4": {
        "valid-lifetime": 1800,
        "interfaces-config": {
            "interfaces": ["em0"]
        },
        "lease-database": {
            "type": "memfile",
            "persist": true
        },
        "control-socket": {
            "socket-type": "unix",
            "socket-name": "/var/run/kea4-ctrl-socket"
        },
        "loggers": [
            {
                "name": "kea-dhcp4",
                "output_options": [
                    {
                        "output": "syslog"
                    }
                ],
                "severity": "INFO"
            }
        ],
        "subnet4": [
            {
                "id": 1,
                "subnet": "192.168.222.0/24",
                "option-data": [
                    {
                        "name": "domain-name-servers",
                        "data": "192.168.222.1"
                    },
                    {
                        "name": "routers",
                        "data": "192.168.222.1"
                    },
                    {
                        "name": "ntp-servers",
                        "data": "192.168.222.1"
                    },
                    {
                        "name": "domain-name",
                        "data": "citi.intranet"
                    }
                ],
                "pools": [
                    { "pool": "192.168.222.20 - 192.168.222.245" }
                ],
                "reservations": [
                    {
                        "hw-address": "[mac]",
                        "ip-address": "192.168.222.2",
                        "hostname": "OPNsense1.citi.intranet"
                    },
                    {
                        "hw-address": "[mac]",
                        "ip-address": "192.168.222.3",
                        "hostname": "OPNsense2.citi.intranet"
                    },
                    {
                        "hw-address": "[mac]",
                        "ip-address": "192.168.222.6",
                        "hostname": "srvr-2.citi.intranet"
                    },
                    {
                        "hw-address": "[mac]",
                        "ip-address": "192.168.222.5",
                        "hostname": "srvr-1.citi.intranet"
                    },
                    {
                        "hw-address": "[mac]",
                        "ip-address": "192.168.222.7",
                        "hostname": "srvr-3.citi.intranet"
                    }
                ]
            }
        ]
        ,"hooks-libraries": [
            {
                "library": "/usr/local/lib/kea/hooks/libdhcp_lease_cmds.so",
                "parameters": { }
            },
            {
                "library": "/usr/local/lib/kea/hooks/libdhcp_ha.so",
                "parameters": {
                    "high-availability": [ {
                        "this-server-name": "OPNsense1",
                        "mode": "hot-standby",
                        "heartbeat-delay": 10000,
                        "max-response-delay": 60000,
                        "max-ack-delay": 5000,
                        "max-unacked-clients": 5,
                        "sync-timeout": 60000,
                        "peers": [
                            {
                                "name": "OPNsense1",
                                "role": "primary",
                                "url": "http://192.168.222.2:8001/"
                            },
                            {
                                "name": "OPNsense2",
                                "role": "standby",
                                "url": "http://192.168.222.3:8001/"
                            }
                        ]
                    } ]
                }
            }
        ]
    }
}

kea-dhcp4.conf on "standby" / backup:

{
    "Dhcp4": {
        "valid-lifetime": 1800,
        "interfaces-config": {
            "interfaces": ["em0"]
        },
        "lease-database": {
            "type": "memfile",
            "persist": true
        },
        "control-socket": {
            "socket-type": "unix",
            "socket-name": "/var/run/kea4-ctrl-socket"
        },
        "loggers": [
            {
                "name": "kea-dhcp4",
                "output_options": [
                    {
                        "output": "syslog"
                    }
                ],
                "severity": "INFO"
            }
        ],
        "subnet4": [
            {
                "id": 1,
                "subnet": "192.168.222.0/24",
                "option-data": [
                    {
                        "name": "domain-name-servers",
                        "data": "192.168.222.1"
                    },
                    {
                        "name": "routers",
                        "data": "192.168.222.1"
                    },
                    {
                        "name": "ntp-servers",
                        "data": "192.168.222.1"
                    },
                    {
                        "name": "domain-name",
                        "data": "citi.intranet"
                    }
                ],
                "pools": [
                    { "pool": "192.168.222.20 - 192.168.222.245" }
                ],
                "reservations": [
                    {
                        "hw-address": "[mac]",
                        "ip-address": "192.168.222.2",
                        "hostname": "OPNsense1.citi.intranet"
                    },
                    {
                        "hw-address": "[mac]",
                        "ip-address": "192.168.222.3",
                        "hostname": "OPNsense2.citi.intranet"
                    },
                    {
                        "hw-address": "[mac]",
                        "ip-address": "192.168.222.6",
                        "hostname": "srvr-2.citi.intranet"
                    },
                    {
                        "hw-address": "[mac]",
                        "ip-address": "192.168.222.5",
                        "hostname": "srvr-1.citi.intranet"
                    },
                    {
                        "hw-address": "[mac]",
                        "ip-address": "192.168.222.7",
                        "hostname": "srvr-3.citi.intranet"
                    }
                ]
            }
        ]
        ,"hooks-libraries": [
            {
                "library": "/usr/local/lib/kea/hooks/libdhcp_lease_cmds.so",
                "parameters": { }
            },
            {
                "library": "/usr/local/lib/kea/hooks/libdhcp_ha.so",
                "parameters": {
                    "high-availability": [ {
                        "this-server-name": "OPNsense2",
                        "mode": "hot-standby",
                        "heartbeat-delay": 10000,
                        "max-response-delay": 60000,
                        "max-ack-delay": 5000,
                        "max-unacked-clients": 5,
                        "sync-timeout": 60000,
                        "peers": [
                            {
                                "name": "OPNsense1",
                                "role": "primary",
                                "url": "http://192.168.222.2:8001/"
                            },
                            {
                                "name": "OPNsense2",
                                "role": "standby",
                                "url": "http://192.168.222.3:8001/"
                            }
                        ]
                    } ]
                }
            }
        ]
    }
}

Expected behavior

If "primary" Kea DHCP server is unavailable, after 60000 milliseconds (as by default "sync-timeout": 60000) "standby" DHCP server failover should occur and "standby" should take over and start serving leases.

Relevant log files

log.txt

Environment

Opensense Version: OPNsense 24.1.6-amd64

@AdSchellevis
Copy link
Member

not sure if this is new, but looking at https://kea.readthedocs.io/en/latest/arm/hooks.html#hot-standby-configuration "auto-failover": true might be missing. should be rather easy to test locally.

@tom-citizencard
Copy link
Author

not sure if this is new, but looking at https://kea.readthedocs.io/en/latest/arm/hooks.html#hot-standby-configuration "auto-failover": true might be missing. should be rather easy to test locally.

Just tried and updated config for "peers" section on both servers to:

"peers": [
                            {
                                "name": "OPNsense1",
                                "role": "primary",
                                "url": "http://192.168.222.2:8001/",
				"auto-failover": true
                            },
                            {
                                "name": "OPNsense2",
                                "role": "standby",
                                "url": "http://192.168.222.3:8001/",
				"auto-failover": true
                            }
                        ]

Then service was restarted on both (configs were checked after that to ensure OPNsense UI hasn't replaced the changes), "primary" machine was switched off but unfortunately this doesn't solve the problem - failover still doesn't occur automatically on the "standby" even after waiting 10 minutes. It does occur if I restart Kea service on the "standby" machine or 5-6 clients are unacted.

Done some research online and some people suggest to use "max-unacked-clients": 0 but this doesn't seem like a good solution to me as you risk "standby" taking over when "primary" isn't truly unavailable which might result in duplicate leases.

@AdSchellevis
Copy link
Member

Kea seems to be challenging at least unfortunately, if there is an idea of options to add or change, just ping me.

There's not much we can do at this stage I'm afraid (last feature we tried to add didn't appear to be working either for "reasons"), kea's feature set looks large at a first glance, but the functional part appears to be much smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants