Harvester 1.6.0-RC5: Bug With Static IP Assignment In OverlayNetwork
Introduction
Hey guys! Today, we're diving into a rather critical bug found in Harvester 1.6.0-RC5 that messes with static IP assignments when using OverlayNetwork. This issue can be a real headache, especially when you're trying to set up your VMs with specific IP configurations. Let's break down the problem, how to reproduce it, what the expected behavior should be, and some potential workarounds. We'll keep it casual and friendly, just like chatting with a tech buddy.
Describe the Bug
The core issue revolves around how Kubeovn handles IP assignments in a Harvester environment. When you create a VM and attach it to a subnet, Kubeovn steps in and assigns an IP address at the logical switch level. If DHCP is enabled within your subnet, this IP is then passed into a DHCP response to the guest OS. So far, so good, right? Well, here's where things get tricky.
The Static IP Configuration Problem
The real problem pops up when you try to use static network configurations within your guest OS cloud-init config. There are two main issues at play:
- Incorrect IP Reflection in Harvester UI: The Harvester UI doesn't accurately reflect the static IP you've assigned. It stubbornly sticks to the Kubeovn-assigned IP. Now, this might seem like just a cosmetic issue, but it can lead to confusion and make troubleshooting harder.
- Connectivity Issues with Static IPs: VMs configured with static IPs can't reach certain IP addresses. They can ping and communicate with Harvester nodes within the 10.10.0.0/24 range, but they hit a brick wall when trying to reach the Harvester VIP (10.10.0.10) or the gateway IP (10.10.0.1). This is a major roadblock, as it effectively isolates these VMs from critical network services and external connectivity.
Diving into the Details
Let's look at a specific example. Imagine you're using a cloud-init config like this to set up a static IP:
version: 2
ethernets:
enp1s0:
dhcp4: false
addresses:
- 10.20.0.100/24
gateway4:
10.20.0.1
nameservers:
addresses:
- 1.1.1.1
- 8.8.8.8
You'd expect your VM to come up with the IP 10.20.0.100 and be able to access the internet, right? But here's the kicker: while the VM correctly configures its network interface within the OS, it can't reach external IPs like 8.8.8.8 or the WAN gateway (10.10.0.1) that Harvester's nodes use to communicate with the outside world. It's like being in a room with the door locked.
On the flip side, VMs that use DHCP (especially with a forced default route and DNS) can reach these endpoints without any issues. This discrepancy points to a routing or configuration problem specifically affecting static IP setups.
Here’s an example of a VM using DHCP with a fixed default route and DNS:
$ ssh [email protected]
[email protected]'s password:
Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-71-generic x86_64)
...
System information as of Thu Aug 7 14:44:14 UTC 2025
System load: 0.0 Processes: 140
Usage of /: 5.8% of 37.70GB Users logged in: 0
Memory usage: 4% IPv4 address for enp1s0: 10.20.0.7
Swap usage: 0%
...
Last login: Tue Aug 5 05:38:02 2025 from 10.10.0.50
ubuntu@test2:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=111 time=23.2 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=111 time=22.3 ms
...
ubuntu@test2:~$ ip route
default via 10.20.0.1 dev enp1s0 proto dhcp src 10.20.0.7 metric 100
10.10.0.10 via 10.20.0.1 dev enp1s0 proto dhcp src 10.20.0.7 metric 100
10.20.0.0/24 dev enp1s0 proto kernel scope link src 10.20.0.7 metric 100
10.20.0.1 dev enp1s0 proto dhcp scope link src 10.20.0.7 metric 100
Now, let’s compare that to a static IP VM configured with 10.20.0.100 (but Kubeovn assigns it 10.20.0.2):
ubuntu@staticip-test:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
^C
--- 8.8.8.8 ping statistics ---
17 packets transmitted, 0 received, 100% packet loss, time 16405ms
ubuntu@staticip-test:~$ ip route
default via 10.20.0.1 dev enp1s0 proto static
10.20.0.0/24 dev enp1s0 proto kernel scope link src 10.20.0.100
ubuntu@staticip-test:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
link/ether 96:80:01:b3:99:43 brd ff:ff:ff:ff:ff:ff
inet 10.20.0.100/24 brd 10.20.0.255 scope global enp1s0
valid_lft forever preferred_lft forever
inet6 fe80::9480:1ff:feb3:9943/64 scope link
valid_lft forever preferred_lft forever
The Plot Thickens: Routing Issues
The mystery deepens when you move the static IP VM to a different host node. While it can still reach everything on the 10.20.0.x subnet and even the Harvester node IPs, it remains stubbornly unable to communicate with the Harvester VIP (10.10.0.10), the Harvester node gateway (10.10.0.1), or external IPs. It’s like the VM is in a network bubble.
Here’s the output from a VM on a different host node:
$ ip route
default via 10.20.0.1 dev enp1s0 proto static
10.20.0.0/24 dev enp1s0 proto kernel scope link src 10.20.0.100
$ ping 10.10.0.90
PING 10.10.0.90 (10.10.0.90) 56(84) bytes of data.
64 bytes from 10.10.0.90: icmp_seq=1 ttl=63 time=1.47 ms
64 bytes from 10.10.0.90: icmp_seq=2 ttl=63 time=0.601 ms
^C
--- 10.10.0.90 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.601/1.034/1.467/0.433 ms
$ ping 10.10.0.1
PING 10.10.0.1 (10.10.0.1) 56(84) bytes of data.
^C
--- 10.10.0.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2032ms
$ ping 10.10.0.10
PING 10.10.0.10 (10.10.0.10) 56(84) bytes of data.
^C
--- 10.10.0.10 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3087ms
It’s not yet clear whether this is a simple routing issue within the guest OS or something more fundamental. However, the fact that the UI doesn’t display the correct static IP (instead showing the Kubeovn-assigned IP) suggests there might be a deeper problem.
Kubeovn's Perspective
When we inspect the logical switch using nbctl
, we can see that the IP hasn’t been updated to reflect the static assignment. This hints that the ARP (Address Resolution Protocol) responses from the VM might be getting ignored. Here’s a snippet from nbctl show test
:
$ kc ko nbctl show test
switch 27faa623-068a-4733-8052-da9173089d15 (test)
port test2.default.uplink.default.ovn
addresses: ["6e:38:1a:0a:d0:c1 10.20.0.7"]
port staticip-test.default.uplink.default.ovn
addresses: ["96:80:01:b3:99:43 10.20.0.2"]
port test-ovn-cluster
type: router
router-port: ovn-cluster-test
Notice how staticip-test
is still listed with the Kubeovn-assigned IP (10.20.0.2), even though the guest OS is configured with 10.20.0.100.
To Reproduce
Want to see this bug in action yourself? Here’s how you can reproduce it:
- Set up the Environment: You'll need a Harvester cluster running version 1.6.0-RC5. Ensure you have a subnet configured that allows DHCP (although DHCP functionality itself needs to be working correctly, which, as of RC5, might need some fixing).
- Create Two VMs: Spin up two VMs within the same subnet.
- DHCP VM: Let this VM obtain its IP address via DHCP. This will serve as your baseline for comparison.
- Static IP VM: Configure this VM with a static IP. You can use the cloud-init config we discussed earlier:
version: 2 ethernets: enp1s0: dhcp4: false addresses: - 10.20.0.100/24 gateway4: 10.20.0.1 nameservers: addresses: - 1.1.1.1 - 8.8.8.8
- Verify Connectivity:
- On the DHCP VM, check if it can reach external endpoints (like 8.8.8.8) and internal resources.
- Try to ping the static IP VM from the DHCP VM using its static IP assignment (10.20.0.100 in our example), not the Kubeovn-assigned IP.
- On the static IP VM, attempt to ping external endpoints (like 8.8.8.8). You should see that it fails.
Expected Behavior
So, what should happen when you assign a static IP? Here’s the ideal scenario:
- Accurate UI Representation: The Harvester UI should display the correct IP address as reported by the qemu-guest-agent. If the agent isn't running or can't report the IP, it should fall back to the Kubeovn-assigned IP.
- Full Connectivity: Both VMs, regardless of whether they use DHCP or static IPs, should be able to reach all expected endpoints, including the public internet (assuming there's a route to it).
- (Optional) Kubeovn IP Table Update: Ideally, Kubeovn should update its internal IP table to accurately reflect the static IP assignment on the VM. This would ensure consistency and might help prevent routing issues.
Support Bundle for Troubleshooting
In this case, a support bundle isn't directly applicable, as the issue seems to be related to network configuration and routing.
Environment
- Harvester version: 1.6.0-rc5
- Underlying Infrastructure: NUCs
- Rancher version: N/A
Additional Context
No response
Workaround and Mitigation
Unfortunately, as of now, there's no straightforward workaround for this issue. The best course of action is to avoid using static IP assignments in Harvester 1.6.0-RC5 when using OverlayNetwork. Relying on DHCP (with proper route and DNS configuration) might be a temporary solution until this bug is addressed.
Conclusion
The static IP assignment bug in Harvester 1.6.0-RC5 is a significant issue that can impact network connectivity and management. By understanding the problem, how to reproduce it, and the expected behavior, we can better troubleshoot and work around it until a fix is available. Stay tuned for updates, and let's hope this gets resolved soon! In the meantime, keep your eyes peeled for any official announcements or patches from the Harvester team.