Thomas Richter– IBM Research and Development, Linux Technology
LinuxCon Edinburgh 21-Oct-2013
Software Defined Networking
Thomas Richter
y Center
using VXLAN
© 2009 IBM Corporation
IBM Presentation Template Full Version
Agenda
■ Vxlan
– IETF Draft
– VXLAN Features in Linux Kernel 3.8
■ Principle of Operation
– VM Creation, Migration, Removal
■ Advanced Usage
– Multicast, Broadcast, VM Detection
■ Management Tools
■ Related and Future Work
2 So
8 (DOVE Extension)
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
Virtualization in Data Center
Host A
VM
Intranet/Internet
Data centers host multiple customers
Customers require
– Own network (logical)
– Individual address space (IP and MA
– No interconnection with other custom
→ Overlay Network
• Logical network on top of e
Targets
– Central management and control
– Reliability
– Cover long distance between data ce
– Define optional policies (compressio
3 So
M VM VM Host B VM
VM
App App Virtual Bridge
NIC NIC
Switch
AC)
mers
existing network infrastructure
enters
on, encryption, ...)
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
VXLAN (IETF Draft)
Virtual eXtensible Local Area Network
– Encapsulates data packets
– Connection between end points (VT
– VTEP connection via existing IP inf
Provides
– 24 bit network identifier (VNI → def
– VM to VM communication only with
– VMs can use the same MAC/IP add
– VM unaware of encapsulation
This Talk
– Explains recent extensions and typ
– Mapping of VM addresses to VTEP
– Management of VTEP
4 So
TEP)
frastructure
fines VXLAN segment)
hin the same VXLAN segment
dresses in different VXLAN segments
pical traffic flow scenarios
P
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version Host
VXLAN Details
Host A
20.0.0.A vxlan0 v
192.168.100.A eth0
Switch
Vxlan device:
– Network device with IP, MAC addre
# ip link add vxlan0 type vxlan
# ip link set vxlan0 address 54
# ip address add 20.0.0.{A,B}
# ip link set up vxlan0
– Creates and connects to UDP sock
– Joins multicast group
– Encapsulates all traffic with VXLAN
– Uses UDP to forward traffic via eth0
5 So
tB
vxlan0 20.0.0.B Vxlan0 packet (inner)
eth0 192.168.100.B Vxlan payload
E+IP hdr udp vxlan E+IP hdr icmp
Eth0 packet (outer)
ess and VNI
n id 42 group 239.1.1.1 dev eth0
4:8:20:0:0:{A,B}
}/8 dev vxlan0
IANA Kernel
ket endpoint (port 4789/8472 (vxlan))
N header
0
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
VXLAN Details (2) Intran
Intern
Host A
ff:ff:ff:ff:ff:ff 54:1:20:0:0:A ARP REQ
Who has 20.0.0.B Tell 20.0.0.A
vxlan0 20.0.0.A
Source IP Source MAC eth0 192.168.100.A
Multicast UDP VNI ARP REQ Switch
Host-A: # ping 20.0.0.B
● Find vxlan0 interface and send out ARP
● Vxlan driver adds vxlan header (VNI)
● No known destination MAC, us
● Eth0 sends packet
Host-B receives packet and forwards to udp port
● Vxlan driver verifies VNI and strips off vx
● ARP request packet received and ARP re
● Vxlan driver adds vxlan header (VNI)
● No known destination MAC, us
● Eth0 sends packet
Host-B Vxlan device driver maintains a forwarding da
Command bridge fdb show dev vxlan0
54:1:20:0:0:A dev vxlan0 dst 192.168.100
0:0:0:0:0:0 dev vxlan0 dst 239.1.1.1 via e
6 So
net 54:1:20:0:0:B 54:1:20:0:0:A ARP REPLY
net 20.0.0.B is at 54:1:20:0:0:B
Host B Unicast 1 UDP VNI ARP REPLY
vxlan0 20.0.0.B 1) With Learning enabled
eth0 192.168.100.B (multicast otherwise)
h
request
se multicast address
xlan header
eply packet generated
se multicast address
atabase (fdb)
0.A self
eth0 self permanent → catch all
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
New VXLAN Features for Overlay N
Drawbacks:
– Missing control plane (no central contr
– Depends on multicast routing support
– Mapping VNI to multicast address
Vxlan Features released into Linux Kernel 3.8 (D
– L3MISS: Destination VM IP address n
• Trigger netlink message to u
• Expect netlink reply to add d
– L2MISS: MAC address not in VXLAN
• Do not broadcast to any VTE
• Trigger netlink message to u
• Expect netlink reply to add M
– NOLEARNING: Disable snooping of in
• No entry of MAC and destina
– Optimization (for virtual bridges)
• PROXY: Reply on Neighbor
• RSC: If dst MAC refers to rou
– saves 1st hop
7 So
Networks
rol of VTEPs and table management)
availability (wide area, routing table size)
DOVE extensions)
not in Neighbor table
user space
dst VM IP address into Neighbor table
FDB
EP (multicast)
user space
MAC address into VXLAN FDB
ncoming packets
ation VTEP address to VXLAN FDB
request when mapping found in VXLAN FDB
uter, replace with VM dst MAC address
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
VXLAN Forwarding Database (FDB
Maps destination VM MAC to VTEP IP
– Hashed, key is MAC address
– Size limitation possible
Contains destination
– IP Address
– VNI & port number
– Others: timestamps, flags
– Aging
Multiple destinations possible
– For multicast/all zero MAC address
– Transmit to several VTEP
– One copy per destination
Use iproute2 tool to create/delete FDB entries
– Command: bridge fdb add/del/appen
DOVE Extensions So
8
B)
unicast multicast
MAC MAC
Remote VTEP Remote VTEP
IP IP
VNI, Port VNI, Port
Remote VTEP
IP
VNI, Port
nd/replace …
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
VM Creation Host A Host B
VMA
20.0.0.A
54:A:20:0:0:A
NIC 192.168.100.A NIC
Switch
Create virtual bridge with VXLAN device per V
# ip link add vxlan0 type vxlan id 1 l2m
Neighbor & FDB Host A:
ARP: 20.0.0.B → 54:B:20:0:0:B
FDB: 54:B:20:0:0:B → 192.168.100
Neighbor & FDB Host B:
ARP: 20.0.0.A → 54:A:20:0:0:A
FDB: 54:A:20:0:0:A → 192.168.100
Traffic flow between VM A ↔ VM B
Can travel across internet
9 So
VMB Virtual Bridge
Vxlan
20.0.0.B
54:B:20:0:0:B
192.168.100.B
VNI
miss l3miss rsc proxy nolearning
(L3MISS netlink message)
0.B (L2MISS netlink message)
(L3MISS netlink message)
0.A (L2MISS netlink message)
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version Host A
VM Migration
Host C
VMA
20.0.0.A
54:A:20:0:0:A
NIC 192.168.100.C NIC 1
Create Virtual Bridge with VXLAN device per V
Host A: Delete Entries, Host C: Add Entries
● ARP: 20.0.0.B → 54:B:20:0:0:B
● FDB: 54:B:20:0:0:B → 192.168.100
Host B: Modify Entries
● ARP: 20.0.0.A → 54:A:20:0:0:A
● FDB: 54:A:20:0:0:A → 192.168.100
● Modify on all hosts part of the 20.x.
Traffic flow between VM A ↔ VM B
10 So
Host B
VMA VMB Virtual Bridge
Vxlan
20.0.0.A 20.0.0.B
54:A:20:0:0:A 54:B:20:0:0:B
192.168.100.A NIC 192.168.100.B
Switch
VNI
0.B
0.C
.x.x overlay network
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
VM Removal
Host C
VMA
20..0.0.A
54:A:20:0:0:A
NIC 192.168.100.C
Delete Virtual Bridge with VXLAN device per V
Host C: Delete Entries
● ARP: 20.0.0.B → 54:B:20:0:0:B
● FDB: 54:B:20:0:0:B → 192.168.100
Host B: Delete Entries
● ARP: 20.0.0.A → 54:A:20:0:0:A
● FDB: 54:A:20:0:0:A → 192.168.100
● Modify on all hosts part of the 20.x.
11 So
Host B
VMB Virtual Bridge
Vxlan
20.0.0.B
54:C:20:0:0:B
NIC 192.168.100.B
Switch
VNI
0.B
0.C
.x.x overlay network
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version Host A
VM Broadcast/Multicast 2
5
Host C
NIC 1
ARP VMC
FDB 20.0.0.A → 54:A:20:0:0:A 20.0.0.C
20.0.0.B → 54:B:20:0:0:B 54:C:20:0:0:C
54:A:20:0:0:A → 192.168.100.A NIC 192.168.100.C
54:B:20:0:0:B → 192.168.100.B
VM1: #ping -b 20.255.255.255
– Destination MAC ff:ff:ff:ff:ff:ff:ff
– One entry per VTEP
Traffic flow between VM A ↔ VM B and VM C
12 So
Host B
VMA VMB
20.0.0.A 20.0.0.B
54:A:20:0:0:A 54:B:20:0:0:B
Virtual Bridge
Vxlan
192.168.100.A NIC 192.168.100.B
ARP
Switch 20.0.0.A → 54:A:20:0:0:A
20.0.0.C → 54:C:20:0:0:C
ARP FDB
20.0.0.B → 54:B:20:0:0:B 54:A:20:0:0:A → 192.168.100.A
20.0.0.C → 54:C:20:0:0:C 54:C:20:0:0:C → 192.168.100.C
FDB
54:B:20:0:0:B → 192.168.100.B
54:C:20:0:0:C → 192.168.100.C
FF:FF:.......:FF → 192.168.100.A
FF:FF:.......:FF → 192.168.100.C
C
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
Multiple Customer Setup 20
54
20.0.0.A
54:A:20:0:0:A 20.0.0.A
54:A:20:0
Host A
VNI VMX VM1
ARP: 20.0.0.B → 54:B:20:0:0:B
3 FDB: 54:B:20:0:0:B → 192.168.100.B
ARP: 21.0.0.B → 54:B:21:0:0:B
FDB: 54:B:21:0:0:B → 192.168.100.B VNI 4
ARP: 20.0.0.B → 54:B:20:0:0:B NIC 192.168.100.A
4 FDB: 54:B:20:0:0:B → 192.168.100.B Switch
Create Virtual Bridges with different VXLAN de
Traffic flow between VM1 ↔ VM2 and VMX ↔
– Isolation of logical networks (defau
Cross logical network traffic possible (domain)
– Need configuration
– Add target VNI in VXLAN FDB
54:B:21:0:0:B → 192.168
Multiple nets via IP routing VM X ↔ VM Z
13 So
0.0.0.B 21.0.0.B VNI
4:B:20:0:0:B 54:B:21:0:0:B 3
0:0:A 20.0.0.B
Host B 54:B:20:0:0:B
ARP: 20.0.0.A → 54:A:20:0:0:A
FDB: 54:A:20:0:0:A → 192.168.100.A
VMY VMZ VM2 ARP: 20.0.0.A → 54:A:20:0:0:A 4
FDB: 54:A:20:0:0:A → 192.168.100.A
Virtual Bridge
Vxlan
NIC 192.168.100.B Fan out based on VNI in
h
VXLAN header
evices and VNIs
↔ VMY
ult configuration)
)
8.100.B VNI 4
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
External Connections
Host A
Internet GW VMX
macvtap
NIC NIC 192.168.100.A
Switch
External Connections
– Legacy VM to Overlay Network VM
– Access to External Network
Create VM with access to both networks
– Configure as gateway
Traffic flow between
– VMX↔ GW ↔ VM2/Internet
14 So
Host B
VMY VMZ VM2 Virtual Bridge
Vxlan
NIC 192.168.100.B
h
M
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
Control Plane: Neighbor and FDB T
5 Host A
ARP: 20.0.0.B → 54:B:20:0:0:B Agent
FDB: 54:B:20:0:0:B → 192.168.100.B
1
VMA
NIC 192.168.100.A
Switc
Agent runs on each host
– Manipulates Neighbor and FDB entrie
– Gets VM IP and MAC address
• DHCP Snooping/Gratuitous
• IGMP Snooping
– Data Exchange with AM
– Agent registers for libvirtd migration ev
Agent Manager
– Define logical networks
– Connects to all agents
– Multiple instances for reliability
– Defines Policy (ACL, firewall, encrypti
– Domains (One mngt for multiple VNI n
15 So
Table Management
Host B 5
Agent
ARP: 20.0.0.A → 54:A:20:0:0:A
VMB 4 FDB: 54:A:20:0:0:A → 192.168.100.A
2 4
Virtual Bridge Agent
Vxlan Manager
3
NIC 192.168.100.B
ch
es 1) VM boot detected by Agent
ARP 2)Agent forwards IP/MAC to AM
3) Check policy and permissions
vents 4) Notifies Agents
5) Agents add entries
ion, gateways, …)
networks)
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
Remarks
Details
– Prevent fragmentation en route, s
– UDP traffic between VTEP
Security
– Secure communication between
– Agent Manager data base protec
– Middle boxes (firewall, virus scan
IP v6 support under work
– Multicast support missing
Iptables, ebtables, tc
– Available on host side
Alternatives (VLAN, IEEE 802.3 Qbg):
– Need hardware configuration on
– Export VM MAC addresses to ph
16 So
set DF bit on VTEP
Agent and Agent Manager
ction
nner) must be VXLAN aware
devices
hysical network (table size, STP)
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
Summary
Location independent addressing
– VM assigned addresses retained w
Logical network scaling
– Independent of underlying physica
– Use existing IP network infrastruct
– No VM addresses in external switc
– No VLAN limitation
– No multicast dependency
Address space isolation
– Different tenants can use same ad
17 So
while moved in overlay network
al network and protocols
ture
ches → table size, STP
ddresses
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
Related and Future Work
Related Work
– Overlay transport:
• Similar concept (encapsula
• NVGRE:
– RFC 2784 and RFC289
– GRE protocol (0x6558)
• STT:
– Designed for NIC with T
– STT protocol (similar to
Future Work
– Integration into Open Stack (See Re
18 So
ation, inner and outer headers)
90
) over IP
TSO, LRO
o TCP) over IP
eference Nr. 4) and Open vSwitch
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
Quest
19 So
tions?
Send to: [email protected]
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
References
1)M. Mahalingam, D. Dutt et al, VXLAN: A Fram
Networks over Layer 3 Networks (Version
Note: This is work in progress
2)IBM: IBM SND VE White Paper, Jun-2013,
1)http://www-03.ibm.com/systems/net
3)Rami Cohen, et al: An intent-based approach
International Symposium on Integrated Ne
2013, pp 42-50
4)Rami Cohen, et al: Distributed Overlay Virtua
IFIP/IEEE International Symposium on Int
31-May-2013, pp 1088-1089
5)Vivek Kashyap, Network Overlays, Network V
Plumbers Conference, August 29-31, 201
20 So
mework for Overlaying Virtualized Layer 2
n 5), 8-May-2013, http://datatracker.ietf.org,
tworking/solutions/sdn.html
h for network virtualization, IFIP/IEEE
etwork Management (IM 2013), 29-31-May-
al Ethernet (DOVE) integration with Openstack,
tegrated Network Management (IM 2013), 29-
Virtualization and Lightning Talks, Linux
12, San Diego, CA, USA
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
References (2)
6)B. Davie, J. Gross, et al, A Stateless Transpo
Virtualization (STT) (Version 4), 13-Sep-2
Note: This is work in progress
7)T. Narten, E. Gray, et al, Problem Statement:
2013, http://tools.ietf.org/html/draft-nvo3-o
Note: This is work in progress
8)M. Sridharan, et al, NVGRE: Network Virtuali
(Version 3), 8-Aug-2013, http://tools.ietf.o
Note: This is work in progress
21 So
ort Tunneling Protocol for Network
2013, http://tools.ietf.org/html/draft-davie-stt,
: Overlays for Network Virtualization, 31-Jul-
overlay-problem-statement-04,
ization using Generic Routing Encapsulation
org/html/draft-sridharan-virtualization-nvgre-03,
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
Acknowledgments
Vivek Kashyap, Gerhard Stenzel, Dirk Herrendö
Stevens, Vinit Jain: IBM Linux Technology Ce
22 So
örfer, Mijo Safradin, Sridhar Sumadrala, David
enter, Data Center Networking
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
Trademarks
■ This work represents the view of the author an
IBM.
■ IBM is a registered trademark of International B
States and/or other countries.
■ UNIX is a registered trademark of The Open G
■ Linux is a registered trademark of Linus Torva
both.
■ Other company, product, and service names m
23 So
nd does not necessarily represent the view of
Business Machines Corporation in the United
Group in the United States and other countries .
alds in the United States, other countries, or
may be trademarks or service marks of others.
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
Glossary
Agent: Application to maintain Neighbor/FDB tables
AM: Agent Manager
DOVE: Distributed Overlay Virtual Ethernet
FDB: Forwarding Data Base
Layer 2: OSI Data Link Layer (Reliable Link between directly
connected nodes)
Layer 3: OSI Network Layer (IP addressing)
L2MISS: Destination MAC address unknown
L3MISS: Destination IP address unknown
LEARNING: Add new MAC/VTEP address in FDB
Multi Tenant: Software Instance used for several customers
NVGRE: Network Virtualization Generic Routing Encapsulation
OSI: Open Systems Interconnection
OTV: Overlay Transport Virtualization
RSC: Route Short Circuit
SDN: Software Defined Network
24 So
SST: Stateless Transport Tunneling
VNI: VXLAN Network Identifier or VXLAN Segment Identifier
VTEP: Virtual Tunnel End Point
VXLAN: Virtual extensible Local Area Network
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013
IBM Presentation Template Full Version
BACK
25 So
CKUP
oftware Defined Networking using VXLAN, Thomas Richter ([email protected]), LinuxCon 2013