We have an INTEL dual socket server installed single westmere-ep CPU /w quad core and we're testing out Vyatta VC6.2 on it. The board has dual igb (82575EB) 1 Gig Ethernet ports and installed one added-on INTEL 82599EB 10G dual port card
We are focus on 10Gb testing, so ignore the 1Gb

Performance seems lower than it should be for routing traffic as bottom reference says. I've played with CPU affinity for the "smp_affinity auto" prameters already.
1. In Uni-Directional case, it can't archieved as intel white paper tested in some packet size. [324176.pdf]
2. In Bi-Directional case, it can't archieved as intel white paper tested in some packet size. [322973.pdf]
How do I attached testcenter result file on forum ??
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
â—Ž TestCenter generating 2 flows from both port
port //1/1 - 192.168.2.10 ~ 192.168.2.11 /24 gw: 192.168.2.1
port //1/2 - 192.168.3.10 ~ 192.168.3.11 /24 gw: 192.168.3.1
â—Ž Dual 82599EB setting on VC6.2 - smp_affinity auto default are enabled on eth interface
ethernet eth1 {
address 192.168.2.1/24
}
ethernet eth2 {
address 192.168.3.1/24
}
â—Ž CPU X5667 @ 3.07Ghz / 12M cache / QPI speed 6.4GTs
â—Ž Memory DDR3 2G x 2 @ 1333 Mhz
â—Ž BIOS Setting
Hyper-threading disabled
EIST disabled
IOAT enabled
DCA enabled
NUMA enabled
â—Ž ixgbe drivers infomation under VC6.2
$ sudo ethtool -i ethx
eth1
driver: ixgbe
version: 2.0.62-k2
firmware-version: 0.9-3
bus-info: 0000:03:00.0
eth2
driver: ixgbe
version: 2.0.62-k2
firmware-version: 0.9-3
bus-info: 0000:03:00.1
â—Ž Status of modules in Linux kernel
$ lsmod | grep -i ioatdma
ioatdma 36480 40
dca 2965 2 iaatdma,ixgbe
â—Ž Linux kernel version
$ uname -a
Linux vyatta 2.6.35-1-586vyatta #1 SMP Fri Feb 4 05:07:37 PST 2011 i686 GUN/Linux
INTEL 20 Gbps packet-forwarding on Vyatta network OS
http://download.intel.com/embedded/processor/solutionbrief/322973.pdf
Design considerations for efficient network applications with IntelR multi-core processor-based systems on Linux

Upload TestCenter result file by xls format
Bi-directional
http://www.vyatta.org/files/u36700/bidirectional.zip
Uni-directional
http://www.vyatta.org/files/u36700/unidirectional.zip
Upload TestCenter result file by html format /w picture
recommend by firefox browser for best view
Uni-directional
http://www.filefactory.com/file/cc1a316/n/unidirectional.zip
Bi-directional
http://www.filefactory.com/file/cc1a319/n/bidirectional.zip
What was the memory used?
I had some discussions with the Linux Intel driver development team and they said that dual socket (especially on Westmere) is slower than a single socket unless all memory banks are fully populated with the more expensive (single rank?) variety. Otherwise the system becomes memory bandwidth limited. The approximate number for 2 socket fully populated was something like 32GB and over $12K.
By default Vyatta also overhead that may interfere. First, it always installs some basic iptables infrastructure that causes some performance loss even if it is not used. Second, the current versions of Vyatta are 32bit which limits memory usage on large systems.
Yes, Vyatta is 32bit now, when 64bit will be release? even my system is dual socket but just installed single cpu only, so there will be really performance difference as you mentioned, I'm able to populated the DIMM up to 32GB, but limited by 32bit Vyatta network OS can recognized up to 4GB.
is there still have some tips for improve performance like as white paper says?
20Gb for bi-directional performance <-- I would like to archieve the goal
thanks a lot.
You could try setting transmit queue length to 0 which disables QoS and the overhead of maintaining a software queue. There is no Vyatta config option for this so you have to manually do it.
It is possible to manually remove all the iptables chains and unload the modules. It is kind of messy.
Vyatta hasn't released a 64 bit version yet, but you can build it yourself from source.
Hi -- It looks like you were running the "RFC-2544 zero-loss throughput test" configured via the wizard on your Spirent TestCenter. I would suggest that you instead run the "RFC-2544 frame loss test" configurable via the same wizard. The frame loss test provides more information, so we find it generally more useful. One difference is that the frame loss test will give you throughput at 10 different offered rates (10 % through 100 %) at all 7 frame sizes (64 byte through 1518).
In addition, there are a couple of important Vyatta configuration steps you need to take in order to get good results in a test like this:
1) Disable flow-control on all NIC ports participating in the test. If you don't do this, the NIC will send pause frames to the analyzer, lowering the results because they take up space on the wire and are not counted by the analyzer.
2) Set up static ARP entries for each of the analyzer "virtual hosts". This is important because otherwise the ARP translations in the Vyatta router will time out in the middle of your test, and will have a tough time getting re-established when the offered load exceeds the capacity of the box.
It is also important to configure the analyzer so that the traffic between each of the port-pairs is "multi-flow". By that, I mean that the packets exhibit a range of source IP addresses and source and dest UDP port numbers. This is important so that the RSS function in the NIC will distribute the received packets equally to all of its receive queues. Since each receive queue is served by a different CPU (when smp-affinity is set to "auto"), this ensures that the traffic is balanced across all CPUs. If the traffic load is "single-flow", then all traffic will end up in one queue and only one CPU will service the load.
Note that the TestCenter RFC-2544 wizard does not generate a multi-flow workload by default. You need to first set up the test by running the wizard, then edit the stream blocks that it generates. I usually make the traffic be UDP, set the UDP source port range from 1 - 65535, the dest port range from 1 - 65535, and the source IP address from X.X.X.2 - X.X.X.254 (where X.X.X.2 is the IPv4 address assigned to the analyzer port).
With these changes, you should be able to exceed the results shown in the Intel white paper because you have a more powerful processor than they used.
For a quick sanity check, run "mpstat -P ALL 5" on the Vyatta while the test is in progress. If you do not see high usage on all of the CPUs, then your traffic generator is probably not sending "multi-flow" traffic. You can also double-check what the simulated traffic looks like with tcpdump on the Vyatta.
For more details, run "ethtool -S eth0" (or eth1, eth2, etc.) while the test is in progress and you can see the individual queue statistics. With proper multi-flow, all the queues will be active.
there is not mpstat command can be found, I use top to filter current running process and observe CPUs usage, you provided method I will use on later test.
The traffic flow which I use just 2 in the test
thanks a lot.
Thanks, first question : another eth interface also needed to do this command?
second question : is 64bit vyatta can be better performance result?
I will try all of suggestion as you mentioned and post result later
Put something sensible in /etc/apt/sources.list and do:
apt-get install sysstatOr run the above command on a similar Squeeze system and copy the .deb file out of /var/cache/apt/archives to your test system.
If you only have two IP addresses in the test, you're not going to be making full use of the Ethernet hardware and will only be filling one queue, so you'll only get a fraction of the possible performance.
Thanks for your advice and direction
I've face a problem when setting you provide me command for both interface that will cause the end of each traffic test iteration has several packets loss.
If no set command result is normal.
sudo ip li set dev [eth0 or ethx] txq 0Here is the result
Result
How do I check that NIC NAPI is supported and enabled ?
When testing routing throughput , sometime always received garbage packet that will hinder test judgment, would someone can help me to shutdown unnecessary protocol.
Here is TestCenter Log
"Unexpected Out of sequence frames received during current test iteration"
The ixgbe driver, which drives the 82599 NIC, employs NAPI. NAPI has been around for a long time now, so most drivers do support it. If you want to find out whether a particular driver uses NAPI, you need to read the source code to the driver.
I compile unofficial 32 and 64 bits version regularly and publish them here:
http://ftp.het.net/iso/vyatta/
Afaik bgp is faster on 64 bits.
The only other advance is better/faster usage of > 4 GB ram .
Could you try one of 64 bits images and run your test again ?
Very interested in the results.
Regards
Danny
OK, is these files?
vyatta-livecd-napa-20110710_i386
vyatta-livecd-napa-20110719_amd64.iso
Yes, I see i made a typo last week , date should be 10 (not 19)
Will compile/upload new ones now
Danny