[vyatta-svn] Linus' linux git respitory cloned with vyatta additions: Changes to 'kernel.org'
Rick Balocca
rbalocca at suva.vyatta.com
Fri Dec 22 17:01:54 PST 2006
New branch 'kernel.org' available with the following commits:
commit e45116b8d71ece9dbe41b114368ff7aebe3ae41a
Author: Brice Goglin <Brice.Goglin at ens-lyon.org>
Date: Mon Dec 11 20:14:15 2006 +0100
[PATCH] Fix typo in 'EXPERIMENTAL' in CC_STACKPROTECTOR on x86_64
Fix typo in 'EXPERIMENTAL' in config CC_STACKPROTECTOR in arch/x86_64/Kconfig.
Signed-off-by: Brice Goglin <brice at myri.com>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 022416967a814aa1b3a9476a842c0947a1a9d784
Author: David Howells <dhowells at redhat.com>
Date: Mon Dec 11 13:16:05 2006 +0000
[PATCH] LOG2: Make powerpc's __ilog2_u64() take a 64-bit argument
Make powerpc's __ilog2_u64() take a 64-bit argument.
Signed-off-by: David Howells <dhowells at redhat.com>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 69de7fc037c8cda6fd20a632d39461bf9d42b927
Merge: 116140b7f5c9182c86a4e419f81684209357aea7 99eeb8dfb1ce3df744e2e0d00dd627d7a8199ef0
Author: Linus Torvalds <torvalds at woody.osdl.org>
Date: Mon Dec 11 12:26:03 2006 -0800
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
AT91 MMC update for 2.6.19
mmc: Change SDHCI iomem error to a warning
mmc: fix "prev->state: 2 != TASK_RUNNING??" problem on SD/MMC card removal
AT91 MMC 5 : Minor cleanups
AT91 MMC 4 : Interrupt handler cleanup
AT91 MMC 3 : Move global mci_clk variable
AT91 MMC 2 : Use platform resources
AT91 MMC 1: Pass host structure.
commit 116140b7f5c9182c86a4e419f81684209357aea7
Merge: 8d610dd52dd1da696e199e4b4545f33a2a5de5c6 8af905b4a403ce74b8d907b50bccc453a58834bc
Author: Linus Torvalds <torvalds at woody.osdl.org>
Date: Mon Dec 11 12:22:58 2006 -0800
Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
[PATCH] smc91x: Kill off excessive versatile hooks.
[PATCH] myri10ge: update driver version to 1.1.0
[PATCH] myri10ge: fix big_bytes in case of vlan frames
[PATCH] myri10ge: Full vlan frame in small_bytes
[PATCH] myri10ge: drop contiguous skb routines
[PATCH] myri10ge: switch to page-based skb
[PATCH] myri10ge: add page-based skb routines
[PATCH] myri10ge: indentation cleanups
[PATCH] chelsio: working NAPI
[PATCH] MACB: Use __raw register access
[PATCH] MACB: Use struct delayed_work instead of struct work_struct
[PATCH] ucc_geth: Initialize mdio_lock.
[PATCH] ucc_geth: compilation error fixes
commit 8d610dd52dd1da696e199e4b4545f33a2a5de5c6
Author: Linus Torvalds <torvalds at woody.osdl.org>
Date: Mon Dec 11 12:12:04 2006 -0800
Make sure we populate the initroot filesystem late enough
We should not initialize rootfs before all the core initializers have
run. So do it as a separate stage just before starting the regular
driver initializers.
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 8993780a6e44fb4e7ed34e33458506a775356c6e
Author: Linus Torvalds <torvalds at woody.osdl.org>
Date: Mon Dec 11 09:28:46 2006 -0800
Make SLES9 "get_kernel_version" work on the kernel binary again
As reported by Andy Whitcroft, at least the SLES9 initrd build process
depends on getting the kernel version from the kernel binary. It does
that by simply trawling the binary and looking for the signature of the
"linux_banner" string (the string "Linux version " to be exact. Which
is really broken in itself, but whatever..)
That got broken when the string was changed to allow /proc/version to
change the UTS release information dynamically, and "get_kernel_version"
thus returned "%s" (see commit a2ee8649ba6d71416712e798276bf7c40b64e6e5:
"[PATCH] Fix linux banner utsname information").
This just restores "linux_banner" as a static string, which should fix
the version finding. And /proc/version simply uses a different string.
To avoid wasting even that miniscule amount of memory, the early boot
string should really be marked __initdata, but that just causes the same
bug in SLES9 to re-appear, since it will then find other occurrences of
"Linux version " first.
Cc: Andy Whitcroft <apw at shadowen.org>
Acked-by: Herbert Poetzl <herbert at 13thfloor.at>
Cc: Andi Kleen <ak at suse.de>
Cc: Andrew Morton <akpm at osdl.org>
Cc: Steve Fox <drfickle at us.ibm.com>
Acked-by: Olaf Hering <olaf at aepfle.de>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 8af905b4a403ce74b8d907b50bccc453a58834bc
Author: Paul Mundt <lethal at linux-sh.org>
Date: Mon Dec 11 19:30:06 2006 +0900
[PATCH] smc91x: Kill off excessive versatile hooks.
This looks like a result of too many auto-merges. The
CONFIG_ARCH_VERSATILE case was handled a total of 6 times.
This kills 5 of them.
Signed-off-by: Paul Mundt <lethal at linux-sh.org>
--
drivers/net/smc91x.h | 90 ---------------------------------------------------
1 file changed, 90 deletions(-)
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit 5796df19824bef34aabf5656f447b3b170d34c3b
Author: Brice Goglin <brice at myri.com>
Date: Mon Dec 11 11:27:55 2006 +0100
[PATCH] myri10ge: update driver version to 1.1.0
Update driver version to 1.1.0.
Signed-off-by: Brice Goglin <brice at myri.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit 13348beee529cd1200deeac161e1b2de0705b495
Author: Brice Goglin <brice at myri.com>
Date: Mon Dec 11 11:27:19 2006 +0100
[PATCH] myri10ge: fix big_bytes in case of vlan frames
Fix sizing of big_bytes in the case of vlan frames. The 4
VLAN_HLEN bytes were omitted, leading to sizing the big buffer
4 bytes smaller than it should be. Due to how rx buffers are
carved from pages, this was harmless for the common (9000, 1500)
byte MTUs, but could lead to data corruption for some MTUs.
Signed-off-by: Brice Goglin <brice at myri.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit de3c4507047f2457359551c49b093669acb4f190
Author: Brice Goglin <brice at myri.com>
Date: Mon Dec 11 11:26:38 2006 +0100
[PATCH] myri10ge: Full vlan frame in small_bytes
Receive full vlan frames into smalls when running with a jumbo MTU.
Signed-off-by: Brice Goglin <brice at myri.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit 52ea6fb39b6fd08ec8718b92cddb3fed2165a921
Author: Brice Goglin <brice at myri.com>
Date: Mon Dec 11 11:26:12 2006 +0100
[PATCH] myri10ge: drop contiguous skb routines
Drop the old routines that used the physically contigous skb now
that we use the physical pages. And rename myri10ge_page_rx_done()
to myri10ge_rx_done() as it was previously.
Signed-off-by: Brice Goglin <brice at myri.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit c7dab99b080accb2751c96bf66cd5ab12c78f8e4
Author: Brice Goglin <brice at myri.com>
Date: Mon Dec 11 11:25:42 2006 +0100
[PATCH] myri10ge: switch to page-based skb
Switch to physical page skb, by calling the new page-based
allocation routines and using myri10ge_page_rx_done().
Signed-off-by: Brice Goglin <brice at myri.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit dd50f3361f9f0bb407658e9087947c9bdcdefffc
Author: Brice Goglin <brice at myri.com>
Date: Mon Dec 11 11:25:09 2006 +0100
[PATCH] myri10ge: add page-based skb routines
Add physical page skb allocation routines and page based rx_done,
to be used by upcoming patches.
Signed-off-by: Brice Goglin <brice at myri.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit 6250223e055764efcaef3809a9f2350edfc82bbc
Author: Brice Goglin <brice at myri.com>
Date: Mon Dec 11 11:24:37 2006 +0100
[PATCH] myri10ge: indentation cleanups
Indentation cleanups to synchronize to our tree which is automatically
indent'ed.
Signed-off-by: Brice Goglin <brice at myri.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit 7fe26a60e08f38c797851fb3b444d753af616112
Author: Stephen Hemminger <shemminger at osdl.org>
Date: Fri Dec 8 11:08:33 2006 -0800
[PATCH] chelsio: working NAPI
This driver tries to enable/disable NAPI at runtime, but
does so in an unsafe manner, and the NAPI interrupt handling is
a mess. Replace it with a compile time selected NAPI implementation.
Signed-off-by: Stephen Hemminger <shemminger at osdl.org>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit 0f0d84e52cb2a6e0b1d101484a92121410135da1
Author: Haavard Skinnemoen <hskinnemoen at atmel.com>
Date: Fri Dec 8 14:38:30 2006 +0100
[PATCH] MACB: Use __raw register access
Since macb is a chip-internal device, use __raw_readl and
__raw_writel instead of readl/writel. This will perform native-endian
accesses, which is the right thing to do on both AVR32 and ARM devices.
Signed-off-by: Haavard Skinnemoen <hskinnemoen at atmel.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit d836cae4f683211f14c1fd8184f478622b185164
Author: Haavard Skinnemoen <hskinnemoen at atmel.com>
Date: Fri Dec 8 14:37:35 2006 +0100
[PATCH] MACB: Use struct delayed_work instead of struct work_struct
The macb driver calls schedule_delayed_work() and friends, so we need
to use a struct delayed_work along with it. The conversion was
explained by David Howells on lkml Dec 5 2006:
http://lkml.org/lkml/2006/12/5/269
Signed-off-by: Haavard Skinnemoen <hskinnemoen at atmel.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit 68dc44af632944dff6c8b36013d32a254fe62de4
Author: Scott Wood <scottwood at freescale.com>
Date: Thu Dec 7 13:31:26 2006 -0600
[PATCH] ucc_geth: Initialize mdio_lock.
Signed-off-by: Scott Wood <scottwood at freescale.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit 1083cfe11285816fb2e2e36aad097f1c3b6db915
Author: Scott Wood <scottwood at freescale.com>
Date: Thu Dec 7 13:31:07 2006 -0600
[PATCH] ucc_geth: compilation error fixes
Fix compilation failures when building the ucc_geth driver with spinlock
debugging.
Signed-off-by: Scott Wood <scottwood at freescale.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
commit 99eeb8dfb1ce3df744e2e0d00dd627d7a8199ef0
Author: Andrew Victor <andrew at sanpeople.com>
Date: Mon Dec 11 12:40:23 2006 +0100
AT91 MMC update for 2.6.19
The driver is usable on the newer SAM9 processors so replace all text
references to AT91RM9200 with just AT91.
The controller bug where all the words are byte-swapped is fixed on the
AT91SAM9 processors. The byte-swapping work-around therefore only needs
to be done if cpu_is_at91rm9200().
[Original patch from Wojtek Kaniewski]
The AT91RM9200 and AT91SAM9260 processors support two MMC/SD slots - the
slot which is connected is now passed via the platform_data and the
correct slot selected in the AT91_MCI_SDCR register.
The driver should not be calling at91_set_gpio_output() since the VCC
pin should have already been configured as an output in the
processor/board setup code. The driver should call
at91_set_gpio_value().
Signed-off-by: Andrew Victor <andrew at sanpeople.com>
Signed-off-by: Pierre Ossman <drzeus at drzeus.cx>
commit a98087cf81e91999a91ceedb2d2e3a95827c651f
Author: Pierre Ossman <drzeus at drzeus.cx>
Date: Thu Dec 7 19:17:20 2006 +0100
mmc: Change SDHCI iomem error to a warning
Some controllers report an invalid iomem size, but seem to work
correctly anyway. Change our current error to just a warning and
hope it doesn't cause too much problems.
Signed-off-by: Pierre Ossman <drzeus at drzeus.cx>
commit 7b30d281b9c115890c75d11eaf06881261c256da
Author: Vitaly Wool <vitalywool at gmail.com>
Date: Thu Dec 7 20:08:02 2006 +0100
mmc: fix "prev->state: 2 != TASK_RUNNING??" problem on SD/MMC card removal
Currently on SD/MMC card removal the system exhibits the following message (the platform is ARM Versatile):
prev->state: 2 != TASK_RUNNING??
mmcqd/762[CPU#0]: BUG in __schedule at linux-2.6/kernel/sched.c:3826
(akpm: someone tried to fix this, but it's still wrong)
Signed-off-by: Vitaly Wool <vitalywool at gmail.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Pierre Ossman <drzeus at drzeus.cx>
commit f3a8efa90b1aab16ead76ad7e22d9c5fc2045400
Author: Andrew Victor <andrew at sanpeople.com>
Date: Mon Oct 23 14:53:20 2006 +0200
AT91 MMC 5 : Minor cleanups
A number of small cleanups to the AT91RM9200 MMC driver:
- fix warnings generated by pr_debug().
- prepend "AT91 MMC:" to printk() messages.
Signed-off-by: Andrew Victor <andrew at sanpeople.com>
Signed-off-by: Pierre Ossman <drzeus at drzeus.cx>
commit df05a303e3b8a0c32764941200bec76d729126bc
Author: Andrew Victor <andrew at sanpeople.com>
Date: Mon Oct 23 14:50:09 2006 +0200
AT91 MMC 4 : Interrupt handler cleanup
This patch simplifies the AT91RM9200 MMC interrupt handler code so that
it doesn't re-read the Interrupt Status and Interrupt Mask registers
multiple times.
Also defined AT91_MCI_ERRORS instead of using the hard-coded 0xffff0000.
Signed-off-by: Andrew Victor <andrew at sanpeople.com>
Signed-off-by: Pierre Ossman <drzeus at drzeus.cx>
commit 3dd3b039d489dfbc907c64a161fd2231ddcdea48
Author: Andrew Victor <andrew at sanpeople.com>
Date: Mon Oct 23 14:46:54 2006 +0200
AT91 MMC 3 : Move global mci_clk variable
Move the global 'mci_clk' variable into the local 'at91mci_host'
structure.
Signed-off-by: Andrew Victor <andrew at sanpeople.com>
Signed-off-by: Pierre Ossman <drzeus at drzeus.cx>
commit 17ea0595f4e89932ac9297a3850fba8b4ecb461e
Author: Andrew Victor <andrew at sanpeople.com>
Date: Mon Oct 23 14:44:40 2006 +0200
AT91 MMC 2 : Use platform resources
Use the I/O base-address and IRQ passed to the driver via the
platform_device resources instead of using hardcoded values.
Signed-off-by: Andrew Victor <andrew at sanpeople.com>
Signed-off-by: Pierre Ossman <drzeus at drzeus.cx>
commit e0b19b83656731fc93f9a82592ebcad82c3e0944
Author: Andrew Victor <andrew at sanpeople.com>
Date: Wed Oct 25 19:42:38 2006 +0200
AT91 MMC 1: Pass host structure.
The I/O base address is now stored in the 'at91mci_host' structure. We
therefore have to pass this structure to at91_mci_read() and
at91_mci_write().
Signed-off-by: Andrew Victor <andrew at sanpeople.com>
Signed-off-by: Pierre Ossman <drzeus at drzeus.cx>
commit 9202f32558601c2c99ddc438eb3218131d00d413
Author: Ralf Baechle <ralf at linux-mips.org>
Date: Sun Dec 10 18:43:59 2006 +0000
[MIPS] Export local_flush_data_cache_page for sake of IDE.
On a CPU with aliases the IDE core needs to flush caches in the special
IDE variants of insw, insl etc. If IDE support is built as a module this
will only work if local_flush_data_cache_page happens is exported as a
module.
As per policy export local_flush_data_cache_page as GPL symbol only.
Signed-off-by: Ralf Baechle <ralf at linux-mips.org>
commit f8bf35a9145b0831d7d110402662d9cff2d90bd9
Author: Ralf Baechle <ralf at linux-mips.org>
Date: Sun Dec 10 15:09:38 2006 +0000
[MIPS] Export pm_power_off
This is required for ipmi_poweroff.c to work as a module.
Signed-off-by: Ralf Baechle <ralf at linux-mips.org>
commit ae32ffd65bbcc32795bb9b58ed12941efeb03dff
Author: Ralf Baechle <ralf at linux-mips.org>
Date: Sun Dec 10 15:05:11 2006 +0000
[MIPS] Export csum_partial_copy_nocheck.
ibmtr.c and typhoon.c use it.
Signed-off-by: Ralf Baechle <ralf at linux-mips.org>
commit 2d911e9a4e74ddbd059f9dabea402a119ef22e3d
Author: Ralf Baechle <ralf at linux-mips.org>
Date: Sun Dec 10 15:02:17 2006 +0000
[MIPS] Move die and die_if_kernel() from system.h to ptrace.h
This eleminates the need to include ptrace.h into system.h and fixes a
harmless namespace conflict on the PC symbol in bpck.c.
Signed-off-by: Ralf Baechle <ralf at linux-mips.org>
commit 86384d544157db23879064cde36061cdcafc6794
Author: Ralf Baechle <ralf at linux-mips.org>
Date: Sun Dec 10 14:57:28 2006 +0000
[MIPS] Discard .exit.text at linktime.
This fixes fairly unobvious breakage of various drivers.
Signed-off-by: Ralf Baechle <ralf at linux-mips.org>
commit 5b1d221e6292f9fcf9f12d6c9e94ee9470ee2a24
Author: Ralf Baechle <ralf at linux-mips.org>
Date: Sat Dec 9 16:12:18 2006 +0000
[MIPS] Fix build of several IDE drivers by providing pci_get_legacy_ide_irq
Signed-off-by: Ralf Baechle <ralf at linux-mips.org>
commit 3263263f7091eccab6fdc23f28f09b17c0466629
Author: Herbert Xu <herbert at gondor.apana.org.au>
Date: Sun Dec 10 09:50:36 2006 +1100
[CRYPTO] dm-crypt: Select CRYPTO_CBC
As CBC is the default chaining method for cryptoloop, we should select
it from cryptoloop to ease the transition. Spotted by Rene Herman.
Signed-off-by: Herbert Xu <herbert at gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 0258736a0a2cde8ab30725b601aeca4cf8bc93ab
Author: Cal Peake <cp at absolutedigital.net>
Date: Sun Dec 10 06:22:05 2006 -0500
[PATCH] add MODULE_* attributes to bit reversal library
Add MODULE_* attributes to the new bit reversal library. Most notably
MODULE_LICENSE which prevents superfluous kernel tainting.
Signed-off-by: Cal Peake <cp at absolutedigital.net>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit edb16bec41db68b22799a5fbad82c3891e637565
Merge: bb7320d1d96dc2e479180ae8e7a112caf0726ace f0882589666440d573f657cb3a1d5f66f3caa157
Author: Linus Torvalds <torvalds at woody.osdl.org>
Date: Sun Dec 10 10:00:00 2006 -0800
Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Fix several kprobes bugs.
[SPARC64]: Update defconfig.
[SPARC64]: dma remove extra brackets
[SPARC{32,64}]: Propagate ptrace_traceme() return value.
[SPARC64]: Replace kmalloc+memset with kzalloc
[SPARC]: Check kzalloc() return value in SUN4D irq/iommu init.
[SPARC]: Replace kmalloc+memset with kzalloc
[SPARC64]: Run ctrl-alt-del action for sun4v powerdown request.
[SPARC64]: Unaligned accesses to userspace are hard errors.
[SPARC64]: Call do_mathemu on illegal instruction traps too.
[SPARC64]: Update defconfig.
[SPARC64]: Add irqtrace/stacktrace/lockdep support.
commit bb7320d1d96dc2e479180ae8e7a112caf0726ace
Merge: 6aa8b732ca01c3d7a54e93f4d701b8aabbe60fb7 1de1bf06330920802d3b7646a088965bdd918356
Author: Linus Torvalds <torvalds at woody.osdl.org>
Date: Sun Dec 10 09:59:18 2006 -0800
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (132 commits)
V4L/DVB 4949b: Fix container_of pointer retreival
V4L/DVB (4949a): Fix INIT_WORK
V4L/DVB (4949): Cxusb: codingstyle cleanups
V4L/DVB (4948): Cxusb: Convert tuner functions to use dvb_pll_attach
V4L/DVB (4947): Cx88: trivial cleanups
V4L/DVB (4946): Cx88: Move cx88_dvb_bus_ctrl out of the card-specific area
V4L/DVB (4945): Cx88: consolidate cx22702_config structs
V4L/DVB (4944): Cx88: Convert DViCO FusionHDTV Hybrid to use dvb_pll_attach
V4L/DVB (4943): Cx88: cleanup dvb_pll_attach for lgdt3302 tuners
V4L/DVB (4953): Usbvision minor fixes
V4L/DVB (4951): Add version.h, since it is required for VIDIOC_QUERYCAP
V4L/DVB (4940): Or51211: Changed SNR and signal strength calculations
V4L/DVB (4939): Or51132: Changed SNR and signal strength reporting
V4L/DVB (4938): Cx88: Convert lgdt3302 tuning function to use dvb_pll_attach
V4L/DVB (4941): Remove LINUX_VERSION_CODE and fix identations
V4L/DVB (4942): Whitespace cleanups
V4L/DVB (4937): Usbvision cleanup and code reorganization
V4L/DVB (4936): Make MT4049FM5 tuner to set FM Gain to Normal
V4L/DVB (4935): Added the capability of selecting fm gain by tuner
V4L/DVB (4934): Usbvision radio requires GainNormal at e register
...
commit 6aa8b732ca01c3d7a54e93f4d701b8aabbe60fb7
Author: Avi Kivity <avi at qumranet.com>
Date: Sun Dec 10 02:21:36 2006 -0800
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel at lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero at arklinux.org: build fix]
[simon.kagstrom at bth.se: build fix, other fixes]
[uril at qumranet.com: KVM: Expose interrupt bitmap]
[akpm at osdl.org: i386 build fix]
[mingo at elte.hu: i386 fixes]
[rdreier at cisco.com: add log levels to all printks]
[randy.dunlap at oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony at codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv at qumranet.com>
Signed-off-by: Avi Kivity <avi at qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom at bth.se>
Cc: Bernhard Rosenkraenzer <bero at arklinux.org>
Signed-off-by: Uri Lublin <uril at qumranet.com>
Cc: Ingo Molnar <mingo at elte.hu>
Cc: Roland Dreier <rolandd at cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap at oracle.com>
Signed-off-by: Anthony Liguori <anthony at codemonkey.ws>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit f5f1a24a2caa299bb7d294aee92d7dd3410d9ed7
Author: Daniel Walker <dwalker at mvista.com>
Date: Sun Dec 10 02:21:33 2006 -0800
[PATCH] clocksource: small cleanup
Mostly changing alignment. Just some general cleanup.
[akpm at osdl.org: build fix]
Signed-off-by: Daniel Walker <dwalker at mvista.com>
Acked-by: John Stultz <johnstul at us.ibm.com>
Cc: Thomas Gleixner <tglx at linutronix.de>
Cc: Ingo Molnar <mingo at elte.hu>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 2b0137001de68153203dd3bc20e6d27eb7c9719c
Author: Daniel Walker <dwalker at mvista.com>
Date: Sun Dec 10 02:21:30 2006 -0800
[PATCH] clocksource: add usage of CONFIG_SYSFS
Simply adds some ifdefs to remove clocksoure sysfs code when CONFIG_SYSFS
isn't turn on.
Signed-off-by: Daniel Walker <dwalker at mvista.com>
Acked-by: John Stultz <johnstul at us.ibm.com>
Cc: Thomas Gleixner <tglx at linutronix.de>
Cc: Ingo Molnar <mingo at elte.hu>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 2b2842146cb4105877c2be51d3857ec61ebd4ff9
Author: Arjan van de Ven <arjan at linux.intel.com>
Date: Sun Dec 10 02:21:28 2006 -0800
[PATCH] user of the jiffies rounding patch: Slab
This patch introduces users of the round_jiffies() function in the slab code.
The slab code has a few "run every second" timers for background work; these
are obviously not timing critical as long as they happen roughly at the right
frequency.
Signed-off-by: Arjan van de Ven <arjan at linux.intel.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 44d306e1508fef6fa7a6eb15a1aba86ef68389a6
Author: Arjan van de Ven <arjan at linux.intel.com>
Date: Sun Dec 10 02:21:26 2006 -0800
[PATCH] user of the jiffies rounding code: JBD
This patch introduces a user: of the round_jiffies() function; the "5 second"
ext3/jbd wakeup.
While "every 5 seconds" doesn't sound as a problem, there can be many of these
(and these timers do add up over all the kernel). The "5 second" wakeup isn't
really timing sensitive; in addition even with rounding it'll still happen
every 5 seconds (with the exception of the very first time, which is likely to
be rounded up to somewhere closer to 6 seconds)
Signed-off-by: Arjan van de Ven <arjan at linux.intel.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 4c36a5dec25fb344ad76b11860da3a8b50bd1248
Author: Arjan van de Ven <arjan at linux.intel.com>
Date: Sun Dec 10 02:21:24 2006 -0800
[PATCH] round_jiffies infrastructure
Introduce a round_jiffies() function as well as a round_jiffies_relative()
function. These functions round a jiffies value to the next whole second.
The primary purpose of this rounding is to cause all "we don't care exactly
when" timers to happen at the same jiffy.
This avoids multiple timers firing within the second for no real reason;
with dynamic ticks these extra timers cause wakeups from deep sleep CPU
sleep states and thus waste power.
The exact wakeup moment is skewed by the cpu number, to avoid all cpus from
waking up at the exact same time (and hitting the same lock/cachelines
there)
[akpm at osdl.org: fix variable type]
Signed-off-by: Arjan van de Ven <arjan at linux.intel.com>
Cc: Thomas Gleixner <tglx at linutronix.de>
Cc: Ingo Molnar <mingo at elte.hu>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 5466b456ed6748e0bfe02831e570004d4c04c1d7
Author: Vadim Lobanov <vlobanov at speakeasy.net>
Date: Sun Dec 10 02:21:22 2006 -0800
[PATCH] fdtable: Implement new pagesize-based fdtable allocator
This patch provides an improved fdtable allocation scheme, useful for
expanding fdtable file descriptor entries. The main focus is on the fdarray,
as its memory usage grows 128 times faster than that of an fdset.
The allocation algorithm sizes the fdarray in such a way that its memory usage
increases in easy page-sized chunks. The overall algorithm expands the allowed
size in powers of two, in order to amortize the cost of invoking vmalloc() for
larger allocation sizes. Namely, the following sizes for the fdarray are
considered, and the smallest that accommodates the requested fd count is
chosen:
pagesize / 4
pagesize / 2
pagesize <- memory allocator switch point
pagesize * 2
pagesize * 4
...etc...
Unlike the current implementation, this allocation scheme does not require a
loop to compute the optimal fdarray size, and can be done in efficient
straightline code.
Furthermore, since the fdarray overflows the pagesize boundary long before any
of the fdsets do, it makes sense to optimize run-time by allocating both
fdsets in a single swoop. Even together, they will still be, by far, smaller
than the fdarray. The fdtable->open_fds is now used as the anchor for the
fdset memory allocation.
Signed-off-by: Vadim Lobanov <vlobanov at speakeasy.net>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Al Viro <viro at zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar at in.ibm.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 4fd45812cbe875a620c86a096a5d46c742694b7e
Author: Vadim Lobanov <vlobanov at speakeasy.net>
Date: Sun Dec 10 02:21:17 2006 -0800
[PATCH] fdtable: Remove the free_files field
An fdtable can either be embedded inside a files_struct or standalone (after
being expanded). When an fdtable is being discarded after all RCU references
to it have expired, we must either free it directly, in the standalone case,
or free the files_struct it is contained within, in the embedded case.
Currently the free_files field controls this behavior, but we can get rid of
it entirely, as all the necessary information is already recorded. We can
distinguish embedded and standalone fdtables using max_fds, and if it is
embedded we can divine the relevant files_struct using container_of().
Signed-off-by: Vadim Lobanov <vlobanov at speakeasy.net>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Al Viro <viro at zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar at in.ibm.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit bbea9f69668a3d0cf9feba15a724cd02896f8675
Author: Vadim Lobanov <vlobanov at speakeasy.net>
Date: Sun Dec 10 02:21:12 2006 -0800
[PATCH] fdtable: Make fdarray and fdsets equal in size
Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets. The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).
In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.
Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal. This
patch removes fdtable->max_fdset. As an added bonus, most of the supporting
code becomes simpler.
Signed-off-by: Vadim Lobanov <vlobanov at speakeasy.net>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Al Viro <viro at zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar at in.ibm.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit f3d19c90fb117a5f080310a4592929aa8e1ad8e9
Author: Vadim Lobanov <vlobanov at speakeasy.net>
Date: Sun Dec 10 02:21:09 2006 -0800
[PATCH] fdtable: Delete pointless code in dup_fd()
The dup_fd() function creates a new files_struct and fdtable embedded inside
that files_struct, and then possibly expands the fdtable using expand_files().
The out_release error path is invoked when expand_files() returns an error
code. However, when this attempt to expand fails, the fdtable is left in its
original embedded form, so it is pointless to try to free the associated
fdarray and fdsets.
Signed-off-by: Vadim Lobanov <vlobanov at speakeasy.net>
Cc: Dipankar Sarma <dipankar at in.ibm.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Al Viro <viro at zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 5eb6c7a2ab413dea1ee6c08dd58263a1c2c2efa3
Author: Zach Brown <zach.brown at oracle.com>
Date: Sun Dec 10 02:21:07 2006 -0800
[PATCH] dio: lock refcount operations
The wait_for_more_bios() function name was poorly chosen. While looking to
clean it up it I noticed that the dio struct refcounting between the bio
completion and dio submission paths was racey.
The bio submission path was simply freeing the dio struct if
atomic_dec_and_test() indicated that it dropped the final reference.
The aio bio completion path was dereferencing its dio struct pointer *after
dropping its reference* based on the remaining number of references.
These two paths could race and result in the aio bio completion path
dereferencing a freed dio, though this was not observed in the wild.
This moves the refcount under the bio lock so that bio completion can drop
its reference and decide to wake all in one atomic step.
Once testing and waking is locked dio_await_one() can test its sleeping
condition and mark itself uninterruptible under the lock. It gets simpler
and wait_for_more_bios() disappears.
The addition of the interrupt masking spin lock acquiry in dio_bio_submit()
looks alarming. This lock acquiry existed in that path before the recent
dio completion patch set. We shouldn't expect significant performance
regression from returning to the behaviour that existed before the
completion clean up work.
This passed 4k block ext3 O_DIRECT fsx and aio-stress on an SMP machine.
Signed-off-by: Zach Brown <zach.brown at oracle.com>
Cc: Badari Pulavarty <pbadari at us.ibm.com>
Cc: Suparna Bhattacharya <suparna at in.ibm.com>
Cc: Jeff Moyer <jmoyer at redhat.com>
Cc: <xfs-masters at oss.sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 8459d86aff04fa53c2ab6a6b9f355b3063cc8014
Author: Zach Brown <zach.brown at oracle.com>
Date: Sun Dec 10 02:21:05 2006 -0800
[PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED
The only time it is safe to call aio_complete() is when the ->ki_retry
function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has
historically done this by relying on its caller to translate positive return
codes into -EIOCBQUEUED for the aio case. It did this by trying to keep
conditionals in sync. direct_io_worker() knew when finished_one_bio() was
going to call aio_complete(). It would reverse the test and wait and free the
dio in the cases it thought that finished_one_bio() wasn't going to.
Not surprisingly, it ended up getting it wrong. 'ret' could be a negative
errno from the submission path but it failed to communicate this to
finished_one_bio(). direct_io_worker() would return < 0, it's callers
wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the
future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
would be called for a second time which can manifest as an oops.
The previous cleanups have whittled the sync and async completion paths down
to the point where we can collapse them and clearly reassert the invariant
that we must only call aio_complete() after returning -EIOCBQUEUED.
direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
drop the dio refcount and the aio bio completion path will only call
aio_complete() when it is the last to drop the dio refcount.
direct_io_worker() can ensure that it is the last to drop the reference count
by waiting for bios to drain. It does this for sync ops, of course, and for
partial dio writes that must fall back to buffered and for aio ops that saw
errors during submission.
This means that operations that end up waiting, even if they were issued as
aio ops, will not call aio_complete() from dio. Instead we return the return
code of the operation and let the aio core call aio_complete(). This is
purposely done to fix a bug where AIO DIO file extensions would call
aio_complete() before their callers have a chance to update i_size.
Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
no longer have to translate for it. XFS needs to be careful not to free
resources that will be used during AIO completion if -EIOCBQUEUED is returned.
We maintain the previous behaviour of trying to write fs metadata for O_SYNC
aio+dio writes.
Signed-off-by: Zach Brown <zach.brown at oracle.com>
Cc: Badari Pulavarty <pbadari at us.ibm.com>
Cc: Suparna Bhattacharya <suparna at in.ibm.com>
Acked-by: Jeff Moyer <jmoyer at redhat.com>
Cc: <xfs-masters at oss.sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 20258b2b397031649e4a41922fe803d57017df84
Author: Zach Brown <zach.brown at oracle.com>
Date: Sun Dec 10 02:21:01 2006 -0800
[PATCH] dio: remove duplicate bio wait code
Now that we have a single refcount and waiting path we can reuse it in the
async 'should_wait' path. It continues to rely on the fragile link between
the conditional in dio_complete_aio() which decides to complete the AIO and
the conditional in direct_io_worker() which decides to wait and free.
By waiting before dropping the reference we stop dio_bio_end_aio() from
calling dio_complete_aio() which used to wake up the waiter after seeing the
reference count drop to 0. We hoist this wake up into dio_bio_end_aio() which
now notices when it's left a single remaining reference that is held by the
waiter.
Signed-off-by: Zach Brown <zach.brown at oracle.com>
Cc: Badari Pulavarty <pbadari at us.ibm.com>
Cc: Suparna Bhattacharya <suparna at in.ibm.com>
Acked-by: Jeff Moyer <jmoyer at redhat.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 0273201e693fd62381f6b1e85b15ffc117d8a46e
Author: Zach Brown <zach.brown at oracle.com>
Date: Sun Dec 10 02:20:59 2006 -0800
[PATCH] dio: formalize bio counters as a dio reference count
Previously we had two confusing counts of bio progress. 'bio_count' was
decremented as bios were processed and freed by the dio core. It was used to
indicate final completion of the dio operation. 'bios_in_flight' reflected
how many bios were between submit_bio() and bio->end_io. It was used by the
sync path to decide when to wake up and finish completing bios and was ignored
by the async path.
This patch collapses the two notions into one notion of a dio reference count.
bios hold a dio reference when they're between submit_bio and bio->end_io.
Since bios_in_flight was only used in the sync path it is now equivalent to
dio->refcount - 1 which accounts for direct_io_worker() holding a reference
for the duration of the operation.
dio_bio_complete() -> finished_one_bio() was called from the sync path after
finding bios on the list that the bio->end_io function had deposited.
finished_one_bio() can not drop the dio reference on behalf of these bios now
because bio->end_io already has. The is_async test in finished_one_bio()
meant that it never actually did anything other than drop the bio_count for
sync callers. So we remove its refcount decrement, don't call it from
dio_bio_complete(), and hoist its call up into the async dio_bio_complete()
caller after an explicit refcount decrement. It is renamed dio_complete_aio()
to reflect the remaining work it actually does.
Signed-off-by: Zach Brown <zach.brown at oracle.com>
Cc: Badari Pulavarty <pbadari at us.ibm.com>
Cc: Suparna Bhattacharya <suparna at in.ibm.com>
Acked-by: Jeff Moyer <jmoyer at redhat.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 17a7b1d74b1207f8f1af40b5d184989076d08f8b
Author: Zach Brown <zach.brown at oracle.com>
Date: Sun Dec 10 02:20:56 2006 -0800
[PATCH] dio: call blk_run_address_space() once per op
We only need to call blk_run_address_space() once after all the bios for the
direct IO op have been submitted. This removes the chance of calling
blk_run_address_space() after spurious wake ups as the sync path waits for
bios to drain. It's also one less difference betwen the sync and async paths.
In the process we remove a redundant dio_bio_submit() that its caller had
already performed.
Signed-off-by: Zach Brown <zach.brown at oracle.com>
Cc: Badari Pulavarty <pbadari at us.ibm.com>
Cc: Suparna Bhattacharya <suparna at in.ibm.com>
Acked-by: Jeff Moyer <jmoyer at redhat.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 6d544bb4d9019c3a0d7ee4af1e4bbbd61a6e16dc
Author: Zach Brown <zach.brown at oracle.com>
Date: Sun Dec 10 02:20:54 2006 -0800
[PATCH] dio: centralize completion in dio_complete()
There have been a lot of bugs recently due to the way direct_io_worker() tries
to decide how to finish direct IO operations. In the worst examples it has
failed to call aio_complete() at all (hang) or called it too many times
(oops).
This set of patches cleans up the completion phase with the goal of removing
the complexity that lead to these bugs. We end up with one path that
calculates the result of the operation after all off the bios have completed.
We decide when to generate a result of the operation using that path based on
the final release of a refcount on the dio structure.
I tried to progress towards the final state in steps that were relatively easy
to understand. Each step should compile but I only tested the final result of
having all the patches applied.
I've tested these on low end PC drives with aio-stress, the direct IO tests I
could manage to get running in LTP, orasim, and some home-brew functional
tests.
In http://lkml.org/lkml/2006/9/21/103 IBM reports success with ext2 and ext3
running DIO LTP tests. They found that XFS bug which has since been addressed
in the patch series.
This patch:
The mechanics which decide the result of a direct IO operation were duplicated
in the sync and async paths.
The async path didn't check page_errors which can manifest as silently
returning success when the final pointer in an operation faults and its
matching file region is filled with zeros.
The sync path and async path differed in whether they passed errors to the
caller's dio->end_io operation. The async path was passing errors to it which
trips an assertion in XFS, though it is apparently harmless.
This centralizes the completion phase of dio ops in one place. AIO will now
return EFAULT consistently and all paths fall back to the previously sync
behaviour of passing the number of bytes 'transferred' to the dio->end_io
callback, regardless of errors.
dio_await_completion() doesn't have to propogate EIO from non-uptodate bios
now that it's being propogated through dio_complete() via dio->io_error. This
lets it return void which simplifies its sole caller.
Signed-off-by: Zach Brown <zach.brown at oracle.com>
Cc: Badari Pulavarty <pbadari at us.ibm.com>
Cc: Suparna Bhattacharya <suparna at in.ibm.com>
Acked-by: Jeff Moyer <jmoyer at redhat.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 1757128438d41670ded8bc3bc735325cc07dc8f9
Author: NeilBrown <neilb at suse.de>
Date: Sun Dec 10 02:20:52 2006 -0800
[PATCH] md: assorted md and raid1 one-liners
Fix few bugs that meant that:
- superblocks weren't alway written at exactly the right time (this
could show up if the array was not written to - writting to the array
causes lots of superblock updates and so hides these errors).
- restarting device recovery after a clean shutdown (version-1 metadata
only) didn't work as intended (or at all).
1/ Ensure superblock is updated when a new device is added.
2/ Remove an inappropriate test on MD_RECOVERY_SYNC in md_do_sync.
The body of this if takes one of two branches depending on whether
MD_RECOVERY_SYNC is set, so testing it in the clause of the if
is wrong.
3/ Flag superblock for updating after a resync/recovery finishes.
4/ If we find the neeed to restart a recovery in the middle (version-1
metadata only) make sure a full recovery (not just as guided by
bitmaps) does get done.
Signed-off-by: Neil Brown <neilb at suse.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit c2b00852fbae4f8c45c2651530ded3bd01bde814
Author: NeilBrown <neilb at suse.de>
Date: Sun Dec 10 02:20:51 2006 -0800
[PATCH] md: return a non-zero error to bi_end_io as appropriate in raid5
Currently raid5 depends on clearing the BIO_UPTODATE flag to signal an error
to higher levels. While this should be sufficient, it is safer to explicitly
set the error code as well - less room for confusion.
Signed-off-by: Neil Brown <neilb at suse.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit b8c6b645563d641df91fdcfd84a9c73c91d75b61
Author: NeilBrown <neilb at suse.de>
Date: Sun Dec 10 02:20:50 2006 -0800
[PATCH] md: remove some old ifdefed-out code from raid5.c
There are some vestiges of old code that was used for bypassing the stripe
cache on reads in raid5.c. This was never updated after the change from
buffer_heads to bios, but was left as a reminder.
That functionality has nowe been implemented in a completely different way, so
the old code can go.
Signed-off-by: Neil Brown <neilb at suse.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit fdee8ae4498c48b44c0eac592f9c6ed24c4517c1
Author: Jeff Garzik <jeff at garzik.org>
Date: Sun Dec 10 02:20:50 2006 -0800
[PATCH] MD: conditionalize some code
The autorun code is only used if this module is built into the static
kernel image. Adjust #ifdefs accordingly.
Signed-off-by: Jeff Garzik <jeff at garzik.org>
Acked-by: NeilBrown <neilb at suse.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit b875e531fc82db592d6093594593d5cafde0a1cd
Author: NeilBrown <neilb at suse.de>
Date: Sun Dec 10 02:20:49 2006 -0800
[PATCH] md: fix innocuous bug in raid6 stripe_to_pdidx
stripe_to_pdidx finds the index of the parity disk for a given stripe. It
assumes raid5 in that it uses "disks-1" to determine the number of data disks.
This is incorrect for raid6 but fortunately the two usages cancel each other
out. The only way that 'data_disks' affects the calculation of pd_idx in
raid5_compute_sector is when it is divided into the sector number. But as
that sector number is calculated by multiplying in the wrong value of
'data_disks' the division produces the right value.
So it is innocuous but needs to be fixed.
Also change the calculation of raid_disks in compute_blocknr to make it
more obviously correct (it seems at first to always use disks-1 too).
Signed-off-by: Neil Brown <neilb at suse.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 5248861511d6aae4997a5aa7152824d87587b0b6
Author: Raz Ben-Jehuda(caro) <raziebe at gmail.com>
Date: Sun Dec 10 02:20:48 2006 -0800
[PATCH] md: enable bypassing cache for reads
Call the chunk_aligned_read where appropriate.
Signed-off-by: Neil Brown <neilb at suse.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 46031f9a38a9773021f1872abc713d62467ac22e
Author: Raz Ben-Jehuda(caro) <raziebe at gmail.com>
Date: Sun Dec 10 02:20:47 2006 -0800
[PATCH] md: allow reads that have bypassed the cache to be retried on failure
If a bypass-the-cache read fails, we simply try again through the cache. If
it fails again it will trigger normal recovery precedures.
update 1:
From: NeilBrown <neilb at suse.de>
1/
chunk_aligned_read and retry_aligned_read assume that
data_disks == raid_disks - 1
which is not true for raid6.
So when an aligned read request bypasses the cache, we can get the wrong data.
2/ The cloned bio is being used-after-free in raid5_align_endio
(to test BIO_UPTODATE).
3/ We forgot to add rdev->data_offset when submitting
a bio for aligned-read
4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
so we need to invalidate the segment counts.
5/ We don't de-reference the rdev when the read completes.
This means we need to record the rdev to so it is still
available in the end_io routine. Fortunately
bi_next in the original bio is unused at this point so
we can stuff it in there.
6/ We leak a cloned bio if the target rdev is not usable.
From: NeilBrown <neilb at suse.de>
update 2:
1/ When aligned requests fail (read error) they need to be retried
via the normal method (stripe cache). As we cannot be sure that
we can process a single read in one go (we may not be able to
allocate all the stripes needed) we store a bio-being-retried
and a list of bioes-that-still-need-to-be-retried.
When find a bio that needs to be retried, we should add it to
the list, not to single-bio...
2/ We were never incrementing 'scnt' when resubmitting failed
aligned requests.
[akpm at osdl.org: build fix]
Signed-off-by: Neil Brown <neilb at suse.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit f679623f50545bc0577caf2d0f8675b61162f059
Author: Raz Ben-Jehuda(caro) <raziebe at gmail.com>
Date: Sun Dec 10 02:20:46 2006 -0800
[PATCH] md: handle bypassing the read cache (assuming nothing fails)
Signed-off-by: Neil Brown <neilb at suse.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 23032a0eb97c8eaae8ac9d17373b53b19d0f5413
Author: Raz Ben-Jehuda(caro) <raziebe at gmail.com>
Date: Sun Dec 10 02:20:45 2006 -0800
[PATCH] md: define raid5_mergeable_bvec
This will encourage read request to be on only one device, so we will often be
able to bypass the cache for read requests.
Signed-off-by: Neil Brown <neilb at suse.de>
Cc: Jens Axboe <jens.axboe at oracle.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 0d4ca600fcf5c5f3a0c195ccf37e989b83451dd4
Author: NeilBrown <neilb at suse.de>
Date: Sun Dec 10 02:20:44 2006 -0800
[PATCH] md: tidy up device-change notification when an md array is stopped
An md array can be stopped leaving all the setting still in place, or it can
torn down and destroyed. set_capacity and other change notifications only
happen in the latter case, but should happen in both.
Signed-off-by: Neil Brown <neilb at suse.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit a3d899839064b6924c3d8a6404dae14c79f657fd
Author: Paul Mackerras <paulus at samba.org>
Date: Sun Dec 10 02:20:42 2006 -0800
[PATCH] Fbdev driver for IBM GXT4500P videocards
This is an fbdev driver for the IBM GXT4500P display card found in some IBM
System P (pSeries) machines. These cards have hardware 2D and 3D
capabilities, but the driver does not use them; it just exports a dumb
framebuffer.
Signed-off-by: Paul Mackerras <paulus at samba.org>
Acked-by: James Simmons <jsimmons at infradead.org>
Cc: "Antonino A. Daplas" <adaplas at pol.net>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit ee2f344b33b507af23610c8fdfdde38d7c10fb33
Author: Alan Cox <alan at lxorguk.ukuu.org.uk>
Date: Sun Dec 10 02:20:39 2006 -0800
[PATCH] ide-cd: Handle strange interrupt on the Intel ESB2
The ESB2 appears to emit spurious DMA interrupts when configured for native
mode and handling ATAPI devices. Stratus were able to pin this bug down and
produce a patch. This is a rework which applies the fixup only to the ESB2
(for now). We can apply it to other chips later if the same problem is found.
This code has been tested and confirmed to fix the problem on the tested
systems.
Signed-off-by: Alan Cox <alan at redhat.com>
(Most of the hard work done by Stratus however)
Cc: Jens Axboe <axboe at suse.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 33859f7f9788da2ac9aa23be4dc8e948112809ca
Author: Miguel Ojeda Sandonis <maxextreme at gmail.com>
Date: Sun Dec 10 02:20:38 2006 -0800
[PATCH] kernel/sched.c: whitespace cleanups
[akpm at osdl.org: additional cleanups]
Signed-off-by: Miguel Ojeda Sandonis <maxextreme at gmail.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Cc: Nick Piggin <nickpiggin at yahoo.com.au>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 62ab616d54371a65f595c199aad1e1755b837d25
Author: Chen, Kenneth W <kenneth.w.chen at intel.com>
Date: Sun Dec 10 02:20:36 2006 -0800
[PATCH] sched: optimize activate_task for RT task
RT task does not participate in interactiveness priority and thus shouldn't
be bothered with timestamp and p->sleep_type manipulation when task is
being put on run queue. Bypass all of the them with a single if (rt_task)
test.
Signed-off-by: Ken Chen <kenneth.w.chen at intel.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Cc: Nick Piggin <nickpiggin at yahoo.com.au>
Cc: "Siddha, Suresh B" <suresh.b.siddha at intel.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 06066714f6016cffcb249f6ab21b7919de1bc859
Author: Chen, Kenneth W <kenneth.w.chen at intel.com>
Date: Sun Dec 10 02:20:35 2006 -0800
[PATCH] sched: remove lb_stopbalance counter
Remove scheduler stats lb_stopbalance counter. This counter can be
calculated by: lb_balanced - lb_nobusyg - lb_nobusyq. There is no need to
create gazillion counters while we can derive the value.
Signed-off-by: Ken Chen <kenneth.w.chen at intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha at intel.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 783609c6cb4eaa23f2ac5c968a44483584ec133f
Author: Siddha, Suresh B <suresh.b.siddha at intel.com>
Date: Sun Dec 10 02:20:33 2006 -0800
[PATCH] sched: decrease number of load balances
Currently at a particular domain, each cpu in the sched group will do a
load balance at the frequency of balance_interval. More the cores and
threads, more the cpus will be in each sched group at SMP and NUMA domain.
And we endup spending quite a bit of time doing load balancing in those
domains.
Fix this by making only one cpu(first idle cpu or first cpu in the group if
all the cpus are busy) in the sched group do the load balance at that
particular sched domain and this load will slowly percolate down to the
other cpus with in that group(when they do load balancing at lower
domains).
Signed-off-by: Suresh Siddha <suresh.b.siddha at intel.com>
Cc: Christoph Lameter <clameter at engr.sgi.com>
Cc: Ingo Molnar <mingo at elte.hu>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit b18ec80396834497933d77b81ec0918519f4e2a7
Author: Mike Galbraith <efault at gmx.de>
Date: Sun Dec 10 02:20:31 2006 -0800
[PATCH] sched: improve migration accuracy
Co-opt rq->timestamp_last_tick to maintain a cache_hot_time evaluation
reference timestamp at both tick and sched times to prevent said reference,
formerly rq->timestamp_last_tick, from being behind task->last_ran at
evaluation time, and to move said reference closer to current time on the
remote processor, intent being to improve cache hot evaluation and
timestamp adjustment accuracy for task migration.
Fix minor sched_time double accounting error which occurs when a task
passing through schedule() does not schedule off, and takes the next timer
tick.
[kenneth.w.chen at intel.com: cleanup]
Signed-off-by: Mike Galbraith <efault at gmx.de>
Acked-by: Ingo Molnar <mingo at elte.hu>
Acked-by: Ken Chen <kenneth.w.chen at intel.com>
Cc: Don Mullis <dwm at meer.net>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 08c183f31bdbb709f177f6d3110d5f288ea33933
Author: Christoph Lameter <clameter at sgi.com>
Date: Sun Dec 10 02:20:29 2006 -0800
[PATCH] sched: add option to serialize load balancing
Large sched domains can be very expensive to scan. Add an option SD_SERIALIZE
to the sched domain flags. If that flag is set then we make sure that no
other such domain is being balanced.
[akpm at osdl.org: build fix]
Signed-off-by: Christoph Lameter <clameter at sgi.com>
Cc: Peter Williams <pwil3058 at bigpond.net.au>
Cc: Nick Piggin <nickpiggin at yahoo.com.au>
Cc: Christoph Lameter <clameter at sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha at intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen at intel.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu at jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 1bd77f2da58e9cdd1f159217887343dadd9af417
Author: Christoph Lameter <clameter at sgi.com>
Date: Sun Dec 10 02:20:27 2006 -0800
[PATCH] sched: call tasklet less frequently
Trigger softirq less frequently
We trigger the softirq before this patch using offset of sd->interval.
However, if the queue is busy then it is sufficient to schedule the softirq
with sd->interval * busy_factor.
So we modify the calculation of the next time to balance by taking
the interval added to last_balance again. This is only the
right value if the idle/busy situation continues as is.
There are two potential trouble spots:
- If the queue was idle and now gets busy then we call rebalance
early. However, that is not a problem because we will then use
the longer interval for the next period.
- If the queue was busy and becomes idle then we potentially
wait too long before rebalancing. However, when the task
goes idle then idle_balance is called. We add another calculation
of the next balance time based on sd->interval in idle_balance
so that we will rebalance soon.
V2->V3:
- Calculate rebalance time based on current jiffies and not
based on the jiffies at the last time we load balanced.
We no longer rely on staggering and therefore we can
affort to do this now.
V3->V4:
- Use functions to do jiffy comparisons.
Signed-off-by: Christoph Lameter <clameter at sgi.com>
Cc: Peter Williams <pwil3058 at bigpond.net.au>
Cc: Nick Piggin <nickpiggin at yahoo.com.au>
Cc: Christoph Lameter <clameter at sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha at intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen at intel.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu at jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit c9819f4593e8d052b41a89f47140f5c5e7e30582
Author: Christoph Lameter <clameter at sgi.com>
Date: Sun Dec 10 02:20:25 2006 -0800
[PATCH] sched: use softirq for load balancing
Call rebalance_tick (renamed to run_rebalance_domains) from a newly introduced
softirq.
We calculate the earliest time for each layer of sched domains to be rescanned
(this is the rescan time for idle) and use the earliest of those to schedule
the softirq via a new field "next_balance" added to struct rq.
Signed-off-by: Christoph Lameter <clameter at sgi.com>
Cc: Peter Williams <pwil3058 at bigpond.net.au>
Cc: Nick Piggin <nickpiggin at yahoo.com.au>
Cc: Christoph Lameter <clameter at sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha at intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen at intel.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu at jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit e418e1c2bf1a253916b569370653414eb28597b6
Author: Christoph Lameter <clameter at sgi.com>
Date: Sun Dec 10 02:20:23 2006 -0800
[PATCH] sched: move idle status calculation into rebalance_tick()
Perform the idle state determination in rebalance_tick.
If we separate balancing from sched_tick then we also need to determine the
idle state in rebalance_tick.
V2->V3
Remove useless idlle != 0 check. Checking nr_running seems
to be sufficient. Thanks Suresh.
Signed-off-by: Christoph Lameter <clameter at sgi.com>
Cc: Peter Williams <pwil3058 at bigpond.net.au>
Cc: Nick Piggin <nickpiggin at yahoo.com.au>
Cc: Christoph Lameter <clameter at sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha at intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen at intel.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu at jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 7835b98bc6de2ca10afa45572d272304b000b048
Author: Christoph Lameter <clameter at sgi.com>
Date: Sun Dec 10 02:20:22 2006 -0800
[PATCH] sched: extract load calculation from rebalance_tick
A load calculation is always done in rebalance_tick() in addition to the real
load balancing activities that only take place when certain jiffie counts have
been reached. Move that processing into a separate function and call it
directly from scheduler_tick().
Also extract the time slice handling from scheduler_tick and put it into a
separate function. Then we can clean up scheduler_tick significantly. It
will no longer have any gotos.
Signed-off-by: Christoph Lameter <clameter at sgi.com>
Cc: Peter Williams <pwil3058 at bigpond.net.au>
Cc: Nick Piggin <nickpiggin at yahoo.com.au>
Cc: Christoph Lameter <clameter at sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha at intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen at intel.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu at jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit fe2eea3fafb3df2f5b8a55a48bcbb0d23b3b5618
Author: Christoph Lameter <clameter at sgi.com>
Date: Sun Dec 10 02:20:21 2006 -0800
[PATCH] sched: disable interrupts for locking in load_balance()
Interrupts must be disabled for request queue locks if we want to run
load_balance() with interrupts enabled.
Signed-off-by: Christoph Lameter <clameter at sgi.com>
Cc: Peter Williams <pwil3058 at bigpond.net.au>
Cc: Nick Piggin <nickpiggin at yahoo.com.au>
Cc: Christoph Lameter <clameter at sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha at intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen at intel.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu at jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 4211a9a2e94a34df8c02bc39b7ec10678ad5c2ab
Author: Christoph Lameter <clameter at sgi.com>
Date: Sun Dec 10 02:20:19 2006 -0800
[PATCH] sched: remove staggering of load balancing
Timer interrupts already are staggered. We do not need an additional layer of
time staggering for short load balancing actions that take a reasonably small
portion of the time slice.
For load balancing on large sched_domains we will add a serialization later
that avoids concurrent load balance operations and thus has the same effect as
load staggering.
Signed-off-by: Christoph Lameter <clameter at sgi.com>
Cc: Peter Williams <pwil3058 at bigpond.net.au>
Cc: Nick Piggin <nickpiggin at yahoo.com.au>
Cc: Christoph Lameter <clameter at sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha at intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen at intel.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu at jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 571f6d2fb0b1c04798df783db2ba85e96bcce43d
Author: Christoph Lameter <clameter at sgi.com>
Date: Sun Dec 10 02:20:13 2006 -0800
[PATCH] sched: avoid taking rq lock in wake_priority_sleeper
Avoid taking the request queue lock in wake_priority_sleeper if there are no
running processes.
Signed-off-by: Christoph Lameter <clameter at sgi.com>
Cc: Peter Williams <pwil3058 at bigpond.net.au>
Cc: Nick Piggin <nickpiggin at yahoo.com.au>
Cc: Christoph Lameter <clameter at sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha at intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen at intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu at jp.fujitsu.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit ac7d550499e225efb51a53d0b00667f26b93bdff
Author: Siddha, Suresh B <suresh.b.siddha at intel.com>
Date: Sun Dec 10 02:20:12 2006 -0800
[PATCH] sched domain: increase the SMT busy rebalance interval
With SMT, if the logical processor is busy, load balance happens for every
8msec(min)-16msec(max). There is no need to do this often, as this is just
for fairness(to maintain uniform runqueue lengths) and default time slice
anyhow is 100msec.
Appended patch increases this interval to 64msec(min)-128msec(max) when the
logical processor is busy.
Signed-off-by: Suresh Siddha <suresh.b.siddha at intel.com>
Cc: Nick Piggin <nickpiggin at yahoo.com.au>
Acked-by: Ingo Molnar <mingo at elte.hu>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 054b9108e01ef27e2e6b32b4226abb6024626f06
Author: Kirill Korotaev <dev at openvz.org>
Date: Sun Dec 10 02:20:11 2006 -0800
[PATCH] move_task_off_dead_cpu() should be called with disabled ints
move_task_off_dead_cpu() requires interrupts to be disabled, while
migrate_dead() calls it with enabled interrupts. Added appropriate
comments to functions and added BUG_ON(!irqs_disabled()) into
double_rq_lock() and double_lock_balance() which are the origin sources of
such bugs.
Signed-off-by: Kirill Korotaev <dev at openvz.org>
Acked-by: Ingo Molnar <mingo at elte.hu>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 6711cab43ed5e60bf51e3dbbce6395e87d4e9805
Author: Siddha, Suresh B <suresh.b.siddha at intel.com>
Date: Sun Dec 10 02:20:07 2006 -0800
[PATCH] ched domain: move sched group allocations to percpu area
Move the sched group allocations to percpu area. This will minimize cross
node memory references and also cleans up the sched groups allocation for
allnodes sched domain.
Signed-off-by: Suresh Siddha <suresh.b.siddha at intel.com>
Acked-by: Ingo Molnar <mingo at elte.hu>
Acked-by: Christoph Lameter <clameter at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit cc2a73b5caf065f8612fcb5df5bd2f5e25881d99
Author: Robert P. J. Day <rpjday at mindspring.com>
Date: Sun Dec 10 02:20:00 2006 -0800
[PATCH] sched.c: correct comment for this_rq_lock()
Acked-by: Ingo Molnar <mingo at elte.hu>
Signed-off-by: Robert P. J. Day <rpjday at mindspring.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 596afa41b21a414e523936b27100456f0f85e411
Author: Ralf Baechle <ralf at linux-mips.org>
Date: Sun Dec 10 02:19:58 2006 -0800
[PATCH] Don't build some broken ISDN drivers on big endian MIPS
Signed-off-by: Ralf Baechle <ralf at linux-mips.org>
Cc: Karsten Keil <kkeil at suse.de>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit cf709844d8a8fa21c59772d1a069ae0efa15e981
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:56 2006 -0800
[PATCH] io-accounting: add to getdelays
Wire up the IO accounting into getdelays.c.
Usage:
To display I/O stats for each exitting task:
vmm:/home/akpm> ./getdelays -m0,1,2,3 -i -l
cpumask 0 maskset 1
printing IO accounting
listen forever
rm: read=8192, write=0, cancelled_write=0
cvs: read=733184, write=4255744, cancelled_write=4096
make: read=217088, write=0, cancelled_write=0
cc1: read=4263936, write=12288, cancelled_write=0
as: read=811008, write=8192, cancelled_write=0
gcc: read=323584, write=0, cancelled_write=12288
cc1: read=0, write=8192, cancelled_write=0
as: read=4096, write=4096, cancelled_write=0
gcc: read=16384, write=0, cancelled_write=4096
as: read=4096, write=4096, cancelled_write=0
gcc: read=16384, write=0, cancelled_write=8192
ld: read=1011712, write=16384, cancelled_write=0
collect2: read=626688, write=0, cancelled_write=0
gcc: read=204800, write=0, cancelled_write=0
cc1: read=0, write=8192, cancelled_write=0
as: read=4096, write=4096, cancelled_write=0
gcc: read=16384, write=0, cancelled_write=8192
ld: read=8192, write=16384, cancelled_write=0
collect2: read=49152, write=0, cancelled_write=0
gcc: read=0, write=0, cancelled_write=0
cc1: read=0, write=4096, cancelled_write=0
ld: read=4096, write=12288, cancelled_write=0
collect2: read=49152, write=0, cancelled_write=0
gcc: read=0, write=0, cancelled_write=0
To display I/O stats for a particular presently-running task:
vmm:/home/akpm> ./getdelays -i -p $(pidof crond)
printing IO accounting
crond: read=61440, write=0, cancelled_write=0
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit d2f7bf13461e8ead863126ee1e8ba92105959ecc
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:55 2006 -0800
[PATCH] getdelays: various fixes
- Various cleanups
- Report errors to stderr, not stdout
- A printf was missing a \n and was hiding from me.
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 4a7864ca638e0a38307962ee8ef122822a351b65
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:53 2006 -0800
[PATCH] io-accounting: via taskstats
Deliver IO accounting via taskstats.
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit f2f1f8a3b86ccc5e998dc70a3ba35af199fdbc58
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:50 2006 -0800
[PATCH] cleanup taskstats.h
Fix weird whitespace mangling in taskstats.h
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit aba76fdb8a5fefba73d3490563bf7c4da37b1a34
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:48 2006 -0800
[PATCH] io-accounting: report in procfs
Add a simple /proc/pid/io to show the IO accounting fields.
Maybe this shouldn't be merged in mainline - the preferred reporting channel
is taskstats. But given the poor state of our userspace support for
taskstats, this is useful for developer-testing, at least. And it improves
the changes that the procps developers will wire it up into top(1). Opinions
are sought.
The patch also wires up the existing IO-accounting fields.
It's a bit racy on 32-bit machines: if process A reads process B's
/proc/pid/io while process B is updating one of those 64-bit counters, process
A could see an intermediate result.
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 98c4d57decf97bf8ddfe948a3266aa56b38b1a51
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:47 2006 -0800
[PATCH] io-accounting: direct-io
Account for direct-io.
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 6f88cc2e9c29c181557b477ee396375906acbc90
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:44 2006 -0800
[PATCH] io-accounting-read-accounting cifs fix
CIFS implements ->readpages and doesn't use read_cache_pages(). So wire the
read IO accounting up within CIFS.
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: Steven French <sfrench at us.ibm.com>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 8bde37f08fe3340435f4320b5a092eeb55acebfd
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:40 2006 -0800
[PATCH] io-accounting-read-accounting nfs fix
nfs's ->readpages uses read_cache_pages(). Wire it up there.
[wfg at mail.ustc.edu.cn: account only successful nfs/fuse reads]
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Fengguang Wu <wfg at mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit faccbd4b26df7bd977cee33d4145155d0ef95c87
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:35 2006 -0800
[PATCH] io-accounting: read accounting
Wire up read accounting for block devices, within submit_bio().
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit e08748ce01e02f0ec154b141f392ccb9555333f4
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:31 2006 -0800
[PATCH] io-accounting: write-cancel accounting
Account for the number of byte writes which this process caused to not happen
after all.
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 55e829af06681e5d731c03ba04febbd1c76ca293
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:27 2006 -0800
[PATCH] io-accounting: write accounting
Accounting writes is fairly simple: whenever a process flips a page from clean
to dirty, we accuse it of having caused a write to underlying storage of
PAGE_CACHE_SIZE bytes.
This may overestimate the amount of writing: the page-dirtying may cause only
one buffer_head's worth of writeout. Fixing that is possible, but probably a
bit messy and isn't obviously important.
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 8c08540f8755c451d8b96ea14cfe796bc3cd712d
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:24 2006 -0800
[PATCH] clean up __set_page_dirty_nobuffers()
Save a tabstop in __set_page_dirty_nobuffers() and __set_page_dirty_buffers()
and a few other places. No functional changes.
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 7c3ab7381e79dfc7db14a67c6f4f3285664e1ec2
Author: Andrew Morton <akpm at osdl.org>
Date: Sun Dec 10 02:19:19 2006 -0800
[PATCH] io-accounting: core statistics
The present per-task IO accounting isn't very useful. It simply counts the
number of bytes passed into read() and write(). So if a process reads 1MB
from an already-cached file, it is accused of having performed 1MB of I/O,
which is wrong.
(David Wright had some comments on the applicability of the present logical IO accounting:
For billing purposes it is useless but for workload analysis it is very
useful
read_bytes/read_calls average read request size
write_bytes/write_calls average write request size
read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
write_bytes/write_blocks ie logical/physical guess since pdflush writes can
be missed
I often look for logical larger than physical to see filesystem cache
problems. And the bytes/cpusec can help find applications that are
dominating the cache and causing slow interactive response from page cache
contention.
I want to find the IO intensive applications and make sure they are doing
efficient IO. Thus the acctcms(sysV) or csacms command would give the high
IO commands).
This patchset adds new accounting which tries to be more accurate. We account
for three things:
reads:
attempt to count the number of bytes which this process really did cause
to be fetched from the storage layer. Done at the submit_bio() level, so it
is accurate for block-backed filesystems. I also attempt to wire up NFS and
CIFS.
writes:
attempt to count the number of bytes which this process caused to be sent
to the storage layer. This is done at page-dirtying time.
The big inaccuracy here is truncate. If a process writes 1MB to a file
and then deletes the file, it will in fact perform no writeout. But it will
have been accounted as having caused 1MB of write.
So...
cancelled_writes:
account the number of bytes which this process caused to not happen, by
truncating pagecache.
We _could_ just subtract this from the process's `write' accounting. But
that means that some processes would be reported to have done negative
amounts of write IO, which is silly.
So we just report the raw number and punt this decision up to userspace.
Now, we _could_ account for writes at the physical I/O level. But
- This would require that we track memory-dirtying tasks at the per-page
level (would require a new pointer in struct page).
- It would mean that IO statistics for a process are usually only available
long after that process has exitted. Which means that we probably cannot
communicate this info via taskstats.
This patch:
Wire up the kernel-private data structures and the accessor functions to
manipulate them.
Cc: Jay Lan <jlan at sgi.com>
Cc: Shailabh Nagar <nagar at watson.ibm.com>
Cc: Balbir Singh <balbir at in.ibm.com>
Cc: Chris Sturtivant <csturtiv at sgi.com>
Cc: Tony Ernst <tee at sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin at bull.net>
Cc: David Wright <daw at sgi.com>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 47694bb86af3648d4ec34c7afd46653cefc9b359
Author: Sergei Shtylyov <sshtylyov at ru.mvista.com>
Date: Sun Dec 10 02:19:13 2006 -0800
[PATCH] pdc202xx_new: fix PLL/timing issues
Fix the CRC errors in the higher UltraDMA modes with the Promise PDC20268
and newer chips that always occur on non-x86 machines and when there are
more than 2 adapters on x86 machines. Fix the overclocking issue for
PDC20269 and newer chips that occurs when an UltraDMA/133 capable drive is
connected. Here's the summary of changes:
- add code to detect the PLL input clock detection and setup it output clock,
remove the PowerMac hacks;
- replace the macros accessing the indexed regiters with functions, switch to
using them where appropriate, gather the PIO/MWDMA/UDMA timings into tables;
- rewrite the speedproc() handler to set the drive's transfer mode first, and
then override the timing registers set by hardware on UltraDMA/133 chips;
- use better criterion for determining higher UltraDMA modes, and add comment
concerning the doubtful value of the code enabling IORDY/prefetch;
- replace the stupid 'pdcnew_new_' prefixes with mere 'pdcnew_';
- get rid of unneded spaces, parens and type casts, clean up some printk's,
add some new lines here and there...
This work is loosely based on these former patches by Albert Lee:
[1] http://marc.theaimsgroup.com/?l=linux-ide&m=110992442032300
[2] http://marc.theaimsgroup.com/?l=linux-ide&m=110992457729382
[3] http://marc.theaimsgroup.com/?l=linux-ide&m=110992474205555
[4] http://marc.theaimsgroup.com/?l=linux-ide&m=111019224802939
Some PLL clock detection code was backported from his pata_pdc2027x driver...
This code has been successfully tested by me on PDC2026[89] chips.
I tried to keep this rework as several patches but it made no sense: [2] was
largely a modification of the non-working timing override code, [3] by itself
extended the overclocking issue to the case of non-UltraDMA/133 drives, and
finally, the cleanup patch based on [1] ended up rejected...
Signed-off-by: Sergei Shtylyov <sshtylyov at ru.mvista.com>
Cc: Albert Lee <albertcc at tw.ibm.com>
Acked-by: Alan Cox <alan at lxorguk.ukuu.org.uk>
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz at elka.pw.edu.pl>
Signed-off-by: Andrew Morton <akpm at osdl.org>
Signed-off-by: Linus Torvalds <torvalds at osdl.org>
commit 58f64d83c37f5073a01573d27043c9c0ccc764f1
Author: David Woodhouse <dwmw2 at infradead.org>
Date: Sun Dec 10 02:19:11 2006 -0800
[PATCH] Fix noise in futex.h
There are some kernel-only bits in the middle of <linux/futex.h> which
should be removed in what we export to userspace.
Signed-off-by: David Woodhouse <dwmw2 at infradead.org>
Signed-of