hostap_cs causes kernel oops on 2.6.26 with senao nl-2511cd

Bug #254837 reported by Dan Taylor
12
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Tim Gardner
Jaunty
Fix Released
Medium
Colin Ian King
Karmic
Fix Released
Medium
Tim Gardner
linux-ubuntu-modules-2.6.24 (Ubuntu)
Invalid
Undecided
Unassigned
Jaunty
Invalid
Undecided
Unassigned
Karmic
Invalid
Undecided
Unassigned

Bug Description

SRU Justification:

Impact: Booting with the Senao NL-2511CD (PRISM II compatible) wireless card can generate a kernel oops in the hostap interrupt handler.

Spurious shared interrupts or early probing interrupts can cause the hostap interrupt handler to oops before the driver has fully configured the IO base port addresses. In some cases the oops can be because the hardware shares an interrupt line, on other cases it is due to a race condition between probing for the hardware and configuring the IO base port. The latter occurs because the probing is required to determine the hardware port address which is only determined when the probe can interrupt the hardware (catch 22).

Fix: This patch catches this pre-configured condition in the interrupt handler to avoid the oops.

Testcase: Without the patch a kernel oops occurs on boot when the card is installed. With the patch, there is no kernel oops and the wireless card works.

---

Binary package hint: linux-image-2.6.26-5-generic

Hello,
I have a Senao NL-2511CD Plus Ext2 on a Dell Inspiron 4150 notebook running Ubuntu Intrepid (development release) with the latest kernel 2.6.26-5. I have observed this behavior in kernel releases since Gutsy (when I started using linux on this laptop with this card).

Whenever the card is inserted on boot, a kernel panic occurs on the hostap_cs drivers (see screenshot).
If I insert the card once all the modules are loaded, it detects, works and acts normally with no errors in dmesg.

I am aware of the existing bug where the orinoco drivers are loaded along with the hostap drivers when detecting these cards, and I have blacklisted the orinoco drivers. Regardless, this happens with or without the orinoco drivers blacklisted.

Here is the output from hostap_diag:

NICID: id=0x800c v1.0.0 (PRISM II (2.5) PCMCIA (SST parallel flash))
PRIID: id=0x0015 v1.1.1
STAID: id=0x001f v1.8.2 (station firmware)

I had the stock firmware on the card and flashed it to see if it made a difference, and it doesn't.

What REALLY threw me for a loop was when I was trying to get console output via serial to post the debugging output of this crash (booting with linux option console=ttyS0,9600,8,n,1), it would detect the card and not crash while booting! But, I got a framebuffer driver working nicely and managed to snap a picture.

Revision history for this message
Dan Taylor (slash) wrote :
Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Colin Ian King (colin-king) wrote :

Hi,

Can you boot the machine and then insert the card and attach the following details:

dmesg > dmesg.log

cat /proc/interrupts > interrupts.log

Meanwhile, I will see if I can get some debug put into a kernel image for you to test.

Colin

Changed in linux:
assignee: ubuntu-kernel-team → colin-king
status: Triaged → In Progress
Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks for information.

I have put a couple of kernel image deb packages into:
http://people.ubuntu.com/~cking/debug-254837-1/

If you can install the appropriate deb using sudo dpkg -i and reboot with the device attached and send me the console output again, it may help me corner the bug.

Colin

Revision history for this message
Dan Taylor (slash) wrote :

Here's the new output as requested. Note that as before with the previous screenshot, I had to add radeonfb to the initramfs and boot with video=radeonfb in order to get the crash output to fit on the screen,

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Dan,

Thanks for the feedback. One more thing, can you add to your boot line the following:

ignore_loglevel

and redo the test and attach the screenshot - and if there is any easy way of getting more lines on the console by changing font size or screen resolution it will be helpful - I may get some more details that may help corner this.

Thanks. Colin

Revision history for this message
Dan Taylor (slash) wrote :

Here's the new screenshot. the radeon framebuffer automatically sizes to the maximum resolution it supports, so unfortunately I can't get any more information on the screen than what is there. If only it would crash when I change the console output to serial, this would be alot less tedious :)

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi again, sorry to take so long to get back to you.

I have put another kernel image deb package, this time into:

http://people.ubuntu.com/~cking/debug-254837-2/

I've reduced the stack trace and added some more debug - hopefully we can get a capture of some of my debug.

Please can install the deb using sudo dpkg -i and reboot with the device attached and send me the console output again.

Also, with the device unattached can attach the following information:

sudo lspci -vvnn
sudo lspci -vn

Thank you.

Colin

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :

I'd also like to note that the issue still remains in kernel 2.6.27

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi again,

I have put another kernel image deb package, this time into:

http://people.ubuntu.com/~cking/debug-254837-3

I hope this will give me a little more information on why the interrupt handler is jumping to address 0x00000000

If you can try this out and attach the screen shot, that would be helpful. Thanks! Colin

Revision history for this message
Dan Taylor (slash) wrote :

Sorry for the belated response, Here's the screenshot with the new kernel image as requested

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi there,

can you add "lapic" to the kernel boot line and re-try.

Thanks.

Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

Hi again,

I have put another kernel image deb package, this time into:

http://people.ubuntu.com/~cking/debug-254837-4

This has some more sanity checking to see why the interrupt handler is jumping to a zero address, and also some checks on the interrupt vector to see if they are getting corrupted during the kernel init stages.

Thanks for testing this and your patience.

Colin

Revision history for this message
Dan Taylor (slash) wrote :

There was alot of information coming across the screen, so I managed to snap 3 photos and got as much as I could.
(kernel was booted with lapic as before)

Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks for this information - it was a lot of help for the next round of debugging.

I have put another kernel image deb package, this time into:

http://people.ubuntu.com/~cking/debug-254837-5

The debug will cause a kernel panic when it detects the zeroing of the interrupt vectors 16 upwards - hopefully the final screenshot is all I need. I've put more debug into the yenta socket code to try and get some more information about where this bug is occurring.

Thank you for your help and patience while I try and corner this bug.

Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Dan,

Is it possible for you to see if you can enable your local APIC in your BIOS set up? The kernel is reporting it is disabled, which does not seem correct to me.

Meanwhile, I have a good understanding now of why the OOPS is occurring and I am opening up a discussion with the maintainer of the code.

Colin

Revision history for this message
Colin Ian King (colin-king) wrote :

I have another put another kernel image deb package, this time into:

http://people.ubuntu.com/~cking/debug-254837-6

This tweaks a possible bug in the yenta_socket driver, and also has a patch to stop the kernel from OOPsing. I've bumped this to the latest Intrepid Kernel too. Please let me know of the results. Thanks!

Revision history for this message
Dan Taylor (slash) wrote :

Unfortunately the BIOS has no option to enable/disable APIC, or to adjust any IRQ settings for that matter (it is flashed with the latest BIOS available from Dell for this laptop).

I've booted with the new kernel image and attached is a new screenshot. It still hangs up on detecting the card.
Thanks for all your help!

Dan

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Dan,

One more try... I think this kernel may get a little further now:

http://people.ubuntu.com/~cking/debug-254837-7

Please let me know of outcome. Colin.

Revision history for this message
Dan Taylor (slash) wrote :

Still hangs - here's the screenshot

Dan

Revision history for this message
Colin Ian King (colin-king) wrote :

Dan, can you try pci=noacpi in the boot line?

Revision history for this message
Colin Ian King (colin-king) wrote :

..and if the above fails try assign-busses in the boot line too.

Revision history for this message
Dan Taylor (slash) wrote :

I tried pci=noacpi by itself, still locked.
Trying pci=noacpi assign-busses had the same result.
Attached is the screenshot with assign-busses

Revision history for this message
wondra (wondra) wrote :

Greetings,
have you seen this article?
http://www.aircrack-ng.org/doku.php?id=hostap

What you are experiencing may be the shared irq bug in hostap. The patch they have there helped me. I cannot confirm the script, because I didn't have orinoco_cs compiled before I patched the kernel, but I managed to insert the card after some fiddling with modprobe and cardctl, so it may work as well.

The kernel is also 2.6.24, though the distro is far from Ubuntu.

wondra

Revision history for this message
Colin Ian King (colin-king) wrote :

Many thanks wondra. I will investigate this patch.

Revision history for this message
Colin Ian King (colin-king) wrote :

Dan,

I'd like to explore the fact that your hardware is generating spurious interrupts. I'd just like you to try various boot options to make certain I've covered the IRQ side of things a bit more thoroughly. Can you try the following boot options:

irqfixup=1
(and possibly irqfixup=2)

pci=biosirq

pci=usepirqmask

pci=noacpi

pci=nobios

acpi=noirq

No need to attach a console picture. Just let me know if any work.

Colin

Revision history for this message
Dan Taylor (slash) wrote :

Colin,

I've tried everything you have suggested, but none of these boot options have worked.

Dan

Revision history for this message
Colin Ian King (colin-king) wrote :

I think I need to understand a little more about your machines DSDT. Can you do the following:

sudo cat /proc/acpi/dsdt > dsdt.dat

and disassemble the dsdt file with:

sudo apt-get install iasl
iasl -d dsdt.dat

and attach the disassembled dstd.dsl file.

Thanks, Colin

Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

Unfortunately it seems this bug is still an issue. Can you confirm this issue exists with the most recent Jaunty Jackalope 9.04 release - http://www.ubuntu.com/news/ubuntu-9.04-desktop . Please let us know your results. Thanks.

Revision history for this message
Dan Taylor (slash) wrote : Re: [Bug 254837] Re: hostap_cs causes kernel oops on 2.6.26 with senao nl-2511cd

Colin King wrote:
> Unfortunately it seems this bug is still an issue. Can you confirm this
> issue exists with the most recent Jaunty Jackalope 9.04 release -
> http://www.ubuntu.com/news/ubuntu-9.04-desktop . Please let us know
> your results. Thanks.
>

Yes, I can confirm the bug is still there in 9.04 with
2.6.28-11-generic. As before, the wireless card works fine if inserted
after boot.

Dan

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Dan,

Can you confirm this issue exists with the Karmic Alpha 9.10 release. Images for testing are available at http://cdimage.ubuntu.com/daily-live/current/ . Please let us know your results.

If the issue remains while still running Karmic, please run the following command which will automatically gather and attach updated debug information:

apport-collect -p linux 254837

Thanks.

Changed in linux (Ubuntu):
status: In Progress → Incomplete
Revision history for this message
Dan Taylor (slash) wrote : apport-collect data

Architecture: i386
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=3a75c586-8bc0-4918-b8b0-2300c167c3ff
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Dell Computer Corporation Inspiron 4150
Package: linux (not installed)
ProcCmdLine: root=UUID=dde38425-230a-4000-81f7-7f1c0d3da80c ro quiet splash
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, no user)
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-4.23-generic
RelatedPackageVersions: linux-backports-modules-2.6.31-4-generic N/A
Uname: Linux 2.6.31-4-generic i686
UserGroups:

dmi.bios.date: 05/15/2003
dmi.bios.vendor: Dell Computer Corporation
dmi.bios.version: A06
dmi.board.vendor: Dell Computer Corporation
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Computer Corporation
dmi.modalias: dmi:bvnDellComputerCorporation:bvrA06:bd05/15/2003:svnDellComputerCorporation:pnInspiron4150:pvr:rvnDellComputerCorporation:rn:rvr:cvnDellComputerCorporation:ct8:cvr:
dmi.product.name: Inspiron 4150
dmi.sys.vendor: Dell Computer Corporation

Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :
Revision history for this message
Dan Taylor (slash) wrote :
Changed in linux (Ubuntu):
status: Incomplete → New
tags: added: apport-collected
Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks Dan. After some digging around I suspect this problem occurs because the device shares an interrupt line with other devices and the interrupt handler for this driver is trying to handle and interrupt before the rest of the driver is fully initialised, which leads to the oops.

Which version is your current kernel so that I can build one with some debug to verify my assumption?

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Dan Taylor (slash) wrote : Re: [Bug 254837] Re: hostap_cs causes kernel oops on 2.6.26 with senao nl-2511cd

2.6.31-5-generic

- Dan

Colin King wrote:
> Thanks Dan. After some digging around I suspect this problem occurs
> because the device shares an interrupt line with other devices and the
> interrupt handler for this driver is trying to handle and interrupt
> before the rest of the driver is fully initialised, which leads to the
> oops.
>
> Which version is your current kernel so that I can build one with some
> debug to verify my assumption?
>

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Dan,

Please download and install the appropriate deb package from http://people.canonical.com/~cking/sru-254837/ and re-test. Please attach the dmesg as this contains some debug information to verify my assumptions about this bug.

Thanks!

Revision history for this message
Dan Taylor (slash) wrote :

Colin,

I installed the kernel package you posted ( linux-image-2.6.31-5-generic_2.6.31-5.24_i386.deb ) and to my surprise it actually booted, detected the wireless card, and worked like a champ without any kernel panics.

Attached is the dmesg output as you requested.

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Dan,

Thanks for testing - it helped me identify exactly why the bug occurred and how to optimally fix it. Now I have a kernel fix without all the extra debug messages. Please can you please download and test the appropriate kernel from:

http://people.canonical.com/~cking/sru-254837/karmic

Please let me know if this fixes the bug for sure, and also attach the dmesg output so I can sanity check my fix. Then I get this push this patch pushed so you and others can benefit from it on subsequent kernel releases. Thank you for your help.

Revision history for this message
Dan Taylor (slash) wrote :

Colin, I am happy to report that it booted perfectly with no problems whatsoever. Attached is the dmesg output.
- Dan

Stefan Bader (smb)
description: updated
Stefan Bader (smb)
Changed in linux-ubuntu-modules-2.6.24 (Ubuntu):
status: New → Invalid
Changed in linux-ubuntu-modules-2.6.24 (Ubuntu Jaunty):
status: New → Invalid
Changed in linux (Ubuntu Jaunty):
assignee: nobody → Colin King (colin-king)
importance: Undecided → Medium
status: New → Fix Committed
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Karmic):
assignee: Colin King (colin-king) → Tim Gardner (timg-tpi)
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted linux into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.31-10.30

---------------
linux (2.6.31-10.30) karmic; urgency=low

  [ Amit Kucheria ]

  * [Config] Enable CONFIG_USB_DEVICEFS
    - LP: #417748
  * [Config] Populate the config-update template a bit more

  [ Andy Whitcroft ]

  * rebase to v2.6.31-rc9
  * [Config] update configs following rebase to v2.6.31-rc9
  * [Config] update ports configs following rebase to v2.6.31-rc9

  [ Colin Ian King ]

  * SAUCE: wireless: hostap, fix oops due to early probing interrupt
    - LP: #254837

  [ Jerone Young ]

  * [Upstream] ACPI: Add Thinkpad T400 & Thinkpad T500 to OSI(Linux)
    white-list
    - LP: #281732
  * [Upstream] ACPI: Add Thinkpad X200, X200s, X200t to OSI(Linux)
    white-list
    - LP: #281732
  * [Upstream] ACPI: Add Thinkpad X300 & Thinkpad X301 to OSI(Linux)
    white-list
    - LP: #281732
  * [Upstream] ACPI: Add Thinkpad R400 & Thinkpad R500 to OSI(Linux)
    white-list
    - LP: #281732
  * [Upstream] ACPI: Add Thinkpad W500, W700, & W700ds to OSI(Linux)
    white-list
    - LP: #281732

  [ John Johansen ]

  * SAUCE: AppArmor: Fix profile attachment for regexp based profile names
    - LP: #419308
  * SAUCE: AppArmor: Return the correct error codes on profile
    addition/removal
    - LP: #408473
  * SAUCE: AppArmor: Fix OOPS in profile listing, and display full list
    - LP: #408454
  * SAUCE: AppArmor: Fix mapping of pux to new internal permission format
    - LP: #419222
  * SAUCE: AppArmor: Fix change_profile failure
    - LP: #401931
  * SAUCE: AppArmor: Tell git to ignore generated include files
    - LP: #419505

  [ Stefan Bader ]

  * [Upstream] acpi: video: Loosen strictness of video bus detection code
    - LP: #333386
  * SAUCE: Remove ov511 driver from ubuntu subdirectory

  [ Tim Gardner ]

  * [Config] Exclude char-modules from non-x86 udeb creation
  * SAUCE: Notify the ACPI call chain of AC events
  * [Config] CONFIG_SATA_VIA=m
    - LP: #403385
  * [Config] Build in all phylib support modules.
  * [Config] Don't fail when sub-flavour files are missing
    - LP: #423426
  * [Config] Set CONFIG_LSM_MMAP_MIN_ADDR=0
    - LP: #423513

  [ Upstream ]

  * Rebased against v2.6.31-rc9

 -- Andy Whitcroft <email address hidden> Mon, 07 Sep 2009 11:33:45 +0100

Changed in linux (Ubuntu Karmic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.28-15.52

---------------
linux (2.6.28-15.52) jaunty-proposed; urgency=low

  [ Stefan Bader ]

  * Revert "SAUCE: ACPI: Populate DIDL before registering ACPI video device
    on Intel"
    - LP: #423296
  * SAUCE: Allow less restrictive acpi video detection
    - LP: #333386

  [ Upstream Kernel Changes ]

  * include drivers/pci/hotplug/* in -virtual package
    - LP: #364916
  * ext4: don't call jbd2_journal_force_commit_nested without journal
    - LP: #418197
  * ext4: fix ext4_free_inode() vs. ext4_claim_inode() race
    - LP: #418197
  * ext4: fix bogus BUG_ONs in in mballoc code
    - LP: #418197
  * ext4: fix typo which causes a memory leak on error path
    - LP: #418197
  * ext4: Fix softlockup caused by illegal i_file_acl value in on-disk
    inode
    - LP: #418197
  * ext4: Fix sub-block zeroing for writes into preallocated extents
    - LP: #418197
  * jbd2: Call journal commit callback without holding j_list_lock
    - LP: #418197
  * ext4: Print the find_group_flex() warning only once
    - LP: #367065
  * ext4: really print the find_group_flex fallback warning only once
    - LP: #367065

linux (2.6.28-15.51) jaunty-proposed; urgency=low

  [ Colin Ian King ]

  * SAUCE: wireless: hostap, fix oops due to early probing interrupt
    - LP: #254837

  [ Leann Ogasawara ]

  * Add the atl1c driver to support Atheros AR8132
    - LP: #415358
  * Updating configs to enable the atl1c driver
    - LP: #415358

  [ Stefan Bader ]

  * Revert "SAUCE: input: Blacklist digitizers from joydev.c"
    - LP: #300143
  * SAUCE: Fix the exported name for e1000e-next
    - LP: #402890
  * SAUCE: Fix incorrect stable backport to bas_gigaset
    - LP: #417732
  * SAUCE: Remove the atl2 driver from the ubuntu subdirectory
    - LP: #419438

linux (2.6.28-15.50) jaunty-proposed; urgency=low

  [ Colin Ian King ]

  * SAUCE: radio-maestro: fix panics on probe failure
    - LP: #357724
  * SAUCE: HDA Intel, sigmatel: Enable speakers on HP Mini 1000
    - LP: #318942

  [ Jerone Young ]

  * SAUCE: Fix Soltech TA12 volume hotkeys not sending key release in
    Jaunty
    - LP: #397499

  [ John Johansen ]

  * SAUCE: remove AppArmor debug check for calls from interrupt context
    - LP: #350789

  [ Manoj Iyer ]

  * SAUCE: Fix kernel panic when SELinux is enabled.
    - LP: #395219

  [ Matthew Garrett ]

  * SAUCE: ACPI: Populate DIDL before registering ACPI video device on
    Intel

  [ Michael Frey (Senior Manager, MID ]

  * SAUCE: Fix for internal microphone for Dell Mini10V
    - LP: #394793

  [ Tim Gardner ]

  * SAUCE: Added e1000e from sourceforge.
    - LP: #402890

  [ Upstream Kernel Changes ]

  * Input: synaptics - report multi-taps only if supported by the device
    - LP: #399787
  * ftdi_sio: fix kref leak
    - LP: #396930, #376128
  * IPv6: add "disable" module parameter support to ipv6.ko
    - LP: #351656

 -- Stefan Bader <email address hidden> Thu, 27 Aug 2009 15:09:06 +0200

Changed in linux (Ubuntu Jaunty):
status: Fix Committed → Fix Released
To post a comment you must log in.