Getting Linux to boot on Lenovo Thinkpad E485/E585

Lenovo finally fixed this after 1 year! Install the UEFI update and remove the entries.
 
Please complain in the Lenovo forums about this issue: https://forums.lenovo.com/t5/ThinkPad-11e-Windows-13-E-and/ThinkPad-E485...
 
It took me a long time to get any Linux Distribution to work. This was a lot of trial and error. My results are:
  • Trying to boot 32bit Linux resulted in immediate reboots. No output before that happened.
  • Most Linux 64bit distributions can be booted by adding noapic (not noacpi!)  intremap=off ivrs_ioapic[32]=00:14.0 ivrs_ioapic[33]=00:00.1 to the kernel boot parameters
  • Recent Ubuntu (and most likely also Debian) based distributions need also the boot parameter spec_store_bypass_disable=prctl spec_store_bypass_disable=seccomp. Maybe other distributions/kernel from 2018 will need this, too. Seems that with Ubuntu 18.04 kernel 4.15.0-33 this is no longer needed. So try this omiting this parameter, and check that all devices you need are initalized (-> dmesg), then try seccomp, which disables the mitigations for less code and if still devices are missing, try prctl. 
 
With these parameters everything I tested works, the only drawback is that the fan(s?) run at full speedan  CPU fan works perfectly, if cold enough it will stop.
Without the noapic the system will crash before giving any output in debug mode. Without the spec_store_bypass_disable there will be a null pointer derefence in the USB driver/subsystem, with the stacktrace looking like
  • null pointer dereference in _raw_spin_lock
  • speculative_store_bypass
  • ssb_prctrl_set
  • arch_seccomp_spec_mitigate
  • do_seccomp
  • SyS_seccomp
  • do_syscall_64
  • entry_SYSCALL_64_after_hwframe
Sometimes the crash was triggered in switch_to_xtra or xhci_pci_suspend.
 

Update

After more testing it narrowed down to amd_iommu=off and then to intremap=off 

The final error is "timer doesn't work through interrupt-remapped IO-APIC".

 

crash stack trace

 

2nd Update

When booting Ubuntu, there is now a stack trace during GPU initialization:

[    1.973028] fbcon: amdgpudrmfb (fb0) is primary device
[    1.992315] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 1us * 100 tries - tgn10_lock line:566
[    1.992384] WARNING: CPU: 7 PID: 199 at /build/linux-uT8zSN/linux-4.15.0/drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:190 generic_reg_wait+0xe8/0x120 [amdgpu]
[    1.992384] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc chash i2c_algo_bit aesni_intel ttm aes_x86_64 drm_kms_helper crypto_simd glue_helper syscopyarea cryptd psmouse sysfillrect sysimgblt ahci fb_sys_fops i2c_piix4 libahci drm sdhci_pci r8169 nvme sdhci mii nvme_core wmi video i2c_scmi
[    1.992409] CPU: 7 PID: 199 Comm: systemd-udevd Not tainted 4.15.0-23-generic #25-Ubuntu
[    1.992410] Hardware name: LENOVO 20KU000NGE/20KU000NGE, BIOS R0UET45W (1.25 ) 06/22/2018
[    1.992459] RIP: 0010:generic_reg_wait+0xe8/0x120 [amdgpu]
[    1.992460] RSP: 0018:ffffa857c29bf198 EFLAGS: 00010297
[    1.992462] RAX: 0000000000000001 RBX: 0000000000000065 RCX: 0000000000000001
[    1.992463] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000246
[    1.992463] RBP: ffffa857c29bf1d8 R08: 0000000000000000 R09: 000000000000005e
[    1.992464] R10: 0000000000000002 R11: 0000000000000396 R12: 0000000000000001
[    1.992465] R13: ffff9aadbc798280 R14: 0000000000000100 R15: 0000000000000001
[    1.992466] FS:  00007f85d38bc680(0000) GS:ffff9aadbebc0000(0000) knlGS:0000000000000000
[    1.992467] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.992468] CR2: 00005644b2ea6cf8 CR3: 0000000430046000 CR4: 00000000003406e0
[    1.992469] Call Trace:
[    1.992524]  tgn10_lock+0xa2/0xb0 [amdgpu]
[    1.992574]  program_all_pipe_in_tree+0x804/0x8b0 [amdgpu]
[    1.992621]  ? amdgpu_cgs_write_register+0x14/0x20 [amdgpu]
[    1.992668]  ? generic_reg_update_ex+0xe6/0x150 [amdgpu]
[    1.992714]  ? amdgpu_cgs_read_register+0x14/0x20 [amdgpu]
[    1.992763]  dcn10_apply_ctx_for_surface+0x498/0x4f0 [amdgpu]
[    1.992811]  dc_commit_state+0x2aa/0x500 [amdgpu]
[    1.992862]  amdgpu_dm_atomic_commit_tail+0x2cd/0xa50 [amdgpu]
[    1.992899]  ? amdgpu_bo_pin_restricted+0x1b5/0x2a0 [amdgpu]
[    1.992948]  ? dm_plane_helper_prepare_fb+0x181/0x240 [amdgpu]
[    1.992957]  commit_tail+0x42/0x70 [drm_kms_helper]
[    1.992963]  drm_atomic_helper_commit+0x10c/0x120 [drm_kms_helper]
[    1.993010]  amdgpu_dm_atomic_commit+0x87/0xa0 [amdgpu]
[    1.993026]  drm_atomic_commit+0x51/0x60 [drm]
[    1.993031]  restore_fbdev_mode_atomic+0x178/0x1e0 [drm_kms_helper]
[    1.993037]  restore_fbdev_mode+0x32/0x140 [drm_kms_helper]
[    1.993043]  ? _cond_resched+0x19/0x40
[    1.993048]  drm_fb_helper_restore_fbdev_mode_unlocked.part.32+0x28/0x80 [drm_kms_helper]
[    1.993053]  drm_fb_helper_set_par+0x43/0x70 [drm_kms_helper]
[    1.993058]  fbcon_init+0x493/0x670
[    1.993062]  visual_init+0xdc/0x140
[    1.993065]  do_bind_con_driver+0x207/0x420
[    1.993067]  do_take_over_console+0x82/0x1a0
[    1.993070]  do_fbcon_takeover+0x5c/0xb0
[    1.993072]  fbcon_event_notify+0x58d/0x780
[    1.993077]  notifier_call_chain+0x4c/0x70
[    1.993078]  blocking_notifier_call_chain+0x43/0x60
[    1.993081]  fb_notifier_call_chain+0x1b/0x20
[    1.993082]  register_framebuffer+0x24d/0x360
[    1.993088]  __drm_fb_helper_initial_config_and_unlock+0x1fc/0x400 [drm_kms_helper]
[    1.993093]  drm_fb_helper_initial_config+0x35/0x40 [drm_kms_helper]
[    1.993130]  amdgpu_fbdev_init+0xcd/0x100 [amdgpu]
[    1.993166]  amdgpu_device_init+0xe6c/0x1620 [amdgpu]
[    1.993202]  amdgpu_driver_load_kms+0x8b/0x2e0 [amdgpu]
[    1.993210]  drm_dev_register+0x149/0x1d0 [drm]
[    1.993245]  amdgpu_pci_probe+0x113/0x150 [amdgpu]
[    1.993250]  local_pci_probe+0x47/0xa0
[    1.993253]  pci_device_probe+0x145/0x1b0
[    1.993257]  driver_probe_device+0x31e/0x490
[    1.993258]  __driver_attach+0xa7/0xf0
[    1.993260]  ? driver_probe_device+0x490/0x490
[    1.993262]  bus_for_each_dev+0x70/0xc0
[    1.993264]  driver_attach+0x1e/0x20
[    1.993265]  bus_add_driver+0x1c7/0x270
[    1.993266]  ? 0xffffffffc0671000
[    1.993268]  driver_register+0x60/0xe0
[    1.993269]  ? 0xffffffffc0671000
[    1.993270]  __pci_register_driver+0x5a/0x60
[    1.993310]  amdgpu_init+0x96/0xa9 [amdgpu]
[    1.993314]  do_one_initcall+0x52/0x19f
[    1.993318]  ? __vunmap+0x81/0xb0
[    1.993320]  ? _cond_resched+0x19/0x40
[    1.993323]  ? kmem_cache_alloc_trace+0xa6/0x1b0
[    1.993326]  ? do_init_module+0x27/0x209
[    1.993328]  do_init_module+0x5f/0x209
[    1.993330]  load_module+0x191e/0x1f10
[    1.993334]  ? ima_post_read_file+0x96/0xa0
[    1.993336]  SYSC_finit_module+0xfc/0x120
[    1.993337]  ? SYSC_finit_module+0xfc/0x120
[    1.993340]  SyS_finit_module+0xe/0x10
[    1.993341]  do_syscall_64+0x73/0x130
[    1.993344]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[    1.993345] RIP: 0033:0x7f85d33c6839
[    1.993346] RSP: 002b:00007fff88d70fc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    1.993347] RAX: ffffffffffffffda RBX: 00005644b2e8bfc0 RCX: 00007f85d33c6839
[    1.993348] RDX: 0000000000000000 RSI: 00007f85d30a50e5 RDI: 0000000000000014
[    1.993348] RBP: 00007f85d30a50e5 R08: 0000000000000000 R09: 00007fff88d710e0
[    1.993349] R10: 0000000000000014 R11: 0000000000000246 R12: 0000000000000000
[    1.993350] R13: 00005644b2e8aef0 R14: 0000000000020000 R15: 00005644b2e8bfc0
[    1.993351] Code: 31 f6 44 8b 45 10 44 89 e1 48 c7 c7 a1 87 60 c0 89 45 d4 52 48 c7 c2 e8 0d 60 c0 e8 83 ad c9 ff 41 83 7d 20 01 58 8b 45 d4 74 02 <0f> 0b 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 c7 45 c4 23 
[    1.993375] ---[ end trace 7e3e1ff95baa3ffb ]---

That's funny, a colleague had the same issue with systemd-udevd today on ouf our servers. With a completely different driver .

3rd Update (2018-07-29)

Ok, this is really a firmware bug, the ACPI IVRS table lacks at least one entry. Adding ivrs_ioapic[32]=00:14.0 instead of intremap=off is sufficient to make the system boot until Lenovo releases an UEFI update with a working IVRS table. At least UEFI 1.27 (2018-07-24) needs this override. And spec_store_bypass_disable=prctl is still needed for Ubuntu & co. 

The clue is the line "[Firmware Bug]: AMD-Vi: IOAPIC[32] not in IVRS table". I decompiled the ACPI tables, started to read the AMD documentation, but in the end I just guessed the 32 from the error message and 00:14.0 from the lspci output and the Stack Overflow/Ubuntu forum entries.  Interesting stuff, but too much to read in my little time. 

What was helpful is the Linux boot parameter amd_iommu_dump=1 which will dump information from the IVRS table:

[    0.851042] AMD-Vi: Using IVHD type 0x11
[    0.851401] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: b0 info 0000
[    0.851401] AMD-Vi:        mmio-addr: 00000000feb80000
[    0.851430] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:01.0 flags: 00
[    0.851431] AMD-Vi:   DEV_RANGE_END           devid: ff:1f.6
[    0.851870] AMD-Vi:   DEV_ALIAS_RANGE                 devid: ff:00.0 flags: 00 devid_to: 00:14.4
[    0.851871] AMD-Vi:   DEV_RANGE_END           devid: ff:1f.7
[    0.851875] AMD-Vi:   DEV_SPECIAL(HPET[0])           devid: 00:14.0
[    0.851876] AMD-Vi:   DEV_SPECIAL(IOAPIC[33])                devid: 00:14.0
[    0.851877] AMD-Vi:   DEV_SPECIAL(IOAPIC[34])                devid: 00:00.1
[    1.171028] AMD-Vi: IOMMU performance counters supported
 

Resolving devid 00:14.0 was easy via lspci:

 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)

but for 00:00.1 I have not found the device. If anybody knows how to list all device ids and their associate devices/drivers/etc, please mail me. 
 

4th Update (2018-09-01)

At least with Ubuntu kernel 4.15.0-33 spec_store_bypass_disable is no longer needed. At least for my e485. My assumption is that this was a kernel bug in the earlier kernels  with Spectre/Meltdown fixes.. Anyway, if you cannot boot or not all devices initialize, try the option seccomp and if that does not help try prctl. The option seccomp disables less spectre mitigations than prctl: 

https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/SpectreAndMeltdown/Mi...

5th Update (2018-09-07) 

Just installed BIOS/UEFI 1.32 which fixes the keyboard issue. But Linux still needs the IVRS Table overrride. 

I am currently testing the Dell Latitude 5495, which boots Linux out of the box, but has also a few warnings/errors in the kernel log. I checked their IVRS Table: 

[    0.000000] AMD-Vi: Using IVHD type 0x11
[    0.000000] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: b0 info 0000
[    0.000000] AMD-Vi:        mmio-addr: 00000000fd900000
[    0.000000] AMD-Vi:   DEV_SELECT_RANGE_START devid: 00:01.0 flags: 00
[    0.000000] AMD-Vi:   DEV_RANGE_END devid: ff:1f.6
[    0.000000] AMD-Vi:   DEV_ALIAS_RANGE devid: ff:00.0 flags: 00 devid_to: 00:14.4
[    0.000000] AMD-Vi:   DEV_RANGE_END devid: ff:1f.7
[    0.000000] AMD-Vi:   DEV_SPECIAL(HPET[0]) devid: 00:14.0
[    0.000000] AMD-Vi:   DEV_SPECIAL(IOAPIC[32]) devid: 00:14.0
[    0.000000] AMD-Vi:   DEV_SPECIAL(IOAPIC[33]) devid: 00:00.1
[    0.000000] [Firmware Bug]: AMD-Vi: IOAPIC[4] not in IVRS table
[    0.000000] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found
[    0.000000] AMD-Vi: Disabling interrupt remapping
 

Compare the indexes with the Lenovo IVRS table. My uneducated guess is that these entries should be the same for all mainboards with (Mobile?) Ryzen processors as I assume (guess!! correct me if wrong) that these are CPU internal devices. Maybe the Lenovo IVRS entries are just off by one.

6th Update (2018-09-10)

Pilatomic in the comments pointed out that adding ivrs_ioapic[33]=00:00.1 solves a couple of issues, in his case issues with the SD Card reader. For me this works flawless for all my microSDHC cards, the only write errors I got were caused by ejecting the cards without unmounting them ;-)  Unfortunately I have found no way to remove entry 34 via kernel parameters, the parser will only accept well-formed parameters.So you will have wait for Lenovo to fix this ACPI table. I think by now they should know how to fix this and it's not a big change. But perhaps they do not care for Linux. 

Second Last Update (2018-09-10)

The story ends here for me. I just discarded all data on the SSDs, reimaged the NVMe with the factory image, cleaned up screen, keyboard, touchpad and case. And hopefully I will find a buyer for this fine notebook 

Why not the last update? I hope somebody will notify me when Lenovo releases a new UEFI version which fixes the IVRS ACPI table and Linux distributions will work out of the box. Then there will be a last update....

Another Update (2019-01-11)

According to this comment the IVRS overrides may not be needed anymore. And there are additional settings which should be set.

 

 

 

Comments

OK, for anyone who cares I got mint 19 working, they had an updated mint 19 ISO (v2 in the file name) and it solved the EFI errors on install..

I also had a random crashing that was rather annoying, just randomly locked up and had to power cycle. I see some people were playing around with c states to fix it but I found that once I disabled TPM in the bios the freezes stopped?! -- well for 2.5 days so far, being on almost all day each day.

Lenovo ThinkPad A485 has the same boot problem that is worked around by the ivrs_ioapic kernel parameter.

However, recently Ubuntu started listing the A485 as certified compatible notebook with kernel 4.15.0-1021-oem https://certification.ubuntu.com/hardware/201808-26387/

Perhaps with this kernel the workarounds are no longer needed?

Would be nice to get the confirmation that the issue is fixed. I sold the E485, so I cannot test it.

Just installed ubuntu 18.04.1 with Kernel 4.15.0-39-generic on a E485 with BIOS 1.45
It works but the ivrs_ioapic and spec_store_bypass_disable kernel parameters are still mandatory.

They are still needed!

This is no longer needed (as of 28.12.18, bios version 1.48), running 4.19 with no additional boot parameters or deactivated iommu

(4.20's amdgpu crashes for me though :p)

I'd like to know how you got this working on kernel 4.19.
I just installed ubuntu 18.10 upgraded to kernel 4.19.13 and i cant get it to boot with out the grub parameters.

well, have you installed the bios update from the 28 dec of 2018?

I forgot to mention i am running the latest 1.48 bios version. There are other people experiencing the same issue with talk of kernel 4.20 including a fix.
See here
https://forums.lenovo.com/t5/ThinkPad-11e-Windows-13-E-and/ThinkPad-E485...

In the German speaking Thinkpad forum someone posted a message with options that make the E585 run with kernel 4.20 and make X run with kernel 4.20 and fixex resume from suspend. No need to understand the whole posting, the important stuff is only the boot parameters:

ivrs_ioapic[32]=00:14.0 ivrs_ioapic[33]=00:00.1 clocksource=hpet libata.force=1:nohrst iommu=pt

https://thinkpad-forum.de/threads/217551-Boot-Parameter-f%FCr-E585-bzw-f...

Thanks to all of you for keep on investigating! Really having a great time running Fedora (kernel >= 4.20) on my E485.

Thank you for the post, eazrael.

In lieu of a fix from Lenovo, which does not seem forthcoming, I attempted to fix this issue myself by modifying the latest BIOS image and flashing it. Unfortunately, the E485 refused to flash my altered image and returned a message that the digital signature was invalid; clearly all BIOS updates must be signed by Lenovo.

You wrote about trying to remove entry #34, which is superfluous if you are correct about this being an off-by-one error. Do you believe this entry causes any problems? If so I could look into a patch for the kernel command-line override to allow it to blank certain entries.

Hi,
I think if you are able to develop patches for the Linux kernel, and patch BIOS images, your skills are far beyond mine :-) There have been reports that kernel version 4.20 fixes this issue, so if I had the time (and still this notebook) I would try to bsiect the change(s) that made this notebook work. Interest in this blog article also waned, so my guess is that this issue is solved.

Haha, I appreciate the compliment, but I am no elite hacker, and my expertise is almost surely inferior to yours (after all, I needed your blog to make my laptop boot). My expertise in "patch[ing] BIOS images" extends as far as using vim's hex mode to edit what I hope are the correct strings in the binary file.

But in regards to the IVRS table issue, I am in fact running the latest 4.20.x kernel and it has not been fixed (I have constant PCIe errors filling up my logs, random lockups, can't suspend at all, etc.), nor have I seen any suggestions that is has been in the development kernels. The Lenovo community forum thread (https://forums.lenovo.com/t5/ThinkPad-11e-Windows-13-E-and/ThinkPad-E485...) is chugging along as much as ever and is now 16 (!) pages in length, with a steady supply of anger to inspire new posts, so I imagine people are using that thread instead of your blog.

Anyway, the reason I commented here was to try and find out whether the extra entry in the IVRS table was causing (some of?) these problems, as I'd rather not go to the effort of submitting a kernel patch (the relevant code is almost incomprehensible) to allow its removal if that would have no effect.

Hm I may have another idea. I did not investigate this in depth as I did not expect to be able to edit the firmware, but Linux might be able to replace the ACPI tables during runtime:
https://www.kernel.org/doc/Documentation/acpi/initrd_table_override.txt
https://01.org/linux-acpi/documentation/overriding-dsdt
and you can decompile the acpi tables:
https://blog.fpmurphy.com/2014/12/decompiling-acpi-tables.html (there's a small example in the second link)
Maybe interesting to have a look at this.

I replaced my E485 with a Dell Latitude 5495 which is also a Ryzen Mobile model. While it booted out of the box, the longer I use it, the more bugs and quirks I find. A few days ago I deinstalled linux as it was almost unusable and again the IOMMU was probably involved. Maybe disabling the IOMMU in Linux might a good idea. Adding "pcie_aspm=off" also improved some things.

And to be fair, recent Intel notebooks can also cause a lot of issues, it's really disheartening.

I already have decompiled the DSDT table and I couldn't find anything related to the IVRS, although there was far too much data to look through. Looking through the links you posted, they all refer to the DSDT; however, I wonder if the other tables in /sys/firmware/acpi/tables/ (which includes the IVRS) can be replaced using the initrd in the same manner? If so, that would work, but why is there a command-line override system at all?

I also found out about the pcie_aspm=off option to disable ASPM. It initially allowed the laptop to suspend but appeared to have stopped working after a kernel update, so I removed it.

It certainly is distressing for FOSS users.

And to add insult to injury, the sole reason I purchased a Ryzen laptop over a Coffee Lake or whatever bizzare-sounding codename Intel currently uses is because I thought Linux on AMD would eclipse any Intel + NVIDIA option.

From what I gather, all of the aforementioned tables can be replaced at boot time. I tried decompiling the IVRS table but the result didn't at all resemble what it looks like in reality.

Here are the contents of the ACPI IVRS table (hexdump):

00000000 49 56 52 53 d0 00 00 00 02 ca 4c 45 4e 4f 56 4f |IVRS......LENOVO|
00000010 54 50 2d 52 30 55 20 20 80 14 00 00 50 54 45 43 |TP-R0U ....PTEC|
00000020 02 00 00 00 41 30 20 00 00 00 00 00 00 00 00 00 |....A0 .........|
00000030 10 b0 48 00 02 00 40 00 00 00 b8 fe 00 00 00 00 |..H...@.........|
00000040 00 00 00 00 6e 8f 04 80 03 08 00 00 04 fe ff 00 |....n...........|
00000050 43 00 ff 00 00 a4 00 00 04 ff ff 00 00 00 00 00 |C...............|
00000060 48 00 00 00 00 a0 00 02 48 00 00 d7 21 a0 00 01 |H.......H...!...|
00000070 48 00 00 00 22 01 00 01 11 b0 58 00 02 00 40 00 |H...".....X...@.|
00000080 00 00 b8 fe 00 00 00 00 00 00 00 00 00 02 04 00 |................|
00000090 da 4a 29 22 ef 77 4f 00 00 00 00 00 00 00 00 00 |.J)".wO.........|
000000a0 03 08 00 00 04 fe ff 00 43 00 ff 00 00 a4 00 00 |........C.......|
000000b0 04 ff ff 00 00 00 00 00 48 00 00 00 00 a0 00 02 |........H.......|
000000c0 48 00 00 d7 21 a0 00 01 48 00 00 00 22 01 00 01 |H...!...H..."...|

If we can determine how the relevant entries correspond to the above data, we should be able to modify this table and, from what I gather, override it in the manner you describe; assuming that this is the place where those entries are stored, that is.

Please send me the binary dump to impressum.cjqqtlthsznog4@evilazrael.de. Please add instructions how you dumped the data.

With Bios Version 1.54 they finally fixed it:
CHANGES IN THIS RELEASE
Version 1.54

[Important updates]
Nothing.

[New functions or enhancements]
Nothing.

[Problem fixes]
- Sync IOAPICID in IVRS and APIC ACPI tables (Linux).
- Enhance to address vulnerability security issue.

Thanks for the Notification :-)

Can confirm 1.54 bios fixes all issues on 18.04.3. no extra grub parameters are needed

just for the record, I haven't looked into BIOS 1.54 yet, however the system runs stable with intel_pstate=disable;
the only problem I'm still having is the graphics card seems to run at lower clock speed (971 MHz @ max), possibly due to a lack
of sufficient power supply; don't know if that holds true for the new bios version, perhaps I'm gonna have to
adjust that manually via ppfeaturemask for AMD APUs and then set clock speeds individually.

Pages

Add new comment