sources/tech/20191120 Troubleshooting PCIe Bus Error severity Corrected on Ubuntu and Linux Mint.md
7.2 KiB
Troubleshooting PCIe Bus Error severity Corrected on Ubuntu and Linux Mint
Recently I was trying to install Mint on several nodes in my institute. At times, I was not able to install and got lots of ‘PCIe Bus’ errors on the screen. I have also observed similar issue with Ubuntu 18.04.
I got stuck into it for more than a month, after using many solution and observations (solution is the same, but observation and treatment may be different), I found something which was helpful for me and I think could be helpful for other Ubuntu and Linux Mint users.
Observations about PCIe Bus Error severity Corrected
It happened with my HP system and it seems that there is some compatibility issues with the HP hardware. The PCIe Bus Error is basically the Linux kernel reporting the hardware issue.
This error reporting turns into nightmare because of the frequency of error messages generated by the system. I have noticed in various Linux forums that many HP user have encountered this error, probably HP needs to improve Linux support for their hardware.
Do note that this doesn’t necessarily mean that you cannot use Linux on your HP system. You might be able to use Linux like everyone else. It’s just that seeing this message flashing on the screen on every boot is annoying and sometimes, it could lead to bigger troubles.
If the system keeps on reporting, it will increase the log size. If you have limited space for root, it could mean that your system will stuck at the black screen displaying the PCIe error message and your system won’t be able to boot.
Now that you know a few things, let’s see how to tackle this error.
Handling PCIe Bus Error messages if you can boot in to your Linux system
If you see the PCIe Bus Error message on the screen while booting but you are still able to log in, you could do a workaround for this annoyance.
You can do little on the hardware compatibility front. I mean you (most probably) cannot go ahead and start coding drivers for your hardware or fix the existing drivers code. If your system works fine, your main concern should be that too much of error reporting doesn’t eat up the disk space.
In that regard, you can change the Linux kernel parameter and ask it to stop reporting the PCIe errors. To do that, you need to edit the grub configuration.
Basically, you just have to use a text editor for editing the file.
First thing first, make a backup of your grub config file so that you can revert in case if you are not sure of things you changed. Open a terminal and use the following command:
cp /etc/default/grub ~/grub.back
Now open the file with Gedit for editing:
sudo gedit /etc/default/grub
Look for the line that has GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash”
Add pci=noaer in this line. AER stands for Advanced Error Reporting and ‘noaer’ asks the kernel to not use/log Advanced Error Reporting. The changed line should look like this:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=noaer"
Once you have saved the file, you should update the grub using this command:
sudo update-grub
Restart Ubuntu and you shouldn’t see the ‘PCIe Bus Error severity Corrected messages’ anymore.
If this doesn’t fix the issue for you, you can try to change other kernel parameters.
Further troubleshooting: Disable MSI
Now you are resorting to hit and trial. You may try disabling MSI. Though Linux kernel supports MSI for several years now, a wrong implementation of MSI from some hardware manufacturer may lead to the PCIe errors.
The drill is practically the same as you saw in the previous section. You edit the grub configuration and make the GRUB_CMDLINE_LINUX_DEFAULT line look like this:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=nomsi"
Update grub and reboot the system:
sudo update-grub
Even further troubleshooting: Disable mmconf
I know it’s getting repetitive but if you are still facing the issue, it could be worth to give this a last try. This time, disable the mmconf parameter in Linux kernel.
mmconf means memory mapped config and if you have an old computer, a buggy BIOS may lead to this issue.
The steps remain the same. Just change the line GRUB_CMDLINE_LINUX_DEFAULT in your grub config to make it look like:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=nommconf"
Can’t boot! How to edit grub config now?
In some cases, if you are not even able to boot at all, perhaps your root is out of space. An idea here would be to delete old log files and see if you could boot now and if yes, change the grub config.
On reboot, if you stuck with logs on the screen and do a hard boot (use power button to turn it off and on again). When you power on, choose to go in to recovery mode from the grub screen. It should be under Advanced options.
If your system doesn’t show the grub screen, press and hold shift key at boot. In some systems, pressing the Esc key brings the grub screen.
In the advanced option->recovery mode:
Drop into root shell:
If you use the ls command to find large files, you’ll see that sys.log and kern.log take huge space:
ls -s -S /var/log
You can empty the log files in Linux command line this way:
$ > syslog
$ > kern.log
Once that is done, reboot your system. You should be able to log in. You should quickly change the grub parameters as discussed above. Adding pci=noaer should help you in this case.
I know it’s more of a workaround than solution. But this is something that troubled me long and helped me get around the error. Otherwise I had to reinstall the system.
I just wanted to share what worked for me with the community here. I hope it helps you as well.
This article is written by Arun Shrimali. Arun is IT Head at Resonance Institute in India and he tries to implement Open Source Software across his organization.
The article has been edited by Abhishek Prakash.
via: https://itsfoss.com/pcie-bus-error-severity-corrected/