TranslateProject/sources/tech/20200511 Start using systemd as a troubleshooting tool.md
DarkSun cdc3319ccb 选题: 20200511 Start using systemd as a troubleshooting tool
sources/tech/20200511 Start using systemd as a troubleshooting tool.md
2020-05-12 00:59:21 +08:00

22 KiB

Start using systemd as a troubleshooting tool

While systemd is not really a troubleshooting tool, the information in its output points the way toward solving problems. Magnifying glass on code

No one would really consider systemd to be a troubleshooting tool, but when I encountered a problem on my webserver, my growing knowledge of systemd and some of its features helped me locate and circumvent the problem.

The problem was that my server, yorktown, which provides name services, DHCP, NTP, HTTPD, and SendMail email services for my home office network, failed to start the Apache HTTPD daemon during normal startup. I had to start it manually after I realized that it was not running. The problem had been going on for some time, and I recently got around to trying to fix it.

Some of you will say that systemd itself is the cause of this problem, and, based on what I know now, I agree with you. However, I had similar types of problems with SystemV. (In the first article in this series, I looked at the controversy around systemd as a replacement for the old SystemV init program and startup scripts. If you're interested in learning more about systemd, read the second and third articles, too.) No software is perfect, and neither systemd nor SystemV is an exception, but systemd provides far more information for problem-solving than SystemV ever offered.

Determining the problem

The first step to finding the source of this problem is to determine the httpd service's status:

[root@yorktown ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2020-04-16 11:54:37 EDT; 15min ago
     Docs: man:httpd.service(8)
  Process: 1101 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE)
 Main PID: 1101 (code=exited, status=1/FAILURE)
   Status: "Reading configuration..."
      CPU: 60ms

Apr 16 11:54:35 yorktown.both.org systemd[1]: Starting The Apache HTTP Server...
Apr 16 11:54:37 yorktown.both.org httpd[1101]: (99)Cannot assign requested address: AH00072: make_sock: could not bind to address 192.168.0.52:80
Apr 16 11:54:37 yorktown.both.org httpd[1101]: no listening sockets available, shutting down
Apr 16 11:54:37 yorktown.both.org httpd[1101]: AH00015: Unable to open logs
Apr 16 11:54:37 yorktown.both.org systemd[1]: httpd.service: Main process exited, code=exited, status=1/FAILURE
Apr 16 11:54:37 yorktown.both.org systemd[1]: httpd.service: Failed with result 'exit-code'.
Apr 16 11:54:37 yorktown.both.org systemd[1]: Failed to start The Apache HTTP Server.
[root@yorktown ~]#

This status information is one of the systemd features that I find much more useful than anything SystemV offers. The amount of helpful information here leads me easily to a logical conclusion that takes me in the right direction. All I ever got from the old chkconfig command is whether or not the service is running and the process ID (PID) if it is. That is not very helpful.

The key entry in this status report shows that HTTPD cannot bind to the IP address, which means it cannot accept incoming requests. This indicates that the network is not starting fast enough to be ready for the HTTPD service to bind to the IP address because the IP address has not yet been set. This is not supposed to happen, so I explored my network service systemd startup configuration files; all appeared to be correct with the right "after" and "requires" statements. Here is the /lib/systemd/system/httpd.service file from my server:

# Modifying this file in-place is not recommended, because changes                                                                                    
# will be overwritten during package upgrades.  To customize the                                                                                      
# behaviour, run "systemctl edit httpd" to create an override unit.                                                                                  
                                                                                                                                                     
# For example, to pass additional options (such as -D definitions) to                                                                                
# the httpd binary at startup, create an override unit (as is done by                                                                                
# systemctl edit) and enter the following:                                                                                                            
                                                                                                                                                     
#       [Service]                                                                                                                                    
#       Environment=OPTIONS=-DMY_DEFINE                                                                                                              
                                                                                                                                                     
[Unit]                                                                                                                                                
Description=The Apache HTTP Server                                                                                                                    
Wants=httpd-init.service                                                                                                                              
After=network.target remote-fs.target nss-lookup.target httpd-init.service                                                                            
Documentation=man:httpd.service(8)                                                                                                                    
                                                                                                                                                     
[Service]                                                                                                                                            
Type=notify                                                                                                                                          
Environment=LANG=C                                                                                                                                    
                                                                                                                                                     
ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND                                                                                                      
ExecReload=/usr/sbin/httpd $OPTIONS -k graceful                                                                                                      
# Send SIGWINCH for graceful stop                                                                                                                    
KillSignal=SIGWINCH                                                                                                                                  
KillMode=mixed                                                                                                                                        
PrivateTmp=true                                                                                                                                      
                                                                                                                                                     
[Install]                                                                                                                                            
WantedBy=multi-user.target

The httpd.service unit file explicitly specifies that it should load after the network.target and the httpd-init.service (among others). I tried to find all of these services using the systemctl list-units command and searching for them in the resulting data stream. All were present and should have ensured that the httpd service did not load before the network IP address was set.

First solution

A bit of searching on the internet confirmed that others had encountered similar problems with httpd and other services. This appears to happen because one of the required services indicates to systemd that it has finished its startup—but it actually spins off a child process that has not finished. After a bit more searching, I came up with a circumvention.

I could not figure out why the IP address was taking so long to be assigned to the network interface card. So, I thought that if I could delay the start of the HTTPD service by a reasonable amount of time, the IP address would be assigned by that time.

Fortunately, the /lib/systemd/system/httpd.service file above provides some direction. Although it says not to alter it, it does indicate how to proceed: Use the command systemctl edit httpd, which automatically creates a new file (/etc/systemd/system/httpd.service.d/override.conf) and opens the GNU Nano editor. (If you are not familiar with Nano, be sure to look at the hints at the bottom of the Nano interface.)

Add the following text to the new file and save it:

[root@yorktown ~]# cd /etc/systemd/system/httpd.service.d/
[root@yorktown httpd.service.d]# ll
total 4
-rw-r--r-- 1 root root 243 Apr 16 11:43 override.conf
[root@yorktown httpd.service.d]# cat override.conf
# Trying to delay the startup of httpd so that the network is
# fully up and running so that httpd can bind to the correct
# IP address
#
# By David Both, 2020-04-16

[Service]
ExecStartPre=/bin/sleep 30

The [Service] section of this override file contains a single line that delays the start of the HTTPD service by 30 seconds. The following status command shows the service status during the wait time:

[root@yorktown ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/httpd.service.d
           └─override.conf
           /usr/lib/systemd/system/httpd.service.d
           └─php-fpm.conf
   Active: activating (start-pre) since Thu 2020-04-16 12:14:29 EDT; 28s ago
     Docs: man:httpd.service(8)
Cntrl PID: 1102 (sleep)
    Tasks: 1 (limit: 38363)
   Memory: 260.0K
      CPU: 2ms
   CGroup: /system.slice/httpd.service
           └─1102 /bin/sleep 30

Apr 16 12:14:29 yorktown.both.org systemd[1]: Starting The Apache HTTP Server...
Apr 16 12:15:01 yorktown.both.org systemd[1]: Started The Apache HTTP Server.
[root@yorktown ~]#

And this command shows the status of the HTTPD service after the 30-second delay expires. The service is up and running correctly:

[root@yorktown ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/httpd.service.d
           └─override.conf
           /usr/lib/systemd/system/httpd.service.d
           └─php-fpm.conf
   Active: active (running) since Thu 2020-04-16 12:15:01 EDT; 1min 18s ago
     Docs: man:httpd.service(8)
  Process: 1102 ExecStartPre=/bin/sleep 30 (code=exited, status=0/SUCCESS)
 Main PID: 1567 (httpd)
   Status: "Total requests: 0; Idle/Busy workers 100/0;Requests/sec: 0; Bytes served/sec:   0 B/sec"
    Tasks: 213 (limit: 38363)
   Memory: 21.8M
      CPU: 82ms
   CGroup: /system.slice/httpd.service
           ├─1567 /usr/sbin/httpd -DFOREGROUND
           ├─1569 /usr/sbin/httpd -DFOREGROUND
           ├─1570 /usr/sbin/httpd -DFOREGROUND
           ├─1571 /usr/sbin/httpd -DFOREGROUND
           └─1572 /usr/sbin/httpd -DFOREGROUND

Apr 16 12:14:29 yorktown.both.org systemd[1]: Starting The Apache HTTP Server...
Apr 16 12:15:01 yorktown.both.org systemd[1]: Started The Apache HTTP Server.

I could have experimented to see if a shorter delay would work as well, but my system is not that critical, so I decided not to. It works reliably as it is, so I am happy.

Because I gathered all this information, I reported it to Red Hat Bugzilla as Bug 1825554. I believe that it is much more productive to report bugs than it is to complain about them.

The better solution

A couple of days after reporting this as a bug, I received a response indicating that systemd is just the manager, and if httpd needs to be ordered after some requirements are met, it needs to be expressed in the unit file. The response pointed me to the httpd.service man page. I wish I had found this earlier because it is a better solution than the one I came up with. This solution is explicitly targeted to the prerequisite target unit rather than a somewhat random delay.

From the httpd.service man page:

Starting the service at boot time

The httpd.service and httpd.socket units are disabled by default. To start the httpd service at boot time, run: systemctl enable httpd.service. In the default configuration, the httpd daemon will accept connections on port 80 (and, if mod_ssl is installed, TLS connections on port 443) for any configured IPv4 or IPv6 address.

If httpd is configured to depend on any specific IP address (for example, with a "Listen" directive) which may only become available during start-up, or if httpd depends on other services (such as a database daemon), the service must be configured to ensure correct start-up ordering.

For example, to ensure httpd is only running after all configured network interfaces are configured, create a drop-in file (as described above) with the following section:

[Unit] After=network-online.target Wants=network-online.target

I still think this is a bug because it is quite common—at least in my experience—to use a Listen directive in the httpd.conf configuration file. I have always used Listen directives, even on hosts with only a single IP address, and it is clearly necessary on hosts with multiple network interface cards (NICs) and internet protocol (IP) addresses. Adding the lines above to the /usr/lib/systemd/system/httpd.service default file would not cause problems for configurations that do not use a Listen directive and would prevent this problem for those that do.

In the meantime, I will use the suggested solution.

Next steps

This article describes a problem I had with starting the Apache HTTPD service on my server. It leads you through the problem determination steps I took and shows how I used systemd to assist. I also covered the circumvention I implemented using systemd and the better solution that followed from my bug report.

As I mentioned at the start, it is very likely that this is the result of a problem with systemd, specifically the configuration for httpd startup. Nevertheless, systemd provided me with the tools to locate the likely source of the problem and to formulate and implement a circumvention. Neither solution really resolves the problem to my satisfaction. For now, the root cause of the problem still exists and must be fixed. If that is simply adding the recommended lines to the /usr/lib/systemd/system/httpd.service file, that would work for me.

One of the things I discovered during this is process is that I need to learn more about defining the sequences in which things start. I will explore that in my next article, the fifth in this series.

Resources

There is a great deal of information about systemd available on the internet, but much is terse, obtuse, or even misleading. In addition to the resources mentioned in this article, the following webpages offer more detailed and reliable information about systemd startup.

  • The Fedora Project has a good, practical guide to systemd. It has pretty much everything you need to know in order to configure, manage, and maintain a Fedora computer using systemd.
  • The Fedora Project also has a good cheat sheet that cross-references the old SystemV commands to comparable systemd ones.
  • For detailed technical information about systemd and the reasons for creating it, check out Freedesktop.org's description of systemd.
  • Linux.com's "More systemd fun" offers more advanced systemd information and tips.

There is also a series of deeply technical articles for Linux sysadmins by Lennart Poettering, the designer and primary developer of systemd. These articles were written between April 2010 and September 2011, but they are just as relevant now as they were then. Much of everything else good that has been written about systemd and its ecosystem is based on these papers.


via: https://opensource.com/article/20/5/systemd-troubleshooting-tool

作者:David Both 选题:lujun9972 译者:译者ID 校对:校对者ID

本文由 LCTT 原创编译,Linux中国 荣誉推出