2018/05/30

From Mew
Jump to navigation Jump to search

Wednesday, May 30, 2018 (#150)

Tuesday Wednesday Thursday

prev .. today .. next

2024-11-29 (Fri) 11:19 EST Everything seems to be working ok... (details of last incident.)
There are no known issues.

Solved

There were, I think, two main problems:

  • The Let's Encrypt configuration had not been properly set up after the server migration in April
  • Nginx was correctly configured to respond to IPv6 on port 443 (https), but not on port 80 (http).
    • This was the problem that took the most time to identify, as it caused the letsencrypt command to fail even though the URL appeared to be returning the proper file.

Confounding factors included:

  • HSTS was turned on (this seems to be part of the SSL cert, not something in Nginx), so existing browser sessions insisted on self-redirecting from http to https even when I had turned that off in Nginx. Workaround: only attempt access in an anonymous browser window. (Using wget to do the test also might have worked – and ultimately, it was use of wget from an IPv6-enabled server that provided the final clue.)
  • My home internet (and apparently everyone else's) uses only IPv4, while Let's Encrypt uses IPv6 if it is available – so Let's Encrypt was consistently reporting a problem that we weren't seeing, leading us to think (basically) that we were misunderstanding the messages.

Cleanup to do:

  • It's not clear whether Let's Encrypt is now set up properly; I did the challenge installation by hand.
  • I turned off all http -> https redirects in Nginx in order to be absolutely certain that Nginx wasn't causing the redirects I was still seeing (because of HSTS).
    • 2018-06-28 This was causing full-page 403 errors for some users; I reverted the Nginx config file and it seems to be fixed now.
      • commented out the port 80 no-redirect block
      • uncommented the old port 80 redirect block
      • added an IPv6 listen directive to the uncommented block
      • /etc/init.d/nginx reload
  • A user now reports some IPv6 issues on the https side as well. (Addressing this first.)

Relevant Files

  • /etc/cron.daily/letsencrypt-renew - the cron job to check/renew the cert
  • /etc/letsencrypt/renewal/tootcat.conf - Let's Encrypt configuration
  • /etc/nginx/sites-available/tootcat.conf - Nginx configuration
  • /var/log/nginx/access.log - Nginx access log (showed 404 when accessing via IPv6)

Narration / Notes

Phase 1

It looks like nginx was set to use a different set of certificate files than the ones Let's Encrypt was set to renew.

Tentatively, LE goes through all the .conf files in /etc/letsencrypt/renewal and renews each one.

There was only one, and it pointed at files in /etc/letsencrypt/live/tootcat2.hypertwins.net/

I've changed it to point to /etc/letsencrypt/live/toot.cat/

Nginx also looks for 2 cert files in /etc/letsencrypt/live/toot.cat/, so now at least we're matched.

When I try to renew with letsencrypt renew, I get:

root@tootcat2:/# letsencrypt renew
Processing /etc/letsencrypt/renewal/toot.cat.conf
2018-05-31 00:11:15,708:ERROR:letsencrypt.error_handler:Encountered exception during recovery
2018-05-31 00:11:15,709:ERROR:letsencrypt.error_handler:Missing --webroot-path for domain: toot.cat
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/letsencrypt/error_handler.py", line 74, in call_registered
    self.funcs[-1]()
  File "/usr/lib/python2.7/dist-packages/letsencrypt/auth_handler.py", line 280, in _cleanup_challenges
    self.dv_auth.cleanup(dv_c)
  File "/usr/lib/python2.7/dist-packages/letsencrypt/plugins/webroot.py", line 139, in cleanup
    root_path = self._get_root_path(achall)
  File "/usr/lib/python2.7/dist-packages/letsencrypt/plugins/webroot.py", line 108, in _get_root_path
    .format(achall.domain))
PluginError: Missing --webroot-path for domain: toot.cat
2018-05-31 00:11:15,711:WARNING:letsencrypt.cli:Attempting to renew cert from /etc/letsencrypt/renewal/toot.cat.conf produced an unexpected error: Missing --webroot-path for domain: toot.cat. Skipping.

All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/toot.cat/fullchain.pem (failure)
1 renew failure(s), 0 parse failure(s)
root@tootcat2:/# 

The key piece of information there seems to be "Missing --webroot-path for domain". What seems to be happening is that Nginx is redirecting from http to https even though the file exists.

Phase 2

It turned out there was an HSTS policy that was forcing the browser to redirect even though Nginx wasn't, I think? But opening a toot.cat URL in an anonymous window fixed that. However it still wasn't finding the test file, so I changed the webroot on both Nginx and Let'sEncrypt to /var/www/challenges, and then was able to access the test file.

...but Let's Encrypt still returns this:

root@tootcat2:/var/www/challenges# letsencrypt renew
Processing /etc/letsencrypt/renewal/toot.cat.conf
2018-05-31 00:49:39,460:ERROR:letsencrypt.error_handler:Encountered exception during recovery
2018-05-31 00:49:39,460:ERROR:letsencrypt.error_handler:Missing --webroot-path for domain: toot.cat
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/letsencrypt/error_handler.py", line 74, in call_registered
    self.funcs[-1]()
  File "/usr/lib/python2.7/dist-packages/letsencrypt/auth_handler.py", line 280, in _cleanup_challenges
    self.dv_auth.cleanup(dv_c)
  File "/usr/lib/python2.7/dist-packages/letsencrypt/plugins/webroot.py", line 139, in cleanup
    root_path = self._get_root_path(achall)
  File "/usr/lib/python2.7/dist-packages/letsencrypt/plugins/webroot.py", line 108, in _get_root_path
    .format(achall.domain))
PluginError: Missing --webroot-path for domain: toot.cat
2018-05-31 00:49:39,462:WARNING:letsencrypt.cli:Attempting to renew cert from /etc/letsencrypt/renewal/toot.cat.conf produced an unexpected error: Missing --webroot-path for domain: toot.cat. Skipping.

All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/toot.cat/fullchain.pem (failure)
1 renew failure(s), 0 parse failure(s)
root@tootcat2:/var/www/challenges# 

Phase 3

root@tootcat2:/var/www/challenges/.well-known# letsencrypt renew
Processing /etc/letsencrypt/renewal/toot.cat.conf
2018-05-31 01:30:48,656:WARNING:letsencrypt.cli:Attempting to renew cert from /etc/letsencrypt/renewal/toot.cat.conf produced an unexpected error: urn:acme:error:rateLimited :: There were too many requests of a given type :: Error creating new authz :: too many failed authorizations recently: see https://letsencrypt.org/docs/rate-limits/. Skipping.

All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/toot.cat/fullchain.pem (failure)
1 renew failure(s), 0 parse failure(s)
root@tootcat2:/var/www/challenges/.well-known# 
root@tootcat2:/var/www/challenges/.well-known# letsencrypt --staging renew
Processing /etc/letsencrypt/renewal/toot.cat.conf
2018-05-31 01:36:59,134:WARNING:letsencrypt.cli:Attempting to renew cert from /etc/letsencrypt/renewal/toot.cat.conf produced an unexpected error: You should register before running non-interactively, or provide --agree-tos and --email <email_address> flags. Skipping.

All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/toot.cat/fullchain.pem (failure)
1 renew failure(s), 0 parse failure(s)
root@tootcat2:/var/www/challenges/.well-known# 

Phase 4

root@tootcat2:/var/www/challenges/.well-known# letsencrypt --dry-run renew
Processing /etc/letsencrypt/renewal/toot.cat.conf
2018-05-31 01:42:09,149:WARNING:letsencrypt.client:Registering without email!
2018-05-31 01:42:09,992:WARNING:letsencrypt.cli:Attempting to renew cert from /etc/letsencrypt/renewal/toot.cat.conf produced an unexpected error: Missing command line flag or config entry for this setting:
Please read the Terms of Service at https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf. You must agree in order to register with the ACME server at https://acme-staging.api.letsencrypt.org/directory

(You can set this with the --agree-tos flag). Skipping.
** DRY RUN: simulating 'letsencrypt renew' close to cert expiry
**          (The test certificates below have not been saved.)

All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/toot.cat/fullchain.pem (failure)
** DRY RUN: simulating 'letsencrypt renew' close to cert expiry
**          (The test certificates above have not been saved.)
1 renew failure(s), 0 parse failure(s)
root@tootcat2:/var/www/challenges/.well-known# 
root@tootcat2:/var/www/challenges/.well-known# letsencrypt --dry-run --agree-tos renew
Processing /etc/letsencrypt/renewal/toot.cat.conf
2018-05-31 01:52:59,407:WARNING:letsencrypt.cli:Attempting to renew cert from /etc/letsencrypt/renewal/toot.cat.conf produced an unexpected error: Failed authorization procedure. toot.c$
t (http-01): urn:acme:error:unauthorized :: The client lacks sufficient authorization :: Invalid response from http://toot.cat/.well-known/acme-challenge/tZFwZb9H2at2brdJYexpRqTDYOigSbT$
J_6oL3AXwBQ: ''[404 errors]''. Skipping.
** DRY RUN: simulating 'letsencrypt renew' close to cert expiry
**          (The test certificates below have not been saved.)

All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/toot.cat/fullchain.pem (failure)
** DRY RUN: simulating 'letsencrypt renew' close to cert expiry
**          (The test certificates above have not been saved.)
1 renew failure(s), 0 parse failure(s)

[...]

root@tootcat2:/var/www/challenges/.well-known# letsencrypt --dry-run --agree-tos --manual renew
Processing /etc/letsencrypt/renewal/toot.cat.conf
2018-05-31 01:59:54,703:WARNING:letsencrypt.cli:Attempting to renew cert from /etc/letsencrypt/renewal/toot.cat.conf produced an unexpected error: Missing command line flag or config entry for this setting:
NOTE: The IP of this machine will be publicly logged as having requested this certificate. If you're running letsencrypt in manual mode on a machine that is not your server, please ensure you're okay with that.

Are you OK with your IP being logged?


(You can set this with the --manual-public-ip-logging-ok flag). Skipping.
** DRY RUN: simulating 'letsencrypt renew' close to cert expiry
**          (The test certificates below have not been saved.)

All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/toot.cat/fullchain.pem (failure)
** DRY RUN: simulating 'letsencrypt renew' close to cert expiry
**          (The test certificates above have not been saved.)
1 renew failure(s), 0 parse failure(s)
root@tootcat2:/var/www/challenges/.well-known# 

Phase 5

Relevant excerpts from Discord:

Ok, yeah, same error. It's getting a 404. But the URL it claims to be accessing is returning the challenge.
(Just verified again, pasting URL from the logfile.)
It'd be nice if it would include the whole URL, including the domain and protocol.
Wait, it did. I'm just tired.
It's as if the LE remote client is reaching a different server.(edited)
Is there any logging of IP address...
Can't see any.
Okay, next idea; check Nginx server log and see what URL is being requested. If any.
[...]
I'm trying to do a close tracking of what request the server sees when I do the test run, but my ADD gets worse when I'm tired and I forget what I was doing between one screen and the next.
Ok, I've caught the server actually returning a 404!
Checking URL...
URL is fine, except I'm making an assumption about the domain.
I think that must be the problem.
Somehow.
There is another domain on that server, and it goes to a different webroot.
oh, ffu.... this MIGHT be an IPv6 issue. >.<
[...]
I can test this from another DO server. They all have IPv6 and tend to default to it.
[...]
It's IPv6.
[...]
I just got a 404 from the same URL that works from here.
Ok, this should be tractable now.
I just have to find an example of correct ipv6 Nginx config.
[...]
I can probably just copy from the https section.
Found.
Now restart nginx...
Or reload, I guess.
200 OK
So... do I dare try LE the easy way? No, I'd better do it manually again.
Actually, better do a manual dry-run first.
TEST SUCCESSFUL
[...]
Now the real thing (but manually again, because I have zero faith that the other part of the problem is solved).
heh... URL ends in UwU...
[...]
I got a success message from LE, but still getting cert error in browser.
owait, probly need to restart nginx.
<hopes that's all it is>
Yeah, the standard cron script does that too. Ok...
OMGSUCCESS.