yesterday at 01:00.07 my HA crashed (? at least it was no longer on network and unresponsive in physical tests) (raspberry pi 4 8gb) - this has never happened like that and I am looking for how to identify what caused it?
- everything was up to date until last week.
- looking at it, it seemed to be running but:
- It was not connecting to the network, no ip, no ping possible
- it was not actually running any automations where it did not need network (eg flow meter to turn on relay)
- the ssd drive was running (green/red light as usual) and had loads of storage
- eventually in debugging I added an additional power source on ssd drive
after cycling restarts (10x during the day, giving it about an hour between each to be sure (?)) it actually started!? (took 3x after i added the extra power source on ssd, so I don’t think it actually is the solution as it did not work directly, but left it on “just in case”)
on its startup:
- immediate error in interface was that supervisor for bluetooth did not start I disabled this. (but I think this was an effect of it just having trouble starting… but maybe this was it? )
- no logs or errors that I did not have previously showed up on startup
- it was very slow and strange - thinking I should reflash the card I wanted to make sure I got the data I ran a backup … the backup is now 2gb where it had swelled to 4.8gb in recent weeks quite suddenly filling my local card so a few weeks ago I moved the data to the SSD (figuring out what actually was causing this was on my todo but I have not identified it and now it seems to be gone?) - but now what did i loose and which backup should I use to restore from?
- I then did a full reboot
- it now seems to be running fine. eg water flow is triggering relay in system and I see it running physically
- did not crash again at 01:00.xx
- processor logs show that it was quite high on monday at 01:00.xx but even has some recording of this after this but everything else was not recorded (eg water flow)… so seems something else broke… disconnected from network? (unfortunately I turned on the processor logs a few days ago along with storage space so I can not see if it is normal for a “monday @01:00.xx”)
- logs for water flow counter show it was working, and the relay indicates it likely turned on … but i know it did not actually turn on in a physical test which I tried multiple times.
any help is appreciated:
eg
to identify what could be the cause? or advice if I should just do a restore?
any ideas how to figure out what data has been pruned between backups (the now missing 2.8gb!?) or which restore would you use, the 4.8gb from a few days ago or the now 2gb?
2 posts - 2 participants