UPS and Monitoring
After much deliberation over the years, I have finally invested in a UPS to stop my network equipment from being affected by the short powercuts we seem to have each year. I debated quite a while over the merits of getting a UPS when the Tesla Powerwall 2 can be made into an off-grid system and I may even still get the Tesla Powerwall 2 Backup Gateway at some point in time but a recent double power outage had me press the buy button 🙁
I bought the SMT750RMI2UC because it can be rack mounted (eventually) and because I don’t like the shape of the ones that are designed for desktop usage. The delivery guy who dropped off the box made it look effortless to pick up – I tried picking it up and first go failed as I wasn’t expecting it to be quite so heavy… 😂 that and covid made me lose a lot of strength 😢
After a couple of hours re-arranging the makeshift server cupboard, the UPS was up and running with all of the key devices in the cupboard powered by it. P.S. The featured pic is from Pexels and doesn’t represent my server cupboard😂
Having the UPS up and running is one thing but I need to know when it’s running normally and when it’s taken up the job of the mains, so on one of the servers in the cupboard, I’ve installed apcupsd to monitor the ups every minute. I hadn’t realised until setting up apcupsd that ssmtp on the server wasn’t correctly configured – emails were being sent by apcupsd because I hadn’t finished configuring the program to use the usb cable but the emails were going to root@hostname which obviously isn’t a valid email address! This particular issue took quite a while for me to work out. I eventually found this really helpful stackoverflow post https://askubuntu.com/questions/643873/how-to-get-ssmtp-to-map-local-user-to-email-address-for-the-to-field and got ssmtp setup correctly.
Next step was to get data every minute from the UPS into Splunk. It seems quite a few people have solved the same problem but I didn’t find any programs that I wanted to use, so decided to write a script that could be executed every minute by cron to send the data to Splunk in json format. Then I needed a way of receiving a notification in case the email didn’t get triggered – plus email is a bit clunky, I much prefer the little ping notification from Slack 😀
crontab entry:
* * * * * /home/victoria/scripts/cron/apc-ups.sh > /home/victoria/scripts/cron/apc-ups.sh.log
apc-ups.sh:
#!/bin/bash
function remove_output_file() {
rm ${OUTPUT_FILE}
}
function gather_data_from_ups() {
echo "gathering data from the ups"
/sbin/apcaccess status > ${OUTPUT_FILE}
ls -la
echo "found this data and put it into file"
cat ${OUTPUT_FILE}
}
function validate_output_file_has_data() {
linesFound=$(cat ${OUTPUT_FILE} | wc -l)
echo "lines found '${linesFound}'"
if [ "${linesFound}" -eq "0" ]; then
echo "Failed to gather data from the UPS successfully"
exit 1
fi
}
function generate_output_json() {
while read -r line || [[ -n "$line" ]]; do
echo "line '$line'"
fieldName=$(echo $line | cut -d":" -f 1 | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')
fieldValue=$(echo $line | cut -d":" -f 1 --complement | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')
echo "fieldName '${fieldName}' fieldValue '${fieldValue}'"
jsonOutput="${jsonOutput}\"${fieldName}\":\"${fieldValue}\","
done < ${OUTPUT_FILE}
jsonOutput="${jsonOutput::-1}}"
echo "json = ${jsonOutput}"
}
function post_to_splunk() {
curl -s -k ${SPLUNK_COLLECTOR_URL} \
-H "Authorization: Splunk ${SPLUNK_TOKEN}" \
--data "{\"sourcetype\": \"json_no_timestamp\", \"event\": ${jsonOutput}}"
}
function post_slack_message() {
local message="$1"
payload="{\"text\": \"${message}\"}"
curl -s \
-X POST \
-H 'Content-type: application/json' \
--data "${payload}" \
${SLACK_WEBHOOK_URL}
}
function send_alert_to_slack() {
status=$(echo "${jsonOutput}" | jq --raw-output .STATUS)
echo "STATUS is '${status}'"
if [ "${status}" == "ONLINE" ]; then
if [ -f ${ALERT_FILE} ]; then
echo "Sending notification UPS back up and running"
post_slack_message "${SLACK_MESSAGE_MAINS}"
rm ${ALERT_FILE}
else
echo "skipping sending alert for ups online as alert file not found"
fi
elif [ "${status}" == "ONBATT" ]; then
if [ -f ${ALERT_FILE} ]; then
echo "skipping sending alert for ups error status as alert file found"
else
echo "Sending alert for UPS status on battery"
post_slack_message "${SLACK_MESSAGE_ON_BATTERY}"
touch ${ALERT_FILE}
fi
else
if [ -f ${ALERT_FILE} ]; then
echo "skipping sending alert for ups error status as alert file found"
else
echo "Sending alert for UPS status"
post_slack_message "${SLACK_MESSAGE_UNKNOWN}"
touch ${ALERT_FILE}
fi
fi
}
ALERT_FILE=/home/victoria/scripts/cron/apc-ups.sh.alert
OUTPUT_FILE=/home/victoria/scripts/cron/apc-status.output
SLACK_MESSAGE_MAINS="<!channel> UPS is back on mains :large_green_square:"
SLACK_MESSAGE_ON_BATTERY="<!channel> WARNING: UPS is running from battery! :large_red_square:"
SLACK_MESSAGE_UNKNOWN="<!channel> WARNING: UPS status unexpected - please check :large_yellow_square:"
SLACK_WEBHOOK_URL="<WEBHOOK URL>"
SPLUNK_COLLECTOR_URL="http://<HOST:PORT>/services/collector"
SPLUNK_TOKEN="<TOKEN UUID>"
echo "starting script apc-ups.sh"
remove_output_file
gather_data_from_ups
validate_output_file_has_data
jsonOutput="{"
generate_output_json
#alert first otherwise when Splunk is down, Splunk call might fail and not send alert
send_alert_to_slack
post_to_splunk
Splunk dashboard XML:
<dashboard theme="dark">
<label>UPS</label>
<row>
<panel>
<title>Status</title>
<single>
<search>
<query>index=ups
| fields STATUS
| tail 1
| eval text=case(STATUS=="ONBATT","Battery",STATUS=="ONLINE","Mains",true(),"?")
| fields text
| eval range=case(text=="Battery","severe",text=="Mains","low",true(),"high")</query>
<earliest>rt-30m</earliest>
<latest>rt</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="colorBy">value</option>
<option name="colorMode">block</option>
<option name="drilldown">none</option>
<option name="height">200</option>
<option name="numberPrecision">0</option>
<option name="rangeColors">["0x53a051","0x0877a6","0xdc4e41"]</option>
<option name="rangeValues">[1,2]</option>
<option name="refresh.display">progressbar</option>
<option name="showSparkline">1</option>
<option name="showTrendIndicator">1</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
<option name="trendColorInterpretation">standard</option>
<option name="trendDisplayMode">absolute</option>
<option name="unitPosition">after</option>
<option name="useColors">0</option>
<option name="useThousandSeparators">1</option>
</single>
</panel>
</row>
<row>
<panel>
<title>Battery Percentage</title>
<single>
<search>
<query>index=ups
| rex field=BCHARGE "(?<battery_percentage>.*) Percent"
| fields battery_percentage
| tail 1</query>
<earliest>rt-30m</earliest>
<latest>rt</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="colorMode">block</option>
<option name="drilldown">none</option>
<option name="height">200</option>
<option name="numberPrecision">0</option>
<option name="rangeColors">["0x53a051","0xdc4e41","0xf1813f","0xf8be34","0x53a051"]</option>
<option name="rangeValues">[0,25,50,75]</option>
<option name="refresh.display">progressbar</option>
<option name="unit">%</option>
<option name="unitPosition">after</option>
<option name="useColors">1</option>
<option name="useThousandSeparators">1</option>
</single>
</panel>
<panel>
<title>Time Remaining</title>
<single>
<search>
<query>index=ups
| rex field=TIMELEFT "(?<battery_time_left>.*) Minutes"
| fields battery_time_left
| tail 1</query>
<earliest>rt-30m</earliest>
<latest>rt</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="colorMode">block</option>
<option name="drilldown">none</option>
<option name="height">200</option>
<option name="numberPrecision">0</option>
<option name="rangeColors">["0x53a051","0xdc4e41","0xf1813f","0x53a051"]</option>
<option name="rangeValues">[0,20,45]</option>
<option name="refresh.display">progressbar</option>
<option name="unit">Mins</option>
<option name="useColors">1</option>
</single>
</panel>
</row>
<row>
<panel>
<title>On Battery Events Last 90 Days</title>
<viz type="heat-map-viz.heat-map-viz">
<search>
<query>index=ups STATUS=ONBATT
| timechart count as "onbattery" by status</query>
<earliest>rt-90d</earliest>
<latest>rtnow</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">none</option>
<option name="heat-map-viz.heat-map-viz.colorCritical">#DC4E41</option>
<option name="heat-map-viz.heat-map-viz.colorHigh">#F1813F</option>
<option name="heat-map-viz.heat-map-viz.colorLow">#53A051</option>
<option name="heat-map-viz.heat-map-viz.colorMedium">#F8BE34</option>
<option name="heat-map-viz.heat-map-viz.convertTimeToUTC">false</option>
<option name="heat-map-viz.heat-map-viz.enableAnimation">true</option>
<option name="heat-map-viz.heat-map-viz.enableShades">false</option>
<option name="heat-map-viz.heat-map-viz.hideCellBorders">false</option>
<option name="heat-map-viz.heat-map-viz.labelCritical">Critical</option>
<option name="heat-map-viz.heat-map-viz.labelFontSize">8</option>
<option name="heat-map-viz.heat-map-viz.labelHigh">High</option>
<option name="heat-map-viz.heat-map-viz.labelLow">Low</option>
<option name="heat-map-viz.heat-map-viz.labelMedium">Medium</option>
<option name="heat-map-viz.heat-map-viz.legendPosition">top</option>
<option name="heat-map-viz.heat-map-viz.legendText">categories</option>
<option name="heat-map-viz.heat-map-viz.reverseNegativeShade">false</option>
<option name="heat-map-viz.heat-map-viz.shape">square</option>
<option name="heat-map-viz.heat-map-viz.showDateInTooltip">true</option>
<option name="heat-map-viz.heat-map-viz.showLegend">false</option>
<option name="heat-map-viz.heat-map-viz.showValues">false</option>
<option name="heat-map-viz.heat-map-viz.tokenLabel">onbattery</option>
<option name="heat-map-viz.heat-map-viz.tokenTime">hm_token_time</option>
<option name="heat-map-viz.heat-map-viz.tokenValue">hm_token_value</option>
<option name="heat-map-viz.heat-map-viz.tooltipDateFormat">dd-mm-yyyy</option>
<option name="heat-map-viz.heat-map-viz.valHigh">75</option>
<option name="heat-map-viz.heat-map-viz.valLow">0</option>
<option name="heat-map-viz.heat-map-viz.valMedium">40</option>
<option name="heat-map-viz.heat-map-viz.yaxiswidth">custom</option>
<option name="heat-map-viz.heat-map-viz.yaxiswidthpx">0</option>
<option name="refresh.display">progressbar</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</viz>
</panel>
</row>
</dashboard>
Please enable the Disqus feature in order to add comments