0

Heartbeat

Serge Rudziankou 3 weeks ago in Master modules / MD5 updated 2 weeks ago 2

Hello,

I am working on a hydraulic control system based on MD5 and MC43. The main control logic runs in MC43, while MD5 handles non-safety UI functions.

To ensure safe operation, I implemented a simple heartbeat between MD5 and MC43:

Setup:

  • Heartbeat: digital signal toggling TRUE/FALSE every ~100 ms in MD5
  • Transmitted via J1939 to MC43
  • MC43 monitors time since last change
  • System cycle time: 50 ms
  • Safety action: if no change for >500 ms, MC43 disables outputs
  • Logging is active on both MD5 and MC43

Problem:

The heartbeat signal is periodically lost or appears frozen for short periods (typically 100–3000 ms, sometimes longer), triggering safety shutdowns.

Test setup (minimal system):

  • Only MD5 and MC43 connected
  • 30 cm CAN cable
  • Two 120Ω termination resistors at both ends
  • No other CAN nodes
  • Minimal application: only one J1939 message + heartbeat logic
  • No HMI logic or additional communication

The issue still occurs in this simplified configuration.

Tests performed:

  • Different CAN speeds (including 250k / 500k/1000)
  • J1939 and generic CAN communication modes
  • Same behavior observed across configurations
  • No communication errors reported in internal Parker logs

Question:

Has anyone observed similar intermittent J1939 communication drops between MD5 and MC43 under otherwise stable conditions?

+1

I'm thinking that a problem could be the logic with just a toggling message. If the MC43FS cycle time isn't faster than the MD5, you might be seeing the same content even if multiple messages were received. 


A quick thing to try for troubleshooting purposes could be to change the logic in your MC43FS to just look at a timeout on the JFIN/GFIN, then you know if the message is received. 


Another recommendation for troubleshooting would be to take a CAN trace to see if the CAN message actually shows up. 

Some fluctuation in time between CAN messages from an MD5 is expected, but it shouldn't go silent for this long. 


If you combine the MD5 and MC43FS in the same IQAN multi-master system, you will have a heartbeat functionality on the diagnostics bus. The individual master modules report this on the system information channel multi master state.

Hi Gustav. Thanks for the reply.

Regarding your first point, I agree that a simple toggling signal could be missed if the MD5 changes state several times between MC43 cycles. To rule that out, I replaced the toggling bit with an 8-bit counter (0–255) that is transmitted every cycle. The result was exactly the same: the counter still occasionally "freezes" on the MC43 side.

As for relying only on the JFIN/GFIN timeout, that isn't sufficient for my application. It's not enough for me to know that a CAN message was received. I need to know that the application running on the MD5 is alive, processing operator inputs, and sending up-to-date data.

For example, if the operator is performing a potentially hazardous action and releases the button, I want the machine to stop immediately. I don't want a situation where the display freezes, keeps transmitting the last valid command, and the machine continues moving for another second or longer.

That's why I'm using a changing heartbeat/counter instead of just monitoring message reception.

I've tested the system for several days, both in the full configuration with all devices connected and in a minimal setup consisting of only the MD5, MC43, a short CAN cable, and an almost empty application containing only the communication monitoring logic.

Test parameters:

  • System cycle time: 50 ms
  • J1939 and Generic Bus
  • CAN speeds tested: 250 kbps and 500 kbps
  • Counter updated and transmitted every 50 ms
  • Fault triggered if no update was received for more than 500 ms

In all cases, the results were nearly identical: between 3 and 10 watchdog trips per day. The vast majority of interruptions lasted less than one second.

Since the behavior is the same regardless of application size, CAN speed, or network load, I've decided for now to simply increase the timeout and treat it as a system characteristic.