You're right, my best guess is they would have to manually set the timing or frame exposure for the animations to get that effect, which would take a lot of work. The team would probably prefer to spend that much time and money elsewhere.
And yeah, soundalikes just don't hit quite the same, but...