Jump to content
inkedotly

Time code audio and video issue, any help appreciated

Recommended Posts

I have a video and sound file both with SMPTE time code. These were collected some time ago and cannot be re-collected. There was an LED that lit/sent a TTL pulse to the audio that allowed me to manually align the tracks, and I realized that there is a "drift" between the time code in the audio and video recordings. What can be the cause of this "drift"? What was done wrong in the setup? The recording setup consisted of a Horita time code generator (https://www.bhphotovideo.com/c/product/23700-REG/Horita_TG50_TG_50_SMPTE_LTC_Reader.html) with its TC generated output split to an AJA recorder (https://www.bhphotovideo.com/c/product/735435-REG/AJA_KI_PRO_MINI_R0_Ki_Pro_Mini_Compact.html) and the microphone. Any help would be appreciated, thanks.

Share this post


Link to post
Share on other sites

Hi inkedotly, in a nutshell, the problem is this: Timecode in the modern world of file-based recorders is a positional reference only, and applies to the start of the file on each recorder only (just like the "LED that lit/sent a TTL pulse to the audio - which we'd call a "Bloop Box" - and is an alternative to the film-style clapperboard). The advantage of timecode over these other positional references is that each take has a unique start reference that may make it easier in Post Production to sort through all the matching files, and allow a degree of automation in the process. After that, you rely on the two recorders (audio and video) to maintain EXACTLY the same speed or you get drift over time. In normal drama film production for example, the individual takes or shots are quite short, and drift doesn't have time to manifest itself if you are using decent equipment - but I suspect your Research applications mean continuous takes, maybe of hours or more. Even the best equipment will have slight differences in the calibration of their internal 'clock' or electronic timing rates that mean their individual ideas of how long an hour is will give you a drift in sync.

 

I'm guessing that when you say you sent timecode to "the microphone", this was a Zoom recorder or similar, with the timecode going on an audio track? While there's ways to extract that using software to sync up the start points of the files (as an alternative to the LED flash and pulse method you used) - it's existence on the recording does nothing to correct the timing of the audio recorder - and even the most expensive audio field production gear will have the same issue, to a lesser extent.

 

So - what to do about it? It's a really common problem, and any reasonably competent Audio Post Production person will be able to get your audio back in sync as long as there's obvious visual / audio events at each end of the material to match up. The techniques are either time-stretching (or shrinking) the audio a tiny bit, or cutting it into smaller chunks and slipping the head of each of those chunks into sync, then patching the holes if required. Note that this needs to be done in Audio editing software rather than Video editing software as on the whole the Audio editing software has a higher degree of time resolution. The reason it's a commonly seen problem is, as I mentioned above, using modern professional gear, the drift is negligible in the normal short (say a couple of minutes) shot lengths most common in film production where separate audio and video recorders are used - and in the Documentary world who do longer takes, recording audio onto the camera's recording medium has been typical - which alleviates the problem. Thus many camera operators are simply unaware it'll be problematic over longer takes. I personally work in long-form Concert filming and similar, and am constantly having to explain this to professional camera crews who are at the top of their game, so don't be surprised whoever captured your material got caught out.

 

How to avoid the problem in the future? The simplest way of course is to record video and audio to the same machine (camera or KiPro) - it can't drift with itself. There's plenty of reasons you'd prefer to use a separate recorder, such as needing more audio tracks, but if long takes are a reality for you, it's going to get expensive if you want drift-free results. Basically what you need to do is 'lock' the internal clocks of the camera and audio recorder together in perfect sync - and as we have learned, Timecode doesn't do that, being a positional reference, not a sync reference. What you need is what the Camera types call Genlock, and the Audio types call Wordclock. They are not the same signal, but serve the same purpose. This means you need a camera and audio recorder that can accept these signals and sync to them, and some sort of scheme to generate the Genlock / Wordclock. There's many options (and even some high-end cameras such as those made by Arri, that can indeed extract and sync / genlock to the timing information within a Timecode signal, if it's coming from an extremely stable source) so I can't really offer solutions here. You should find a good, experienced local film sound professional to help you out. For most researchers / documentary producers, this is outside their budget, so the other option is to accept that drift is a reality and budget to correct it in post production as described above.

 

All the best with it!

Share this post


Link to post
Share on other sites

Thanks so much for your detailed explanation, Nick! If I am understanding correctly, the time code generator that I had (with its split output to the camera and microphone) is not really keeping these two devices in sync throughout a recording session, but rather only at the very beginning. Then, as the frames advance, the time code generator assigns an address to each frame but due to the fact that the clock that advances the frames in the camera drifts from that of the recorder, we see drift.

 

This makes me curious as to what information is contained in the signal that the time code generator actually outputs and how the timecode assignment actually works? Since the TC generator output is split on a cable, both devices would be receiving the same exact information at any given moment in time.

Share this post


Link to post
Share on other sites

As was said, that TC info is not doing anything but labelling the frames of digital information.  In  this context it has nothing to do with controlling the speed at which those frames are made--that function is solely the domain of the recording devices sample clock.  Until you get those clocks under control, either by using clocks that are super stable (TCO type) or are connected in a master-slave arrangement so one follows the other, real long-term sync between the recorders is impossible.

Share this post


Link to post
Share on other sites

The Timecode coming into each device (video and audio) isn't actually labelling EACH frame - the device just looks at it at the exact moment a recording is started, reads the time data (in the format Hours:Minutes:Seconds:Frames, repeating as many times a second as suggested by the frame-rate you chose to use) and timestamps the file with that as it's start time. The vast majority of modern file-base recorders never look at the incoming timecode again during the recording of that take - whatever's playing them back just extrapolates from the start time. It's like the driver and conductor of a train both setting their watches to the station clock when they leave LA, then writing down their arrival time in NYC from their watches - the numbers will probably be different.

Note that in the 'olden days' of linear recording media (ie Tape and actual Film), things were different - older texts can add to the confusion here.

To give any more advice than we have already done would need more specific information about your setup - we know the KiPro Mini on the video side, but what specifically are you recording the sound on - and what editing software are you trying to combine them with?

Share this post


Link to post
Share on other sites
10 hours ago, nickreich said:

The Timecode coming into each device (video and audio) isn't actually labelling EACH frame - the device just looks at it at the exact moment a recording is started, reads the time data (in the format Hours:Minutes:Seconds:Frames, repeating as many times a second as suggested by the frame-rate you chose to use) and timestamps the file with that as it's start time. The vast majority of modern file-base recorders never look at the incoming timecode again during the recording of that take - whatever's playing them back just extrapolates from the start time. It's like the driver and conductor of a train both setting their watches to the station clock when they leave LA, then writing down their arrival time in NYC from their watches - the numbers will probably be different.

 

 

To be slightly more accurate...... the audio recorder actually looks at the incoming TC, and works out a value based on TC value and current frame rate, and 'stamps' a number into the file header that is worked out as 'samples since midnight'.

 

Also I couldnt help but to think that the driver and conductor of the train will arrive in NYC at different times because one is at the front of the train, and the other (much) further behind in the rear of the train ;-).

 

Kindest, Simon B

Share this post


Link to post
Share on other sites

The sound recording device that was used for the experiment (https://www.avisoft.com/usg/usg116h.htm) took the LTC as a digital input and recorded it to a separate channel on the wav file. The Horita TC generator outputted LTC on its RCA-out port which was hardwired via a splitter to both the Sound recording device and the video recording device.

 

I am looking at the video file TC in Premiere pro, and the audio file in the specialized software that comes with the sound recording device. I need to have the correct time to correlate events in the video with sound events in the audio file, but don't necessarily need to align the tracks.

 

I believe that in my case, it was actually the linear version that I was using since the LTC was recorded on a separate audio track in the video and audio files... This should mean that LTC recorded is independent of the video frame rate. To try to isolate the cause of the issue, I tried to count the relative frames advanced from time zero at the beginning of video vs. the frames advanced from time zero as determined by the SMPTE time code. There is zero drift when I compare the beginning and end of the video for relative vs. SMPTE. If the frame generation were drifting from the set frame rate, one would expect the time code to drift vs. the relative frames but that is not the case... Yet, there is definitely a drift over time between the video and audio files. 

Share this post


Link to post
Share on other sites

As to the original posting by Mr Inkedotly.......

 

In what country was the original footage shot?

What was the camera?

What was the sound recording device?

What fps was the camera running at?

What sample rate and TC fps was the sound recording device running at?

 

Is the 'drift' a gradual thing - is it that the sync for each shot starts off quite close together, and the amount of drift grows slowly as the shot continues?

Does the sound run in advance of the picture, ie does the sound 'lead' the picture, and the error becomes greater as the shot progresses?

 

The most likely situation is that the 'camera' was running at 29.97fps (or maybe 23.98fps), which we would refer to as being 'pulled down'. This is where a camera takes 30frames of video, but it takes just longer than a second to do so, so if it were a pendulum, it would be ticking just slightly slower than once per second. This is a historical thing to do with USA/colour video/TV, and the second was extended to allow the colour information to be added (to what was originally a 30fps black and white picture signal).

 

Your sound recording device may well have been recording and playing back at 'normal' speed, ie a second = a second, so basically the pictures are being recorded slower than the sound. The amount of error will be 0.01%.

 

A simple solution might be to slow down (or speed up) the audio clips by 0.01%, and see if this solves the drifting issue. You should be able to do this easily in almost any video editing software. You are unlikely to notiv=ce the difference in pitch in the audio.

 

I hope that this helps, and good luck,

 

Simon B

Share this post


Link to post
Share on other sites

On a side note building off of what Nick said:

 

I'm a big fan of using timecode boxes that are all jammed from one source (a separate master timecode generator or the audio recorder) and then those boxes live on the cameras. The cameras are set to external jam. If you have one on your audio recorder it is also set to external jam. This helps eliminate TC drift and allows everything to have as accurate a start timecode on the files as possible. It's even better if you can wirelessly jam everything all at once. 

The thing is, doing it this way can cost money, but it can be a life saver....as long as the person in charge of the TC knows what they are doing. It isn't that hard. When I was doing freelance work I had people let me be the "timecode god". They told me what frame rate we were rolling at, and I made sure everything was on the same page. Never had an issue. 
 

Share this post


Link to post
Share on other sites
30 minutes ago, inkedotly said:

The sound recording device that was used for the experiment (https://www.avisoft.com/usg/usg116h.htm) took the LTC as a digital input and recorded it to a separate channel on the wav file. The Horita TC generator outputted LTC on its RCA-out port which was hardwired via a splitter to both the Sound recording device and the video recording device.

 

I am looking at the video file TC in Premiere pro, and the audio file in the specialized software that comes with the sound recording device. I need to have the correct time to correlate events in the video with sound events in the audio file, but don't necessarily need to align the tracks.

 

 

The spec for the sound recording device shows that the available sampling rates do NOT include 48kHz (or 96kHz) which are the 'standard' sampling rates for video editing softwares. It could be that your NLE software, Premiere Pro, might be able to sample rate convert your audio clips on the way in, but it might need to know what your original sample rate was, and what your target sample rate might be.

 

What frame rate was the camera shooting at?

What sample rate was your audio recorder running at?

 

What picture frame rate is your Premiere Pro session running at?

What audio sample rate is your Premiere Pro session running at?

 

If, for instance, you shot pictures at 200fps, and sound at 50kHz, and you then load them both into a typical video editing session at, say, 25fps, 48kHz.... the pictures will last 8 x longer than the clip took to film (ie a 10sec camera run would last 80 seconds), and the sound will last just longer than 10seconds, something like 10.4 seconds.

 

Again, I hope that this helps..... sb

Share this post


Link to post
Share on other sites
11 hours ago, nickreich said:

The Timecode coming into each device (video and audio) isn't actually labelling EACH frame - the device just looks at it at the exact moment a recording is started, reads the time data (in the format Hours:Minutes:Seconds:Frames, repeating as many times a second as suggested by the frame-rate you chose to use) and timestamps the file with that as it's start time. The vast majority of modern file-base recorders never look at the incoming timecode again during the recording of that take - whatever's playing them back just extrapolates from the start time. It's like the driver and conductor of a train both setting their watches to the station clock when they leave LA, then writing down their arrival time in NYC from their watches - the numbers will probably be different.

Note that in the 'olden days' of linear recording media (ie Tape and actual Film), things were different - older texts can add to the confusion here.

To give any more advice than we have already done would need more specific information about your setup - we know the KiPro Mini on the video side, but what specifically are you recording the sound on - and what editing software are you trying to combine them with?

The net effect of using TC as I said is that each frame ends up with a unique identifier, otherwise editing would not work.  In any case all the TC provides as a unique address for each moment of the recording, and has nothing to do with the speed the recording was made at.

Share this post


Link to post
Share on other sites
5 hours ago, Bash said:

The spec for the sound recording device shows that the available sampling rates do NOT include 48kHz (or 96kHz) which are the 'standard' sampling rates for video editing softwares. It could be that your NLE software, Premiere Pro, might be able to sample rate convert your audio clips on the way in, but it might need to know what your original sample rate was, and what your target sample rate might be.

 


That is true, it is absolutely bizarre that this recorder lacks any usual standard rates!! :-o

Is only 16bits too (or 8bit?? what the hell!!), no 24bits option. Strange, very strange. 

Share this post


Link to post
Share on other sites

So with the scientific/logging USB interface you are using, as with most more typical 'musician' USB interfaces, the interface's internal clock is the source of timing for the computer doing the recording. It has a proprietary sync system so you can link multiple units for multi-channel logging, but this does not seem at initial glance to be able to accept an industry-standard sync signal such as Wordclock, and as Bash notes - the available samplerates do not include those typically used in the film / music industries. This makes any suggestion of using wordclock / genlock between the camera and audio recorder impractical - although I'd contact Avisoft for comment. I note in their LTC tech note they warn of a possible DRIFT for the reasons I've outlined of somewhere in the vicinity of 300ms (0.3 sec) per 10 mins recording time - certainly in the ballpark of what I'd expect of decent camera and audio gear running un-sync'd (timecode or not). If the drift is greatly more than that, then Bash's suggestion that Premiere Pro is incorrectly assuming the audio is a particular samplerate, and playing it as such, may be a cause. However, my reading of your workflow description is that you are not actually importing the audio into the video editing software - you are simply using the timecode display on the video software to then go find a matching audio event to investigate, using the timecode display in the Avisoft playback application - rather than trying to play them in sync? Here's where another complication raises it's head. I assume that the Avisoft recorder software will happily play back audio through the computer's headphone socket without the recording interface attached. In that case, you need to know how the software handles it's timecode display on playback. Is is reading a recorded timecode track in real-time, or doing what most film-industry recorders do, and reading a start time stamp at the head of the file, then extrapolating based on the sample count from there. If it's the latter, the stability and accuracy of the timing 'clock' generated by the computer will always be different to that generated by the USB audio interface when plugged in. If you have the interface, and the software will play back with it connected, maybe try that, and see if there's a difference in the TC location of an easily-locatable event towards the end of a long recording. These scientific logging recorders are something most of us here don't come across. The KiPro Mini video recorder will be getting it's sync from the Camera via the SDI or HDMI signal carrying the pictures, so camera and KiPro can be considered as a single unit. If that camera happens to be a DSLR stills-type camera, don't expect much in the way of stability / accuracy from it's internal clock either.

Share this post


Link to post
Share on other sites

I think that the fundamental problem here is that currently, we have no idea what the camera shooting frame rate was, nor what the sound recording sampling rate was. We may well be able to help somewhat more, if we know the above numbers ;-) sb

 

Mr Ironfilm - the audio record software is clearly a rather more 'scientific' thing than what we are used to in film and TV world. This is not the end of the world, we just nbeed to know what sampling rate it was running at!!!

 

Ho hum. Happy days.... sb

Share this post


Link to post
Share on other sites

Eeh, Bash..

Since when has sampling rate something to do with drift? (Unless someone changed the sampling rate indicator while not re-sample, thus changing the duration of a clip...)

As Nick explained, drift occurs due to non-precise clocks / lack of genlock on long takes.

 

 

Share this post


Link to post
Share on other sites
1 hour ago, Bouke said:

Eeh, Bash..

Since when has sampling rate something to do with drift? (Unless someone changed the sampling rate indicator while not re-sample, thus changing the duration of a clip...)

 

 

 

If you load a, say, 48.048 audio clip into a 48k project without sample rate converting it then it will play at a different speed to that which it was recorded at, and it will 'drift'. So.... sampling rate can have quite a lot to do with drift ;-)

 

Remember that the audio recorder the OP used does not run at any of our 'regular' sample rates.

 

sb

Share this post


Link to post
Share on other sites

Hi Bash,

Well, it's not that if you load a 48.048 clip that it will play at a different rate. It will play at 48.048. BUT (And that is what is happening  in our world,) if you record at 48K and THEN flag it as if it was recorded at 48.048, then it will play at a different rate. (Same as shooting slomo, high framerate, tell player to play at normal rate.)

 

 

Share this post


Link to post
Share on other sites
15 hours ago, Bouke said:

Hi Bash,

Well, it's not that if you load a 48.048 clip that it will play at a different rate. It will play at 48.048. BUT (And that is what is happening  in our world,) if you record at 48K and THEN flag it as if it was recorded at 48.048, then it will play at a different rate. (Same as shooting slomo, high framerate, tell player to play at normal rate.)

 

 

 

In truth, I think we are both correct - if you import a 48.048 clip into a 48k project without sample rate converting then it will play at off speed. If you flag a 48k file as a 48.048k file, and import it and tell it to p[lay at normal rate (ie with sample rate conversion) then it will play at off speed).

 

I think my greater point was that the software usewd to record the OP's audio is recording a weird sample rates, and so depending how you import into the video edit project it is likely to play at an off speed, so is also likely to drift wrt pictures. sb

Share this post


Link to post
Share on other sites

Well, If I don't know the fps / sr / amount of drift over what period / if the drift is constant or not, I'd rather not guess what is happening :-)

Depending on the amount of footage, I think we all would have fixed it in less time than guessing here :-)

 

Share this post


Link to post
Share on other sites
On 1/5/2018 at 2:44 AM, Bouke said:

Depending on the amount of footage, I think we all would have fixed it in less time than guessing here :-)

 

But where is the fun in that?!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×