Monday, January 2, 2006

Idx/Sub Subtitle File Time Adjust (ISFiTA) v0.01

This is a quick and dirty (even the acronym) perl script I use to time-shift/stretch idx/sub subtitle files.



#!/usr/bin/perl


###########################################################################################
#
# Idx/Sub Subtitle File Time Adjust (ISFiTA) v0.01
#
# Author: Tsan-Kuang Lee
# Date: Jan 2, 2006
#
# What does it do?
#
# A video file may have different subtitle files, which not only differ in format
# but also differ in their timestamps. Usually this is because the creators of
# the subtitle files are syncing their subtitles to different video files (of the
# same content, but some with, say, trailors, and others without).
#
# For example, there are two video files: v1 and v2. Let's say v1 has 2 minutes more
# trailors from the beginning before the main content. Subtitle creators make s1 for
# v1 and s2 for v2.
#
# However, we only have v1 and s2 at hand. We want s2 to sync with v1.
#
# If s2 is in plain text subtitle format, e.g. SRT, TXT, SSA, etc. Many excellent
# utilities have been written, such as: SubCreator, http://www.radioactivepages.com/
# ISFiTA deals with idx/sub format.
#
#
# How to use it?
#
# ISFiTA simply reads the idx file (it's a plain text file) and recalculate the
# timestamps, according to the parameters we provide. We'll need to provide the
# several parameters. Here's what you would usually input at the command prompt:
#
# % perl ISFiTA-v0.01.pl 3 film0.idx 00:03:55:735 01:32:53:067 film.idx 00:03:34:535 01:32:32:067
#
# Here's what they mean:
#
# "%" is the command prompt;
#
# "perl ISFiTA-v0.01.pl" launches this script;
#
# "3" is the index code of the language you want to deal with. If you open the idx file
# with a text editor, you would see something like "id: zh, index: 3". zh is the language
# code for Chinese (Zhong-wen), and "3" is what we want here.
#
# "film0.idx" is the original idx file we want to read from.
#
# "00:03:55:735" is the timestamp (the last three digits are in milliseconds) of one
# specific subtitle line in "filem0.idx". Let's say it maps to the line "Let the show begin!".
# Many tools let you see which line "00:03:55:735" maps to, e.g. VobSub. (Of course, you need
# to have film0.sub for VobSub to display the image.)
#
# "01:32:53:067" maps to another line in "film0.idx", say "See you next time!".
#
# "film.idx" is the target file we want to write to.
#
# "00:03:34:535" is the target timestamp for "Let the show begin!". You can get this target
# timestamp from watching the video file, or from a correctly synced subtitle file.
#
# "01:32:32:067" is the target timestamp for "See you next time!".
#
# "Let the show begin!" doesn't have to be the first line in the whole subtitle; likewise,
# "See you next time!" doesn't have to be the last. However, the more apart they are from
# each other, the more accuracy of caculation we get, since the minor time-offs will be
# divided into ignorably small differences.
#
# Don't forget to provide film.sub to complete the idx/sub pair. Just use the original sub and
# correctly rename it.
#
# About
#
# This is a quick and dirty perl script (even its acronym sucks) I use to time-shift/stretch
# idx/sub subtitle files. Do whatever you want with it, at your own risk. If you do improve
# it or rewrite it, I encourage you to share it with the public. Authors of free utilities have
# my highest respects.
#
# Todos for you
#
# Here are some suggestions if you want to contribute something to the Open Source world, or the
# freeware world:
#
# SubCreator has much more sophisticated transformations. For example, you can decide the time
# range in which you want to adjust the time code.
#
# Obviously Perl is not user-friendly for most people. You may want to re-write it in another
# language and compile it into an excuteable.
#
# GUI makes it more approachable.
#
# Bug the authors of VobSub, SubCreator, etc. to include idx/sub time adjust/stretch functions
# into their utilities.
#
#
###########################################################################################


# Command line arguments
unless (@ARGV == 7) { die "Usage: $0 lang_index in_file in_start in_end out_file out_start out_end\n(timestamp format: hh:mm:ss:mis)\n" }
($lang_index, $in_file, $in_start_timestamp, $in_end_timestamp, $out_file, $out_start_timestamp, $out_end_timestamp) = @ARGV;

# Open files
unless (open INFILE, "<", $in_file) {
die "Couldn't open input file $in_file: $!; aborting";
}
unless (open OUTFILE, ">", $out_file) {
die "Couldn't open output file $out_file: $!; aborting";
}


# calculate necessaary transformation (shift, slope)
# simple linear transformation formula :
#
# new_bx = (ax - a1) * (b2-b1)/(a2-a1) + b1 , where
# a1 = in_start ; a2 = in_end
# b1 = out_start; b2 = out_en
# ax = old_time_point

$shift_milliseconds = &convert_to_milliseconds($out_start_timestamp);
$slope = (&convert_to_milliseconds($out_end_timestamp) - $shift_milliseconds) / (&convert_to_milliseconds($in_end_timestamp) - &convert_to_milliseconds($in_start_timestamp));
$in_start_milliseconds = &convert_to_milliseconds($in_start_timestamp);


# start processing

$section = "header";
while ()
{
if ($section eq "header" || $section eq "other_lang")
{
# check language id
# id: zh, index: 2
if ($_ =~ m/^id: *.+, index: *(.+)$/)
{
if ($1 eq $lang_index)
{
$section = "processing";
print "Processing $_";
}
else
{
$section = "other_lang";
}
}
}
elsif ($section eq "processing")
{
# stop when another language id tag appears
if ($_ =~ m/^id: *.+, index: *(.+)$/)
{
$section = "other_lang";
}
elsif ($_ =~ m/^(timestamp: )(\d\d:\d\d:\d\d:\d\d\d)(,.+)$/s)
# only process lines with timestamps, e.g.
# timestamp: 01:22:20:634, filepos: 00011a000
{
$_ = $1 . &calculate_new_timestamp($2) . $3;
}
}
print OUTFILE $_;
}

sub calculate_new_timestamp
{
my ($old_timestamp) = @_;
$new_milliseconds = (&convert_to_milliseconds($old_timestamp) - $in_start_milliseconds) * $slope + $shift_milliseconds;
return &convert_to_timestamp($new_milliseconds);
}

sub convert_to_milliseconds
{
my ($string) = @_;
$string =~ m/(\d+):(\d+):(\d+):(\d+)/;
$milliseconds = $4 + $3*1000 + $2*1000*60 + $1*1000*60*60;
return $milliseconds;
}

sub convert_to_timestamp
{
my ($milliseconds) = @_;

# Although perl supports integer division, the scope of "use integer"
# is global and that would cause slope calculation become integer.
# Therefore, we use redundant division as below. floor() is possible but
# we don't use it here for cross-distribution compatibility
# (some perl distributions don't have that package installed)

my $res_h = $milliseconds % (60*60*1000);
my $hour = ($milliseconds - $res_h) / (1000*60*60);
my $res_m = $res_h % (1000*60);
my $min = ($res_h - $res_m) / (1000*60);
my $res_s = $res_m % 1000;
my $sec = ($res_m - $res_s) / 1000;
my $ms = $res_s;

return sprintf("%02d:%02d:%02d:%03d", $hour, $min, $sec, $ms);
}

No comments:

Post a Comment