[Obsolete] Converting subtitles from SVT Play

Posted 2015-09-04 20:40 by Traesk. Edited 2020-08-10 13:54 by Traesk. 1594 views.

Update 2020-08-10: SVT has switched from using their own tags to instead using proper WebVTT format, with formatting tags like <c.magenta>, <c.green> etc. This means you can now either use the subs directly as WebVTT (in a supported player) or use a tool like Subtitle Edit to convert them to SRT.

Update 2019-06-01: Fixed new url to akamized in php-script.

SVT Play and Öppet Arkiv use a slightly modified version of the SRT-format, with some additional tags from WebVTT, to display subtitles. These files can be directly downloaded by finding the direct link in the source code or using a service such as pirateplay.se. They do work pretty well just out of the box, but can be modified to better match the intended style. Just like with the mp4-video itself, SVT use their Flash-based videoplayer to parse the content of the subtitle and apply the formatting. They do have a beta of HTML5-playback, but according to their FAQ it does not support subtitles yet. There are two differences in SVT:s format compared to the normal SRT-format:

<30> through <37> tags

These tags are SVT specific and used to apply colours to the subtitle. I downloaded SVT:s Flash-player from http://media.svt.se/swf/video/svtplayer-2015.01.swf and opened it with JPEXS Free Flash Decompiler. What I found when browsing to scripts -> se -> svt -> utils -> subtitle -> SubtitleUtils was:
[..]
private function translateAnsiColor(param1:String) : String
      {
         var param1:String = this.replaceAnsiColor(param1,"30","#000000");
         param1 = this.replaceAnsiColor(param1,"31","#FF0000");
         param1 = this.replaceAnsiColor(param1,"32","#00FF00");
         param1 = this.replaceAnsiColor(param1,"33","#FFFF00");
         param1 = this.replaceAnsiColor(param1,"34","#0000FF");
         param1 = this.replaceAnsiColor(param1,"35","#FF00FF");
         param1 = this.replaceAnsiColor(param1,"36","#00FFFF");
         param1 = this.replaceAnsiColor(param1,"37","#FFFFFF");
         return param1;
      }
      
      private function replaceAnsiColor(param1:String, param2:String, param3:String) : String
      {
         var param1:String = param1.replace(new RegExp("<" + param2 + ">","g"),"");
         param1 = param1.replace(new RegExp("<!--" + param2 + "-->","g"),"");
         return param1;
      }
[..]
Here we can see that, for example, <31>*text*</31> is replaced with <font color="#FF0000">*text*</font> to change the subtitles colour to red. The resulting code works well with usual SRT, so if we want the correct colours we'll also have to do this conversation. Example:
186
00:19:29.280 --> 00:19:37.040
–Pappa, i går såg jag en häst.
<33>–Jättekul. Klappade du den?</33>
to:
186
00:19:29.280 --> 00:19:37.040
–Pappa, i går såg jag en häst.
<font color="#FFFF00">–Jättekul. Klappade du den?</font>

WebVTT-tags

WebVTT is a standard to display subtitles in HTML5. It's based on and very similar to SRT. SVT use some tags specific to WebVTT, and these are also parsed by the Flash-player and not by the browser itself. An example would be:
248
00:28:08.480 --> 00:28:11.880 A:middle L:50%
SVT Programtextning:
lisbet olofsdotter
...where "A:middle L:50%" is positioning in WebVTT. Positioning is the only tags I've seen used in SVT:s subtitles. More info about these tags here.

I tried to save the subtitle as .vtt and opening it in MPC-HC, VLC, Firefox and Internet Explorer. All of them seemed to ignore these WebVTT-tags, while the browsers also ignored the <font>-tag. I tried to add "WEBVTT" at the beginning of the file, according to the standard (SVT does not have this line), but that did not help. Only difference was that MPC-HC could not open it as a subtitle at all anymore. Even when opening the file with Subtitle Edit these tags were ignored. I'm not a WebVTT-guru, but it seems to me that it's better to just save it as SRT. At least until players add proper support for these tags.

So, what about converting these tags to SRT-tags? While it seems like the following tags are not "officially" part of SRT, it's possible to set position using either {\pos(x,y)} to set exact position, or \an to set it to left/center/middle and bottom/middle/top, reportedly, \an-tags position themselves like this on the screen (\an1 is bottom left etcetera):
{\an7} {\an8} {\an9}
{\an4} {\an5} {\an6}
{\an1} {\an2} {\an3}
However, as these tags does not exactly match the WebVTT-tags in function and that compatibility is questionable, I think it's just a waste of time attempting to convert the positioning. Also SVT:s parsing is done by the Flash-player so it might not even be consistent with how it's supposed to work according to the WebVTT-spec.

I did add a feature on my converter to remove this kind of tags (everything remaining on the line after "00:00:00.000 --> 00:00:00.000"). Again, it seems like most players ignore the extra tags anyways, so there is really no reason to remove it besides wanting to have a clean SRT.

PHP Script

Here is the source code to my converter.
<?php

error_reporting(-1);
ini_set("display_errors", 1);

//Does the actual converting
function convert($sub){
	$colours = array(
		"30" => "000000",
		"31" => "FF0000",
		"32" => "00FF00",
		"33" => "FFFF00",
		"34" => "0000FF",
		"35" => "FF00FF",
		"36" => "00FFFF",
		"37" => "FFFFFF",
	);

	//Replace all <3x> tags with the correct colours, one by one
	foreach ($colours as $num => $colour){
		$sub = preg_replace("%<$num>%", "<font color=\"#$colour\">", $sub);
		$sub = preg_replace("%</$num>%", "</font>", $sub);
	}

	//Strips WebVTT-tags if requested (remove everything on the line after "00:00:00.000 --> 00:00:00.000")
	if (isset($_POST['strip_vtt'])){
		$sub = preg_replace("%(^[0-9-:.]+[ \->]+[0-9-:.]+)( .+)$%m", "\\1", $sub);
	}

	return $sub;
}

//Zips files to send to user
function addtozip($filename, $contents){
	global $outputfile;
	$zip = new ZipArchive();
	$zip->open($outputfile, ZipArchive::CREATE);
	$zip->addFromString($filename, $contents);
	$zip->close();
}

//Scan through the folder containing the zips and delete all older than an hour
function cleanup(){
	$svtdownload_contents = scandir("./svtdownload/");
	foreach ($svtdownload_contents as $i){
		$svtdownload_file = "./svtdownload/" . $i;
		if (is_file($svtdownload_file)){
			if(filemtime($svtdownload_file) < time()-3600){
				unlink($svtdownload_file);
			}
		}
	}
}

//If user has submitted the form with data, process it
if (isset($_POST['upload_sub']) && (!empty($_POST['sub_text']) || $_FILES['sub_file']['error'] == 0)){
	//If a file was uploaded...
	if ($_FILES['sub_file']['error'] == 0){
		cleanup();
		//Set the path for the zip that we'll create later
		$outputfile = "./svtdownload/" . str_replace(".", NULL, $_SERVER['REMOTE_ADDR']) . time() . ".zip";
		//Get the filetype of the uploaded file
		$finfo = finfo_open(FILEINFO_MIME_TYPE);
		$filetype = finfo_file($finfo, $_FILES['sub_file']['tmp_name']);
		finfo_close($finfo);
		//If user uploaded a text file (srt). Some subs are detected as x-pascal for some reason.
		if ($filetype === "text/plain" || $filetype === "text/x-pascal"){
			//Get the content, convert it, and zip it
			if($uploaded_sub = file_get_contents($_FILES['sub_file']['tmp_name'])){
				$finished_sub = convert($uploaded_sub);
				addtozip($_FILES['sub_file']['name'], $finished_sub);
			}
		}
		//If user uploaded a zip file
		else if ($filetype === "application/zip"){
			//Open the uploaded zip
			$uploaded_zip = new ZipArchive();
			$uploaded_zip->open($_FILES['sub_file']['tmp_name']);
			//Loop through every file in the zip
			for($i = 0; $i < $uploaded_zip->numFiles; $i++){
				//Get info about the current file
				$stat = $uploaded_zip->statIndex($i);
				//Get the filetype for the current file, process it if text
				$finfo = finfo_open(FILEINFO_MIME_TYPE);
				$filetype = finfo_buffer($finfo, $uploaded_zip->getFromIndex($i));
				finfo_close($finfo);
				//Get the content, convert it, and zip it
				if ($filetype === "text/plain" || $filetype === "text/x-pascal"){
					$finished_sub = convert($uploaded_zip->getFromIndex($i));
					addtozip($stat['name'], $finished_sub);
				}
			}
		}
		else{
			$error = "<br /><span class=\"red_color\">Wrong filetype. Please upload text-subtitle or zip.</span><br />";
		}
		//Redirect user to zip, if exists
		if(file_exists($outputfile)){
			header("Location: $outputfile");
		}
		else if (!isset($error)){
			$error = "<br /><span class=\"red_color\">Something went wrong :(</span><br />";	
		}
	}
	//If user did not upload a file but filled in the text-field
	else if (!empty($_POST['sub_text'])){
		cleanup();
		$textbox_echo = null;
		$counter = 1;
		//Set the path for the zip that we'll create later
		$outputfile = "./svtdownload/" . str_replace(".", NULL, $_SERVER['REMOTE_ADDR']) . time() . ".zip";
		//Loop through each line and check if it's an url
		$all_lines = explode("\n", trim($_POST['sub_text']));
		foreach ($all_lines as $i){
			//Remove \n from the string
			$i = trim($i);
			//New url discovered 2019 to akamaized, keeping old as alternative if it's still used.
			$url_regex = "%^(https?://svt-vod-5m\.akamaized\.net/|https?://media\.svt\.se/download/).*\.wsrt$%";
			//Check if it links to a wsrt on media.svt.se/download/
			if (preg_match($url_regex, $i) == true){
				//Set variable to skip converting from text-input later
				$inputbyurl = true;
				//Fetch the sub, convert it and zip it
				if($sub = file_get_contents($i)){
					$finished_sub = convert($sub);
					addtozip(str_pad($counter, 2, "0", STR_PAD_LEFT) . ". " . pathinfo($i, PATHINFO_BASENAME), $finished_sub);
					$counter++;
				}
			}
		}
		//If no links were found, suppose input is a sub and convert it
		if(!isset($inputbyurl)){
			$finished_sub = convert($_POST['sub_text']);
			$textbox_echo = htmlentities($finished_sub);
		}
		else{
			//Redirect user to zip, if exists
			if(file_exists($outputfile)){
				header("Location: $outputfile");
			}
			else{
				$error = "<br /><span class=\"red_color\">Something went wrong :(</span><br />";	
			}
		}
		
		$output = "\n\t<div class=\"sub_container\">
		Results:<br />
		<textarea class=\"sub_textbox\">" . $textbox_echo . "\t\t</textarea>
	</div>";
	}
}

echo "<!DOCTYPE html>
<html>
<head>
	<meta charset=\"utf-8\" />
	<title>SVT Subs Converter</title>
	<style>
		.sub_textbox {
			width:500px;
			min-width:200px;
			height: 200px;
			min-height: 100px;
		}
		.sub_container{
			background-color: #F8F8F8;
			padding: 10px;
		}
		.red_color{
			color: red;
		}
	</style>
</head>
<body>
	<div class=\"sub_container\">
		" . htmlentities("This simple tool will replace SVT:s <30> to <37> tags with their respective colours. Source code and technical details ") . "<a href='/view/27/'>here</a>.<br />\n\t\t";
		if (isset($error)){
			echo $error;
		}
		echo "<form method=\"post\" enctype=\"multipart/form-data\">
			<br />
			Upload file, either single file or zipped:
			<br />
			<input type=\"file\" name=\"sub_file\" />
			<br />
			<br />
			...or input subtitle as text, or paste 1 url per line:
			<br />
			<textarea class=\"sub_textbox\" name=\"sub_text\">" . @$_POST['sub_text'] . "</textarea>
			<br />
			<br />
			<input type=\"checkbox\" name=\"strip_vtt\" id=\"strip_vtt\" />
			<label for=\"strip_vtt\">Remove WebVTT specific tags (most players just seem to ignore these, doesn't do any harm to keep them)</label>
			<br />
			<br />
			<input type=\"submit\" name=\"upload_sub\" value=\"Submit\" />
			<button type=\"reset\" value=\"Reset\">Reset</button>
		</form>
	</div>";

	if (isset($output)){
		echo $output;
	}

echo "\n</body>
</html>"

?>

Comments
No comments yet.