[Obsolete] MSDN Subscriber Downloads: Folder Structure Generator / Crawler

Posted 2015-11-26 22:36 by Traesk. Edited 2020-08-10 12:28 by Traesk. 2257 views.

Update 2019-06-01: This script obviously does not work since a while back when MSDN was replaced with My Visual Studio.

Purpose: I have been hoarding original files from MSDN for a while. I used to recreate the folder structure from the site on my disk, like "MSDN\Operating Systems\MS-DOS\MS-DOS 6.22 (English)\en_msdos622.exe" and save the release-information from the site to a text-file. When I got my own MSDN-account and started downloading lots of files, this got unmanagable. 2-3 years ago I managed to create a script that does this for me, and thought I'd now share it if people have use of this specific code or the concept itself.
Scope/limitations: The sole purpose of this is to crawl the whole section of "MSDN Subscriber Downloads" and create a zip-file containing the structure and information of the releases. This is slow, unoptimised and with limited options, but serves it purpose. This tool would be better suited as an application, but we'll use PHP.
Method: Create a PHP-script that queries Microsoft for the desired information and saves it to a zip.

For an example of what the finished output looks like, please download this from Mega.

Microsoft's website

If we open our browser and the developer tools, we can see what is sent and received from the site. If we start by browsing to https://msdn.microsoft.com/en-us/subscriptions/downloads/ and click "Product Categories" we will see that it sends a request to https://msdn.microsoft.com/en-us/subscriptions/json/GetProductCategories?brand=MSDN&localeCode=en-us, which lists all categories on the website in JSON-format.

Example response, GetProductCategories:
[{"ExtensionData":{},"Brand":3,"Name":" New Products","ProductGroupId":65},{"ExtensionData":{},"Brand":3,"Name":"Applications","ProductGroupId":1},{"ExtensionData":{},"Brand":3,"Name":"Business Solutions","ProductGroupId":29},{"ExtensionData":{},"Brand":1,"Name":"Designer Tools","ProductGroupId":62},{"ExtensionData":{},"Brand":3,"Name":"Developer Tools","ProductGroupId":18},{"ExtensionData":{},"Brand":1,"Name":"MSDN Library","ProductGroupId":35},{"ExtensionData":{},"Brand":3,"Name":"Operating Systems","ProductGroupId":36},{"ExtensionData":{},"Brand":3,"Name":"Servers","ProductGroupId":42},{"ExtensionData":{},"Brand":3,"Name":"Tools and Resources","ProductGroupId":59}]
Further, if we click on a category (like "Operating Systems"), we will see that it requests https://msdn.microsoft.com/en-us/subscriptions/json/GetProductFamiliesForCategory?brand=MSDN&categoryId=$prodid, where $prodid is the ProductGroupId from GetProductCategories. Example: https://msdn.microsoft.com/en-us/subscriptions/json/GetProductFamiliesForCategory?brand=MSDN&categoryId=36 for Operating Systems.

Example response, GetProductFamiliesForCategory:
[{"ProductFamilyId":153,"Title":"Compute Cluster Pack","ProductGroupId":0},{"ProductFamilyId":155,"Title":"MS-DOS","ProductGroupId":0},{"ProductFamilyId":163,"Title":"Small Business Server 2003 R2","ProductGroupId":0},{"ProductFamilyId":606,"Title":"Windows 10","ProductGroupId":0},{"ProductFamilyId":625,"Title":"Windows 10, Version 1511","ProductGroupId":0},{"ProductFamilyId":147,"Title":"Windows 3.1 (16-bit)","ProductGroupId":0},{"ProductFamilyId":148,"Title":"Windows 3.11 (16-bit)","ProductGroupId":0},{"ProductFamilyId":152,"Title":"Windows 3.11 for Workgroups (16-bit)","ProductGroupId":0},{"ProductFamilyId":149,"Title":"Windows 3.2 (16-bit)","ProductGroupId":0},{"ProductFamilyId":350,"Title":"Windows 7","ProductGroupId":0},{"ProductFamilyId":481,"Title":"Windows 8","ProductGroupId":0},{"ProductFamilyId":524,"Title":"Windows 8.1","ProductGroupId":0},{"ProductFamilyId":545,"Title":"Windows 8.1 with Update","ProductGroupId":0},{"ProductFamilyId":141,"Title":"Windows Advanced Server","ProductGroupId":0},{"ProductFamilyId":156,"Title":"Windows CE .NET Platform Builder 4.1","ProductGroupId":0},{"ProductFamilyId":157,"Title":"Windows CE .NET Platform Builder 4.2","ProductGroupId":0},{"ProductFamilyId":160,"Title":"Windows CE DirectX Kit","ProductGroupId":0},{"ProductFamilyId":347,"Title":"Windows Essential Business Server 2008","ProductGroupId":0},{"ProductFamilyId":349,"Title":"Windows Home Server","ProductGroupId":0},{"ProductFamilyId":438,"Title":"Windows Home Server 2011","ProductGroupId":0},{"ProductFamilyId":19,"Title":"Windows Internet Explorer 7","ProductGroupId":0},{"ProductFamilyId":342,"Title":"Windows Internet Explorer 8","ProductGroupId":0},{"ProductFamilyId":138,"Title":"Windows Server 2003","ProductGroupId":0},{"ProductFamilyId":137,"Title":"Windows Server 2003 R2","ProductGroupId":0},{"ProductFamilyId":164,"Title":"Windows Server 2008","ProductGroupId":0},{"ProductFamilyId":351,"Title":"Windows Server 2008 R2","ProductGroupId":0},{"ProductFamilyId":483,"Title":"Windows Server 2012","ProductGroupId":0},{"ProductFamilyId":488,"Title":"Windows Server 2012 Essentials","ProductGroupId":0},{"ProductFamilyId":522,"Title":"Windows Server 2012 R2","ProductGroupId":0},{"ProductFamilyId":523,"Title":"Windows Server 2012 R2 Essentials","ProductGroupId":0},{"ProductFamilyId":548,"Title":"Windows Server 2012 R2 Essentials with Update","ProductGroupId":0},{"ProductFamilyId":546,"Title":"Windows Server 2012 R2 with Update","ProductGroupId":0},{"ProductFamilyId":571,"Title":"Windows Server Technical Preview","ProductGroupId":0},{"ProductFamilyId":142,"Title":"Windows Services for UNIX 1.0","ProductGroupId":0},{"ProductFamilyId":143,"Title":"Windows Services for UNIX 2.0","ProductGroupId":0},{"ProductFamilyId":144,"Title":"Windows Services for UNIX 3.0","ProductGroupId":0},{"ProductFamilyId":145,"Title":"Windows Services for UNIX 3.5","ProductGroupId":0},{"ProductFamilyId":341,"Title":"Windows Small Business Server 2008","ProductGroupId":0},{"ProductFamilyId":426,"Title":"Windows Small Business Server 2011","ProductGroupId":0},{"ProductFamilyId":368,"Title":"Windows Storage Server 2008","ProductGroupId":0},{"ProductFamilyId":369,"Title":"Windows Storage Server 2008 R2","ProductGroupId":0},{"ProductFamilyId":439,"Title":"Windows Thin PC","ProductGroupId":0},{"ProductFamilyId":146,"Title":"Windows Vista","ProductGroupId":0},{"ProductFamilyId":140,"Title":"Windows XP","ProductGroupId":0}]
If we click on a product (like "MS-DOS"), it will request https://msdn.microsoft.com/en-us/subscriptions/json/GetFileSearchResult with request-headers (in JSON-format) containing information of what we request, like ProductFamily and Languages. The response is a list of releases. This response actually contains all information about the releases, except, for some reason, what subscriptions have access to this release.

Example request-header, GetFileSearchResult:
{"Languages":"en","Architectures":"","ProductFamilyIds":"","FileExtensions":"","MyProducts":false,"ProductFamilyId":155,"SearchTerm":"","Brand":"MSDN","PageIndex":0,"PageSize":10,"FileId":0}
Example response, GetFileSearchResult:
{"Files":[{"FileId":2736,"DownloadProvider":4,"NotAuthorizedReasonId":null,"FileName":"en_msdos60.exe","Description":"MS-DOS 6.0 (English)","Notes":null,"Sha1Hash":"877b0b8e391ed07cb83214cb09e8f3b10c4b206f","ProductFamilyId":155,"PostedDate":"\/Date(971411760000)\/","LanguageCodes":["en"],"Languages":["English"],"Size":"5 MB","IsAuthorization":false,"BenefitLevels":null,"IsProductKeyRequired":true},{"FileId":2737,"DownloadProvider":4,"NotAuthorizedReasonId":null,"FileName":"en_msdos622.exe","Description":"MS-DOS 6.22 (English)","Notes":null,"Sha1Hash":"d01aa47a5d85908185f8987e972afc66dc92a735","ProductFamilyId":155,"PostedDate":"\/Date(971411760000)\/","LanguageCodes":["en"],"Languages":["English"],"Size":"11 MB","IsAuthorization":false,"BenefitLevels":null,"IsProductKeyRequired":true}],"LanguageContext":[{"DisplayName":"English","Value":"en","IsApplied":true},{"DisplayName":"Arabic","Value":"ar","IsApplied":false},{"DisplayName":"Chinese - Simplified","Value":"cn","IsApplied":false},{"DisplayName":"Danish","Value":"da","IsApplied":false},{"DisplayName":"Dutch","Value":"nl","IsApplied":false},{"DisplayName":"Finnish","Value":"fi","IsApplied":false},{"DisplayName":"French","Value":"fr","IsApplied":false},{"DisplayName":"German","Value":"de","IsApplied":false},{"DisplayName":"Hebrew","Value":"he","IsApplied":false},{"DisplayName":"Italian","Value":"it","IsApplied":false},{"DisplayName":"Japanese","Value":"ja","IsApplied":false},{"DisplayName":"Korean","Value":"ko","IsApplied":false},{"DisplayName":"Norwegian","Value":"no","IsApplied":false},{"DisplayName":"Portuguese-Brazil","Value":"pt","IsApplied":false},{"DisplayName":"Russian","Value":"ru","IsApplied":false},{"DisplayName":"Spanish","Value":"es","IsApplied":false},{"DisplayName":"Swedish","Value":"sv","IsApplied":false}],"ArchitectureContext":[{"DisplayName":"32-bit","Value":"x86","IsApplied":false}],"ProductFamilyContext":[{"DisplayName":"MS-DOS","Value":"155","IsApplied":false}],"FileExtensionContext":[{"DisplayName":".exe","Value":".exe","IsApplied":false}],"TotalResults":2,"MyProductsContext":false}
To get the final information, click on "Details". It will query https://msdn.microsoft.com/en-us/subscriptions/json/GetFileDetail, and again sends the information about what we're looking for in the header. This also contains all information about the release from GetFileSearchResult.

Example request-header, GetFileDetail:
{"fileId":"2736","brand":"MSDN"}
Example response, GetFileDetail:
{"FileId":2736,"DownloadProvider":4,"NotAuthorizedReasonId":null,"FileName":"en_msdos60.exe","Description":"MS-DOS 6.0 (English)","Notes":null,"Sha1Hash":"877b0b8e391ed07cb83214cb09e8f3b10c4b206f","ProductFamilyId":155,"PostedDate":"\/Date(971411760000)\/","LanguageCodes":["en"],"Languages":["English"],"Size":"5 MB","IsAuthorization":false,"BenefitLevels":["\tVS Enterprise with MSDN (Retail)","\tVS Enterprise with MSDN (VL)","DreamSpark Premium","MCT Developer Software \u0026 Services","MCT Software \u0026 Services","MSDN OS (Retail)","MSDN OS (VL)","MSDN Platforms","MSDN Platforms","VS Enterprise with MSDN (BizSpark Administrator)","VS Enterprise with MSDN (BizSpark Member)","VS Enterprise with MSDN (MPN)","VS Enterprise with MSDN (MPN)","VS Enterprise with MSDN (MPN)","VS Enterprise with MSDN (MPN)","VS Enterprise with MSDN (NFR FTE)","VS Enterprise with MSDN (Retail)","VS Enterprise with MSDN (Retail)","VS Enterprise with MSDN (VL)","VS Enterprise with MSDN (VL)","VS Enterprise with MSDN (VL)","VS Enterprise with MSDN (VL)","VS Enterprise with MSDN (VL)","VS Pro with MSDN (Retail)","VS Pro with MSDN (VL)","VS Pro with MSDN (VL) MSD","VS Test Pro with MSDN (Retail)","VS Test Pro with MSDN (VL)","VS Test Pro with MSDN (VL)"],"IsProductKeyRequired":true}
So this is what we have to work with.

Creating the script

The full PHP-script is posted further below, but here I will explain more in general how it works.
Again, this is not very optimised and is created for one sole purpose. There is a lot of potential of what we can do here, like searching or attaching it to a database to cache results, but that is not covered here.
These are the major steps in the script:
* Create a zip
* Query GetProductCategories to get a list of Categories, like "Operating Systems" - Once
* Loop through the categories using GetProductFamiliesForCategory to get a list of available products in each category, like "MS-DOS" - Once for every category
* Query GetFileSearchResult to get a list of available languages in the category, used in the next step as we have to specify which languages we search for. - Once for every product
* Loop through the products using GetFileSearchResult to get a list of releases, like "MS-DOS 6.22 (English)" - Once for every product
GetFileSearchResult is a bit special. You can't get a single list of all releases, you get a page of no more than 100 releases. So we'd also have to make sure to loop through every page.
* Loop through the releases using GetFileDetail to get the information about which subscriptions have access to this release (BenefitLevels) - Once for every release
If we don't want to use the information in BenefitLevels, we wouldn't need to loop through every single file. (My script would have to be modified for this)
* Format the results from GetFileDetail to add the folder itself, a nice looking nfo-file, a sha1-file, and the raw JSON-data to the zip.

Issues

* The MSDN-site itself is very slow. To query information about all tens of thousand releases takes a while. I attempted to make a multithreaded version, to be able to make several queries at the same time, but it didn't work quite the way I had hoped.
* At some point I noticed that MSDN doesn't give the exact same list of releases every query, so some were missed. I tried to mitigate this by checking that all releases are previously unprocessed, or it retries.
* The formatting of the description and other fields are not very consistent. I had to solve this on a case-by-case basis when I saw something wrong, but there are probably some I missed. It might or might not show up badly on the website, but does so in pure text. Namely I had some issues with redundant white space, wrong character encoding by Microsoft and bugs how PHP handles the (proper) encoding.
* I had some problems that PHP was unable to write to the zip. I solved that by caching up 1000 releases before writing to zip (by closing and reopening), instead of doing it every release.
* The filepath sometimes becomes too long for Windows to handle. NTFS can handle more than 255 character filepaths, but Windows Explorer can not. A workaround is to use a program that can handle such files, like Total Commander. Last time I checked WinRAR wasn't able to unpack these kinds of files, but 7-zip was if you click Extract rather than drag-n-drop.
* There are so many files on MSDN that I can't possibly verify that my script catches them all and that they all look perfect. I've fixed the problems I've encountered, but find some new ones now and then (usually due to bad formatting by Microsoft).
* Some older releases use 12 character timestamp instead of 13. If so, add a 0 to the beginning of the timestamp.
* Different timezones. Example: #6410 was released at UNIX time 1086134100, which translates to June 1st 2004, 23:55. The website for me (GMT +1) shows it as released on the 2nd, but if I use proxy to the UK (GMT) it shows as released on the 1st. If the timestamp was in GMT -x, it would convert to 2nd for UK as well. Hence it must be GMT. That would also make more sense as Microsoft would have posted it in the afternoon local time (GMT -8) and not at midnight. In the script I use gmdate to calculate the date, which forces it to return the result in GMT-time.
* As I use plain text as output, any links in the release's description are missing are from the nfo-file.

Stats

Calculated on my server with 100Mbit/s
* ~26 000 releases on MSDN
* Zip with all information is about 60MiB in size
* It takes ~7½ hour to run through the script
* ~7 hours are spent networking with Curl
* 1½ hour is spent on just establishing the connections to Microsoft (included in the ~7 hours)
* ~25 minutes are spent doing other stuff than networking
* It takes about a second for every release (~26000 releases in ~27000 seconds)

PHP Script

index.php
<?php header('Content-Type: text/html; charset=utf-8');?>
<!DOCTYPE HTML>
<html>
<head>
	<meta charset="utf-8">
	<title>MSDN Structure Creator</title>
</head>
<body>
<pre>
<?php

//Starting timer, setting script limit, memory limit and error_reporting
$starttime = microtime(true);
set_time_limit(172800);
ini_set('memory_limit', '-1');
error_reporting(E_ALL);
//Loads the system class
require_once "system.php";
$system = new SYSTEM();
//Create ZipArchive if not exist. If exist: die.
if (file_exists("MSDN.ZIP")){
	die("Dying: MSDN.ZIP Exists");
}
$zip = new ZipArchive();
$zip->open('MSDN.zip', ZipArchive::CREATE);
$zip->addEmptyDir("SCRIPT IN PROGRESS");

//Set URLs from where to get the data (2/4)
$url = "https://msdn.microsoft.com/en-us/subscriptions/json/GetFileSearchResult";
$urelf = "https://msdn.microsoft.com/en-us/subscriptions/json/GetFileDetail";

//Timer, for stats
$curl_times = [
"get_langlist" => 0,
"get_fileinfo" => 0,
"get_filedetail" => 0,
"get_categories" => 0,
"get_prodfamilies" => 0,
"total_time" => 0,
"connect_time" => 0
];

//Get a list of categories and loop through them to get id, name and category for all available products
$productinfo = $system->loopcatpro();

//Just some variables
//Max 100 allowed
$pagesize = 100;
#$fileid = 0;
$totalresults = NULL;
$processed = NULL;
$json['Files'] = array();

//Looping through all products and put all fileinfo in an array for later use
foreach ($productinfo as $product => $j){
	$totpages = NULL;
	$pageindex = 0;
	//Basic query to get available languages
	$jsonlang = $system->curlie($url, FALSE, '{"Languages":"","Architectures":"","ProductFamilyIds":"","FileExtensions":"","MyProducts":false,"ProductFamilyId":'.$product.',"SearchTerm":"","Brand":"MSDN","PageIndex":0,"PageSize":1,"FileId":0}', "get_langlist");
	//Format all available languages
	$searchlang = NULL;
	$langsnum = count($jsonlang['LanguageContext'])-1;
	foreach ($jsonlang['LanguageContext'] as $i => $j){
		if ($i != $langsnum){
			$searchlang = $searchlang.$jsonlang['LanguageContext'][$i]['Value'].",";
		}
		else{
			$searchlang = $searchlang.$jsonlang['LanguageContext'][$i]['Value'];
		}
	}
	//Makes sure to get the info from all resulting pages
	while (($pageindex+1 <= $totpages) || ($totpages == NULL)){
		$getjson = $system->curlie($url, FALSE, '{"Languages":"'.$searchlang.'","Architectures":"","ProductFamilyIds":"","FileExtensions":"","MyProducts":false,"ProductFamilyId":'.$product.',"SearchTerm":"","Brand":"MSDN","PageIndex":'.$pageindex.',"PageSize":'.$pagesize.',"FileId":0}', "get_fileinfo");
		$totpages = ceil($getjson['TotalResults'] / $pagesize);
		if ($totpages <= 0){
			$totpages = 1;
		}
		//If all results are unique, increase page index next loop
		if(count(array_unique($getjson['Files'], SORT_REGULAR)) === count($getjson['Files'])){
			$pageindex = $pageindex+1;
			//Merging array for use some lines below
			$json['Files'] = array_merge($json['Files'], $getjson['Files']);
		}
		//Else, try it again
		else{
			trigger_error ("Found " . array_unique($getjson['Files'], SORT_REGULAR) . "uniques of " . count($getjson['Files']) . "");
			continue;
		}
	}
	//File-counter
	$totalresults = $totalresults+$getjson['TotalResults'];
}

//Check for duplicate releases, if found: Die
if(count(array_unique($json['Files'], SORT_REGULAR)) < count($json['Files']))
{
	print_r($json['Files']);
	die();
}

//Loop through the results (releases), file after file
foreach ($json['Files'] as $i => $j){
	//Counter
	$processed = $processed+1;
	//Unset variables from previous loop
	unset($id, $jsonf, $filename, $title, $notes, $hash, $size, $ext, $lang1, $lang2, $langnum, $date, $benefit, $wr, $product, $jsoncurl, $curl, $zipfiles);
	//Set the ID...
	$id = $json['Files'][$i]['FileId'];
	//Initiate a new Curl-session to fetch details about the file
	$jsoncurl = $system->curlie($urelf, TRUE, '{"fileId":"'.$id.'","brand":"MSDN"}', "get_filedetail");
	//Array decoded from json
	$jsonf = $jsoncurl['json'];
	//Raw json
	$curl = $jsoncurl['curl'];
	//Set $filename, "0xC2 0xA0"-whitespace is converted to regular 0x32-whitespace
	$filename = preg_replace('%\xc2\xa0%', ' ', trim($jsonf['FileName']));
	//Trim whitespace, remove multiple whitespaces and set as $title
	$title = preg_replace('/\\s+/', ' ', trim($jsonf['Description']));
	if (!empty ($jsonf['Notes'])){
		//Replace all linebreaks in html-code with whitespace, replace <br>, <ul>/</ul> and </li> with newline, decode htmlentities back to html and strip html
		$notes = strip_tags(html_entity_decode(preg_replace("%<br>|<br />|<br/>|</li>|</p>|<p.*?>|<ul.*?>|</ul>%", "\r\n", preg_replace("%\n|\r\n|\r\r\n%", ' ', $jsonf['Notes'])), ENT_QUOTES, 'UTF-8'));
		//Replaces the following to a single " "-whitespace(0x32): multiple 0x32:s, "0xC2 0xA0" (result of &nbsp; -> UTF-8)
		//Replaces the following to a single newline (\r\n): multiple newlines, 0x32 followed by \r\n, \r\n followed by 0x32
		//Trims the string to remove whitespace in the start and end of the string. Adds linebreaks before and after the string
		$pattern = array('%\xc2\xa0%', '%[^\S\r\n]+%', '%[^\S\r\n]\r\n%', '%\r\n[^\S\r\n]%', '%[\r\n]+%');
		$replacement = array(' ', ' ',  "\r\n", "\r\n", "\r\n");
		$notes = "\r\n" . preg_replace($pattern, $replacement, trim($notes)) . "\r\n";
	}
	else{
		$notes = NULL;
	}
	//More variables
	$hash = $jsonf['Sha1Hash'];
	$size = $jsonf['Size'];
	$product = $jsonf['ProductFamilyId'];
	//Fileextension is last 3 characters of filename, in capitals
	$ext = strtoupper(substr($filename, -3));
	//$date = gmdate("m/d/Y", substr(trim($jsonf['PostedDate'], "/Date(".")"), 0, 10));
	//Strip non-numbers from Release Date
	$date = trim($jsonf['PostedDate'], "/Date(".")");
	//Some older releases (like #2063) use 12 char timestamp instead of 13. If so, add a 0 to the beginning. Ex: 952053900000 -> 0952053900000
	if (strlen($date) < 13){
		$date = "0" . $date;
	}
	//Convert from microtime to standard time and format it. gmdate = Force GMT
	$date = gmdate("m/d/Y", substr($date, 0, 10));
	//lang1 = what you see on the site before clicking "Details". More than one language = "Multiple Languages", otherwise just the single language. Ex: "Multiple Languages", "English"
	//lang2 = all included languages. Ex: "English", "English, Russian, Swedish"
	//Count number of languages used to set/format the languages-variables. Several or no languages -> "Multiple Languages"
	$langnum = count($jsonf['Languages'])-1;
	if ($langnum != 0){
		$lang1 = "Multiple Languages";
	}
	//Format language-output
	if (!empty($jsonf['Languages'])){
		$lang2 = NULL;
		//Loop through languages
		foreach ($jsonf['Languages'] as $i => $j){
			//If not at the last language, add a comma and whitespace to the end
			if ($i != $langnum){
				$lang2 = $lang2."$j, ";
			}
			//If there is only one language, set both lang1 and lang2 to that language
			else if ($i == 0 && $langnum == 0){
				$lang1 = $lang2 = $j;
			}
			//If more than one language and we're at the last one, just print the language without adding anything
			else {
				$lang2 = $lang2.$j;
			}
		}
		$lang2 = "\r\nLanguages: " . $lang2;
	}
	else{
		$lang2 = NULL;
	}

	//Loop through BenefitLevels to get $benefit. Needs trimming due to \t in #6215
	$benefit = NULL;
	foreach ($jsonf['BenefitLevels'] as $i => $j){
		$benefit = $benefit.trim($j)."\r\n";
	}
	
	//Every 1000 time, close and reopen the ZIP
	if (is_int($processed/1000)){
		if ($zip->close() != true) {
			trigger_error("\nUnable to close ZIP\n");
		}
		if ($zip->open('MSDN.zip') != true) {
			trigger_error("\nUnable to open ZIP\n");
		}
	}
	
	//Output - NFO w/ UTF-8 BOM
	$zipfiles['content']['nfo'] = "\xEF\xBB\xBFhttp://msdn.microsoft.com/en-us/subscriptions/downloads/#FileId=$id

$title
$ext | $lang1 | Release Date: $date | $size
$notes
File Name: $filename$lang2
SHA1: $hash

Available to these Subscription Levels:
$benefit";

	//Output - SHA1. Format is: "$hash *$relative_path_to_file"
	$zipfiles['content']['sha1'] = "$hash *$filename";
	//Prepare filepath and remove slashes in $title (is this not done in system.php already?)
	//Removed iconv("UTF-8", "ISO-8859-1//TRANSLIT", $title) due to errors with #15069
	$search = array("/", "..." );
	$replace = array("-", NULL);
	$zipfiles['path'] = "MSDN/" . $productinfo[$product]['category'] . "/" . $productinfo[$product]['title'] . "/" . str_replace($search, $replace, $title) . "/";

	//Removing colons in the filepath, sets variables for tozip-function
	$zipfiles['path'] = str_replace(":", " -", $zipfiles['path']);
	$zipfiles['path'] = str_replace("®", "(R)", $zipfiles['path']);
	$zipfiles['filename'] = $filename;
	$zipfiles['content']['json'] = $curl;
	//Send array to function to output files inside zip-archive
	$system->tozip($zipfiles);
}

//Delete temp-file from ZIP.
$zip->open('MSDN.zip');
$zip->deleteName("SCRIPT IN PROGRESS/");
$zip->close();

//Debugging
$endtime = microtime(true);
$loadtime = round($endtime-$starttime, 2);
$other = $loadtime-$curl_times['total_time'];
trigger_error("Processed $processed files of $totalresults results in $loadtime seconds.\nA total of " .$curl_times['total_time']. " was spent networking,\nof which " .$curl_times['connect_time']. " seconds was spent establishing connections to Microsoft.\n$other seconds was spend doing other stuff.");
trigger_error(print_r($curl_times, true));

echo "</pre>
</body>
</html>"

?>
system.php
<?php

class system {
	//Curl
	function curlie($url, $retcurl, $postfields, $requestor){
		//Get timer
		global $curl_times;
		//Initiate Curl and set options
		$ch = curl_init($url);
		//Enable fetch-to-variable
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		//Connect timeout to 60 seconds
		curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 60);
		//Script timeout
		curl_setopt($ch, CURLOPT_TIMEOUT, 0);
		//Disable SSL-verification. Required
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
		//Set UTF-8
		curl_setopt($ch, CURLOPT_ENCODING, "UTF-8" );
		if (isset($postfields)){
			//Enable Post
			curl_setopt($ch, CURLOPT_POST, 1);		
			//Searchstring/Postcontent
			curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
		}
		else{
			$postfields = NULL;
		}
		//Request content as json/UTF-8
		curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json; charset=UTF-8'));

		//Will keep looping until Curl is successful with HTTP_CODE 200
		while(1) {
			//Executing curl
			if (($curl = curl_exec($ch)) == true){
				//Getting HTTP_CODE
				$httpcode = curl_getinfo($ch)["http_code"];
				if ($httpcode == 200){
					break;
				}
				else{
					trigger_error("HTTP_CODE: $httpcode <br />" . print_r(curl_getinfo($ch), true) . $postfields);
				}
			}
			//Print message if Curl failed
			else {
				trigger_error("Curl failed. ".curl_error($ch)." URL: $url Retcurl: $retcurl Postfields: $postfields <br />" . print_r(curl_getinfo($ch), true) . $postfields);
			}
		}

		//Add total_time and connect_time to the total
		unset($total_time);
		$total_time = curl_getinfo($ch)["total_time"];
		$curl_times["total_time"]=$curl_times["total_time"]+$total_time;
		$curl_times["connect_time"]=$curl_times["connect_time"]+curl_getinfo($ch)["connect_time"];
		$curl_times[$requestor]=$curl_times[$requestor]+$total_time;

		$json = json_decode($curl, true);

		//Debug
		$file = 'curl.log';
		file_put_contents($file, print_r(curl_getinfo($ch), true) . print_r($postfields, true) . "\n" . "\n", FILE_APPEND);
		
		//Close Curl-session
		curl_close($ch);
		//Returns $json if $retcurl isn't true, else return both $json and $curl as array
		if ($retcurl == false){
			return $json;
		}
		else if ($retcurl == true){
			return $return['json'] = array("json" => $json, "curl" => $curl);
		}
		
		
	}

	//Get a list of categories and loop through them to get id, name and category for all available products
	function loopcatpro(){
		//Get categories
		$urlcat = "https://msdn.microsoft.com/en-us/subscriptions/json/GetProductCategories?brand=MSDN&localeCode=en-us";
		$jsoncat = $this->curlie($urlcat, FALSE, NULL, "get_categories");
		//Loop through all categories and put all products in an array with ID, name and category
		foreach ($jsoncat as $i => $j){
			//Does not make any difference, but skipping New Products as it's unnecessary. Trim because it's called " New Products", but that might change
			if (trim($jsoncat[$i]['Name']) == "New Products"){
				continue;
			}
			//Variables for request-URL
			$prodid = $jsoncat[$i]['ProductGroupId'];
			//Get contents of category
			$purl = "https://msdn.microsoft.com/en-us/subscriptions/json/GetProductFamiliesForCategory?brand=MSDN&categoryId=$prodid";
			$jsonprod = $this->curlie($purl, FALSE, NULL, "get_prodfamilies");
			//Loop through all products in current category
			foreach ($jsonprod as $k => $l){
					$productid = $jsonprod[$k]['ProductFamilyId'];
					//Convert from UTF-8 to ISO-8859-1 for filepath, and replace slashes with dashes
					$productinfo[$productid]['title'] = iconv("UTF-8", "ISO-8859-1//TRANSLIT", str_replace("/", "-", trim($jsonprod[$k]['Title'])));
					$productinfo[$productid]['category'] = iconv("UTF-8", "ISO-8859-1//TRANSLIT", str_replace("/", "-", trim($jsoncat[$i]['Name'])));
			}
		}
		return $productinfo;
	}
	
	//Add to zip-file
	function tozip($zipfiles){
		//Enable Zip-functions
		global $zip;
		//File path and filename
		$path = $zipfiles['path'];
		$filename = $zipfiles['filename'];
		//Write content to Zip
		foreach ($zipfiles['content'] as $i => $j){
			$zip->addFromString("$path$filename.$i", $j);
		}
	}
	
}
?>

Comments
No comments yet.