What I Learnt Last Week

Tuesday, January 17, 2006

SMH in single page

A little while ago smh.com.au (Sydney Morning Herald) changed their web format (as did Fairfax's other newspaper, theage.com.au. Now some articles are spread over multiple pages in a cynical move to increase their advertising revenue. Even pressing the 'print' button doesn't display the whole article - grr.

What I Did
It was time to learn just how Greasemonkey works and how I can fix smh to display the whole article on a single page.

Mark Pilgrim has made an excellent site at diveintogreasemonkey.org where I learnt everything I needed to know. I originally thought that I would have to parse a page, look for the link to the next page and use the GM_xmlhttprequest() method to fetch the page and then insert its contents into page 1. When I looked at the html of the page, I noticed that smh actually loads all pages in the first page, but makes them invisible by setting style.display="none" on the containing div element. This is done so that print will work. Print uses a different style sheet that sets style.display="inline". I thought for a second about just writing a new style sheet. I'm sure the Web Developer extension or many others would make it simple to switch a style sheet, but I want to write a greasemonkey script darn it!

The script is really simple. I created a file called fairfaxfix.user.js (all greasemonkey scripts have to have the .user.js extension). I then open the script with Firefox and select Tools > Install This User Script.

// ==UserScript==
// @name Fairfax Fix
// @namespace joelhockey.com
// @description Convert all Fairfax (Sydney Morning Herald, and The Age) articles to be single page and remove some ads
// @include *smh.com.au/*
// @include *theage.com.au/*
// ==/UserScript==

// Al pages exist, they are inside divs with style.display="none".
// This makes it really easy to make everything visible.

pages = 5
if (typeof totalpagespagination != "undefined") {
pages = totalpagespagination
}

for (i = 2; i <= pages; i++) {
e = document.getElementById('contentSwap' + i)
if(e) {
e.style.display = "inline"
}
}

// while we're doing stuff, get rid of some annoying ads
ads = ['adSpotIsland', 'AdPlaceholder-popunder']
for (i in ads) {
e = document.getElementById(ads[i])
if(e) {
e.parentNode.removeChild(e)
}
}


// get rid of pagecount as well, just to use some XPath
es = document.evaluate('//*[@class="pagecount"]',
document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null)

for (var i = 0; i < es.snapshotLength; i++) {
e = es.snapshotItem(i);
e.parentNode.removeChild(e)
}


What I Learnt
Greasemonkey. I guess I've now had an introduction to greasemonkey, and I like it. I'm sure I'll be using it to scratch any web UI itches I get in the future. I have added my SMH single page script to the main greasemonkey script repository at userscripts.org.

XPath. I've seen a bit of XPath, but I've never actually written any before. It was simple for the simple thing that I wanted to do which is a good sign. I should say that I'm not the world's greatest fan of XML. I works for hierarchical data that has to contain unicode, but I see it used in too many places where there are simpler solutions.

One More Thing: I want to get rid of the annoying pop-up that smh puts up on the first page you visit every time you start a new session. I haven't taken much time to look at the code, but if anyone knows how I could change to script to get rid of that, please let me know. I thought Firefox is meant to block pop-ups - how does smh get around this?

Update: I tracked down an iframe with id AdPlaceholder-popunder that loads a javascript page which is doing the pop-up. It is simple to remove this element. I also checked Firefox 1.5 and it is detecting the pop-up and stopping it. My old version isn't quite as smart. I guess I'll have to check if all the extensions I use work OK with version 1.5 and upgrade.

0 Comments:

Post a Comment

<< Home