Oct 29, 2006

Some interesting tweaks for Firefox

For when I get around to installing it:

(This is from the Slashdot article Firefox 2 Downloads Top 2 million in 24 hours.)

Annoyances

(Score:5, Informative)
by teslatug (543527) on Saturday October 28, @10:57AM (#16622640)
Here are some of the settings that I've gathered so far to get Firefox 2.0 to my liking:

In about:config
* browser.tabs.closeButtons to 3 for one close tab button
* browser.tabs.selectOwnerOnClose to false for successive reading and closing
* browser.tabs.tabminwidth to 20 for displaying tab scrolling in extreme cases only
* browser.urlbar.hideGoButton no use for the Go button
* dom.disable_window to true, fix various window annoyances
* network.prefetch-next to false for not wasting my bandwidth

In userChrome.css for disabling the List all tabs which annoys me when using the close button:
/* Disable Container box for "List all Tabs" Button */
.tabs-alltabs-stack {
display: none !important;
}

Feel free to add your own to the thread.

Oct 27, 2006

LaTeX document fidelity

Here's something I've been meaning to talk about. Came across this comment in an MSDN blog yesterday. The blogger, Rick Schaut, is a software design engineer for Microsoft Word on the Mac. He makes the following claim:
To relate this back to the equation editing problem, the problem with any TeX-based way is that you won't get identical layout from one platform to another. TeX (whichever flavor you're talking about) is designed to maximize the quality of the output for each given platform, but that sacrifices some aspects of layout compatibility.

Suppose, for example, you have two people working on a paper. One uses a Windows computer, the other uses a Macintosh. Should that paper include a rather lengthy equation, there's a good chance that the equation might fit on one line when the document is opened on the Mac yet not fit on one line when the document is opened on the Windows computer.
Here, Mr Schaut, you are happily wrong :-) TeX and LaTeX cleverly outsource the work of device- and operating system-independent document rendering; they merely specify the document's internal structure. Once a TeX processing system has turned the document into a PostScript or PDF file, you can merrily distribute it anywhere you want with full assurance that the rendering will be inviolate, down to the last full stop on the last line of each paragraph.

And of course, unless I'm missing something here, any plain LaTeX source files will compile to the exact same PDF file whether it's on Windows or the Mac; that's a given, because the processing that TeX applies to the source file is the same regardless of operating system.

Of course, there is the entirely separate issue of distributing LaTeX source files or DVI files of your documents; but given all the potential incompatibilities you can face there, why even bother to distribute anything other than PDFs? LaTeX + PDF is the only sane way to go.

Oct 25, 2006

Temptation, thy name is Internet Explorer 7

OK, I know I said I'd upgrade to IE7 later, but I seriously couldn't resist the temptation -- especially since I have nothing to lose by upgrading now; IE is not my default browser and I'm not using anything that might not be ready for the upgrade. So how am I finding it?

Internet Explorer 7 feels much better than 6. Tab support is awesome, and rather faster than it was before, although still not as fast as Firefox's. And the keyboard shortcuts for switching among tabs should definitely be simpler than Ctrl-Tab and Ctrl-Shift-Tab -- they should really be Ctrl-PgDown and Ctrl-PgUp like they are in Firefox. That said, I've just discovered the Quick Tabs feature (shortcut: Ctrl-Q), which may just be the best tab-related feature I've seen yet in a browser. Press the key and IE shows a tab containing thumbnail previews of each of your tabs, and you can go to the tab you want with a single click. This is amazingly fast, and really cool.

Everything else they've done, including getting rid of the menu bar and putting the the address bar right below the title bar along with the navigation buttons, I applaud because it just increases the screen viewing space in the browser. Who uses the menus in their browser all that much, anyway? And if you need it, you can just right-click on any toolbar and turn it on. Or just press the Alt key to turn it on temporarily.

IE still doesn't have extension manager as convenient as the one in Firefox, but that's OK, I've invested a lot in Firefox, including future plans with Greasemonkey; and I wouldn't really switch back to IE even if it had awesome extensions. (Well, maybe if they had a Greasemonkey-compatible extension....)

So to conclude? IE7 feels like a hit. Definitely go for it. But as for Firefox 2? I've dowloaded the installer, but I'll still wait out the couple of weeks to get it through Mozilla's updating channels once they've made sure all the best extensions are compatible with it.

Eid mubarak

Eid day yesterday was great. I think Bangalis really know how to turn in a good Eid party, if they're in the mood. Some friends got hold of a Bangladeshi biryani chef, who cooked us a big pot of awesome chicken biryani. We got together at their place and had a good lunch, and a nice break from the pressure of the upcoming exams. I know there's a risk of all these breaks from the pressure turning into a permanent break from studies -- but hopefully we'll keep things under control here.

Firefox 2 & Internet Explorer 7

Firefox 2 is coming out in a few hours, but I'm really very satisfied with 1.5. I read though that 1.5 will soon have a minor update that paves the way for an automatic update to version 2, in a couple of weeks. I'm taking that upgrade strategy, because I'm a little worried that my current extensions, especially the very useful SessionSaver extension, will need some time to adapt properly to FF2. So hopefully in a couple of weeks that will be the case. But I'll probably wait a wee bit more even then while checking up on their status, i.e. what people have to say about them.


Not to sound nostalgic, but it seems like just yesterday that I downloaded the all-new Internet Explorer 5 over my slow dial-up line in Sharjah and tried it out, after reading a glowing review full of lavish screenshots in the Emirates' Windows User Magazine. And it was an awesome browser, the best of the best after using Netscape's weird-looking offering. But yeah, anyway ... IE7 is out now, and I know I'm going to upgrade. But similarly to Firefox, I'll wait till Microsoft pushes IE7 out to us through its Windows Update facility -- it gives everyone some breathing room and just feels right.

I won't really be using IE7 -- not with FF2 probably installed by then -- but obviously, it will be necessary to have it because of the bugfixes and interesting new features to try out.

It strikes me now that in about a couple of weeks, I'll be flying back to Dhaka, where my laptop will be totally cut off from internet access -- our desktop PC in Dhaka is the only machine which will be connected (hopefully, anyway). So it might be more than a month before any software gets upated on the laptop. That's fine by me, I guess. I'll still be able to try out FF2 on the home desktop. Hm, might download it now and take it to Dhaka to install, to save the download hassle when I get there.

Oct 20, 2006

Bayesian multi-category classification in Gmail?

What this would do is something like what POPFile does for email clients like Thunderbird or Outlook or whatever: automatically categorise incoming emails based on keywords they contain. POPFile is trainable and it's supposed to reach a pretty high accuracy after a couple of weeks. POPFile doesn't actually put your emails into different folders in your email program; it just marks them as belonging to one category or other (say `Work', `Family', `Junk', `Stamp collecting', etc.). Then you set up your email program so that it puts these emails into different folders, or deletes them, or forwards them, or whatever.

What POPFile does

OK first of all, you're asking why would you want some software like POPFile to classify emails for you when you can just set up filters in your email program to do that based on who they're from, what the subject is, and so on? The reason is your email client's filters are static: they do not learn about new family members who are sending you email, nor about new correspondents from your workplace, nor about new junk mailers you have to deal with all the time. In fact, programs like Thunderbird already have Bayesian filtering to deal with this problem of continually-changing junk mails -- it's just that POPFile goes one step further, to try to identify your mails as belonging to arbitrary categories that you set up.

The idea is, once you've set up POPFile to recognise email from your family, from your work, and from your stamp collecting buddies, it will correctly identify these different types of emails, say, about 99.99% of the time. The rest of the time, which is presumably a piffling amount of time, you'll be telling POPFile something like `no, this isn't junk, it's just my little brother, mark it as ``Family'' '. And POPFile will continue to learn, using the Bayesian statistical analysis.

So the end result is, you can set up your email programs to put email marked `Work' in the right folder, and so on, without having to worry about updating your filters all the time.

Now here's my question: why should email program users get these benefits exclusively? Why can't we have something like this for webmail users? Specifically, Gmail users (like me, and it seems half the world nowadays)? Maybe we can. It boils down to three things: Gmail's JavaScript functions, POPFile's statistical categorisation methods, and Firefox's Greasemonkey extension.

What Greasemonkey does

Basically, Greasemonkey allows you to customise Web pages in Firefox in almost unlimited ways with a little JavaScript programming, using that page's Document Object Model and any JavaScript functions defined in it. Check out http://persistent.info/archives/2005/03/01/gmail-searches for an idea about just how powerful Greasemonkey is, and what it can do to Gmail.

I've tried out the above hack, and it actually does work, with a few hiccups. Furthermore, I've tried programming Greasemonkey scripts myself and I can tell you it's a really powerful way of customising websites which you love to make them even more useful. There are actually a ton of scripts people have written out there, and the best place to get them is userscripts.org. Check it out.

POPFile for Gmail?

OK, now we know we can extend Gmail's functionality in amazing ways with Greasemonkey hacks like the one above. Essentially, Greasemonkey is giving us the means to program a user interface for the new Bayesian classification classification features we want in Gmail. Greasemonkey scripts are written in JavaScript. Now I'm pretty sure the `business logic' of POPFile, which is currently written in Perl, can be ported to JavaScript without too much trouble. The end result: an interface in Gmail that tags incoming messages and quickly allows you to check for and correct mistakes, training it, and bringing the convenience of automatic Bayesian classification to Gmail. Anybody up for it?

Oct 19, 2006

The Trojan War was never this good

Read Dan Simmons' Ilium and Olympos a couple of months ago, but haven't gotten round to talking about them till now. First of all, it's true that they're actually one book published as two, probably because if they were published in one piece nobody would buy a book that fat, and sales would be half as much as they were with two books instead of one.

Second, the book is not about the gods and Trojans and Greeks of the recreated Trojan war battlefield of far-future Mars; it's really about the future of humanity and what shape it might take. Simmons draws from a lot of literary sources, primarily Shakespeare (The Tempest) but also Proust (stuff I'm not familiar with) and Vernes (i.e. his Time Machine Eloi and Morlocks ideas).

The thing is, the story starts off with the scholic Thomas Hockenberry telling of the recreated war, and it's immediately gripping, especially to a guy like me who grew up reading his sci-fi on one hand and Greek/Norse/Egyptian mythology on the other. It's gripping for all the reasons the original mythologies are gripping -- the heroes and their stories are larger than life, etc. But the Trojan War storyline intercuts with that of the humans on Earth and the Moravecs on Jupiter, which takes the wind out of it somewhat, because you have all these new characters you didn't know before that you have to deal with, and you just want to get back to reading what Achilles did next.

Achilles by the way is the most interesting character in the story and Simmons lavishes him with detailed description, enough to satisfy any geek. Achilles the man-killer, Achilles the god-killer, Achilles the fleet-footed, Achilles this, and Achilles that. For some reason I kept imagining Brad Pitt as Achilles throughout the story, and it fit, right to the end. (But Eric Bana as Hector didn't -- Hector needs a stronger jawline, and a taller, more muscular figure).

The stories do converge, but they approach convergence from different points, and there's a lot of suspense. I won't bother with a detailed analysis of the thing here, but it's definitely enjoyable. I do want to talk about some of Simmons' ideas for the future of humanity though. Humans ten thousand years in the future are a sad, childlike lot, with every need catered to by robot servants and, who don't know how to read because they don't need to, and spend most of their time partying and pursuing other pleasures. Sounds perfect, but there's no intellectual stuff, no advanced thought. Simmons has a characters in the books disparagingly refer to them as `post-literate'. Ouch.

But these Eloi do have an interesting feature: they have been genetically modified to contain a hundred cybernetic functions, like a map/locator function that projects holographic images of the person being located; body status query functions; and advanced stuff like infonet access, the infonet being a semi-conscious web of information evolved from the internet which now blankets the planet. This infonet is extremely powerful -- it contains a huge amount of data, like information about every molecule in every cell of a tree the infonet user might be looking at. It's described as being totally overwhelming. You see the information, but you don't understand most of the knowledge contained in it. Oh, and you activate these functions by visualising combinations of coloured geometric shapes in your mind's eye. At least, until you can do it without thinking.

The `old-style', Earth-human protagonists introduced have a destiny to fulfill -- to recover the ability to use these advanced functions and recover the technological knowledge lost to the human race. But that's about it. There is some stuff about recovering some ten thousand humans encoded in a tachyon beam orbiting the Earth, but that's just another problem in the myriad collection of problems and mysteries the humans are faced with.

The infonet plays a large part in the book, actually -- combined with some really wild interpretations of quantum theory and post-human technology. It's a good read, but I still think the Trojan War part of the story should have been a different story altogether -- or rather, the story of the old-style humans on Earth should have been a different story, say The Final Fax. The Trojan War parts of the books would have made a kick-ass movie -- especially Achilles' visit to the pit of Tartarus in Hades, in the presence of the original Greek gods, the Titans, imprisoned there by Zeus.

Oct 17, 2006

If only they used this instead of E-Views

(Interesting note: just found out that the Cochrane behind the famous Cochrane-Orcutt method was at Monash, http://www.buseco.monash.edu.au/depts/ebs/.).

We started doing the basics of econometrics -- things like regression and ANOVA -- in Excel last year, but moved to E-Views this semester to do more advanced stuff like time-series analysis. That's too bad, because there's a much better program we can use: R (http://www.r-project.org/). The main reason is it's free -- we can download and use it at home, so we don't have to depend on the computer labs being open and free to get our assignments done. Here's a good article that talks about why R is great: http://jackman.stanford.edu/papers/download.php?i=22.

And yes, I know R is mostly command line and teaching it at Monash would take up too much of our time, taking our focus away from the econometrics theory. But R can be customised and tailored to the Monash courses with a little effort; and it has a Tcl/Tk widget set built-in which can be used to implement graphical versions of the stuff they teach us using E-Views -- things like restricted model F tests (Wald tests), AR(1) estimation, weighted OLS estimation, things like that.

That said, I'm still learning R and it's sometimes been frustrating to try matching my results on time series data to what my textbook, Wooldridge, says I should get. Things like ARMA(p, q) estimation seem to be built in to non-obvious places like the gls function in the nlme package. But it works, for the most part. Using Excel after R -- especially R's matrix handling -- feels like going backwards now.

Sep 13, 2006

Uptime on Windows

Here's a VBScript script which runs in the Windows Scripting Host and shows you how long your computer has been running:


strComputer = "."
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")

Set colOses = objWMIService.ExecQuery("SELECT LastBootUpTime From Win32_OperatingSystem")
For Each objOs In colOses
diffMin = DateDiff("n", wmiDateStringToDate(objOs.LastBootUpTime), Now)
diffDays = Fix(diffMin / (60 * 24))
diffMin = diffMin - diffDays * 24 * 60
If diffDays >= 1 Then
uptimeStr = uptimeStr & CStr(diffDays) & "d "
End If
diffHours = Fix(diffMin / 60)
diffMin = diffMin - diffHours * 60
If diffHours >= 1 Then
uptimeStr = uptimeStr & CStr(diffHours) & "h "
End If
If diffMin >= 1 Then
uptimeStr = uptimeStr & CStr(diffMin) & "min"
End If

WScript.Echo "Uptime: " & uptimeStr
Next

Function wmiDateStringToDate(dtmDate)
wmiDateStringToDate = CDate(Mid(dtmDate, 5, 2) & "/" & Mid(dtmDate, 7, 2) & "/" & Left(dtmDate, 4) & " " & Mid (dtmDate, 9, 2) & ":" & Mid(dtmDate, 11, 2) & ":" & Mid(dtmDate, 13, 2))
End Function


Save it as a VBS file and try running it. Because it runs in the Windows Scripting Host, the uptime script can (generally) be run just by double-clicking on the file in Windows. If that doesn't work, somehow Windows' connection between the VBS file format and the WScript.exe program has been severed, and you'll have to run WScript.exe with the script name as an argument.

The script is basically my current fascination with the Windows scripting environment. There's a lot of documentation available, especially the Microsoft Windows 2000 Scripting Guide, which has been the most useful to me in understanding Windows' built-in scripting architecture.

Other ideas on what to do with this tool are kind of floating around in my head right now: using it to automatically download and tabulate exchange rates from Yahoo, then analysing the data with Excel; recording system uptime and usage statistics like how often and how long I use the computer; creating a script to quickly log in to Windows (Live?) Messenger and send a message to someone; rewriting the sparklines document in straight VBScript to run in the Windows Scripting Host environment, instead of having to open up the sparklines.doc document every time I want to create some sparklines.

All pretty cool ideas, at least from my point of view. And beyond them I might even look into accessing the Windows common controls and trying to create real graphical programs using just WSH. But that's in the far future.

May 2, 2006

Live word count script for OpenOffice.org

UPDATE 9 Sep 2010: Did something I've been meaning to do for a while and wrote up an awesome wiki intro for the Live Word Count script in its new BitBucket home: http://bitbucket.org/yawaramin/oo.o-live-word-count/wiki/Home. As a consequence, I'm removing all the duplicate installation and usage instructions from this page. Please check out the BitBucket wiki--that's where all the action is!

UPDATE 17 Mar 2010: Moved script to new home at BitBucket, in case I need to make any further changes/improvements. Small fix to make sure script works both when started from the Macro Selector dialog box and from a toolbar button. Oh yeah, to add a toolbar button to start the macro, see instructions below.

UPDATE 13 Mar 2010: Slight change to wordCount macro to handle being started from a toolbar button.

UPDATE 2 Dec 2009: Confirmed that the script works with OpenOffice.org 3.1.1 on Mac OS X 10.6 (Snow Leopard). As usual, see below for where to put the script in a Mac.

Also, I didn't realise this until now, but I've been cited in Linux Pro Magazine! Yay! :-)

UPDATE 30 May 2009: Just tried the script out again with OpenOffice.org 3.1.0 on Windows Vista; works fine. Please see the paragraph after next for the right place to put this script in Windows.

HERE'S something I worked on a long time ago but am finding very useful, a script or macro which displays a dialog box with a continuously-updating document (or selection) word count.

[...]

Feb 6, 2006

Styling Office XML Documents

This post has been due for several days now. Been doing more research into Office 2003's XML file formats. The primary port of call for all budding Office 2003 XML developers is Office 2003 XML Reference Schemas. This is where you can download the schemas -- the formal descriptions -- and the explanatory documentation on the XML document formats for Word, Excel and others. Another important link is to the page for O'Reilly's new book, Office 2003 XML. There is a download for a sample chapter, Chapter 2: The WordprocessingML Vocabulary. Obviously these are very important references for someone who is just entering the field.

When I posted my last entry, I had already created the style file that tells Word how to display the raw account listing. I just wanted to play around with it a little bit, especially to see if I could get the table formatting right. The formatting as it currently is, is OK; but I wanted to customise it a little bit.

By now I've realised that mastery of tables in WordprocessingML will take some time and (at least) a couple of good references (see links above). So I'll just go ahead with the original plan.

Before I list the actual XSL transformations file that does the magic, I want to actually show its results, to get some oohs and aahs from the audience. Here they are:



The account listing as shown by Word when Word has no way of knowing how else to show it.


alist



The account listing with an XSL transformation applied by Word. That is, when the XSL file tells Word how to display it.


alist_transformed



OK, here is the XSL style file:


<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:al="http://yawar.blogspot.com">
<xsl:template match="/">
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve">
<o:DocumentProperties>
<o:Title>Account Listing</o:Title>
<o:Author>Yawar Amin</o:Author>
</o:DocumentProperties>
<w:fonts>
<w:defaultFonts w:ascii="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"/>
</w:fonts>
<w:styles>
<w:style w:type="paragraph" w:default="on" w:styleId="Normal">
<w:name w:val="Normal"/>
<w:rPr>
<wx:font wx:val="Times New Roman"/>
<w:sz w:val="24"/>
<w:sz-cs w:val="24"/>
<w:lang w:val="EN-GB" w:fareast="EN-US" w:bidi="AR-SA"/>
</w:rPr>
</w:style>
<w:style w:type="paragraph" w:styleId="Heading1">
<w:name w:val="heading 1"/>
<wx:uiName wx:val="Heading 1"/>
<w:basedOn w:val="Normal"/>
<w:next w:val="Normal"/>
<w:rsid w:val="00B04D4D"/>
<w:pPr>
<w:pStyle w:val="Heading1"/>
<w:keepNext/>
<w:pBdr>
<w:top w:val="dotted" w:sz="4" wx:bdrwidth="10" w:space="1" w:color="auto"/>
</w:pBdr>
<w:spacing w:before="240" w:after="60"/>
<w:jc w:val="center"/>
<w:outlineLvl w:val="0"/>
</w:pPr>
<w:rPr>
<wx:font wx:val="Times New Roman"/>
<w:b/>
<w:b-cs/>
<w:kern w:val="32"/>
<w:sz w:val="48"/><w:sz-cs w:val="48"/>
</w:rPr>
</w:style>
<w:style w:type="table" w:styleId="MyTableContemporary">
<w:name w:val="My Table Contemporary"/>
<w:basedOn w:val="TableNormal"/>
<w:rPr>
<wx:font wx:val="Times New Roman"/>
</w:rPr>
<w:tblPr>
<w:tblInd w:w="0" w:type="dxa"/>
<w:tblBorders>
<w:insideH w:val="single" w:sz="18" wx:bdrwidth="45" w:space="0" w:color="FFFFFF"/>
<w:insideV w:val="single" w:sz="18" wx:bdrwidth="45" w:space="0" w:color="FFFFFF"/>
</w:tblBorders>
<w:tblCellMar>
<w:top w:w="0" w:type="dxa"/>
<w:left w:w="108" w:type="dxa"/>
<w:bottom w:w="0" w:type="dxa"/>
<w:right w:w="108" w:type="dxa"/>
</w:tblCellMar>
</w:tblPr>
<w:tblStylePr w:type="firstRow">
<w:rPr>
<w:b/>
<w:b-cs/>
<w:color w:val="auto"/>
</w:rPr>
<w:tblPr/>
<w:tcPr>
<w:tcBorders>
<w:tl2br w:val="none" w:sz="0" wx:bdrwidth="0" w:space="0" w:color="auto"/>
<w:tr2bl w:val="none" w:sz="0" wx:bdrwidth="0" w:space="0" w:color="auto"/>
</w:tcBorders>
<w:shd w:val="pct-20" w:color="000000" w:fill="FFFFFF" wx:bgcolor="F2F2F2"/>
</w:tcPr>
</w:tblStylePr>
<w:tblStylePr w:type="band1Horz">
<w:rPr>
<w:color w:val="auto"/>
</w:rPr>
<w:tblPr/>
<w:tcPr>
<w:tcBorders>
<w:tl2br w:val="none" w:sz="0" wx:bdrwidth="0" w:space="0" w:color="auto"/>
<w:tr2bl w:val="none" w:sz="0" wx:bdrwidth="0" w:space="0" w:color="auto"/>
</w:tcBorders>
<w:shd w:val="pct-5" w:color="000000" w:fill="FFFFFF" wx:bgcolor="FFFFFF"/>
</w:tcPr>
</w:tblStylePr>
<w:tblStylePr w:type="band2Horz">
<w:rPr>
<w:color w:val="auto"/>
</w:rPr>
<w:tblPr/>
<w:tcPr>
<w:tcBorders>
<w:tl2br w:val="none" w:sz="0" wx:bdrwidth="0" w:space="0" w:color="auto"/>
<w:tr2bl w:val="none" w:sz="0" wx:bdrwidth="0" w:space="0" w:color="auto"/>
</w:tcBorders>
<w:shd w:val="pct-20" w:color="000000" w:fill="FFFFFF" wx:bgcolor="F2F2F2"/>
</w:tcPr>
</w:tblStylePr>
</w:style>
</w:styles>
<w:docPr>
<w:view w:val="print"/>
<w:zoom w:percent="100"/>
<w:doNotEmbedSystemFonts/>
<w:validateAgainstSchema/>
<w:saveInvalidXML w:val="off"/>
<w:ignoreMixedContent w:val="off"/>
<w:alwaysShowPlaceholderText w:val="off"/>
</w:docPr>
<w:body>
<wx:sect>
<w:sectPr>
<w:pgSz w:w="11909" w:h="16834" w:orient="portrait" w:code="9"/>
</w:sectPr>
<w:p>
<w:pPr>
<w:pStyle w:val="Heading1"/>
</w:pPr>
<w:r>
<w:t>ACCOUNT LISTING</w:t>
</w:r>
</w:p>
<w:p></w:p>
<w:tbl>
<w:tblPr>
<w:tblStyle w:val="MyTableContemporary"/>
<w:tblW w:w="5000" w:type="pct"/>
<w:tblLook w:val="01E0"/>
</w:tblPr>
<w:tblGrid>
<w:gridCol w:w="2832"/>
<w:gridCol w:w="3238"/>
<w:gridCol w:w="2089"/>
<w:gridCol w:w="3061"/>
</w:tblGrid>
<w:tr>
<w:tc>
<w:p>
<w:pPr>
<w:jc w:val="right"/>
</w:pPr>
<w:r>
<w:t>Account ID</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:r>
<w:t>Holder Name</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:pPr>
<w:jc w:val="right"/>
</w:pPr>
<w:r>
<w:t>Balance</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:r>
<w:t>Debit/Credit</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
<xsl:for-each select="al:accountlist/al:account">
<xsl:sort select="al:holdername"/>
<w:tr>
<w:tc>
<w:p>
<w:pPr>
<w:jc w:val="right"/>
</w:pPr>
<w:r>
<w:t><xsl:value-of select="al:accid"/></w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:r>
<w:t><xsl:value-of select="al:holdername"/></w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:pPr>
<w:jc w:val="right"/>
</w:pPr>
<w:r>
<w:t><xsl:value-of select="al:balance"/></w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:r>
<w:t><xsl:value-of select="al:drcr"/></w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
</xsl:for-each>
</w:tbl>
</wx:sect>
</w:body>
</w:wordDocument>
</xsl:template>
</xsl:stylesheet>


Yeah, whew! That was intense. Mostly though, it was the Word XML markup, which I won't even try to explain now. But for more on WordprocessingML, please check out the latest article at Brian Jones' blog. It's got an excellent mid-level overview.

I've made the XSL instructions bold so you can pick them out clearly and marvel at how few of them there are. (By the way, learned the XSLT at the W3Schools' XSLT Tutorial.) Basically, they say the same thing I described towards the end of my last entry.

Sorry about the underlines in the listing -- have been exploring off-the-top-of-my-head ways to best show listings in HTML, and this is the best compromise I've been able to find between source code editability and web page readability. Will update later if I find anything better. Leave comments with any ideas you might have. Well, got rid of the underlines with some cool new hacks I didn't know about before. Check out the rule for the <pre> tag in my stylesheet.


So what has this exercise accomplished? We see that Word has become, as a result of customers' demands on it, a full-fledged XML transformation and validation engine. With this power, businesses have an amazing new ability to juggle information around, push it into and pull it out of Office documents, change it, and just generally go crazy with it.

I know that the trend in business lately forever in our age has been to, whenever a new problem is faced, just throw more technology at it. Am I complaining? No way. Bring it on!

Feb 1, 2006

Sparklines, internship ends, MS Office XML documents

In all the furore over my new and continually-evolving design I've been neglectful of my Sparklines code. Well, the good news is I've been working on it so intensively that the current version of sparklines.doc is so much more functional than the code I've posted here that I've seriously been thinking about deleting the code from the last two sparklines entries. But hell, it's amusing to look at.

The bad news is I've been working on even more exciting stuff for the last few days -- like my blog and conversion of the bank's daily reports into parseable XML form -- that the work I'm doing on sparklines.doc has slowed to a crawl. BUT to be fair, it meets my needs fully.

I'll take a brief interlude here and talk about some of the stuff I tried to do during my internship at ONE Bank, Dhanmondi branch. It'll lead up directly to why I'm so hyped-up about MS Office's new XML file format -- and this is weird, because just a few days ago I'd have told you OpenOffice.org's XML file format is better than Microsoft's. Now I'm strongly inclined to say otherwise.

At the bank, as I (think I) mentioned in Sparklines: can't resist, they have a lot of computer-generated output put in their hard drives daily. I guess their database-querying and -reporting software is tasked to process the day's transactions and output reports on the states of the various accounts, clients and such, every night. Now these reports are in plain-text format and currently the people in my branch, whenever they need to look up some information, just open up the report files in Wordpad and do a search for it.

This simple searching of plain-text files is well and good for small-scale information needs like looking up the account number of an account holder who can't recall the number, finding the interest rates offered on different types of deposits and loans, and sometimes also finding out historical interest rates. But it quickly starts sucking up your time if you have to keep doing things like:

  • prepare a monthly report on deposit mobilisation -- that is, a tally of the people who opened and closed deposit accounts, along with their account balances, and total amount of money deposited and withdrawn thus;

  • prepare reports with tallies of amounts grouped by type (deposit/loan), interest rate, and then economic sector code, as required by Bangladesh Bank;

  • prepare credit risk grading reports;

  • create mass-mailings to send out to account-holders and prospective clients;

  • email daily lists of transactions to the companies with which the bank has bill-collection arrangements;

  • and many more types of documents that the employees of each branch routinely have to prepare.


The common theme running through all of these different tasks is: the user has to process information output from the central database(s) in different ways and create documents showing these data in a nicely formatted way. And this has to be done month after month, with a lot of the document staying basically the same -- the changing data being the newly-processed information.

To me, the processes above are screaming to be automated. And this is where Office's new XML file formats come in. From what I've read about Office's (2003 and above) capabilities in Brian Jones' blog, Word lets you define arbitrary arrangements for your data and then lets you tell it how to format and display the data. This is done through the magic of XML schemas and stylesheets. For details, check out the article. But in short, suppose you start out with some raw data you're working on, information about about some accounts:









Account IDHolder NameBalanceDebit/Credit
1234567890Mr X100000Cr
2345678901Ms Y96000Cr
3456789012Dr Z45009.87Dr

You have this data in XML format, obviously ideal because of its parseability to both humans and computers. Say, this is your XML:

<?xml version="1.0"?>
<?mso-application progid="Word.Document"?>
<al:accountlist xmlns:al="http://yawar.blogspot.com">
<al:account>
<al:accid>1234567890</al:accid>
<al:holdername>Mr X</al:holdername>
<al:balance>100000</al:balance>
<al:drcr>Cr</al:drcr>
</al:account>
<al:account>
<al:accid>2345678901</al:accid>
<al:holdername>Ms Y</al:holdername>
<al:balance>96000</al:balance>
<al:drcr>Cr</al:drcr>
</al:account>
<al:account>
<al:accid>3456789012</al:accid>
<al:holdername>Dr Z</al:holdername>
<al:balance>45009.87</al:balance>
<al:drcr>Dr</al:drcr>
</al:account>
</al:accountlist>

Now, you need a way to tell Word (or any other XML-processing program) what kind of values to expect in each field so that it doesn't goof up on bad data: the account ID should be a sequence of ten digits; the name should be a string; the balance a real number (greater than zero), and the Debit/Credit field should be either `Dr' or `Cr', and nothing else. In fact, we could really just use `d' and `c', but Dr and Cr are time-honoured abbreviations of the words. Turns out the way to do is is through another XML file, a schema definition file.

More about schemas at MSDN's Advanced XML Support in Word and the W3C's XML Schema Primer.


Schema generator at XSD Inference Demo.

More tools, including one that validates your XML file against its schema, at XML Tools.


The schema definition for our account listing should be something like:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://yawar.blogspot.com"
xmlns:al="http://yawar.blogspot.com"
elementFormDefault="qualified">
<xsd:element name="accountlist" />
<xsd:complexType>
<xsd:sequence>
<xsd:element name="account" minOccurs="1" maxOccurs="unbounded">
<xsd:complexType>
<xsd:all>
<xsd:element name="accid">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]{10}" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="holdername" type="xsd:string" />
<xsd:element name="balance">
<xsd:simpleType>
<xsd:restriction base="xsd:decimal">
<xsd:minInclusive value="0" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="drcr">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:pattern value="[DC]r" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:all>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

Yes, it looks rather daunting, but it's not that hard; I whipped this schema up myself browsing through W3Schools' Schema tutorials.

The last piece of the puzzle is, how do we tell Word how to format and display our nice XML file? The answer is the standardised XML Stylesheet Language, XSL. Yet another piece of XML coding, this file instructs Word on how to create a Word XML document on-the-fly from the XML data file that you have (the accounts listing file). Let me try a whimsical explanation here. Imagine the stylesheet file is talking to Word, giving running instructions as the input file is being processed.

`Started reading the document? OK, write the heading, ``ACCOUNT LISTING''. Format it with the ``Heading 1'' style. Now leave a blank line and start a four-column table, with column headers ``Account ID'', ``Holder Name'', ``Balance'' and ``Debit/Credit''.

`Now for each <account>, create a new table row, and: put the contents of the <accid> in the first column; the contents of the <holdername> in the second column; <balance> in the third; and <drcr> in the fourth. Oh, and sort the table rows by account holder name.'

And remember, this is a Word document that is being created -- not an HTML file. Yeah, you can do all that with XSL inside Word!

As an aside, for someone like me, who cut his teeth on LaTeX and then a little bit of DocBook (SGML and XML) with PassiveTeX, Jade, Apache FOP, you name it, Word's new XML capabilities just blow me away. It looks like Word has become the powerful XML processing and transformation engine that documentation writers have always dreamed of.


Soon, I'll post the stylesheet file I've created to do the transformation, and hopefully graphical comparisons of the different views of the same XML document.

Jan 26, 2006

The Wheel of Time -- The Eye of the World

Wheeling Round and Round

Finished Robert Jordan's The Eye of the World and it was a whopper. The story itself is 782 pages. Not the longest I've read, but remarkable because the whole book is nothing more than a setup, even a leaflet, for the rest of the series. And wheels within wheels: almost the whole of the book is a setup for the last couple of chapters, where it really gets exciting.

The book as a whole is a long journey, a long series of hair-breadth escapes, interspersed with threatening dreams, drawn out but at the same time picking up more and more pace, until the explosive ending. The ending makes you want to go out and get the next book pretty much immediately.

But that's not the first thing that struck me, by far, while I was reading it. That would be the similarities to Tolkien's Lord of the Rings. Here are a basic few:
  • Two Rivers = The Shire
  • Tam al'Thor = Frodo, brings back `ring' (either Rand or the sword, or both, depending on how you look at it) from his adventures abroad
  • Fellowship sets out on quest
  • Mischievious Mat Cauthon = Mischevious Pippin Took
  • Moiraine = Gandalf
  • Lan = Aragorn
  • Sauron = Ba'alzamon
  • Fades hunting our `hobbits' = Ringwraiths
  • Trollocs = orcs
  • Padan Fain = Gollum
  • Journey to Blight = Trip to Mordor. Pack light, heroes! :-)
  • Children of the Light capture Perrin & Egwene = Faramir's gang captures Frodo, Sam & Gollum. OK, this is stretching it a bit
  • Green Man = Tom Bombadil, only sadder
  • Green Man = Ent
  • Egwene sounds like Éowyn
Um, am I forgetting anything?

Anyway, I do appreciate that there are definitely big differences. Jordan writes in more modern prose, with more short, sharp sentences for dramatic effect. Short. Dramatic. And he avoids, for the most part, Tolkien's rambling descriptions of this valley here, that nook and cranny there, that seem to go on for days. Oh, and a blessed avoidance of accented characters in names. But they're more than made up for with a liberal dose of apostrophes. Check out the names of some of the main Trolloc tribes (and I've thrown in their roots in monster names): Ahf'frait (afreet), Al'ghol (ghoul), Bhan'sheen (banshee), Dha'vol (devil), Dhai'mon (guess this one), Dhjin'nen (djinn), Ghar'ghael (gargoyle), Ghob'hlin (again, guess), Gho'hlem (golem), Ghraem'lan (gremlin).

But I digress. There is the One Power, a mystical force which comes from the True Source of the universe, drives the eternal Wheel of Time, and empowers a few chosen individuals with great power but at the risk of death and/or madness. But then again, it's like Tolkien's One Ring where it gives you power against the bad guy but the price is high. The real revelation is the turning of the Wheel of Time, where apparently the ages come and go and come again; nothing new ever happens. Civilisations rise and fall, and fall some more, in the eternal battle (you know the one, Good v Evil). Mankind continues to lose science and technology because it just can't get a firm foothold on the Earth before it's all toppled away again. Bleak outlook, really. But then I've heard there are thirteen books in this series, each one presumably as fat as the first. With that kind of length, what else could Jordan be doing but telling the story of the liberation of humanity from the yoke of the Wheel? Guess I'll have to find out. But it's what I would do.

Jan 22, 2006

New style, cont.

After a lot of high-flying coding trying to get cookies to work (to
remember which user has seen which posts and/or comments) and at the
same be compatible with Internet Explorer, I've decided to BAD (Bypass
All Difficulties) and just show the posts and comments by default,
letting users hide them if they want. Code is so much simpler, and at
the same time IE users get to at least read the posts, even if they
don't get the cool clicking and hiding/showing effects.

Jan 19, 2006

New style

After what seems like an eternity with the old ready-made style, have finally gotten down and dirty with Blogger's internals. The inspiration was Gmail's message display interface, which also led me to suggest such an interface for the next version of Thunderbird in the website maintained by the developers, here. Also led me to thinking about how to implement something like it with HTML. Plucked up some courage reading up on JavaScript, the DOM, and CSS, then gave it a try; rather aborted results can be seen here.

Then realised that Blogger's template system provides pseudo-HTML tags which automatically pull blog posts and comments out of the Blogger database -- so basically we have this big database of items which we can pull out and display, rather as if they were emails. Of course, they're a little more complicated than emails (because each post can have one or more comments), which leads to some code complexity; but on the whole it was surprisingly easy. Guess I have XML/CSS/JavaScript and their amazing expressiveness to thank for that.

One thing to note though is that the site doesn't work very well at all on Internet Explorer, even the version 6 that I have running on this XP Service Pack 2 machine. Tried a perfunctory hack to solve the problem, but hasn't worked. Oh well, will tackle it later, I guess. Meanwhile, I recommend all my beloved viewers (anybody out there? :-) use Firefox or Opera, the two best browsers available today.

Jan 12, 2006

Thunderbird rocks

Set up Thunderbird to handle my Gmail account as well as the ISP-provided POP3 account. Works great and, what's more, allows me to sign and/or encrypt outgoing messages with Thunderbird's Enigmail extension which gives Thunderbird OpenPGP support.

Also set up the BDComics RSS feed (Tools > Account Settings..., then Add Account...), making it a hell of a lot easier to navigate all the great comics links put up there.

Dec 30, 2005

Sparklines: satisfaction and disappointment

Could go into a whole diatribe about the paradoxical human condition of conflicting feelings but will keep it simple. Have achieved what I set out to do: and -- sparkline the bank's account opening activity during a given period. Well, have achieved it roughly, anyway. But generating sparklines is a big hassle: code for Office 2000 or XP and above? (2000 doesn't have a feature which makes the user's life a hell of a lot easier.) When generating multiple sparklines at the same time, scale them all to the same scale or different (as they are above)? Currently I'm coding for MSO2000, though there's no guarantee the code will actually work; and scaling to different scales even when generating in the same batch. Will have to change both these settings because the alternatives are so much more helpful.

On the plus side, wrote a clever little toolbar that manages generated sparklines -- i.e. selecting and deleting them -- almost well enough to be called a sparkline manager.

Here are the macros that pull in and parse the plain-text reports generated daily by the bank's software:

Sub inputData(fname, aDoc)
Set fso = CreateObject("Scripting.FileSystemObject")
Set fin = fso.OpenTextFile(fname, 1)
branchNamesStore = ""
openingCountsStore = ""
printingCounts = False
weHaveTabs = False

Do While fin.AtEndOfStream <> True
aLine = fin.ReadLine

Dim branchNamePoses(5)
branchNamePoses(0) = InStr(1, aLine, " BRANCH ")
branchNamePoses(1) = InStr(1, aLine, " BRANCH" + vbTab)
branchNamePoses(2) = InStr(1, aLine, " BRAN ")
branchNamePoses(3) = InStr(1, aLine, " BRAN" + vbTab)
branchNamePoses(4) = InStr(1, aLine, " BRANC ")
branchNamePoses(5) = InStr(1, aLine, " BRANC" + vbTab)

numAcctsPos = InStr(1, aLine, "OPENED :")
lineHasComma = InStr(1, aLine, ",")
branchName = ""
branchNameTemp = ""
numAcctsStr = ""

For posCounter = 0 To UBound(branchNamePoses)
If branchNamePoses(posCounter) <> 0 And lineHasComma = 0 Then
If posCounter Mod 2 = 0 Then ' No tabs in this file
i = branchNamePoses(posCounter) - 1
Do While Mid(aLine, i, 1) <> " "
branchName = branchName & Mid(aLine, i, 1)
i = i - 1
Loop
branchNameTemp = branchName
branchName = StrReverse(branchNameTemp)
branchNamesStore = branchNamesStore & " " & branchName
Else ' Tabs in the file
weHaveTabs = True
i = branchNamePoses(posCounter) - 1
Do While Mid(aLine, i, 1) <> vbTab
branchName = branchName & Mid(aLine, i, 1)
i = i - 1
Loop
branchNameTemp = branchName
branchName = StrReverse(branchNameTemp)
branchNamesStore = branchNamesStore & " " & branchName
End If
ElseIf InStr(1, aLine, "HEAD OFFICE") And lineHasComma = 0 And Not InStr(1, branchNamesStore, "HEADOFFICE") Then
branchNamesStore = branchNamesStore & " HEADOFFICE"
Exit For
End If
Next posCounter

If numAcctsPos <> 0 Then
i = numAcctsPos + 9
Do While i < Len(aLine) + 1
curChar = Mid(aLine, i, 1)
If weHaveTabs Then
If curChar <> vbTab Then numAcctsStr = numAcctsStr & curChar
Else
If curChar <> " " Then
numAcctsStr = numAcctsStr & curChar
Else
If Len(numAcctsStr) >= 2 Then Exit Do
End If
End If
i = i + 1
Loop
openingCountsStore = openingCountsStore & " " & numAcctsStr
End If
Loop
fin.Close

branchNames = Strings.Split(Trim(branchNamesStore))
openingCounts = Strings.Split(Trim(openingCountsStore))
Debug.Assert UBound(branchNames) = UBound(openingCounts)

For Each par In aDoc.Paragraphs
docLine = Mid(par.Range.Text, 1, Len(par.Range.Text) - 1)
If docLine = "\begin{acc_opening_counts}" Then
printingCounts = True
GoTo nextItem
ElseIf docLine = "\end{acc_opening_counts}" Then
printingCounts = False
GoTo nextItem
End If

typedNumber = False
If printingCounts Then
curBranch = Trim(par.Range.Words.First)
par.Range.Select
Selection.EndKey Unit:=wdLine
For i = 0 To UBound(branchNames)
If branchNames(i) = curBranch Then
typedNumber = True
Selection.TypeText Text:=" " & openingCounts(i)
GoTo nextItem
End If
Next i
If typedNumber = False Then Selection.TypeText Text:=" 0"
End If
nextItem:
Next
End Sub

Sub importStats()
Dim theDoc As Document
Set theDoc = ActiveDocument

Set fso = CreateObject("Scripting.FileSystemObject")
monthFolder = "\\accounts\MB_REPORT 2004\YEAR2004\DECEMBER2004\"

For i = 1 To 31
If i < 10 Then
fname = monthFolder + "Dhanmondi Branch 2004-12-0" & Str(i) & "\AC_OPEN_ALL"
fname = Strings.Replace(fname, "0 ", "0")
If fso.FileExists(fname) Then
inputData fname, theDoc
Else
Debug.Print fname, "does not exist"
End If
Else
fname = monthFolder + "Dhanmondi Branch 2004-12-" & i & "\AC_OPEN_ALL"
If fso.FileExists(fname) Then
inputData fname, theDoc
Else
Debug.Print fname, "does not exist"
End If
End If
Next i
End Sub


May not look like much but was a bitch to write thanks to the lack of regular expressions in vanilla Office VBA. This was the first half. The second half was even harder because even more ill-defined -- almost no one's ever done it before.

The sparkline generator:
Microsoft Word Object ThisDocument

Private Sub Document_Close()
myDocumentClose
End Sub

Private Sub Document_Open()
myDocumentOpen
End Sub

Module NewMacros


Function sizeof(arr)
sizeof = UBound(arr) - LBound(arr)
End Function

Function sort(arr As Variant, Optional SortAscending As Boolean = True)
' Chris Rae's VBA Code Archive - http://chrisrae.com/vba
' By Chris Rae, 19/5/99. My thanks to
' Will Rickards and Roemer Lievaart
' for some fixes.
ToSort = arr
Dim AnyChanges As Boolean
Dim BubbleSort As Long
Dim SwapFH As Variant
Do
AnyChanges = False
For BubbleSort = LBound(ToSort) To UBound(ToSort) - 1
If (ToSort(BubbleSort) > ToSort(BubbleSort + 1) And SortAscending) _
Or (ToSort(BubbleSort) < ToSort(BubbleSort + 1) And Not SortAscending) Then
' These two need to be swapped
SwapFH = ToSort(BubbleSort)
ToSort(BubbleSort) = ToSort(BubbleSort + 1)
ToSort(BubbleSort + 1) = SwapFH
AnyChanges = True
End If
Next BubbleSort
Loop Until Not AnyChanges
sort = ToSort
End Function

Function arrayMin(theArr)
Dim arr()
ReDim arr(UBound(theArr))

For i = LBound(theArr) To UBound(theArr)
arr(i) = Val(theArr(i))
Next i

If sizeof(arr) = 1 Then
arrayMin = arr(LBound(arr))
ElseIf sizeof(arr) = 2 Then
If arr(LBound(arr)) < arr(UBound(arr)) Then
smaller = arr(LBound(arr))
ElseIf arr(UBound(arr)) < arr(LBound(arr)) Then
smaller = arr(UBound(arr))
End If
arrayMin = smaller
Else
sortedArr = sort(arr)
arrayMin = sortedArr(LBound(sortedArr))
End If
End Function

Function arrayMax(theArr)
Dim arr()
ReDim arr(UBound(theArr))

For i = LBound(theArr) To UBound(theArr)
arr(i) = Val(theArr(i))
Next i

If sizeof(arr) = 1 Then
arrayMax = arr(LBound(arr))
ElseIf sizeof(arr) = 2 Then
If arr(LBound(arr)) < arr(UBound(arr)) Then
bigger = arr(UBound(arr))
ElseIf arr(UBound(arr)) < arr(LBound(arr)) Then
bigger = arr(LBound(arr))
End If
arrayMax = bigger
Else
sortedArr = sort(arr)
arrayMax = sortedArr(UBound(sortedArr))
End If
End Function

Function scaleHeight(num, max, theHeight) As Double
If max = 0 Then
scaleHeight = 0
Else
scaleHeight = theHeight - (num / max) * theHeight
End If
End Function

Function lineChart(aLine, theHeight, widthMul, showAvg, vertPos As Single, ByRef header, Optional ByVal scaleSame As Boolean, Optional scaleMax, Optional scaleMin)
If Right(aLine, 1) = vbCr Then
theLine = Left(aLine, Len(aLine) - 1)
Else
theLine = aLine
End If
theSeries = Split(theLine) ' Contains the header label
Dim numSeries() ' Does not hold the label

numNils = 0
For counter = 1 To UBound(theSeries)
If theSeries(counter) = "nil" Then numNils = numNils + 1
Next counter

ReDim numSeries(UBound(theSeries) - numNils - 1)
For i = numNils + 1 To UBound(theSeries)
numSeries(i - numNils - 1) = Val(theSeries(i))
Next i

If scaleSame Then
min = scaleMin
tempMax = scaleMax
Else
min = arrayMin(numSeries)
tempMax = arrayMax(numSeries)
End If
max = tempMax - min

For i = 0 To UBound(numSeries)
tempNum = numSeries(i) - min
numSeries(i) = tempNum
Next

If showAvg Then
sum = 0
For Each elem In numSeries
sum = sum + elem
Next
avg = sum / UBound(numSeries)
avgHeight = scaleHeight(avg, max, theHeight)
End If

With ActiveDocument.shapes.BuildFreeform(msoEditingAuto, (numNils * widthMul) + 100, scaleHeight(numSeries(0), max, theHeight) + vertPos)
For i = 1 To UBound(numSeries)
.AddNodes msoSegmentLine, msoEditingAuto, ((numNils + i) * widthMul) + 100, scaleHeight(numSeries(i), max, theHeight) + vertPos
Next i
freeformName = .ConvertToShape.Name
End With

With ActiveDocument.shapes.AddShape(msoShapeOval, (numNils + i - 1) * widthMul - 2 + 100, scaleHeight(numSeries(i - 1), max, theHeight) + vertPos - 2, 4, 4)
.Fill.Visible = msoTrue
.Fill.Solid
.Fill.ForeColor.RGB = RGB(51, 102, 255)
.Line.ForeColor.RGB = RGB(51, 102, 255)
dotName = .Name
End With

With ActiveDocument.shapes.AddTextbox(msoTextOrientationHorizontal, (numNils + i - 1) * widthMul + 5 + 100, scaleHeight(numSeries(i - 1), max, theHeight) + vertPos - 7.5, 50, 15)
.TextFrame.TextRange.Text = strings.Trim(Str(numSeries(i - 1) + min))
.TextFrame.TextRange.Font.Size = 8
.TextFrame.TextRange.Font.Color = RGB(51, 102, 255)
.Fill.ForeColor.RGB = RGB(255, 255, 255)
.Line.Visible = False
.Fill.Transparency = 1#
textBoxName = .Name
End With

If showAvg Then
With ActiveDocument.shapes.AddLine(numNils * widthMul + 100, avgHeight + vertPos, (numNils + i - 1) * widthMul + 100, avgHeight + vertPos)
.Line.ForeColor.RGB = RGB(153, 51, 0)
.Line.DashStyle = msoLineRoundDot
avgLineName = .Name
End With
End If

header = theSeries(0)
If showAvg Then
retval = Array(freeformName, dotName, textBoxName, avgLineName)
Else
retval = Array(freeformName, dotName, textBoxName)
End If

For Each elem In retval
Debug.Print elem
Next
lineChart = retval
End Function

Sub selectChart()
Dim ctl As CommandBarComboBox
Set ctl = CommandBars("Sparklines").Controls(2)
If ctl.ListCount < 1 Then Exit Sub

lineArr = Split(ctl.List(ctl.ListIndex), ":")

theNames = Split(Trim(lineArr(1)), ",")
Dim shapeNames() As Variant
ReDim shapeNames(UBound(theNames))
For i = 0 To UBound(shapeNames)
shapeNames(i) = theNames(i)
Next i

ActiveDocument.shapes.Range(shapeNames).Select
End Sub

Sub deleteChart()
Dim ctl As CommandBarComboBox
Set ctl = CommandBars("Sparklines").Controls(2)
If ctl.ListCount < 1 Then Exit Sub

selectChart
Selection.Delete
ctl.RemoveItem ctl.ListIndex
End Sub

Sub moveChartRight()
Dim ctl As CommandBarComboBox
Set ctl = CommandBars("Sparklines").Controls(2)
If ctl.ListCount < 1 Then Exit Sub

selectChart
Selection.MoveRight
End Sub

Sub moveChartLeft()
Dim ctl As CommandBarComboBox
Set ctl = CommandBars("Sparklines").Controls(2)
If ctl.ListCount < 1 Then Exit Sub

selectChart
Selection.MoveLeft
End Sub

Sub refresh()
myDocumentClose
myDocumentOpen
End Sub

Sub myDocumentOpen()
CommandBars.Add(Name:="Sparklines", Temporary:=False).Visible = True

With CommandBars("Sparklines")
With .Controls.Add(Type:=msoControlButton, Temporary:=False)
.Caption = "Line Chart..."
.Style = msoButtonCaption
.OnAction = "lineChartGui"
End With
.Controls.Add Type:=msoControlDropdown
With .Controls.Add(Type:=msoControlButton, Temporary:=False)
.Caption = "Select"
.Style = msoButtonCaption
.OnAction = "selectChart"
.Enabled = True
End With
With .Controls.Add(Type:=msoControlButton, Temporary:=False)
.Caption = "Delete"
.Style = msoButtonCaption
.OnAction = "deleteChart"
.Enabled = True
End With
With .Controls.Add(Type:=msoControlButton, Temporary:=False)
.Caption = "Refresh"
.Style = msoButtonCaption
.OnAction = "refresh"
.TooltipText = "Clears names of all charts from the list, whether charts are still in document or not"
.Enabled = True
End With
End With
End Sub

Sub myDocumentClose()
Dim sl As CommandBar

On Error Resume Next
Set sl = CommandBars("Sparklines")
If sl Then sl.Delete
End Sub

Sub lineCharts(theHeight, widthMul, showAvg)
noTb = False
Dim sl As CommandBar
On Error GoTo makeToolbar
Set sl = CommandBars("Sparklines")

continueWithTb:
howMany = Selection.Range.Paragraphs.Count
If howMany < 1 Then End

lines = Split(Selection.Range.Text, vbCr)
theHeader = ""
For i = 0 To howMany - 1
theShapes = lineChart(lines(i), theHeight, widthMul, showAvg, 100 + i * (theHeight + 15), theHeader)
shapesStrTemp = ""
For Each elem In theShapes
shapesStrTemp = shapesStrTemp & "," & elem
Next
shapesStr = Right(shapesStrTemp, Len(shapesStrTemp) - 1)
sl.Controls(2).AddItem theHeader & ":" & shapesStr
Next i

Exit Sub
makeToolbar:
myDocumentOpen
GoTo continueWithTb
End Sub

Sub lineChartGui()
frmLineChart.Show
End Sub


Yup, very complicated. Hopefully will become simpler and simpler in future iterations. Sometimes wonder why I don't just switch to automating Excel charts.

Dec 26, 2005

Sparklines: can't resist


When I started looking at ways to automate the graphing of the bank's accounts opening data, I originally started out with a 3-D line chart powered by a PivotTable. But have since realised that this is a perfect area of application for sparklines, Edward Tufte's `intense, simple, word-sized graphics'. For example, see above.

They're usually supposed to be surrounded by more context, but basically that is their size and general appearance.

Sparklines have so much potential in charting huge amounts of data; couldn't resist spending a lot of thought and time trying to figure out what would be the best way to implement them. First decided on plain HTML and CSS generated by Python, and spent a lot of time on it before decided it was too tedious because I had to get Python to generate each and every dot making up the lines. Python is very good, but after a while I realised I should use an environment which already provided vector-based drawing tools which could be automated.

The obvious choice turned out to be Microsoft Word, because of how common it is, especially here in Bangladesh. After some hacking, came up with the following code:

Const theHeight = 50
Const widthMul = 1

Function scaleHeight(num, max) As Double
num = Val(num)

scaleHeight = theHeight - (num / max) * theHeight
End Function

Sub genSl()
Dim c As Shape ' Holds the canvases one by one
min = 0
max = 0

Dim theArray()
howMany = Selection.Range.Paragraphs.Count
ReDim theArray(howMany - 1)
Dim canvasNames()
ReDim canvasNames(howMany - 1)

For i = 0 To howMany - 1
theArray(i) = Strings.Split(Selection.Range.Paragraphs(i + 1).Range.Text)
For j = 1 To UBound(theArray(i))
If Val(theArray(i)(j)) < min =" theArray(i)(j)"> max Then max = theArray(i)(j)
Next j
Next i
max = max - min

For i = 0 To howMany - 1
' For each paragraph in the selection a sparkline is drawn
Set c = ActiveDocument.Shapes.AddCanvas(100, i * (theHeight + 20) + 200, widthMul * (UBound(theArray(i)) + 1) + 55, theHeight + 15)
canvasNames(i) = c.Name

With c.CanvasItems.BuildFreeform(msoEditingAuto, 0, scaleHeight(theArray(i)(1), max) + 7.5)
For j = 2 To UBound(theArray(i))
' j starts from 1 because the first point was plotted in the BuildFreeform function
.AddNodes msoSegmentLine, msoEditingAuto, j * widthMul, scaleHeight(theArray(i)(j), max) + 7.5
Next j
.ConvertToShape
End With

j = j - 1
With c.CanvasItems.AddShape(msoShapeOval, j * widthMul - 2, scaleHeight(theArray(i)(j), max) + 7.5 - 2, 4, 4)
.Fill.Visible = msoTrue
.Fill.Solid
.Fill.ForeColor.RGB = RGB(51, 102, 255)
.Line.ForeColor.RGB = RGB(51, 102, 255)
End With

With c.CanvasItems.AddTextbox(msoTextOrientationHorizontal, j * widthMul + 5, scaleHeight(theArray(i)(j), max) + 7.5 - 7.5, 50, 15)
.TextFrame.TextRange.Text = Strings.Trim(Str(theArray(i)(j)))
.TextFrame.TextRange.Font.Size = 8
.TextFrame.TextRange.Font.Color = RGB(51, 102, 255)
.Fill.ForeColor.RGB = RGB(255, 255, 255)
.Line.Visible = False
End With
Next i

ActiveDocument.Shapes.Range(canvasNames).Select
End Sub

Sub showMarkers(n As Integer)
pWidth = ActiveDocument.PageSetup.PageWidth
pHeight = ActiveDocument.PageSetup.PageHeight

Dim l As Shape
For i = 1 To Int(pWidth / n)
Set l = ActiveDocument.Shapes.AddLine(i * n, 0, i * n, 10)
Set l = ActiveDocument.Shapes.AddLine(i * n, pHeight, i * n, pHeight - 10)
Next i
For i = 1 To Int(pHeight / n)
Set l = ActiveDocument.Shapes.AddLine(0, i * n, 10, i * n)
Set l = ActiveDocument.Shapes.AddLine(pWidth, i * n, pWidth - 10, i * n)
Next i
End Sub

Sub doShowMarkers()
Call showMarkers(10)
End Sub

If you're interested in using them, put them in some module in one of your documents templates (if in the Normal template, it will be available to all documents). Then put some data and numbers in the document itself, arranged in a certain way. The above sparklines were generated from the following data:

DSE 2 3 4 7 3 7 4 119 3
DSEGeneralIndex 749.11 768.03 795.05 763.7 752.91 792.56 874.57 870.46 874.22 842.36 845.07 848.41 807.6 806.92 750.84 787.94 791.7
DSE20Index 942.46 958.2 1004.56 963.88 920.73 973.88 1134.34 1094.45 1085.97 1052.47 1051.48 1054.89 1004.61 1021.5 948.27 964.13 964.32
RandomIndex 642.2 221.5 2

That is, each series is on its own paragraph (paragraphs not separated by blank lines), each item in the series separated from the other by a single space. To chart the data, select it all. If the selection contains a single data series, then a single sparkline will be drawn, and so on.

Need to work more on the code and especially on the GUI front-end. But for now it works OK.
Will upload it to a public server after working on it some more.

Dec 22, 2005

New ideas

As usual, haven't posted in a long time. Never found much to talk about, but nowadays I find myself looking at problems and inconveniences in my life, and others', and thinking of ways to solve them.

Example. With the abolishing of rickshaws from the main road near leading up to New Market and Nilkhet, the road in front of New Market has become more jammed than ever with parked cars and stationary rickshaws. Right now it is a two-way street, with two lanes on each side and a lane for parking cars on. A simple way to solve the jam would be to allow only cars on the side further away from NM, and only rickshaws on the side closer to it. Sure, cars would have to exit through the other, further side on their way out, but then, that's what they're there for.

Work

Started at One Bank in the beginning of December. Worked, or observed, my way through a lot of stuff but I've finally seen what to me is the most interesting part of it all: the raw data generated by the computer system of the bank's daily activities. These data are in the form of plain text files arranged into folders, essentially by date. They are just crying to be pulled in and processed programmatically by Excel or some such program. For example, there are daily data files about fixed deposits which mature on the day; and new accounts (including loans) which were opened on the previous day.

In the new accounts example, the information in each file (each day's report) includes a grouping of accounts by branch, count of new accounts in each branch, and detailed information on each account (one account per line). The way it is arranged makes it possible to parse it and pull out the most useful data -- for example, the count of new accounts opened in each branch. If one does this every day to keep current, one can graph the daily account opening activity for each branch, and what's more, put these graphs together into a combined `3-D' graph for ease of comparison. This gives, over time, a nice high-level view of account activity throughout the bank.

This is exactly what am now trying to do with Excel and a well-crafted macro at the bank. Have made some progress, and think the parsing bit is taken care of thanks to Excel's, well, excellent plain text file importing/parsing capabilities. But a lot of it is still left, including programmatically generating pivot tables and charts for new months. Should be quite a challenge. If they let me do this, even intermittently, it should make it very interesting at work. Don't know who it will really help, though, to be realistic. At this point it's just a shiny toy, a very high-level view which branch employees may not find ultimately useful and thus may lose interest in rapidly.

But still look forward to exploring more of the daily reports and perhaps even getting something useful out of them.

Nov 9, 2005

Evolution v intelligent design, notes from Slashdot

Kansas' education board has decided that its students should be taught about intelligent design, an alternative `theory' to evolution. Here's a posting from Slashdot (http://science.slashdot.org/science/05/11/08/2338233.shtml?tid=123&tid=14) which describes exactly how I feel about this:



Re:You are only hurting yourself you know.... (Score:4, Insightful)

by Decaff (42676) on Tuesday November 08, @10:15PM (#13985193)

Interesting comment--considering that they are teaching Intelligent Design alongside Evolutionary Theory. Your comment seems to indicate that, by teaching ONLY Evolution, that's how we develop Independent Thinking? Tell one side of a story? Somehow, that seems more like indoctrination to me.

You are missing the point. These classes are supposed to be science lessons, not philosophy or religion. There are plenty of alternatives ideas to evolution that can be discussed in biology classes, such as the ideas that fossils aren't old and the Earth was created recently. These areas are testable, and examining the data that suggests they are false can be highly educational - students learn about rock strata and radioactive dating.

Intelligent design is not testable. It is nothing more than a series of statements of incredulity - that because we don't yet understand everything about the evolution of life then there must have been intervention by a `designer'. This isn't science. Intelligent design might be science if there was some sort of valid consistent test for the existence of a designer, but there isn't. Also, because it is likely there there will always be some area of evolution or of biology that is not fully understood, there will always be some room for someone to say `that must be designed'. This means that Intelligent Design is never refutable; again, making it meaningless in the context of science.

Science teaching should include the idea that we are simply currently ignorant about some things. Coming up with untestable, irrefutable explanations to cover that ignorance is dishonest and should not be part of the process.

Imagine this sort of approach being used in other areas of science (e.g. `We don't yet fully understand the origin of comets, so aliens or gods must have made them') and the results are silly in the extreme.


Here's another that echoes it:



Re:You are only hurting yourself you know....(Score:4, Insightful)

by Flower (31351) on Tuesday November 08, @10:25PM (#13985290)


What independent thinking? ID certainly doesn't promote it. It provides the ultimate out in the search for truth. It's too hard right now to explain *this* so the obvious answer is God did it! (And don't even try to claim it is some ambiguous creator that spontaneously created the eye. The second some pagan asserts that it was the Goddess who made it happen you'll see every ID proponent in Kansas heading out to smite that heretic down.)

ID's greatest sin is that it closes doors to scientific research. If God miraciously intervened and created the eye then there is no reason to try to find an explanation. God did it so leave it alone and don't question it. Obviously if a million believers can't figure it out what could a scientist accomplish? And if this can be done in evolution then why can't it be done in other sciences? The creation of the universe is too complex to really comprehend so all this fluff about researching gravity really doesn't have to be done because we can just attribute the really interesting mysteries to God.

ID isn't science. It's the same old shit that pioneers in science had to fight against and be abused by centuries ago.


You have to love Slashdot: they put into words exactly how I feel about this stuff. I don't even have to raise a finger and type the stuff out, it's so perfect. Best of all, their sense of humour:



Hey Kansas! (Score:5, Funny)

by Anonymous Coward on Tuesday November 08, @09:30PM (#13984798)

We're becoming a laughingstock of not only the nation, but of the world, and I hate that

HAHAHAHAHAHAHAHAAHAHA!!!!!!!

-- The World


And also:



Re:Misleading headline (Score:4, Funny)

by c0d3h4x0r (604141) on Tuesday November 08, @10:17PM (#13985212)


2)It redefined the meaning of science. According to the new definition, science is no longer is limited to searching for natural explanations for natural phenomena.

Excellent! So now student `science' fair projects can be about... well, pretty much anything!


Richard Feynman must be turning in his grave.