Jul 7, 2014

Easily Authenticate when Pushing to a Git Remote

SOMETIMES when you’re working with git repositories, the remote doesn’t support pushing and pulling over SSH. It only supports HTTPS. Or you don’t have an SSH key properly set up, or you do but it’s not working for some reason. Whatever the cause, the upshot is that whenever you push, you’re forced to type in your username and password. This sucks, understandably. But do you use a password manager, like KeePass, on Windows by any chance? Because if you do, you can authenticate absolutely painlessly. Here’s how.

Set up a KeePass entry for the git server you’re pushing to. Let’s use GitHub as an example. Create a GitHub entry with your username and password, then make sure the entry has these two properties:

Auto-Type: {USERNAME}{ENTER}{PASSWORD}{ENTER}

Auto-Type-Window: Git Bash

If you’re using git on Windows, most likely you’re using Git Bash. Of course, if you’re using Posh Git, then just change the window name to whatever is appropriate.

Now, when you do a ‘git push’ and the git remote asks you to authenticate, simply press your global autotype combo (by default it’s Ctrl-Alt-K) and KeePass will log you in immediately. No SSH necessary.

That’s convenience.

May 13, 2014

Stack Overflow and its Discontents

LIKE many others, I’ve come to rely on Stack Overflow (SO) as an amazing repository of technical knowledge. How did it become so comprehensive? My guess is it was the ‘many eyes make bugs shallow’ principle. I.e., many contributors building something together, a lot like Wikipedia. SO is nearly always great when you’re searching for an answer, but not that great when you’re trying to contribute and build a profile for yourself among the other members. I’ll explain why but first let me clarify something as I see it: SO may look like a simple Q&A site, but it’s really a permanent knowledge base with the knowledge organised in the form of questions and answers. That’s important: the Q&A format is just that, a knowledge delivery mechanism. So with that in mind, we can examine how that affects peoples’ actions on the site.

Lots of people have discussed why SO punishes or rewards contributors the way it does, but one widely-held belief is that there is a subset of users (usually the moderators and other high-reputation users) that is intent on seeing SO’s core mission carried out: that the site becomes a repository of useful and generally applicable questions and answers. To keep it that way, this subset performs triage on the questions that are asked: they apply judgment calls on whether or not the questions are good ones. When you’re a beginner, there are no bad questions. But when you’re building a long-lasting repository, there are bad questions.

Generally speaking, bad questions on SO could be any of:

  • Duplicate of an already-answered question
  • Not phrased as a question
  • Not clear what the asker wants to find out
  • Asker shows no signs of having done any research or made any attempt to solve the problem
  • Question is about a specific corner case which can be easily solved if asker understood a more general principle
  • Code examples are incomplete and can’t be compiled, error messages are not quoted verbatim but only vaguely described
  • And the other end of the spectrum: many screenfuls of code or log output pasted verbatim, in entirety, without any attempt to zoom in on the source of the issue.

Any of these questions will annoy the mods because they’ll invariably get answered (because people are incentivised to answer no matter how bad the question), and then those bad questions and their answers will raise the noise level and make it difficult for people trying to find answers to the genuinely good questions. (Good search features, and even Google, can only take you so far.)

So with this in mind, we can start to understand the mods’ behaviour of seemingly penalising usually newer users of the site, those who haven’t absorbed the culture yet and are treating SO as a source of answers to one-off questions. It’s not–the questions are meant to become part of a knowledge base on the subject and potentially live forever. Under these conditions, it’s very difficult to justify questions with the above bad qualities, especially if we can guide the new users towards improving their question quality (and lots of people are trying to do this).

So, remember, the mods’ goal is to build a generally-useful knowledge base. With this as a given, the questions (subjects of knowledge) that are of a low quality will tend to get weeded out: either by being downvoted, or closed. The people who’re doing the downvoting and closing don’t have the end goal of punishing the askers; their goal is to weed out the bad questions. That the askers lose rep points is a side effect of the voting and rep system. Which is fair: if my peers judge me as not contributing good material, then I should have less cred. But the primary goal on everyone’s mind is having good material on the site, not punishing me.

Having said all that, I want to address the fact that votes on each question and answer are essentially on an infinite scale going in both directions. So, a given question or answer can potentially be upvoted or downvoted many times over, and every one of those votes affects the poster’s rep. But the effects on rep are all coming from a single posting. That’s skewed, because users on the site are more likely to see higher-voted questions and answers than they are to see lower-voted ones. That’s simply how the search facilities work by default: understandably and helpfully, users get to see highly-regarded content before they see the dregs of the site. But this means that highly-upvoted content will always get more exposure, and therefore continuously be exposed to more upvotes, while downvoted content will get less exposure and less downvotes. This skewness is disproportionately rewarding the experts and inflating their rep.

Let’s ignore the downvoted content for a second and think about the upvoted content: it is continuously getting upvoted, just for being there. Meanwhile, the person who posted that content could very well have not made a single contribution after that (taking this to the logical extreme). That’s an unfair advantage, and thus a bad indicator of that person’s cred in the community.

It’s clear at this point that the SO rep system is not going to be recalibrated yet again (barring some really drastic decision) to fix this bias, so let’s imagine for a second what a rep system would look like that actually did fix it. My idea is that such a rep system would reward (or punish) a user for a single contribution by a single point only, to be determined as the net number of up (or down) votes. So, if a post had +1 and -1, the reward to the contributor is nothing. If the post has +100 and -12, the reward to the contributor is +1. And if the post has +3 and -5, the reward is -1. If there’s a tie, the next person who comes along has the power to break that tie and thus ‘reward’ or ‘punish’ the contributor. Usually, of course, the situation won’t be a tie–usually there’s pretty good consensus about whether a contribution is good or not (to verify this, simply log in to Stack Overflow and click on the score of any question or answer on the question’s page–it’ll show you the upvotes and downvotes separately).

The sum of the net effect on reputation from each of a contributor’s posts shouldn’t become the final measure of their profile rep, though. That doesn’t give the community an easy way to tell apart a person with +100/-99 rep (a polarising figure) from someone with +1/0 (a beginner, an almost-unknown). Instead, users should be able to judge contributions as +1 (helpful), -1 (not helpful), or 0 (undecided). And each net reputation change from a contribution should form a part of a triune measure of rep: percentage helpful, percentage not helpful, and percentage undecided.

The three parts of the measure are equally important here. The vote on the merit of a contribution is nothing but a survey; and any statistician will tell you that in a survey question you must offer the respondent the choice of giving a ‘Don’t know’ answer. In this case that’s the ‘undecided’ option. If we don’t offer this option, we are missing critical information about the response to the contribution–we can’t tell apart people who simply didn’t vote, or those who tried to express their uncertainty in the question/answer by not voting.

This way, everyone immediately sees how often a particular user contributes valuable content, as opposed to unhelpful or dubious content. And the primary measure of rep is therefore not something that can grow in an unbounded way: the most anyone can ever achieve is 100% helpfulness. That too, I think, should be quite rare. The best contributors will naturally tend to have higher helpfulness percentages, but it won’t so much a rat race but rather a level marker within the community, tempered by their levels of ‘unhelpfulness’ or people’s indecision about their contributions.

So much for user profile rep. I do think that the scores on questions and answers should behave in the traditional SO way: all votes should be counted as individual votes, instead of (as with user profile rep) being summed into a net positive/negative/undecided vote. The reason for the difference is that the votes are the measure of each contribution’s merit; and if (for example) you have two similar contributions, their vote scores should be a good way to judge between them. Again, vote score should be presented in the form of a three-fold percentage measure of helpfulness, unhelpfulness, and undecidedness (with vote counts available on, say, mouseover). This keeps conformity with user profile rep and puts an upper bound on the score shown on each question or answer. The reason why this is a good thing is that most of the time, to a site user, unbounded scores are simply extra information they won’t really process. The site itself can easily search and present content in order of most highly-voted; but the reader just needs to judge merit on a fixed scale. Anything extra is just cognitive burden on the reader.

So to recap, if we’re to implement an unbiased user profile rep, we need to count the net helpfulness of each contribution once only. But for content scoring, we can count each vote individually. And to present everything in a uniform way and with low cognitive burden, we should show all scores as a three-fold measure of percentage helpfulness, unhelpfulness, and undecidedness.

Once we have this system in place, we can start doing interesting things, like automatically closing questions which sink below a threshold of, say, 10% helpfulness (they’ll start out with 100% helpfulness because the original asker’s vote is automatically counted as helpful–otherwise they wouldn’t have asked). And we can do away with reputation loss from downvoting, since a downvote will usually have no effect on the recipient’s profile rep, and only one unit of negative effect on the contribution being downvoted.

Achieving an unbiased measure of rep is tricky. But I think we can do better than SO’s current system by rebalancing the ‘rewards’ and ‘punishments’, and bounding the primary measures between 0 and 100% helpfulness so that we don’t end up with another Jon Skeet on our hands (I kid–Jon Skeet is great :-)

Dec 9, 2013

Man of Steel

IT TOOK me a while to write about Zack Snyder’s Man of Steel (MoS) because I was trying to articulate what it meant to me. And I think I’ve got it: MoS is our generation’s Superman anthem.

Let me explain. The Donner movies* were an anthem of the previous generation: Clark Kent as the Everyman, Superman as the benevolent big Boy Scout. Snyder has reimagined Clark as an outsider trying to find himself, someone who’s a little lost in the world. And a lot of us can relate to that, especially in this post-recession age.

With MoS, Snyder and Zimmer have quite literally given us an anthem for this era: brash, bold, perhaps worlds-spanning. And with it, there’s the wild element of of danger and uncertainty because Clark still doesn’t have full control over his powers, his emotions, and his moral compass yet.

Speaking of moral compass, I honestly don’t have a problem with the way it ended with Zod. There’s precedent for it in the comics, and I felt it was a nod to that. My problem was with the way they used Metropolis as the Kryptonian battleground. And maybe I’m being overly sentimental here, but I would’ve thought that Clark would try his hardest to keep those things happening to densely-populated urban centres, especially Metropolis. But then again, maybe it just goes to show how we’re not in Kansas any more, in terms of who and what this Superman is. It’s a brave new world.

* I count Superman Returns as one of the Donner movies because it explicitly tried to follow that continuity and approach to Superman/CK.

Nov 2, 2013

Notes on The Master and Margarita

‘MANUSCRIPTS don’t burn’.–Woland, The Master and Margarita

Recently I re-read this classic, long my favourite book, and I re-discovered why that is. It always amazes me how Bulgakov changes his tone and phrasing, here switching to an everyday, very Russian dry humour and wit, and there to an almost science-fiction exposition. Some notes:

‘… Let me see it.’ Woland held out his hand, palm up.

‘Unfortunately, I cannot do that,’ replied the master, ‘because I burned it in the stove.’

‘Forgive me, but I don’t believe you,’ Woland replied, ‘that cannot be: manuscripts don’t burn.’

Ideas: the most incredible and indestructible creation of humankind. Once an idea is created, it can never be destroyed.

‘No, because you’ll be afraid of me. It won’t be very easy for you to look me in the face now that you’ve killed him.’

‘Quiet,’ replied Pilate. ‘Take some money.’

Levi shook his head negatively, and the procurator went on:

‘I know you consider yourself a disciple of Yeshua, but I can tell you that you learned nothing of what he taught you. For if you had, you would certainly take something from me. Bear in mind that before he died he said he did not blame anyone.’ Pilate raised a finger significantly, Pilate’s face was twitching. ‘And he himself would surely have taken something. You are cruel, and he was not cruel….’

How incredible it would be to not be cruel and petty in this world.

‘… You uttered your words as if you don’t acknowledge shadows, or evil either. Kindly consider the question: what would your good do if evil did not exist, and what would the earth look like if shadows disappeared from it? Shadows are cast by objects and people. … Do you want to skin the whole earth, tearing all the trees and living things off it, because of your fantasy of enjoying bare light? You’re a fool.’

Ah, how clean-cut good and evil are to the Devil … on a grand scale, you need evil to balance out the good. Of course, this is ignoring the minutiae … maybe the Devil’s not in the details, but rather in the grand scheme of things?

‘… If it is true that cowardice is the most grievous vice, then the dog at least is not guilty of it. Storms were the only thing the brave dog feared. Well, he who loves must share the lot of the one he loves.’

An echo of the master and Margarita?

Sep 13, 2013

Excel Gotcha–Fractional Numbers

TODAY I learned (the hard way) about a subtle bug in Microsoft Excel. It seems that to the VLOOKUP function looking for a matching value in a range, two equal fractional numbers are not always equal.

Fractional numbers, technically known as floating-point numbers because of the way computers store them internally, sometimes give Excel a headache. If you want to see this for yourself, try out the following exercise:

image

You will find yourself with the following error:

image

Even more nefariously, if you use the range lookup option:

=VLOOKUP(3.3, tblLookup, 2, TRUE)

Excel will give you an actively incorrect result:

image

So, be extremely wary of using fractional numbers as lookup keys in VLOOKUP functions. If you must, then use the techniques described in Microsoft’s Knowledge Base article on this issue.

Dec 10, 2012

Analysing Transit Spending with Presto

ABOUT a year ago roughly I started using a new transit fare payment card called Presto. Presto is a top-up card–you pay as you spend–and it promised to unify fare payments for public transit systems throughout the Greater Toronto Area. By and large, it has lived up to its promise. But it does have its fair share of discontents.

One of the things that I was impressed by when I first got set up and registered on their website was they have a table of raw data on your transit usage–what service was used, where you travelled from, date and time, amount paid, etc. It was limited to only the past three months’ worth of data, though. No problem, I thought. I’ll just log in every three months, copy the accrued data into an Excel file, and save it that way. And for the most part it worked.

With this raw data, I imagined that I’d be able to do analysis on things like my travel patterns, bus timings (e.g. if some are regularly late), and of course how much I’m spending per week/month compared to how much I’d be spending with more conventional tickets or weekly/monthly passes.

I made a mistake, though. When copying and saving the data in a nicely formatted Excel file, I didn’t notice that Excel wasn’t properly understanding the date format (dd/mm/yyyy) used in the data and was converting it into bogus dates, for whatever reason. And so I deleted some (to me) extraneous columns in the data, like another column which contained the month in which the transaction took place. So I ended up not being able to do month-to-month-type analyses on a large subset of the original data.

Having learned my lesson, I’ve taken care to not delete any more columns in the fresh data I’m copying down now–just formatting things like transaction times and months when I’m absolutely sure Excel is understanding them properly.

With the new, better, data I managed to do a spending analysis like I originally wanted. And though I’d been a bit sceptical of the savings with Presto for me personally, I became a believer after seeing the figures below.

Before I get into the analysis though, a quick explanation of exactly how payment with Presto works: when you first buy the card at a transit kiosk, you pay some amount out of pocket for them to load the card with initially. When you board a bus/train, you tap the Presto card on a special reader and it deducts the amount of the fare from the card. Then you periodically top up the card to prevent going down to zero balance. You can set it up to automatically charge your credit card a certain amount when the Presto balance reaches a certain lower threshold that you set. This is really convenient and I’ve set up mine to auto-load $40 when the balances goes down to $20.

One more thing to keep in mind before the analysis: all of these months were more or less full working months for me, in which I worked at least 20 days and therefore took at least 40 transit rides throughout the month.

The Analysis Result

image

The above screenshot shows an Excel PivotTable with one line entry per month. So to go through the figures for September: the ‘E-Purse Fare Payment’ column says I spent $89.96 in transit fare in September; ‘E-Purse Load Value’ says my credit card was charged $80 total throughout September and that balance was added to my Presto card; and the ‘Grand Total’ of -$9.96, being negative, means I actually spent about $10 more than I loaded onto the card that month (since I had some balance left over from August). And so on for the following months.

How does this compare to what I would have spent with the more conventional weekly or monthly passes? Mississauga Transit, or MiWay weekly passes are $29, so for a month’s worth of them I’d have spent roughly $116, give or take a few days. For a monthly pass I’d have spent $120. Now these are basically the most affordable options if you need to ride 40 times throughout the month. You can’t get cheaper than that if you’re working full-time.

Given the above, September was really a very good month–lot of savings. My credit card was charged only $80. October was not so great, but not exactly horrible. Note that my credit card was still only charged $120–the extra $2.80 was from someone who borrowed the card and loaded some money into it by accident. This is no more than a monthly pass, and now let me explain why the spend was greater in October:

image

The above screenshot shows the same PivotTable as before, just broken down by transit system.

The person who borrowed my card spent about $11 on GO Transit, leaving $117 for my regular travel to/from work, and other places. Still competitive with the weekly and monthly passes.

November was competitive at a spend of $114.37–less than the MiWay weekly/monthly passes despite some extra travel on other transit systems. Again, my credit card was charged exactly $120. The remaining $5.63 was carried over as balance into December–something that is simply not possible with monthly/weekly passes.

December is so far so good. Presto has automatically reloaded early in the month, so for now the credit card has been charged a bit more than I’ve spent on transit this month. But that will have evened out by the time December ends.

All in all, a lot of value for money and a bunch of other benefits (check out the Presto website for details).

Process

The steps from raw Presto usage data to finished PivotTable are fairly simple–if you’re an Excel user, you should be able to mostly figure it out. But the really quick summary: log in to the Presto website, go to the transactions page, show all transactions from the past three months, drag-select (with the mouse) the entire transactions table (including column headings) and paste into Excel. Excel should understand the tabular format and get it more or less right. Get rid of all formatting, then convert the range to an Excel table (select any cell in the data, press Ctrl-T). Format the ‘Loyalty Month’ column (should be column H) with the custom ‘number’ format yyyy-mm. This uniquely identifies the month and year. Finally, create a PivotTable from this raw data with the following specifications:

image

Happy analysis Smile

Dec 8, 2012

Tweet from the Browser Address Bar

This will work on Firefox and should also work on Chrome with a little adjustment. You can start posting a tweet straight from the browser address bar instead of having to navigate to the Twitter website and click on the new tweet button.

In Firefox, create a new bookmark with the following properties:

Name: Tweet (or similar)

Location: https://twitter.com/intent/tweet?source=webclient&text=%s

Keyword: tw (or whatever you like)

Description: Tweet something

Now, with this bookmark saved, go to the address bar and type: tw Just testing. Then press Enter. A new tweet composition page should show up with the words Just testing. Finish off the tweet as wanted and click Tweet.

Is the tweet composition page unavoidable? Maybe not, but I don’t see an easy way to tweet directly from the address bar. And maybe that’s for the best–giving you a chance to finalise things before you publish for the world to see.