At last I have managed to get Google Page Speed Score of 99 and YSlow
score of 97 for this blog. As mentioned earlier, this blog is generated
using Pelican and deployed on heroku Cedar Stack which
supports Python applications. It is served from great wsgi app called
‘static‘, gunicorn and gevent. I had to make a lot of
changes in static to make it possible.
As we are serving static content, there is no need to compress the content with
each and every request. We can have gzipped content generated along with the
other static content and serve it when requested. This approach, in my opinion,
is faster than on-the-fly gzip compression used by nginx and apache. We can
save CPU time used to compress the content with each request. I used
gzip_cache plugin to generate the gzipped version of all my content. Next
step was to serve this static content when requested. Static does not support
this by default. I had to modify it a little bit. It tries to find the
gzipped copy of the content, if gzipped content request is received.
This is purely handled by the HTTP Server serving the content. Again I had to
make a few changes in static to enable caching. I tried to keep the
syntax similar to Apache’s ExpiresByType. Expire time can be specified in
seconds against each mime type.
Again this is purely handled by the HTTP Server and I had to make a few
changes in static to make it possible. Just like Expires headers, I
tried to keep the syntax similar to apache’s AddCharset. Charset can be
set for filename patterns.
Using assets plugin to combine and minify resources which further
uses web assets. This is done offline, so no minification & combining
overhead here.
Lossless compression of images was done using jpegtran and optipng.
This task was automated by writing a pelican plugin. Again, done offline,
so no CPU needed to serve optimized images.
This blog template was designed using twitter bootstrap and lots
of custom css. Even after combining and minification, the size was 130KB. I
used mincss to find unused css and remove it. Now the CSS is just 14KB
(4KB gzipped). I had to re-add some styles which were used on other pages.
Once again, done offline and at design time only.
What’s still missing?
Specify image dimensions
Being responsive design, it is not possible to send all images with image
dimensions specified. The images resize themselves according to the screen
size. Although, we could use some javascript to determine screen size and
resize images accordingly, but this would have its own overheads.
Leverage browser caching for external resources
This blog uses only one external resource ga.js, which is the javascript
file used by Google Analytics. It comes with Expires headers of 12 hours.
There has been a lot of discussion about caching and serving it from one’s own
servers but I guess anything like this would be overkill. ga.js is so
common, that it is probably downloaded by some other website already.
Using CDN for static content
This task is in my todo and I am still looking for a good (preferably free) CDN.
This blog post is continuation of Part-I.
The sample data is increased to 150K Pakistani tweeps now.
Followers
Follower count is no longer a good influence measure. On average each
Pakistani tweep gets followed by 129 users. Majority of Pakistanis
(about 3/4th) have less than 50 followers. Half of Pakistani twitter
users have less than 10 followers. There are about 10,000 tweeps with
no follower and about 12,000 tweeps with single follower. This is a
very strange trend. If you look deeply into these accounts, you’ll notice
that most of them are with default DP and default background. It seems like
these are fake accounts, created by social media cells of different political
parties to increase follower count of their leaders on twitter.
On the other side, there are just 24 Pakistani’s with more than 50,000
followers. Most of them are politicians and TV anchors. Just 2331
tweeps have more than 1000 followers.

Klout
Klout is more reliable social media influence measure. Out of 150,000 Pakistani
tweeps about 40,000 do not have any klout score. About 70,000 have their klout
between 11-20. Average klout score is 16.72. About 12,000 have the minimum
possible score 10.
Only 22 users have scored above 70 score.

Here is the list of most influential Pakistanis (klout: 70+)
Note: This score may have changed when you’re reading this article.
For this analysis, description of about 150,000 Pakistani tweeps was
used. Out of 150K only about 77K (about 51%) users have set description
field in their twitter profiles.
Excluding punctuations and stopwords, following is the list of most commonly
used words by Pakistani tweeps in their profiles.
- love
- pakistan
- student
- follow
- like
- life
- pakistani
- engineer
- cricket
- social
Technology used was FreqDist and stopwords of nltk.
Background
I do not want to start this blog post by bashing Posterous.
Posterous is a great blogging tool for quickly making blog posts.
A couple of years back, some of its unique features convinced me to move
my blog from wordpress to posterous. Posterous offered custom domain name for
free whereas wordpress was charging for it. I really liked the email to blog
post feature, although I never used it other than testing it a couple of times.
Another amazing feature of posterous was detecting and making beautiful
widgets for external objects like YouTube, github gist etc.
Posterous provides some nice templates but I wanted to have more control over
presentation. A few days back youtube was blocked in Pakistan. Some
misconfiguration caused problem in loading other google sites. This affected
the sites using resources from google like javascripts for google analytics,
google maps etc. Same thing happened with my site. The template, I was using
was consuming some resources from google. I don’t know why but it was there
and there was no way to remove it. So, the end result was a slowly loading page
for Pakistani audience.
Another problem was how posterous modifies the HTML of the blog post. Again, I
wanted to have more control on my blog post presentation. Inserting a table in
a blog post was a trivial task. The WYSIWYG editor cannot handle table, even if
it is copy pasted. I had to manually draft HTML and paste it in HTML part of
the WYSIWYG editor. And it gets modified when rendered :-(
Using SSGs
The idea of SSG is amazing. Why do I need a dynamic
setup for content which is hardly going to be modified in a month. I tried
Jekyll & Pelican and decided to use Pelican. Why Pelican? It was because of
my biasedness towards Python. Jekyll is an equally good or may be better SSG.
Being a geek, I like writing in plain text editors more than WYSIWYG editors.
Writing in Markdown and reStructuredText is fun. One can keep his energies
focused on writing rather than formatting the content. My content is saved
as content not as HTML markup. It has better revision mangement using git or
any other version control system. This can easily be imported to any other
application. The content is saved in files, not in DB. I can write offline
and publish when I am online.
I have full control over the page rendered. I can design and optimize it
as I want. I do not have to worry about security or scaling as all the content
is purely static.
User Experience & Minimalism
I am not a UX expert but I do not want a lot of distractions in my content.
Here is what I did to improve UX:
- Removed comments, users can tweet the feedback.
- No facebook like or twitter tweet button.
- No tags, category or author name with wach post.
- Using grey instead of pure black for text.
- Worked on typography
- Using typogrify
Migration
Jekyll provides a posterous importer but Pelican does not. Currently pelican
provides only following imports:
- WordPress
- Dotclear
- RSS/Atom feed
For posterous I had to write my own importer which consumes Posterous API.
Here is the code:
def posterous2fields(api_token, email, password):
"""Imports posterous posts"""
import base64
from datetime import datetime, timedelta
import simplejson as json
import urllib2
def get_posterous_posts(api_token, email, password, page = 1):
base64string = base64.encodestring('%s:%s' % (email, password)).replace('\n', '')
url = "http://posterous.com/api/v2/users/me/sites/primary/posts?api_token=%s&page=%d" % (api_token, page)
request = urllib2.Request(url)
request.add_header("Authorization", "Basic %s" % base64string)
handle = urllib2.urlopen(request)
posts = json.loads(handle.read())
return posts
page = 1
posts = get_posterous_posts(api_token, email, password, page)
while len(posts) > 0:
posts = get_posterous_posts(api_token, email, password, page)
page += 1
for post in posts:
slug = post.get('slug')
if not slug:
slug = slugify(post.get('title'))
tags = [tag.get('name') for tag in post.get('tags')]
raw_date = post.get('display_date')
date_object = datetime.strptime(raw_date[:-6], "%Y/%m/%d %H:%M:%S")
offset = int(raw_date[-5:])
delta = timedelta(hours = offset / 100)
date_object -= delta
date = date_object.strftime("%Y-%m-%d %H:%M")
yield (post.get('title'), post.get('body_cleaned'), slug, date,
post.get('user').get('display_name'), [], tags, "html")
The above code produced pelican fields which can later be passed to
fields2pelican which uses pandoc to tranform html content
to markdown or reStructuredText.
Deployment
The site is deployed on heroku Cedar Stack which supports Pyhton
applications. It is served from great wsgi app called ‘static‘, gunicorn
and gevent.
Update: Using my own fork of static for performance tweaks.
Here is the list of domain managed by MarkMonitor and have their
nameservers pointing to dns2.freehostia.com & dns1.freehostia.com
- biofreeze.com.pk
- blackstone.pk
- blogspot.pk
- itunes.pk
- gmails.pk
- zynga.com.pk
- chrome.com.pk
- chrome.pk
- visa.com.pk
- bx.com.pk
- abbvie.com.pk
- abbvie.pk
- cgma.pk
- chacos.com.pk
- cimacpa.pk
- cisco.pk
- ciscosystems.pk
- blogspot.com.pk
- cpacima.pk
- cpaintl.pk
- cpaldglobal.pk
- cpalwglobal.pk
- drivealliance.pk
- eastman.biz.pk
- eastman.net.pk
- eastman.org.pk
- ebay.pk
- monatin.pk
- everyblock.pk
- youtube.pk
- 3com.web.pk
- hp.web.pk
- revlon.pk
- streetwear.pk
- windows7.pk
- windows8.pk
- windowsrt.pk
- yahoo.pk
- yahoomaktoob.pk
- zynga.pk
- firstdirect.com.pk
- flickr.pk
- fordgofurther.pk
- gbuzz.pk
- gmailbuzz.pk
- gmail.pk
- googlebrowser.com.pk
- google.pk
- googlebuzz.pk
- googlechrome.com.pk
- abbviepharmaceuticals.pk
- abbviepharmaceuticals.com.pk
- hewlettpackard.pk
- hexagon.com.pk
- hsbcamanah.biz.pk
- hotmail.com.pk
- hpcloud.com.pk
- hp.com.pk
- hpscalene.com.pk
- hsbc.biz.pk
- hsbcadvance.com.pk
- hsbc.pk
- hsbcpremier.com.pk
- hsbcprivatebank.biz.pk
- hsbcamanah.com.pk
- hsbcdirect.com.pk
- hsbcnet.com.pk
- hsbcpremier.biz.pk
- hsbcpremier.pk
- hsbcprivatebank.com.pk
- investdirect.biz.pk
- investdirect.com.pk
- ipod.pk
- jaiku.pk
- kellyservices.com.pk
- maktoob.pk
- markmonitor.pk
- microsoftsmartglass.com.pk
- microsoftsmartglass.pk
- xboxsmartglass.com.pk
- xboxsmartglass.pk
- msn.org.pk
- windowsstore.pk
- windowsstore.com.pk
- opteron.com.pk
- parkplaza.pk
- paypal.pk
- postini.pk
- scalene.com.pk
- schwab.biz.pk
- schwab.com.pk
- sonystyle.com.pk
- streetwear.com.pk
- theworldslocalbank.com.pk
- genapp.pk
- genapp.com.pk
- generationapp.pk
- generationapp.com.pk
- windows.com.pk
- windows7.com.pk
- windows8.com.pk
- 3com.biz.pk
- 3com.fam.pk
- 3com.net.pk
- 3com.org.pk
- gchrome.com.pk
- aicpacima.pk
- apple.pk
- google.com.pk
- microsoft.pk
According to some reports there are about 1.9M twitter users in
Pakistan. This was mentioned by someone in #SOCMM12 but there doesn’t
seem to be any source of this information.
I had been collecting twitter data for quite some time. Sample data
contains more than 100K Pakistani twitter users crawled using twitter
API. Only public profiles who have mentioned Pakistan or some pakistani
city name in their profile were considered for this analysis. This data
contains almost all active tweeple of Pakistan.
Here are the results of data analysis:
Geological location:
About half of Pakistani tweeps live in major cities like Karachi, Lahore
and Islamabad/Rawalpindi.
- 24.5% in Punjab (just 64.4% of them in Lahore, rest in other cities
of punjab)
- 21.6% in Sindh (with 92.6% of them in Karachi)
- 10.0% in Islamabad
- 2.6% in KPK (with 58.7% of them in Peshawar)
- 0.96% in Balochistan
- 0.45% in Azad Kashmir
Gender:
gender.c, with custom names database of more than 5,000 names, was
used on first names in twitter profiles:
Names (first name):
Most common male names:
- Syed
- Ali
- Abdul
- Usman
- Imran
- Bilal
- Hassan
- Waqas
- Mohammad
- M
- salman
Most common female names:
- Ayesha
- sana
- Fatima
- Amna
- Noor
- Sara
- Hina
- Sarah
- Maryam
- Rabia
- Hira
- Sidra
According to www.stopbadware.org and ESET following government
websites contain malwares and they are NOT safe for your computer.
- phc.gos.pk (Shaheed Benazir Bhutto Housing Cell)
- gulbergtownlahore.gov.pk (Gulberg Town Lahore)
- wasafaisalabad.gop.pk (WASA Faisalabad)
- moip.gov.pk (Ministry of Industries)
- khushabpolice.gov.pk (Khushab Police)
- sped.gos.pk (Special Education Department, Government of Sindh)
- fatada.gov.pk (FATA Development Authority)
- bisp.gov.pk (Benazir Income Support Programme)
Getting a backlinks from government websites has always been a dream of
SEO experts. In Pakistan many government websites are linking to a lot
of other websites. I surveyed 1072 Government websites of different
departments and categorized this into 4 categories:
- Some company or freelancer made the website for the department and
put a link back to their company website intentionally.
- Company or freelancer used some open source software or components
and didn’t remove the links. In some cases the theme or template used
to develop the government website was a free template linking back to
the original developer.
- Website working as an online directory and linking to Pakistani
banks, newspapers, organizations or telling people that there exist
sites like Google, Wikipedia etc.
- Although a rare case but in some cases some virus infected website
contains links added by the hackers.
Although I have the complete list with me but I will be sharing only a
few interesting ones:
Category 1:
- udb.gov.pk -> http://www.e-creatorz.com/
- prsp-cmiphc.gov.pk -> http://www.usmangee.20m.com/
- veharipolice.gov.pk -> http://hifidesigners.com/
- yasat.gop.pk -> http://www.easy-sol.com
- pard.gov.pk -> http://www.egravity.net
- pwd.gok.pk -> http://www.vertexcreations.com
- qtp.gob.pk -> http://infotechnosolutions.com
- pqa.gov.pk -> http://www.laksol.com
- ntc.gov.pk -> http://www.cronomagic.com
- nha.gov.pk -> http://jicstech.com/main/
- pitb.gov.pk -> http://shoutbox.ticketmy.com/widget/demo/w-embed
- punjabpolice.gov.pk ->
http://shoutbox.ticketmy.com/widget/demo/w-bottom
- ombudsmanpunjab.gov.pk ->
http://shoutbox.ticketmy.com/widget/demo/w-bottom
- ppf.gop.pk -> http://shoutbox.ticketmy.com/widget/demo/w-bottom
- newmurree.gop.pk -> http://www.nexgeninc.com
- fvdp.gop.pk -> http://www.afcwebs.com
- balochistan.gov.pk -> http://about.me/jaannaseer
- bahawalpurpolice.gov.pk -> http://manaz.8m.com/
- sindhagrimarketing.gov.pk -> http://www.gexton.com
- shydo.gov.pk -> http://www.solutioners.com.pk
- sdukp.gov.pk -> http://www.zaamtech.com
- sccdp.gos.pk -> http://msarfarazkha786.blogspot.com/
- rescue.gov.pk -> http://www.levantech.com
- pakboi.gov.pk -> http://www.viperwebsites.com/
- pabalochistan.gov.pk -> http://www.aesthetictech.net
- fdma.gov.pk -> http://www.rswebsols.com
- livestockpunjab.gov.pk -> http://www.121solutionz.com
- larkana.gov.pk -> http://www.ampleteknologies.com
- home.gos.pk -> http://www.ampleteknologies.com
- gsp.gov.pk -> http://www.bohradevelopers.com/
- expopakistan.gov.pk -> http://a2zcreatorz.com/
- enercon.gov.pk -> http://www.360technologies.net
- animalhusbandry.gok.pk -> http://www.gutscheingirl.de
- cplc-lahore.gop.pk -> http://www.infobytesolutions.com
- sindhcoal.gos.pk -> http://www.spatsoltech.com
- customstraining.gov.pk -> http://absorbtechnologies.com/
- npcih.gov.pk -> http://ahadsol.com/ (The whole site runs under
this .gov.pk domain)
Category 2:
- yasat.gop.pk -> http://www.dynamicdrive.com/forums/
- pcsir-frc.gov.pk -> http://wordpress.org/
- suparco.gov.pk -> http://www.opencube.com
- moicop.gov.pk -> http://www.joomlaworks.gr
- lyallpurmuseum.gov.pk -> http://wordpress.org/
- sbi.gos.pk -> http://www.american-chillers.com
- lrh.gov.pk -> http://www.freshjoomlatemplates.com
- punjabfoodauthority.gov.pk -> http://www.rockettheme.com/
- afic.gov.pk -> http://www.dhtml-menu-builder.com
- pmic.gov.pk -> http://demo.icetheme.com/it_icemag
- urbandirectorate.gos.pk ->
http://joomla-extensions.kubik-rubik.de/
- fcbalochistan.gov.pk -> http://www.joomla.org/
- lieda.gov.pk -> http://www.freewebsitetemplates.com
- ajkepa.gov.pk -> http://www.zootemplate.com
- fic.gop.pk -> http://wowslider.com
- lyallpurmuseum.gov.pk -> http://www.caretakerjobs.net/
- lyallpurmuseum.gov.pk -> http://www.louisianamatch.com/
- lyallpurmuseum.gov.pk ->
http://www.californiamatch.com/meet/san-francisco-singles
- lyallpurmuseum.gov.pk ->
http://www.certifiedpublicaccountants.com/florida
Category 3:
- vehari.gov.pk -> http://wp.me/p18wNc-aJ
- vehari.gov.pk -> http://www.standardchartered.com.pk/
- rtokarachi.gov.pk -> http://www.thefreedictionary.com
- ltulahore.gov.pk -> http://en.wikipedia.org/
- tmkhan.gos.pk -> http://www.paperpk.com/
- larkano.gov.pk -> http://hotmail.com
- home.gos.pk -> http://www.hotmail.com
Category 4:
- qtp.gob.pk -> http://www.toponlinepoker.org/
- lrh.gov.pk -> http://pokerfreaks.net/
- pap.gov.pk -> http://www.g-ksa.net/vb
- pap.gov.pk -> http://www.girls-top.net
- pap.gov.pk -> http://bnat-games.girls-top.net
- pap.gov.pk -> http://chat.te3p-qlbe.net
- pap.gov.pk -> http://www.chat-qloob.com
- pap.gov.pk -> http://www.girls-top.net
- pap.gov.pk -> http://www.g-ksa.net
- pap.gov.pk -> http://www.upg-ksa.com
- pap.gov.pk -> http://www.up-5.net
- pap.gov.pk -> http://www.bn00.com
- pap.gov.pk -> http://www.i00p.com
- pap.gov.pk -> http://daleel.girls-top.net/newlink.html
- ntcip.gov.pk -> http://www.zettu.net
- ntcip.gov.pk -> http://www.filme-porno.cc
- ntcip.gov.pk -> http://www.filmele-online.org
- ntcip.gov.pk -> http://www.download-muzica.org
- ntcip.gov.pk -> http://www.logibic.ro
- ntcip.gov.pk -> http://www.porneata.com
- ntcip.gov.pk -> http://ujocuri.ro
- ntcip.gov.pk -> http://www.shadowaura.com/
- healthkp.gov.pk -> http://www.960watch.com
- healthkp.gov.pk ->
http://www.airforce1fashion.com/air-force-1-classic-low-c-237.html
- healthkp.gov.pk -> http://www.nikedunkshow.com/
There are about 1000 domains registered by different departments of
Government of Pakistan. Most of them seem to follow no usability
guideline. Although, a lot of websites need to improve their usability
but this blog post is dedicated to the usability of domain names only.
By looking at these domains it looks like some random guy randomly
chooses domain name for the department. There are domains like
rdcgp.gov.pk, febgif.gov.pk, qmcbvh.gop.pk There are a lot other domains
which seem like random characters.
There is a very nice article on usability of domains
(http://www.usability.gov/articles/newsletter/pubs/032007news.html).
A few of the characteristics of usable domain name are:
- short (12 characters or less)
- guessable
- easy to spell
- easy to type
- easy to say and pronounce
- memorable
Unfortunately, there are a very few Government domains which have all of
these characteristics. As mentioned before most of the domains are
abbreviations or short forms of the department they represent.
Most of them use the name of province in them, when they could have used
the ccTLD of their own province. e.g. advocategeneralsindh.gos.pk could
have been advocategeneral.gos.pk, financedeptsindh.gov.pk could have
been finance.gos.pk Why do they need to mention dept in domain name?
chiefministerpunjab.gov.pk could have been chiefminister.gop.pk or
cm.gop.pk or both
Many use punctuations in domain names making it harder to remember. e.g.
sindh-katchiabadies.gov.pk, minister-wpcc.gov.pk
More than 20% domains are longer than 12 characters. e.g.
khyberpakhtunkhwapolice.gov.pk could have been police.gkp.pk,
infokhyberpakhtunkhwa.gov.pk, visitgilgitbaltistan.gov.pk
I don’t know why but most of the government websites do not open without
appending www. before their domain name.
There needs to some government or semi-government agency which defines
guidelines for the usable domain names and helps the department in
choosing a usable domain name.
NADRA’s CNIC (Computerized National Identity Card), NICOP (National
Identity Card for Overseas Pakistanis) and POC (Pakistan Origin Card)
contain a 2D barcode. This 2D barcode contains a lot of information
about the card bearer.
This 2D barcode is built over an ISO standard. This barcode can be read
by a 2D barcode reader device made for this purpose or by processing the
image of scanned copy of the card. I don’t know how, but someone managed
to put an ad of a CNIC reader, with NADRA written on it, online on OLX
(http://karachi.olx.com.pk/bar-code-reader-nadra-iid-261512018). One
can easily find barcode readers for this ISO standard on alibaba or ebay
for 150USD - 300USD. Here is the information encoded in the barcode:
- Receipt number (Parchi Number) which includes timestamp of your
visit to NADRA’s office
- CNIC Number
- Family Number (Khandan Number)
- Date of Birth
After this there is some gibberish which I was able to decode easily.
This encoded information contains following information in Urdu:
- Full Name
- Father’s Name
- Full Address with District and Tehsil information
I did this decoding only for research/educational purposes. I am not
sure if its legal or not, so I am not publishing decoding details here.
What is the utility?
- Healthcare: Patient comes with his CNIC, receptionist scans the card
and all history/lab reports are available immediately.
- Security: We normally see a guard at the entrance of corporate
offices or some residential areas sitting with a register in his
hand, writing down name/address of all visitors. We can reduce the
time and effort of this whole process.
- Anywhere else, where user’s information is entered manually.
Note
Removed the barcode standard name for security reasons