It was a Saturday morning of November 2012 when I started observing tweets
about Google Pakistan and Microsoft Pakistan websites getting hacked. I
immediately checked both websites and they were really showing a message
from some Turkish hacker. I did nslookup and nameservers were changed to some
free hosting service provider. Obviously, Google and Microsoft were not hosting
their websites on a free webhost. Actually they were not the only ones who were
hacked, it was PKNIC. I quickly did a reverse whois, randomly checked a few of
them. All of them were showing the same page. There were 284 domains pointing
to those specific nameservers. What? 284 domains hacked and people are talking
about just 2 domains. This must be a mega news. I quickly tweeted this:
The tweet went viral and picked up by many news agencies and blogs. There are
still many tweets in twitter search results:
Many referred me and many presented it without mentioning the reference pretending it as their own news.
Here are some of them:
- “it appears that 279 other sites in Pakistan were hacked by a group that appears to be Turkish and calls itself Eboz. Little else is known about Eboz” Techcrunch
- “Google, Apple, eBay and Yahoo were among almost 300 sites affected by a hack attack in Pakistan.” BBC
- “including google.com.pk, apple.pk, microsoft.pk and yahoo.pk. 284 sites were affected in total.” Slashdot
- “284 Pakistani domain names reportedly hijacked, affecting Google, Apple, and Microsoft” The Verge
- “Eboz has hacked over 284 .PK TLD’s this morning, and some of them are major websites like Google.com.pk, Apple.pk, PayPal.pk” gadgec
- “Google’s Pakistan site, 277 others hacked by Turkish hacker group Eboz” first post
- “Today could be the biggest event of the year in Pakistan, due to a change in the DNS entries for 284 Pakistani domains managed by MarkMonitor.” neowin
- “Microsoft.pk and 284 Other .PK Domains Get Hacked” PTE TECH
- “Yes, Google.Com.PK along with 284 other .PK domains were hacked today” Pro Pakistani
- “Yes, google.com.pk, along with 284 other .pk domains, was hacked today, reported Propakistani, a technology blog based in Islamabad.” Tribune Pakistan
- “A total number of 258 web pages with ‘pk’ domain names, managed by MarkMonitor, such as ‘.com.pk’, ‘.pk’ and ‘org.pk’ were hijacked on 23 November” New Europe
And some blogs & news sites in other languages which I don’t understand:
- “Πάνω από 280 δημοφιλή web sites στο Πακιστάν, έπεσαν θύματα τούρκων hackers, μεταξύ αυτών και δημοφιλείς υπηρεσίες όπως οι πακιστανικές σελίδες των Apple, Google, Microsoft και Sony.” PC Magazine Greece
- מבוכה גדולה לענקיות האינטרנט: יותר מ-280 שמות דומיין פקיסטניים פופולארים (pk.), נפרצו אמש (שבת) מסיבות שאינן ברורות עדיין. Geek Time Israel
Not only this, the 284 figure was also published by print media. Here is a
news item from The News Pakistan (By Pakistan’s largest newspaper group):
So, as you can see that each and every news site and blog was after the news and
everyone was publishing it in his own words. What went wrong here? Did
anyone ask any of these blogs or news site for a list of 284 domains hacked?
Did they publish such a list?
The confession part
I tweeted and went for my breakfast. After having the breakfast I decided
to publish the list of these hacked domains. As I started reviewing the hacked
domains list, I noticed that I made a big mistake while counting hacked
domains. There were 2 name servers pointing to that specific free hosting
provider and I counted all the domains pointing to any of those 2 name
servers. So actually, there were just 142 domains each one counted twice.
Now I was extra careful before publishing anything. I checked the name
server change history of all of those domains and noticed that only 110 were
changed in last 24 hours. What about rest of the 32 domains pointing to that
specific name server? All of them were showing real websites hosted by that
free hosting provider and they were not hacked. I verified twice and published
the list here. My blog was getting a huge traffic spike at that time. A
lot of news sites and blogs picked up the list immediately and updated their
news articles. This is how the online news world works. They pick up the news
items from whatever source they can get it and publish it immediately without
At last I have managed to get Google Page Speed Score of 99 and YSlow
score of 97 for this blog. As mentioned earlier, this blog is generated
using Pelican and deployed on heroku Cedar Stack which
supports Python applications. It is served from great wsgi app called
‘static‘, gunicorn and gevent. I had to make a lot of
changes in static to make it possible.
As we are serving static content, there is no need to compress the content with
each and every request. We can have gzipped content generated along with the
other static content and serve it when requested. This approach, in my opinion,
is faster than on-the-fly gzip compression used by nginx and apache. We can
save CPU time used to compress the content with each request. I used
gzip_cache plugin to generate the gzipped version of all my content. Next
step was to serve this static content when requested. Static does not support
this by default. I had to modify it a little bit. It tries to find the
gzipped copy of the content, if gzipped content request is received.
This is purely handled by the HTTP Server serving the content. Again I had to
make a few changes in static to enable caching. I tried to keep the
syntax similar to Apache’s
ExpiresByType. Expire time can be specified in
seconds against each mime type.
Again this is purely handled by the HTTP Server and I had to make a few
changes in static to make it possible. Just like Expires headers, I
tried to keep the syntax similar to apache’s
AddCharset. Charset can be
set for filename patterns.
Using assets plugin to combine and minify resources which further
uses web assets. This is done offline, so no minification & combining
Lossless compression of images was done using jpegtran and optipng.
This task was automated by writing a pelican plugin. Again, done offline,
so no CPU needed to serve optimized images.
This blog template was designed using twitter bootstrap and lots
of custom css. Even after combining and minification, the size was 130KB. I
used mincss to find unused css and remove it. Now the CSS is just 14KB
(4KB gzipped). I had to re-add some styles which were used on other pages.
Once again, done offline and at design time only.
What’s still missing?
Specify image dimensions
Being responsive design, it is not possible to send all images with image
dimensions specified. The images resize themselves according to the screen
resize images accordingly, but this would have its own overheads.
Leverage browser caching for external resources
file used by Google Analytics. It comes with Expires headers of 12 hours.
There has been a lot of discussion about caching and serving it from one’s own
servers but I guess anything like this would be overkill. ga.js is so
common, that it is probably downloaded by some other website already.
Using CDN for static content
This task is in my todo and I am still looking for a good (preferably free) CDN.
This blog post is continuation of Part-I.
The sample data is increased to 150K Pakistani tweeps now.
Follower count is no longer a good influence measure. On average each
Pakistani tweep gets followed by 129 users. Majority of Pakistanis
(about 3/4th) have less than 50 followers. Half of Pakistani twitter
users have less than 10 followers. There are about 10,000 tweeps with
no follower and about 12,000 tweeps with single follower. This is a
very strange trend. If you look deeply into these accounts, you’ll notice
that most of them are with default DP and default background. It seems like
these are fake accounts, created by social media cells of different political
parties to increase follower count of their leaders on twitter.
On the other side, there are just 24 Pakistani’s with more than 50,000
followers. Most of them are politicians and TV anchors. Just 2331
tweeps have more than 1000 followers.
Klout is more reliable social media influence measure. Out of 150,000 Pakistani
tweeps about 40,000 do not have any klout score. About 70,000 have their klout
between 11-20. Average klout score is 16.72. About 12,000 have the minimum
possible score 10.
Only 22 users have scored above 70 score.
Here is the list of most influential Pakistanis (klout: 70+)
Note: This score may have changed when you’re reading this article.
For this analysis, description of about 150,000 Pakistani tweeps was
used. Out of 150K only about 77K (about 51%) users have set description
field in their twitter profiles.
Excluding punctuations and stopwords, following is the list of most commonly
used words by Pakistani tweeps in their profiles.
Technology used was FreqDist and stopwords of nltk.
I do not want to start this blog post by bashing Posterous.
Posterous is a great blogging tool for quickly making blog posts.
A couple of years back, some of its unique features convinced me to move
my blog from wordpress to posterous. Posterous offered custom domain name for
free whereas wordpress was charging for it. I really liked the email to blog
post feature, although I never used it other than testing it a couple of times.
Another amazing feature of posterous was detecting and making beautiful
widgets for external objects like YouTube, github gist etc.
Posterous provides some nice templates but I wanted to have more control over
presentation. A few days back youtube was blocked in Pakistan. Some
misconfiguration caused problem in loading other google sites. This affected
google maps etc. Same thing happened with my site. The template, I was using
was consuming some resources from google. I don’t know why but it was there
and there was no way to remove it. So, the end result was a slowly loading page
for Pakistani audience.
Another problem was how posterous modifies the HTML of the blog post. Again, I
wanted to have more control on my blog post presentation. Inserting a table in
a blog post was a trivial task. The WYSIWYG editor cannot handle table, even if
it is copy pasted. I had to manually draft HTML and paste it in HTML part of
the WYSIWYG editor. And it gets modified when rendered :-(
The idea of SSG is amazing. Why do I need a dynamic
setup for content which is hardly going to be modified in a month. I tried
Jekyll & Pelican and decided to use Pelican. Why Pelican? It was because of
my biasedness towards Python. Jekyll is an equally good or may be better SSG.
Being a geek, I like writing in plain text editors more than WYSIWYG editors.
Writing in Markdown and reStructuredText is fun. One can keep his energies
focused on writing rather than formatting the content. My content is saved
as content not as HTML markup. It has better revision mangement using git or
any other version control system. This can easily be imported to any other
application. The content is saved in files, not in DB. I can write offline
and publish when I am online.
I have full control over the page rendered. I can design and optimize it
as I want. I do not have to worry about security or scaling as all the content
is purely static.
User Experience & Minimalism
I am not a UX expert but I do not want a lot of distractions in my content.
Here is what I did to improve UX:
- Removed comments, users can tweet the feedback.
- No facebook like or twitter tweet button.
- No tags, category or author name with wach post.
- Using grey instead of pure black for text.
- Worked on typography
- Using typogrify
Jekyll provides a posterous importer but Pelican does not. Currently pelican
provides only following imports:
- RSS/Atom feed
For posterous I had to write my own importer which consumes Posterous API.
Here is the code:
def posterous2fields(api_token, email, password):
"""Imports posterous posts"""
from datetime import datetime, timedelta
import simplejson as json
def get_posterous_posts(api_token, email, password, page = 1):
base64string = base64.encodestring('%s:%s' % (email, password)).replace('\n', '')
url = "http://posterous.com/api/v2/users/me/sites/primary/posts?api_token=%s&page=%d" % (api_token, page)
request = urllib2.Request(url)
request.add_header("Authorization", "Basic %s" % base64string)
handle = urllib2.urlopen(request)
posts = json.loads(handle.read())
page = 1
posts = get_posterous_posts(api_token, email, password, page)
while len(posts) > 0:
posts = get_posterous_posts(api_token, email, password, page)
page += 1
for post in posts:
slug = post.get('slug')
if not slug:
slug = slugify(post.get('title'))
tags = [tag.get('name') for tag in post.get('tags')]
raw_date = post.get('display_date')
date_object = datetime.strptime(raw_date[:-6], "%Y/%m/%d %H:%M:%S")
offset = int(raw_date[-5:])
delta = timedelta(hours = offset / 100)
date_object -= delta
date = date_object.strftime("%Y-%m-%d %H:%M")
yield (post.get('title'), post.get('body_cleaned'), slug, date,
post.get('user').get('display_name'), , tags, "html")
The above code produced pelican fields which can later be passed to
fields2pelican which uses pandoc to tranform html content
to markdown or reStructuredText.
The site is deployed on heroku Cedar Stack which supports Pyhton
applications. It is served from great wsgi app called ‘static‘, gunicorn
Update: Using my own fork of static for performance tweaks.
Here is the list of domain managed by MarkMonitor and have their
nameservers pointing to dns2.freehostia.com & dns1.freehostia.com
According to some reports there are about 1.9M twitter users in
Pakistan. This was mentioned by someone in #SOCMM12 but there doesn’t
seem to be any source of this information.
I had been collecting twitter data for quite some time. Sample data
contains more than 100K Pakistani twitter users crawled using twitter
API. Only public profiles who have mentioned Pakistan or some pakistani
city name in their profile were considered for this analysis. This data
contains almost all active tweeple of Pakistan.
Here are the results of data analysis:
About half of Pakistani tweeps live in major cities like Karachi, Lahore
- 24.5% in Punjab (just 64.4% of them in Lahore, rest in other cities
- 21.6% in Sindh (with 92.6% of them in Karachi)
- 10.0% in Islamabad
- 2.6% in KPK (with 58.7% of them in Peshawar)
- 0.96% in Balochistan
- 0.45% in Azad Kashmir
gender.c, with custom names database of more than 5,000 names, was
used on first names in twitter profiles:
Names (first name):
Most common male names:
Most common female names:
According to www.stopbadware.org and ESET following government
websites contain malwares and they are NOT safe for your computer.
- phc.gos.pk (Shaheed Benazir Bhutto Housing Cell)
- gulbergtownlahore.gov.pk (Gulberg Town Lahore)
- wasafaisalabad.gop.pk (WASA Faisalabad)
- moip.gov.pk (Ministry of Industries)
- khushabpolice.gov.pk (Khushab Police)
- sped.gos.pk (Special Education Department, Government of Sindh)
- fatada.gov.pk (FATA Development Authority)
- bisp.gov.pk (Benazir Income Support Programme)
Getting a backlinks from government websites has always been a dream of
SEO experts. In Pakistan many government websites are linking to a lot
of other websites. I surveyed 1072 Government websites of different
departments and categorized this into 4 categories:
- Some company or freelancer made the website for the department and
put a link back to their company website intentionally.
- Company or freelancer used some open source software or components
and didn’t remove the links. In some cases the theme or template used
to develop the government website was a free template linking back to
the original developer.
- Website working as an online directory and linking to Pakistani
banks, newspapers, organizations or telling people that there exist
sites like Google, Wikipedia etc.
- Although a rare case but in some cases some virus infected website
contains links added by the hackers.
Although I have the complete list with me but I will be sharing only a
few interesting ones:
- udb.gov.pk -> http://www.e-creatorz.com/
- prsp-cmiphc.gov.pk -> http://www.usmangee.20m.com/
- veharipolice.gov.pk -> http://hifidesigners.com/
- yasat.gop.pk -> http://www.easy-sol.com
- pard.gov.pk -> http://www.egravity.net
- pwd.gok.pk -> http://www.vertexcreations.com
- qtp.gob.pk -> http://infotechnosolutions.com
- pqa.gov.pk -> http://www.laksol.com
- ntc.gov.pk -> http://www.cronomagic.com
- nha.gov.pk -> http://jicstech.com/main/
- pitb.gov.pk -> http://shoutbox.ticketmy.com/widget/demo/w-embed
- punjabpolice.gov.pk ->
- ombudsmanpunjab.gov.pk ->
- ppf.gop.pk -> http://shoutbox.ticketmy.com/widget/demo/w-bottom
- newmurree.gop.pk -> http://www.nexgeninc.com
- fvdp.gop.pk -> http://www.afcwebs.com
- balochistan.gov.pk -> http://about.me/jaannaseer
- bahawalpurpolice.gov.pk -> http://manaz.8m.com/
- sindhagrimarketing.gov.pk -> http://www.gexton.com
- shydo.gov.pk -> http://www.solutioners.com.pk
- sdukp.gov.pk -> http://www.zaamtech.com
- sccdp.gos.pk -> http://msarfarazkha786.blogspot.com/
- rescue.gov.pk -> http://www.levantech.com
- pakboi.gov.pk -> http://www.viperwebsites.com/
- pabalochistan.gov.pk -> http://www.aesthetictech.net
- fdma.gov.pk -> http://www.rswebsols.com
- livestockpunjab.gov.pk -> http://www.121solutionz.com
- larkana.gov.pk -> http://www.ampleteknologies.com
- home.gos.pk -> http://www.ampleteknologies.com
- gsp.gov.pk -> http://www.bohradevelopers.com/
- expopakistan.gov.pk -> http://a2zcreatorz.com/
- enercon.gov.pk -> http://www.360technologies.net
- animalhusbandry.gok.pk -> http://www.gutscheingirl.de
- cplc-lahore.gop.pk -> http://www.infobytesolutions.com
- sindhcoal.gos.pk -> http://www.spatsoltech.com
- customstraining.gov.pk -> http://absorbtechnologies.com/
- npcih.gov.pk -> http://ahadsol.com/ (The whole site runs under
this .gov.pk domain)
- yasat.gop.pk -> http://www.dynamicdrive.com/forums/
- pcsir-frc.gov.pk -> http://wordpress.org/
- suparco.gov.pk -> http://www.opencube.com
- moicop.gov.pk -> http://www.joomlaworks.gr
- lyallpurmuseum.gov.pk -> http://wordpress.org/
- sbi.gos.pk -> http://www.american-chillers.com
- lrh.gov.pk -> http://www.freshjoomlatemplates.com
- punjabfoodauthority.gov.pk -> http://www.rockettheme.com/
- afic.gov.pk -> http://www.dhtml-menu-builder.com
- pmic.gov.pk -> http://demo.icetheme.com/it_icemag
- urbandirectorate.gos.pk ->
- fcbalochistan.gov.pk -> http://www.joomla.org/
- lieda.gov.pk -> http://www.freewebsitetemplates.com
- ajkepa.gov.pk -> http://www.zootemplate.com
- fic.gop.pk -> http://wowslider.com
- lyallpurmuseum.gov.pk -> http://www.caretakerjobs.net/
- lyallpurmuseum.gov.pk -> http://www.louisianamatch.com/
- lyallpurmuseum.gov.pk ->
- lyallpurmuseum.gov.pk ->
- vehari.gov.pk -> http://wp.me/p18wNc-aJ
- vehari.gov.pk -> http://www.standardchartered.com.pk/
- rtokarachi.gov.pk -> http://www.thefreedictionary.com
- ltulahore.gov.pk -> http://en.wikipedia.org/
- tmkhan.gos.pk -> http://www.paperpk.com/
- larkano.gov.pk -> http://hotmail.com
- home.gos.pk -> http://www.hotmail.com
- qtp.gob.pk -> http://www.toponlinepoker.org/
- lrh.gov.pk -> http://pokerfreaks.net/
- pap.gov.pk -> http://www.g-ksa.net/vb
- pap.gov.pk -> http://www.girls-top.net
- pap.gov.pk -> http://bnat-games.girls-top.net
- pap.gov.pk -> http://chat.te3p-qlbe.net
- pap.gov.pk -> http://www.chat-qloob.com
- pap.gov.pk -> http://www.girls-top.net
- pap.gov.pk -> http://www.g-ksa.net
- pap.gov.pk -> http://www.upg-ksa.com
- pap.gov.pk -> http://www.up-5.net
- pap.gov.pk -> http://www.bn00.com
- pap.gov.pk -> http://www.i00p.com
- pap.gov.pk -> http://daleel.girls-top.net/newlink.html
- ntcip.gov.pk -> http://www.zettu.net
- ntcip.gov.pk -> http://www.filme-porno.cc
- ntcip.gov.pk -> http://www.filmele-online.org
- ntcip.gov.pk -> http://www.download-muzica.org
- ntcip.gov.pk -> http://www.logibic.ro
- ntcip.gov.pk -> http://www.porneata.com
- ntcip.gov.pk -> http://ujocuri.ro
- ntcip.gov.pk -> http://www.shadowaura.com/
- healthkp.gov.pk -> http://www.960watch.com
- healthkp.gov.pk ->
- healthkp.gov.pk -> http://www.nikedunkshow.com/
There are about 1000 domains registered by different departments of
Government of Pakistan. Most of them seem to follow no usability
guideline. Although, a lot of websites need to improve their usability
but this blog post is dedicated to the usability of domain names only.
By looking at these domains it looks like some random guy randomly
chooses domain name for the department. There are domains like
rdcgp.gov.pk, febgif.gov.pk, qmcbvh.gop.pk There are a lot other domains
which seem like random characters.
There is a very nice article on usability of domains
A few of the characteristics of usable domain name are:
- short (12 characters or less)
- easy to spell
- easy to type
- easy to say and pronounce
Unfortunately, there are a very few Government domains which have all of
these characteristics. As mentioned before most of the domains are
abbreviations or short forms of the department they represent.
Most of them use the name of province in them, when they could have used
the ccTLD of their own province. e.g. advocategeneralsindh.gos.pk could
have been advocategeneral.gos.pk, financedeptsindh.gov.pk could have
been finance.gos.pk Why do they need to mention dept in domain name?
chiefministerpunjab.gov.pk could have been chiefminister.gop.pk or
cm.gop.pk or both
Many use punctuations in domain names making it harder to remember. e.g.
More than 20% domains are longer than 12 characters. e.g.
khyberpakhtunkhwapolice.gov.pk could have been police.gkp.pk,
I don’t know why but most of the government websites do not open without
appending www. before their domain name.
There needs to some government or semi-government agency which defines
guidelines for the usable domain names and helps the department in
choosing a usable domain name.