Most commonly used words in Pakistani tweeps profiles

For this analysis, description of about 150,000 Pakistani tweeps was used. Out of 150K only about 77K (about 51%) users have set description field in their twitter profiles.

Excluding punctuations and stopwords, following is the list of most commonly used words by Pakistani tweeps in their profiles.

  1. love
  2. pakistan
  3. student
  4. follow
  5. like
  6. life
  7. pakistani
  8. engineer
  9. cricket
  10. social

Technology used was FreqDist and stopwords of nltk.

Moving my blog from posterous to Pelican

Background

I do not want to start this blog post by bashing Posterous. Posterous is a great blogging tool for quickly making blog posts. A couple of years back, some of its unique features convinced me to move my blog from wordpress to posterous. Posterous offered custom domain name for free whereas wordpress was charging for it. I really liked the email to blog post feature, although I never used it other than testing it a couple of times. Another amazing feature of posterous was detecting and making beautiful widgets for external objects like YouTube, github gist etc.

Posterous provides some nice templates but I wanted to have more control over presentation. A few days back youtube was blocked in Pakistan. Some misconfiguration caused problem in loading other google sites. This affected the sites using resources from google like javascripts for google analytics, google maps etc. Same thing happened with my site. The template, I was using was consuming some resources from google. I don’t know why but it was there and there was no way to remove it. So, the end result was a slowly loading page for Pakistani audience.

Another problem was how posterous modifies the HTML of the blog post. Again, I wanted to have more control on my blog post presentation. Inserting a table in a blog post was a trivial task. The WYSIWYG editor cannot handle table, even if it is copy pasted. I had to manually draft HTML and paste it in HTML part of the WYSIWYG editor. And it gets modified when rendered :-(

Using SSGs

The idea of SSG is amazing. Why do I need a dynamic setup for content which is hardly going to be modified in a month. I tried Jekyll & Pelican and decided to use Pelican. Why Pelican? It was because I am more biades towards Python. Jekyll is an equally good or may be better SSG.

Being a geek, I like writing in plain text editors more than WYSIWYG editors. Writing in Markdown and reStructuredText is fun. One can keep his energies focused on writing rather than formatting the content. My content is saved as content not as HTML markup. It has better revision management using git or any other version control system. This can easily be imported to any other application. The content is saved in files, not in DB. I can write offline and publish when I am online.

I have full control over the page rendered. I can design and optimize it as I want. I do not have to worry about security or scaling as all the content is purely static.

User Experience & Minimalism

I am not a UX expert but I do not want a lot of distractions in my content. Here is what I did to improve UX:

Migration

Jekyll provides a posterous importer but Pelican does not. Currently pelican provides only following imports:

For posterous I had to write my own importer which consumes Posterous API. Here is the code:

def posterous2fields(api_token, email, password):
    """Imports posterous posts"""
    import base64
    from datetime import datetime, timedelta
    import simplejson as json
    import urllib2

    def get_posterous_posts(api_token, email, password, page = 1):
        base64string = base64.encodestring('%s:%s' % (email, password)).replace('\n', '')
        url = "http://posterous.com/api/v2/users/me/sites/primary/posts?api_token=%s&page=%d" % (api_token, page)
        request = urllib2.Request(url)
        request.add_header("Authorization", "Basic %s" % base64string)
        handle = urllib2.urlopen(request)
        posts = json.loads(handle.read())
        return posts

    page = 1
    posts = get_posterous_posts(api_token, email, password, page)
    while len(posts) > 0:
        posts = get_posterous_posts(api_token, email, password, page)
        page += 1

        for post in posts:
            slug = post.get('slug')
            if not slug:
                slug = slugify(post.get('title'))
            tags = [tag.get('name') for tag in post.get('tags')]
            raw_date = post.get('display_date')
            date_object = datetime.strptime(raw_date[:-6], "%Y/%m/%d %H:%M:%S")
            offset = int(raw_date[-5:])
            delta = timedelta(hours = offset / 100)
            date_object -= delta
            date = date_object.strftime("%Y-%m-%d %H:%M")

            yield (post.get('title'), post.get('body_cleaned'), slug, date, 
                post.get('user').get('display_name'), [], tags, "html")

The above code produced pelican fields which can later be passed to fields2pelican which uses pandoc to tranform html content to markdown or reStructuredText.

Deployment

The site is deployed on heroku Cedar Stack which supports Pyhton applications. It is served from great wsgi app called ‘static‘, gunicorn and gevent.

Update: Using my own fork of static for performance tweaks.

110 .PK domains managed by MarkMonitor got hacked by turkish hackers

Google.com.pk hacked screenshot

Here is the list of domain managed by MarkMonitor and have their nameservers pointing to dns2.freehostia.com & dns1.freehostia.com

  1. biofreeze.com.pk
  2. blackstone.pk
  3. blogspot.pk
  4. itunes.pk
  5. gmails.pk
  6. zynga.com.pk
  7. chrome.com.pk
  8. chrome.pk
  9. visa.com.pk
  10. bx.com.pk
  11. abbvie.com.pk
  12. abbvie.pk
  13. cgma.pk
  14. chacos.com.pk
  15. cimacpa.pk
  16. cisco.pk
  17. ciscosystems.pk
  18. blogspot.com.pk
  19. cpacima.pk
  20. cpaintl.pk
  21. cpaldglobal.pk
  22. cpalwglobal.pk
  23. drivealliance.pk
  24. eastman.biz.pk
  25. eastman.net.pk
  26. eastman.org.pk
  27. ebay.pk
  28. monatin.pk
  29. everyblock.pk
  30. youtube.pk
  31. 3com.web.pk
  32. hp.web.pk
  33. revlon.pk
  34. streetwear.pk
  35. windows7.pk
  36. windows8.pk
  37. windowsrt.pk
  38. yahoo.pk
  39. yahoomaktoob.pk
  40. zynga.pk
  41. firstdirect.com.pk
  42. flickr.pk
  43. fordgofurther.pk
  44. gbuzz.pk
  45. gmailbuzz.pk
  46. gmail.pk
  47. googlebrowser.com.pk
  48. google.pk
  49. googlebuzz.pk
  50. googlechrome.com.pk
  51. abbviepharmaceuticals.pk
  52. abbviepharmaceuticals.com.pk
  53. hewlettpackard.pk
  54. hexagon.com.pk
  55. hsbcamanah.biz.pk
  56. hotmail.com.pk
  57. hpcloud.com.pk
  58. hp.com.pk
  59. hpscalene.com.pk
  60. hsbc.biz.pk
  61. hsbcadvance.com.pk
  62. hsbc.pk
  63. hsbcpremier.com.pk
  64. hsbcprivatebank.biz.pk
  65. hsbcamanah.com.pk
  66. hsbcdirect.com.pk
  67. hsbcnet.com.pk
  68. hsbcpremier.biz.pk
  69. hsbcpremier.pk
  70. hsbcprivatebank.com.pk
  71. investdirect.biz.pk
  72. investdirect.com.pk
  73. ipod.pk
  74. jaiku.pk
  75. kellyservices.com.pk
  76. maktoob.pk
  77. markmonitor.pk
  78. microsoftsmartglass.com.pk
  79. microsoftsmartglass.pk
  80. xboxsmartglass.com.pk
  81. xboxsmartglass.pk
  82. msn.org.pk
  83. windowsstore.pk
  84. windowsstore.com.pk
  85. opteron.com.pk
  86. parkplaza.pk
  87. paypal.pk
  88. postini.pk
  89. scalene.com.pk
  90. schwab.biz.pk
  91. schwab.com.pk
  92. sonystyle.com.pk
  93. streetwear.com.pk
  94. theworldslocalbank.com.pk
  95. genapp.pk
  96. genapp.com.pk
  97. generationapp.pk
  98. generationapp.com.pk
  99. windows.com.pk
  100. windows7.com.pk
  101. windows8.com.pk
  102. 3com.biz.pk
  103. 3com.fam.pk
  104. 3com.net.pk
  105. 3com.org.pk
  106. gchrome.com.pk
  107. aicpacima.pk
  108. apple.pk
  109. google.com.pk
  110. microsoft.pk

Twitter users in Pakistan

According to some reports there are about 1.9M twitter users in Pakistan. This was mentioned by someone in #SOCMM12 but there doesn’t seem to be any source of this information.

I had been collecting twitter data for quite some time. Sample data contains more than 100K Pakistani twitter users crawled using twitter API. Only public profiles who have mentioned Pakistan or some pakistani city name in their profile were considered for this analysis. This data contains almost all active tweeple of Pakistan.

Here are the results of data analysis:

Geological location:

About half of Pakistani tweeps live in major cities like Karachi, Lahore and Islamabad/Rawalpindi.

  • 24.5% in Punjab (just 64.4% of them in Lahore, rest in other cities of punjab)
  • 21.6% in Sindh (with 92.6% of them in Karachi)
  • 10.0% in Islamabad
  • 2.6% in KPK (with 58.7% of them in Peshawar)
  • 0.96% in Balochistan
  • 0.45% in Azad Kashmir

Gender:

gender.c, with custom names database of more than 5,000 names, was used on first names in twitter profiles:

Gender of twitter users in Pakistan

Names (first name):

Most common male names:

  • Syed
  • Ali
  • Abdul
  • Usman
  • Imran
  • Bilal
  • Hassan
  • Waqas
  • Mohammad
  • M
  • salman

Most common female names:

  • Ayesha
  • sana
  • Fatima
  • Amna
  • Noor
  • Sara
  • Hina
  • Sarah
  • Maryam
  • Rabia
  • Hira
  • Sidra

Verified Twitter Accounts:

There are only 6 of them:

Who has the most followers in Pakistan?

As you can see from the results, follower count is no longer an influence measure. Our politicians and media person are running after the follower count. To achieve this, they hire services which artificially increase the follower count using some techniques or bogus accounts. By looking at the followers list of any famous politician, you’ll realize that many of the followers are with default dp, no follower, very few baseless statuses and a few tweeple following. I am trying to make a list with klout score of tweeps. This list will be published soon.

Here is a list of top Pakistani tweeple according to their follower count:

Note: The follower count may have increased or decreased by the time you’re reading this. Note2: If you feel that I have missed any person, please feel free to comment and I will update the list.

  1. @ImranKhanPTI - 410807 followers
  2. @KamalFaridi - 361204 followers
  3. @tariqnoorkhan - 264988 followers
  4. @AliZafarsays - 185403 followers
  5. @MubasherLucman - 150700 followers
  6. @ilovebenazir - 137361 followers
  7. @IhtishamKhan4U - 129611 followers
  8. @wasimakramlive - 111193 followers
  9. @fbhutto - 107669 followers
  10. @GizmoCrazed - 96746 followers
  11. @HamidMirGEO - 93936 followers
  12. @twitter_ur - 91618 followers
  13. @wordinvestor - 81079 followers
  14. @marvi_memon - 74384 followers
  15. @farhanmasood - 74039 followers
  16. @najamsethi - 69335 followers
  17. @ReallyVirtual - 67225 followers
  18. @PTIofficial - 57486 followers
  19. @husainhaqqani - 57301 followers
  20. @CMShehbaz - 55129 followers
  21. @sanabucha - 52948 followers
  22. @SenRehmanMalik - 50572 followers
  23. @TaimurAsad - 49346 followers
  24. @sherryrehman - 42597 followers
  25. @Emateinc - 41304 followers
  26. @BakhtawarBZ - 40733 followers
  27. @FarrukhSiddiqui - 38740 followers
  28. @TalatHussain12 - 38517 followers
  29. @AajKamranKhan - 37570 followers
  30. @HassanNisarPK - 36504 followers
  31. @zainshahzad3 - 36220 followers
  32. @DawarAShah - 35550 followers
  33. @PunjabiProblems - 34796 followers
  34. @Nadsms - 34631 followers
  35. @LinkedInExpert1 - 34147 followers
  36. @etribune - 32022 followers
  37. @SocializedWeb - 31959 followers
  38. @tweetfestme - 31134 followers
  39. @Asma_Jahangir - 29912 followers
  40. @BoomPakistan - 29755 followers
  41. @JahanzaibPTI - 29108 followers
  42. @marvisirmed - 28903 followers
  43. @ashraf_chaudhry - 28514 followers
  44. @onlywaqas - 28463 followers
  45. @shehrbanotaseer - 27868 followers
  46. @MaryamNSharif - 27574 followers
  47. @stepan_manukyan - 27404 followers
  48. @ShkhRasheed - 26863 followers
  49. @dawn_com - 26499 followers
  50. @Mehmal - 26293 followers
  51. @NasimZehra - 26064 followers
  52. @sharmilafaruqi - 25424 followers
  53. @NadeemfParacha - 25292 followers

When Pakistanis joined twitter?

Only 7 account were registered in 2006:

  • @vak - Wed Aug 30 07:27:50 +0000 2006
  • @ayazshah - Mon Oct 02 10:53:34 +0000 2006
  • @KO - Tue Oct 10 08:35:25 +0000 2006
  • @efrg - Thu Nov 23 08:32:41 +0000 2006
  • @Nash - Fri Nov 24 20:15:26 +0000 2006
  • @marsonearth - Tue Dec 05 14:03:11 +0000 2006
  • @FreshFarhan - Sat Dec 23 18:36:01 +0000 2006

And then:

  • In 2007, 324 accounts were registered
  • In 2008, 1294 accounts were registered
  • In 2009, 13677 accounts were registered
  • In 2010, 24035 accounts were registered
  • In 2011, 31659 accounts were registered
  • In 2012, 34908 accounts were registered

Who tweets a lot?

Although this information is not important, just sharing it for fun. Most of the accounts in this list are bots. Skipping the bots and sharing the real persons only :-)

Update: See [Part-II]

These governament websites are not safe for your computer

According to www.stopbadware.org and ESET following government websites contain malwares and they are NOT safe for your computer.

Government website backlinks (Pakistan)

Getting a backlinks from government websites has always been a dream of SEO experts. In Pakistan many government websites are linking to a lot of other websites. I surveyed 1072 Government websites of different departments and categorized this into 4 categories:

  1. Some company or freelancer made the website for the department and put a link back to their company website intentionally.
  2. Company or freelancer used some open source software or components and didn’t remove the links. In some cases the theme or template used to develop the government website was a free template linking back to the original developer.
  3. Website working as an online directory and linking to Pakistani banks, newspapers, organizations or telling people that there exist sites like Google, Wikipedia etc.
  4. Although a rare case but in some cases some virus infected website contains links added by the hackers.

Although I have the complete list with me but I will be sharing only a few interesting ones:

Category 1:

  • udb.gov.pk -> http://www.e-creatorz.com/
  • prsp-cmiphc.gov.pk -> http://www.usmangee.20m.com/
  • veharipolice.gov.pk -> http://hifidesigners.com/
  • yasat.gop.pk -> http://www.easy-sol.com
  • pard.gov.pk -> http://www.egravity.net
  • pwd.gok.pk -> http://www.vertexcreations.com
  • qtp.gob.pk -> http://infotechnosolutions.com
  • pqa.gov.pk -> http://www.laksol.com
  • ntc.gov.pk -> http://www.cronomagic.com
  • nha.gov.pk -> http://jicstech.com/main/
  • pitb.gov.pk -> http://shoutbox.ticketmy.com/widget/demo/w-embed
  • punjabpolice.gov.pk -> http://shoutbox.ticketmy.com/widget/demo/w-bottom
  • ombudsmanpunjab.gov.pk -> http://shoutbox.ticketmy.com/widget/demo/w-bottom
  • ppf.gop.pk -> http://shoutbox.ticketmy.com/widget/demo/w-bottom
  • newmurree.gop.pk -> http://www.nexgeninc.com
  • fvdp.gop.pk -> http://www.afcwebs.com
  • balochistan.gov.pk -> http://about.me/jaannaseer
  • bahawalpurpolice.gov.pk -> http://manaz.8m.com/
  • sindhagrimarketing.gov.pk -> http://www.gexton.com
  • shydo.gov.pk -> http://www.solutioners.com.pk
  • sdukp.gov.pk -> http://www.zaamtech.com
  • sccdp.gos.pk -> http://msarfarazkha786.blogspot.com/
  • rescue.gov.pk -> http://www.levantech.com
  • pakboi.gov.pk -> http://www.viperwebsites.com/
  • pabalochistan.gov.pk -> http://www.aesthetictech.net
  • fdma.gov.pk -> http://www.rswebsols.com
  • livestockpunjab.gov.pk -> http://www.121solutionz.com
  • larkana.gov.pk -> http://www.ampleteknologies.com
  • home.gos.pk -> http://www.ampleteknologies.com
  • gsp.gov.pk -> http://www.bohradevelopers.com/
  • expopakistan.gov.pk -> http://a2zcreatorz.com/
  • enercon.gov.pk -> http://www.360technologies.net
  • animalhusbandry.gok.pk -> http://www.gutscheingirl.de
  • cplc-lahore.gop.pk -> http://www.infobytesolutions.com
  • sindhcoal.gos.pk -> http://www.spatsoltech.com
  • customstraining.gov.pk -> http://absorbtechnologies.com/
  • npcih.gov.pk -> http://ahadsol.com/ (The whole site runs under this .gov.pk domain)

Category 2:

  • yasat.gop.pk -> http://www.dynamicdrive.com/forums/
  • pcsir-frc.gov.pk -> http://wordpress.org/
  • suparco.gov.pk -> http://www.opencube.com
  • moicop.gov.pk -> http://www.joomlaworks.gr
  • lyallpurmuseum.gov.pk -> http://wordpress.org/
  • sbi.gos.pk -> http://www.american-chillers.com
  • lrh.gov.pk -> http://www.freshjoomlatemplates.com
  • punjabfoodauthority.gov.pk -> http://www.rockettheme.com/
  • afic.gov.pk -> http://www.dhtml-menu-builder.com
  • pmic.gov.pk -> http://demo.icetheme.com/it_icemag
  • urbandirectorate.gos.pk -> http://joomla-extensions.kubik-rubik.de/
  • fcbalochistan.gov.pk -> http://www.joomla.org/
  • lieda.gov.pk -> http://www.freewebsitetemplates.com
  • ajkepa.gov.pk -> http://www.zootemplate.com
  • fic.gop.pk -> http://wowslider.com
  • lyallpurmuseum.gov.pk -> http://www.caretakerjobs.net/
  • lyallpurmuseum.gov.pk -> http://www.louisianamatch.com/
  • lyallpurmuseum.gov.pk -> http://www.californiamatch.com/meet/san-francisco-singles
  • lyallpurmuseum.gov.pk -> http://www.certifiedpublicaccountants.com/florida

Category 3:

  • vehari.gov.pk -> http://wp.me/p18wNc-aJ
  • vehari.gov.pk -> http://www.standardchartered.com.pk/
  • rtokarachi.gov.pk -> http://www.thefreedictionary.com
  • ltulahore.gov.pk -> http://en.wikipedia.org/
  • tmkhan.gos.pk -> http://www.paperpk.com/
  • larkano.gov.pk -> http://hotmail.com
  • home.gos.pk -> http://www.hotmail.com

Category 4:

  • qtp.gob.pk -> http://www.toponlinepoker.org/
  • lrh.gov.pk -> http://pokerfreaks.net/
  • pap.gov.pk -> http://www.g-ksa.net/vb
  • pap.gov.pk -> http://www.girls-top.net
  • pap.gov.pk -> http://bnat-games.girls-top.net
  • pap.gov.pk -> http://chat.te3p-qlbe.net
  • pap.gov.pk -> http://www.chat-qloob.com
  • pap.gov.pk -> http://www.girls-top.net
  • pap.gov.pk -> http://www.g-ksa.net
  • pap.gov.pk -> http://www.upg-ksa.com
  • pap.gov.pk -> http://www.up-5.net
  • pap.gov.pk -> http://www.bn00.com
  • pap.gov.pk -> http://www.i00p.com
  • pap.gov.pk -> http://daleel.girls-top.net/newlink.html
  • ntcip.gov.pk -> http://www.zettu.net
  • ntcip.gov.pk -> http://www.filme-porno.cc
  • ntcip.gov.pk -> http://www.filmele-online.org
  • ntcip.gov.pk -> http://www.download-muzica.org
  • ntcip.gov.pk -> http://www.logibic.ro
  • ntcip.gov.pk -> http://www.porneata.com
  • ntcip.gov.pk -> http://ujocuri.ro
  • ntcip.gov.pk -> http://www.shadowaura.com/
  • healthkp.gov.pk -> http://www.960watch.com
  • healthkp.gov.pk -> http://www.airforce1fashion.com/air-force-1-classic-low-c-237.html
  • healthkp.gov.pk -> http://www.nikedunkshow.com/