Google Analytics

Thursday, August 11

Technical Goals for 2011: Mid Year Review

Although we are already 2 months past the middle of the year, I decided to do a "mid year" review of my technical goals for this year to see how am I doing. Here we go:
  • HAML, SAAS & Coffeescript: No progress :(
  • SproutCore & BackBone.js: After a cursory look, I decided to explore Backbone.js for our upcoming project at Pothi.com. Still getting a feel of it.
  • Learn Haskell: Deferred
  • Git: Since Drupal moved to Git, it is hard to ignore it and I am beginning to get familiar with it. But given that we use trac, I don't think we will be leaving SVN anytime soon.
  • Android: No progress
  • SQL: Finally starting to dig into this. Not a very planned effort but in 6 months managed to understand few more nut and bolts of Mysql. Ran a few Explains finally :). Learnt to generate slow queries log and also managed to fix some obnoxious queries sitting in Ubercart.
  • Beautiful Code: I realized that "finishing" beautiful code is not the right way to approach it. I read it one chapter at a time whenever I feel like and also re-read the earlier chapters to better understand what are they saying. Meanwhile instead of "Founders At Work", I have acquired "Hackers & Painters" and will be reading that.
This doesn't look impressive at all. But there are 2 new set of tools that were not in my initial plans that I have come to know and use. One are Job Queues/ Message Queues. Used Celery for building a system for crunching the Streaming API of Twitter. Planning to also check out BeanStalkd and other Job Queues.

The other set is deployment & configuration management Tools like Cap, Fabric, Puppet and Chef. Puppet and Chef were not immediately useful for the task at hand but Fabric and Cap were. I decided to stick with Fabric since it is more generic in nature and it is Python. I am consciously trying to not get into one more language. Want to spend some time to get better with Python.

A related discovery was Vagrant which is a tool for provisioning Virtual Machines using Oracle VirtualBox. Using a combo of VirtualBox and Fabric, I am setting up a "few clicks" server for building and deploying for Pothi.com. Will also use it for all other projects once the basic code base is more stable.

The surprising thing is that looking back at my initial list, I now feel that it was all over the place with no core theme tying everything together. If I somehow manage to get to all of them by the year end, I would have gained familiarity with some good tools but I doubt I would be any good in using them. The tools need to be something that I can get down to do real work with. Also they should be something that improve my work. The things I ended up playing with fit that pattern. They fix some basic loopholes in my tool chain. So keeping that in mind, here is the new list:
  • CoffeeScript & Sass
  • BackBone.js
  • Celery/ MessageQueue
  • Fabric/Vagrant
On a related note, I submitted a talk proposal to Pycon India, 2011 and it has been accepted. The talk is about various ways of implementing Naive Bayes Classifier in Python and comparing their performance and pros & cons. So I will probably be in Pune for 2-3 days in September. This is going to be my first technical talk since grad school. Need to finish the work I want to cover in the talk and then start working on the presentation.

And while I have been busy with this stuff, Jaya has been killing it with Python. She has been super productive since she started using Python couple of months back and has already automated a significant part of our backend operations, thus opening up a lot of bandwidth. If there was any more proof I needed that Python is the one general purpose language everyone should pick up, I now have it! :)

Tuesday, May 10

The saga of plain text password

Recently, one of the major Indian payment gateways, CCAvenue was reported to have been hacked. Medianama has good coverage of it including an interview with the very bureaucratic sounding CEO of the company.

While a payment gateway getting hacked is a big news, the bigger revelation were the clear text passwords that came out of the compromised database. There have been a lot of comments and discussions about this all over the startup blogs. Reading through those comments it appeared to me that there is a lot of confusion regarding passwords and how to securely store and transmit them. Saurabh Nanda has a good little primer about things to read. This is my attempt to clarify some of the things involved.

First a few basics. Any situation that involves passwords has 2 parties. The aim is to establish identity between parties. For simplicity, we will assume that it is the user that wants to establish his identity with the service. Login/Password system works on the basis of a shared secret. You tell the service the correct shared secret and it identifies you. One important thing to remember here is that the secret is being established between the user and the service only and not between the user and the employees of the company providing the service. For example, we would not want a database admin in Google to be able to read all our mails.

On to the specific questions.

Is it ever OK to store password in plain text?

There are 2 scenarios of an application dealing with passwords. One is of a web app like Gmail that allows people to sign up and hence must keep track of login passwords. The other is of intermediate apps like browsers that store users credentials for various services to make it more convenient for users. The answer for the first kind of applications is "never". The answer for the second kind of applications is "if done properly".

Basically, the applications in second category need access to the original password when logging in the user to service provider. So they cannot hash the password and store that. They can certainly encrypt it but that may or may not be any more secure then a plain text password stored properly. Even here, adoption of OAuth is reducing the need to deal with user passwords directly.

What is the difference between Encryption and Hashing?

Encrypted things can be decrypted if the key is available. So it only pushes back the question of security one layer deep. How do you ensure the security of the key? You can encrypt that also but then you need to safeguard the key-key. You get the drift.

Hashing is a one way road. A good hashing function has 2 properties: hash value of 2 different inputs is different. And given the hash value, it is extremely difficult (computationally) to retrieve the input string. Because of the first property, we can safely store the hash value in place of the original password. Whenever we need to match, we can just compute the hash and match it to the stored value. Because of the second property, even if someone gets access to the hash values, they cannot recover the password easily.

Are all hashing functions created equal?

No. In the past decade, the commonly used hash functions like MD5 and SHA-1 have been successfully attacked. So the current recommendation is to use SHA-256, SHA-512. However the suggested best choice for password hashing is bcrypt. The advantage of bcrypt is that you can tune it to be as slow as required. How does that help? Well, it increases the resources required to mount a brute force attack significantly. Normal hash functions are built to be very fast. As a result, attacker can compute hashes of millions of passwords per second. With bcrypt, that number goes down several notches and makes the approach unfeasible.

Why is is OK to receive a one time password in plain text?

Since the service should never store the original password in plain text, if the user forgets the password, the only way out is for him to choose a new password. To allow that the service needs a way to establish his identity. This can be done with the help of security questions. A more popular way is to send a mail to the registered email address of the user.

Now if the reset password email contains your original password in plain text, that is a huge red flag. This means that the site stored your original password. Remember that it is not possible to recover original password from the hash.

But it is ok if the mail contains a one time password in plain text. This password is not meant to be stored and used more than once. So even if someone gets access to it, there is no issue. If on the other hand, if someone intercepts your mail and gets access to it before you do, even a hashed/encrypted password will not make much of a difference. The way to prevent the snooping is to use https for your mail and other sensitive connections.

Perhaps this is nothing new for most people but given that even Reddit guys were found storing plain text passwords, it is always good to double check on your security practices.

Monday, March 28

आर्तनाद


हार कर उठने की क्षमता अब नहीं मुझ में रही,
वो ह्रदय की मधुर ममता अब नहीं मुझमे रही.
अब तो मैं संसार के वारों से होकर छिन्न-भिन्न,
बन गया हूँ रूद्र हिंसक आर्तनादी नरपशु.
अब मेरी सब इन्द्रियां रक्षा में मेरी व्यस्त हैं.

मधुर गुंजन मधुप का गांडीव की टंकार है
दामिनी का दमकना अब युद्ध की ललकार है,
दीख पड़ते हैं मुझे चहुँ ओर अपने शत्रु दल,
सांस की आवाज़ मानो शून्य में चित्कार है.
मान था अभिमन्यु सा, अब द्रोण सुत सी वेदना!

Thursday, March 17

The Saga of Static IP

Recently we decided to get a static IP for our office broadband connection. We are a long time Airtel customer and usually not very annoyed with their service. They are quick to respond to complaints and things mostly work as they should.

We placed a request for a static IP and were told that it would require a 1 hour downtime to set up. 1 hour is no big deal and so we asked them to go ahead. Our connection went out at around 5pm on Thursday evening. Someone was coming to set up the router for static IP.

The person arrived at 9pm and started configuring the router confidently. We were hoping it to be a fairly quick and smooth process but suddenly the disaster struck! The 4-5 steps which he had been taught didn't get the link up. After that it was one hour of him calling various people, trying out some really weird configurations and generally hitting refresh. After struggling with it for 1 hour, he told us that our router did not "support" the static IP. He promised to come back next day morning with another router. However he would not be able to come before 11-12pm. So much for a 1 hour downtime!

Next day morning, we were running the office on our mobile 3G connections and a Reliance Net Connect stick and waiting for him to turn up. When I called him up at 11, he said that he had a meeting in the morning and won't be able to come. He was sending someone else. This other person came around lunch time, managed to get the Internet running but the router he brought was not a wireless router. We had a wireless router earlier and half of our machines run on wireless. We didn't have enough cables to connect all the machines. He again promised to come back with a wireless router soon. I had a hunch that I am making a mistake believing him but had no option. Our office had wires all around now with half of the machines off the network.

When we called him next day, pet came the reply that the wireless router is out of stock. They had no idea when it would be back in stock and when could they give us one. Remember that we have actually paid extra to Airtel to get a wireless router. We lodged another complaint with the customer service. They promised to resolve it by the evening but it was clear from their tone that they really don't consider this a serious problem. As a result of the complaint we again got couple of calls from local guys and they repeated the excuse that the wireless router is not in stock. Now I would have readily believed them if there was even an iota of sincerity in their voice. But it seemed like I was needlessly harassing them for wireless when I should have been thankful that at least my Internet was working!

After realizing that we were not going to get anything useful by breaking our heads with them, we decided to give the wireless router which supposedly didn't "support" static IP, a chance. With some googling and common sense, we had our static IP and wireless running in half an hour! So much pain and frustration for something which should not have been a problem in the first place if only Airtel would have taken time to train their field staff well.

Consider the situation. Airtel probably uses a handful of router models - 5 or may be 10. The 2-3 most basic things to do with a router are setting it up for dynamic address, for static address and setting up the wireless. How difficult is it to equip all their field staff with printed instruction sheets for these basic 3 tasks for these 5-10 models? When taking the request for the static IP, they had asked us the model number of router, so they already had that information. The only thing that the guy needed to do in our case was to delete the old config on the router and create a new one from scratch instead of trying to modify the old one. How difficult is it to mention this one fact in the instruction sheet for our model number? It would have saved them multiple phone calls to support center, multiple field trips and an annoyed customer.

But instead of investing in things like this, they recently invested 100s of crores in changing the logo and brand identity. Somehow they fail to understand that a shiny logo and funky tune cannot make up for such bad service experiences. I can only shake my head in disbelief and frustration!

Saturday, March 12

From Low Priced Editions to Fair Priced Editions

A major group of Indian publishers is up in arms against a proposed amendment to the copyright act of India. Put simply, the said amendment allows for the export of any edition of a title into India even if specific Indian editions are already available. 

There are some genuine points both for and against the issue. However the debate has long since devolved into fear mongering and finger pointing. One of the interesting claim of the publishers is that the said amendment will also legalize the export of Low Prized Editions of text books and technical books back to USA and UK. As a result, publishers in those market are likely to stop giving licenses for LPEs.

I personally think that it is very far fetched. There is enough protection against such imports in USA/UK markets. Some short sighted foreign publishers might pull out but then that should not be the guiding factor of our policies anyway. However the reaction from publishers set me thinking in another direction.

Given its status as the outsourcing hub, a very young population and growing number of people comfortable with reading in English, India is a big market for Technical Books and Text Books. Why is it then that Indian publishers are happy to be the printers of LPEs rather then develop their own titles in this market? It is estimated that 70% of Indian book market is of Text Books. This includes everything from Primary to Higher Education. Most of the text book publishers of India seem to be focused on school segment. The titles that do come out in the Higher Education Segment are not up to the mark - bad quality of writing & bad production value. And I am yet to come across a solid technical book (IT and CS are the areas I can vouch for) by Indian authors, published by an Indian publisher. Most of the known names O'Reilly/Shroff, Pearson, Prentice Hall basically bring out LPEs of titles originally written and produced outside.

With the growing number of good techies in India, there should be no dearth of possible authors in India for technical subjects. Recently one of the startup founders wrote a book on SaaS. There is also an increasing number of open source contributors in India. However, due to the fast changing nature of technology, the technical publishing is also a very quick moving market. To survive it today requires adoption of technology, quick adaptability to market and out of the box thinking. The competition is intense, especially from the increasingly high quality free content available online. Indian authors will typically need more hand-holding as compared to their foreign counterparts. But the size of the opportunity seems to be large enough to be worth the risk.

We can either let someone else do all the hard work and be happy publishing LPEs or we can go out and carve out a piece for ourselves. Then we can throw away this tag of LPE and have our own Fair Priced Editions. Given the amount of changes happening in the publishing industry currently, I believe that there is a window of opportunity here. I just hope that there are people in Indian publishing industry who see the possible threat to LPEs as an opportunity!

Thursday, February 3

Finding Geeks!

That the education system in India is screwed up is no secret. When companies like Infosys, TCS and Wipro say that a large number of engineers graduating every year are unemployable, it sets a pretty low bar on the quality of technical education.

The low standards of technical education are felt most acutely by the entrepreneurs trying to build technology companies in India. On one hand, very few good engineers survive the education system. On the other, the few who survive are claimed by the companies like Google, Yahoo, Amazon and other well established firms.

Now it would be excellent to tackle this problem head on and fix the education system in a fundamental way. However I have been thinking about a much smaller problem recently. As a matter of immediate relief to the high technology startups, is it possible to help more geeks/hackers to survive the current system? What is the most effective way of doing that? Out of that huge swamp of mediocrity, how can we give a helping hand to those who want to excel?

An immediate question arises: why do I even think that something like this can be done? My inspiration, ironically, comes from some of the recent laments about the state of CS education in India and the fact that we have hordes and hordes of students signed up for these courses who do not have any passion or inclination for CS. However, in the true spirit of optimism, I invite you to see this glass as half full instead of half empty.

I spent about 2 years in a grad school in USA. One of the important concerns of the universities there was that fewer and fewer students were opting for computer science in high schools. As a result, the pool from which the top under graduate programs used to get their students was shrinking. One of the reasons was that CS was not thought of as a cool subject anymore. It was associated with the geeks and nerds and with having to sit in a cubicle all day staring into a monitor. Many outreach programs were being run by the universities to reach out to high school students and convey the excitement of CS and show them the cool things they could do with it.

In India, we have exactly the opposite situation. We don't need any outreach programs to convince people to come and try computer science. The number of people attempting JEE is now up to 500,000 from 120,000 about a decade back. Every years thousands of students enroll in the CS programs and lacs in engineerings programs, all across the country. Yes, many of them do it because of the herd mentality. Most of them have no passion for it. But at the end of day, we do get a big pool of people who have signed up for a technical education. Question to ask is, how can we leverage this situation?

Let me first say something about passion. When I joined IIT in 1999, I had almost never touched a computer. I used to find it difficult to control the mouse to click on a link. I had never written a program. I had no idea what computer science was. I opted for it only because people around similar ranks used to do that. 2.5 years later, I qualified for the ACM ICPC World finals. That is because once I took to computers, I loved them. Fortunately, I was in an environment that provided ample opportunities to develop my interests and an awesome set of peers to interact with. ICPC played a major role in motivating me to become better at coding & algorithms. The take away is that passion can arise after having experienced something. If we insisted that only those interested in CS choose CS, in Indian context we would be doing ourselves a big disservice.

Coming back to the big pool of students, one of the ways to hack this problem is building what I will refer to as geek magnets. What is a geek magnet? It is an activity/group which attracts geeks or possible geeks with a lot more force then non-geeks. Two examples are ACM-ICPC & Google Summer of Code (GSoC). Another example is Y Combinator (YC). IITs are NOT an example. Being completely optional is a primary criteria. In fact, an ordinary person should not see any value of participating in it and even if he does, it should be offset by requirement of huge effort on his/her part.

Even inside IIT, participation in ICPC was the single biggest predictor of hackers from non-hackers (in cs). It also attracted students from all over India who were excellent geek material. While some of them come from well known colleges, a lot of them came from not so famous places. Unfortunately, there was no community effort around ICPC and so there is no strong legacy.

GSoC was not around when I was in college otherwise I am pretty sure I would have signed up. Fortunately the program also has a strong community component which helps participants connect even after the actual program is over. The program manages to attract students from every corner of India. Best thing is that once someone from one college has participated, you will find regular participation from that college in coming years.

I believe that creating more such "Geek Magnets" and sending them out fishing in the big pool is one way of finding and nurturing the hackers/geeks hidden in the vast pool of students we currently have. We need ways of pulling out all these minds and give them spaces to connect with each other and with other geeks. It is the peer interaction that they are lacking in their present situation. We need to provide ways for them to try out and experience geeky activities. Fortunately with the wide availability of internet, creating and sustaining such initiatives is possible now.

Here is a more specific example. Many academic conferences in CS run competitions to gauge the state of the art in a chosen focus area. For example, Workshop on Statistical Machine Translation runs a competition to build MT systems every year. Participants are usually the research groups working in the domain. How about a similar but limited scope competition targeted specifically towards Indian students? It will allow them to go beyond their curriculum and experience building a real life system. The submissions can be open sourced and made available for future participants to come and see. Finding a small amount of money to fund the prices should not be a big deal especially when so many companies want to find and recruit these guys.

So to summarize, in my opinion and in Charles Dickens's paraphrased words, "It is the best of times, it is the worst of times..". While having a large unmotivated student crowd in a bad education system is a huge problem, it is also a very fertile ground for designing, executing and testing strategies to identify and nurture future geeks. As entrepreneurs and hackers, it does not suit us to sit on the fence and wait for the govt to build a better education system or for society to change its attitude. The wide internet penetration among the student population provides us with enough leverage to build parallel systems to find those of our kind and help them survive the education system.

So are you up for a little hacking of the education system?

Saturday, January 1

The Technical Resolutions for 2011

When running you own company, every year is full of new challenges and learnings. And so was the 2010. However looking back, I am feeling that I didn't gain so much on the technical side.

Recap

The two major things that happened on Pothi.com this year were e-books and the launch of online distribution. Our e-book platform is in the first iteration and still very primitive. We have identified many issues by now which will get fixed in the installment. Online Distribution required a lot of overhaul to the basic pricing mechanism which meant bunch of new Drupal modules. And as the year ends, I am just about wrapping up the new sales dashboard which was long overdue!

We also worked on couple of interesting projects over the summers with 2 interns both of which were in Python/Django. One of them made further improvements to blog2book platform. It is a shame that it has not been pushed to production yet. Primarily due to the memory troubles we are having on VPS and which I have been unable to resolve with Apache. The other project was also close to my heart but has been deprioritized for the moment. We cannot stretch ourselves too thin. Right? :)

The third set of technical work came from the consulting gig where I am working on some NLP/ML stuff. For the first time, I worked with Lucene and returned to Java after a gap of few years. Lucene is a nice technology and replacing the Google CSE based search on Pothi.com is high on my agenda for the next year. Also did some work on Anaphora resolution which I hope will bear fruits in 2011!

So apart from Lucene, I mostly struck to the well known grounds of Drupal and Django. Not to say that there are no technical challenges in those two but a geek always needs new toys to play with :). So for the next year, I have made a list of technical stuff I want to explore:


HAML, SAAS & CoffeeScript
Basically HAML, SAAS and Coffeescript are better ways of writing HTML, CSS and JS respectively. I have the basic idea about the first 2 but never used them in my projects. This year I will try them out in some live project.


Explore SproutCore, Backbone.js
Some of the coming projects on Pothi.com will be heavy on client side. I have been eyeing both SproutCore and Backbone.js for some time. Not to get my hands dirty :)


Learn Haskell - write a non-trivial application in it
Since I started learning Python last year, my affection for functional programming has been increasing. In fact, I have started using more and more array_reduce, array_map and array_filter combined with closures in PHP as well. Also returning back to Java has only made me more impatient with the verbose languages. So time to dig into some Haskell, I say!


Explore Git. Release at least one open source project on Github
Yes, shame on me. But I am a SVN user and I like Trac. However Git combined with Github is more then a version control system. It is now a social system and I am already feeling left out. Need to got on it asap.


Explore the android platform. Develop at least one app.
I bought a LG One P500 Android phone couple of weeks back. Already addicted to Angry Birds :). But it looks interesting as a platform. Also constrained resources == more fun programming.


Learn SQL better. Learn MySQL better. Learn all about database optimization.
So I have never run a Explain on any sql query. Part of the blam must go to Drupal where I survived 1.5 years and many modules before writing any database queries. But now that I am doing more DB stuff in Drupal also, this will change.


Finish reading Beautiful Code. Read Founders At Work.
Beautiful Code was one of my rare impulse purchase online. Have read few chapters and enjoying it. If I manage to finish it within 2011, I will buy Founders At Work.

I know it doesn't look very ambitious. Most of this stuff is already old and past the cutting edge. But let's say this is the lowest to which the bar is set. Let us see how high can I go.

On this note, I wish all of you a very happy and productive new year! Hack on!