Friday, February 4, 2011

The War Between Search Engine Giants: Google and Bing

The search engine industry is on a roll these days with a hot war ensuing between the two search giants: Bing and Google. Did I say war, yes you got that right!!! Last week, the two companies were literally engaged in a heated discussion over Twitter with accusations being hurled from Google of Bing copying its results.

Here's the link to official view of Google: Google's statement. Google performed some experiments that lead it to the conclusion that Bing results come directly from Google, and whoa that led to an outrage in the search engine industry. All this was soon followed by a defense from Microsoft claiming to set the record straight. Well that was achieved for a while because Microsoft had some convincing arguments suggesting that use of click through data does not amount to copying results, which as any search engine expert may know does make sense for many SIGIR, ECIR, WWW, WSDM papers do that too. It's in simple words a learning activity performed to improve the relevance of search results. Here are the punch lines by Bing:

"We do look at anonymous click stream data as one of more than a thousand inputs into our ranking algorithm. We learn from our customers as they traverse the web, a common practice in helping to improve a wide array of online services. We have been clear about this for a couple of years (see Directions on Microsoft report, June 15, 2009)."

It seemed that it had all ended but Matt Cutts does not want to let go off it that easily, and his latest post on the subject includes a forty-minute video with some good and well, not that good points. Here's the video:

The fight seems to not end that easily but the search engine industry now stands at a crucial point with experts of the field divided between pro-Google and pro-Bing. One thing's for sure and that is, the war between these corporate giants will give us many interesting insights from point of view of how search engine industry works. Feel free to add any comment/point-of-view.

Sunday, February 28, 2010

[Video]: Perspective of Search Engines on Web Spamming

Following is the talk given as keynote speech in Web 2.0 Expo by Matt Cutts on "What Google Knows About Spam"

Thursday, January 14, 2010

Microsoft caught on violating open source codes

Why smart people prefer open source? Learn from double faced King of proprietory solutions Microsoft. They follow what they themselves never practice and what they really practice but never accept. Micrsoft CEO, Steve Ballmer called Linux a cancer[1] while the same Cancer is now known and proven to be flowing into the veins of Microsoft[2], now people like me are really wondering how Steve Ballmer will save Microsoft from his self claimed Cancer. The story was exposed when Microsoft was caught red handed when while packing a fast release for Window USB/DVD Tool (WUDT) by ripping off code licensed under GPL[3][4].
If you think there is only one story of Microsoft , then have a look at this story, where Microsoft was caught using Linux device drivers (network)[5][6]
Though this article of mine is written in a funny way but reality of such crime could not be explained any better at this point, for comments and feedback readers are encourged to share their views.

Friday, January 8, 2010

Learning from the Civilization of Giants: Beyond Google's Philosophy

Google's PageRank has revolutionized the world of search engines bringing new dimensions to the problem of finding web pages' importance, but over the years the research community has identified some problems in the famous PageRank algorithm: the problem of Spamming.

Spamming has been an issue of interest from much part of history and it is only Islam that managed its teachings well proven against spamming and spammers... Example: Quran(Holy book of Islam) is not my example as it is protected by Allah (no human involvement) but Ahadith(traditions of Muhammad RasulAllah SAW) and its science of preservation makes it a unique part of text that has been managed successfully over a period of thousand years. Its way of categorization and purifying best of knowledge for humans to come is an exceptional work of Muslim scholars and maintainers (the philosophy to maintain a balance is a deep, innovative and powerful science which was once successfully practiced in the history of the world). Therefore it can serve as a foundation for "modern information retrieval well protected against spammers."

If you still don't get me, here is a piece of real deal: Bible, Torah and other scripts could not manage to survive against spammers but Ahadith through its unique science managed itself in a better way....Now recall we are computer scientists, aren't we and this world of World Wide Web suffers the same old problem in search results: i.e. problem of untrusted information....

As Google Guys say, the foundation of their idea of ranking comes from voting (some overstate it to say it as democratic process while I disagree, [1]), I am shocked that researchers in the field of information retrieval are not aware of the master piece produced by civilization of Islam otherwise they would not have met the ghost of Web Spamming in 2005 like they did. Yes its still an area of active research as the problem has not been solved...but there is really no harm of pride to learn the trick from greatest civilization(Islam)[2] but if that's not the case then I am sure it will take another Islamic civilization to settle things for good.... Western adopted philosophical ideas are too out-dated to handle this research problem and can't breath a new life into it any more because they have reached a point where settling with patch work one after another is considered as a norm...... at least a powerful survey before announcing "patch is our remaining option" should be exercised. In the information retrieval research area there is only very limited literature available as it's an evolving field and it is due to this reason that I feel that the world of science should not be biased to ignore the greatest civilization of history and their contributions[2] otherwise that would lead towards "reinvention of the wheel".

Here I want to bring attention towards one person who shared something similar to my opinion:she was ex-CEO of Hewlett Packard(HP) Carly Fiorina and in her talk "TECHNOLOGY, BUSINESS AND OUR WAY OF LIFE: WHAT'S NEXT"[3] she brings out this point of learning from Islamic civilization and following is an extract from her talk:

I’ll end by telling a story.

There was once a civilization that was the greatest in the world.

It was able to create a continental super-state that stretched from ocean to ocean, and from northern climes to tropics and deserts. Within its dominion lived hundreds of millions of people, of different creeds and ethnic origins.

One of its languages became the universal language of much of the world, the bridge between the peoples of a hundred lands. Its armies were made up of people of many nationalities, and its military protection allowed a degree of peace and prosperity that had never been known. The reach of this civilization’s commerce extended from Latin America to China, and everywhere in between.

And this civilization was driven more than anything, by invention. Its architects designed buildings that defied gravity. Its mathematicians created the algebra and algorithms that would enable the building of computers, and the creation of encryption. Its doctors examined the human body, and found new cures for disease. Its astronomers looked into the heavens, named the stars, and paved the way for space travel and exploration.

Its writers created thousands of stories. Stories of courage, romance and magic. Its poets wrote of love, when others before them were too steeped in fear to think of such things.

When other nations were afraid of ideas, this civilization thrived on them, and kept them alive. When censors threatened to wipe out knowledge from past civilizations, this civilization kept the knowledge alive, and passed it on to others.

While modern Western civilization shares many of these traits, the civilization I’m talking about was the Islamic world from the year 800 to 1600, which included the Ottoman Empire and the courts of Baghdad, Damascus and Cairo, and enlightened rulers like Suleiman the Magnificent.

Although we are often unaware of our indebtedness to this other civilization, its gifts are very much a part of our heritage. The technology industry would not exist without the contributions of Arab mathematicians. Sufi poet-philosophers like Rumi challenged our notions of self and truth. Leaders like Suleiman contributed to our notions of tolerance and civic leadership.

And perhaps we can learn a lesson from his example: It was leadership based on meritocracy, not inheritance. It was leadership that harnessed the full capabilities of a very diverse population–that included Christianity, Islamic, and Jewish traditions.

This kind of enlightened leadership — leadership that nurtured culture, sustainability, diversity and courage — led to 800 years of invention and prosperity.

Muslims lost their civilization and their system of Caliphate by overlooking the fundamentals of its civilization and then finally they lost their say in this world. However this should not cause the scientists to ignore the past contributions of Islamic civilization.

In this world many civilizations have passed and so will the current civilizations perish, but language of rational is always respected amongst sensible people.


Monday, November 23, 2009

[24-Nov-2009] Pic of the day: Chrome's mission: Making Windows obsolete

Video of demo:



Some people are already convinced that Google will fail with its Chrome operating system. Others think that Chrome can't possibly be a threat to Windows. Both groups are so, so wrong.

First, for those who think that Chrome is simply a failure from the word "go", their reasoning is pathetically flawed. They argue that Chrome will fail because it's based on Linux. What century are these people from?

The specific complaints, such as "From power management to display support, Linux has long been a minefield of buggy code and half-baked device driver implementations." reveal that they're coming from people who know nothing whatsoever about Linux. Linux is tried and proven.

You don't have to believe me, though. Just look at the world around you. Linux rules on devices from your TiVo DVR to your Droid smartphone to you name it. Linux kicks rump and takes names on supercomputers, where nothing else is even competitive. And Linux rules stock markets, where failure is never an option.

The only place where Linux hasn't been a strong competitor has been on the desktop. There are many reasons why desktop Linux hasn't done well: number one has been Microsoft's desktop monopoly. With Google's backing, however, Chrome avoids the Linux desktop's real problems.

The other compliant, that somehow the Web interface isn't sufficient, also flies in the face of reality. Google has been showing us for years now that almost everything you can do on a computer, you can do with a Web interface. So what if the interface itself isn't groundbreaking?

What is revolutionary is that Google isn't trying to fight with Microsoft in a mano-a-mano battle for the desktop. No one, especially not Google, is claiming that Chrome OS is a direct competitor to Windows 7. At the high end, where power users use applications like Autodesk or Photoshop, Chrome simply won't play.

Instead, Google is saying that, for most users, most of the time, Windows is obsolete. And it's not just Windows: Google is telling us that we don't need Office, Outlook, and all the other day-in, day-out Windows applications, either.

Google suggests that inexpensive Chrome OS devices, not Windows PCs, are all that most people need for most of their home and office computing. With Chrome OS devices and Web-based services, you won't need to pay theWindows tax or buy Microsoft Office.

It's a radical approach. Google is saying: sure, go ahead and use Windows where you have to — but keep in mind that, for your second computer, or if you don't need high-end PC-specific applications, Chrome OS is all you'll need.

I can see this working. Chrome OS is faster, safer and cheaper. In addition, unlike Windows PCs, Chrome laptops won't require monthly maintenance to keep them running well. In short, Google is trying to make Windows, and all the software that goes with it, obsolete for most users, most of the time.

I like this plan — I like this plan a lot. Rather than trying to take Windows head on, Google is using 21st century technology to reinvent the desktop operating system and question just how important the 1980s style desktop is today. You'll know it's working even before the first Chrome OS netbooks appear if Microsoft revamps Windows 7 Starter Edition to make it more fully functional and cheaper. Keep your eyes on Chrome OS and Microsoft's reactions against it. I'll be very interested to see how this plays out.

Friday, November 13, 2009

[13-Nov-2009] Tech Talk of the Day: Google File System A Critical Analysis

Google File System: A Critical Analysis

Who does not know Google? It will not be wrong to say that Google has become a vital need in today’s information age. But have you ever wondered on the driving force behind Google, what is it that makes Google stand out as Google? There are many answers to this question but one most prominent research by the Google engineers emerged in 2003 by the name of Google File System, the paper of which was presented in the famous SOSP Conference of 2003.

GFS is a distributed file system highly customized for Google's computing needs and clusters composed of thousands of commodity disks. GFS uses a simple master-server architecture based on replication and auto-recovery for reliability, and designed for high aggregate throughput. The file system is proprietary and has been used to serve Google’s unique application workloads and data processing needs.

Why GFS?
Traditional file systems are not suitable for the scale at which Google generates and processes data: multi gigabyte files are common. Google also utilizes inexpensive commodity storage, which makes component failures all the more common. Google's data update patterns are specific, and most of the updates append data to the end of the file. Traditional file systems do not guarantee consistency in the face of multiple concurrent updates, whereas using locks to achieve consistency hampers scalability by becoming a concurrency bottleneck.

GFS Details
The diagram below presents the fundamental architecture of Google File System:

A GFS cluster consists of a single master and multiple chunkservers and is accessed by multiple clients. Files are divided into fixed-size chunks and each chunk is identified by a chunk handle. Large chunk size is chosen for better performance. The master maintains all file system metadata. This includes the namespace, access control information, the mapping from files to chunks, and the current locations of chunks. It also controls system-wide activities such as chunk lease management, garbage collection of orphaned chunks, and chunk migration between chunkservers. The master stores three major types of metadata: the file and chunknamespaces, the mapping from files to chunks, and the locations of each chunk’s replicas. All metadata is kept in the master’s memory. The first two types (namespaces and file-to-chunkma pping) are also kept persistent by logging mutations to an operation log stored on the master’s local disk and replicated on remote machines. The master does not store chunk location information persistently. GFS client code linked into each application implements the file system API and communicates with the master and chunkservers to read or write data on behalf of the application. Clients interact with the master for metadata operations, but all data-bearing communication goes directly to the chunkserver.

Permissions for operations are handled by a system of time-limited, expiring "leases", where the Master server grants permission to a process for a finite period of time during which no other process will be granted permission by the Master server to access the chunk. The modified chunkserver, which is always the primary chunk holder, then propagates the changes to the chunkservers with the backup copies. The changes are not saved until all chunkservers acknowledge, thus guaranteeing the completion and atomicity of the operation.
Programs access the chunks by first querying the Master server for the locations of the desired chunks; if the chunks are not being operated on (if there are no outstanding leases), the Master replies with the locations, and the program then contacts and receives the data from the chunkserver directly (similar to Kazaa and its supernodes).

Critical Analysis

1) The authors and developers of Google File System make trade offs aggressively to their advantage. Unfortunately, the only other people in the world who could benefit from these decisions were other people at Google, or perhaps their direct competitors (and not for long, it appears).

2) The chunkservers run the file system as user-level server processes and are less efficient than implementing file system directly in the kernel to improve performance.

3) Most consistency checks are pushed to the application and it needs to maintain ids/checksums to ensure that the records are consistent. Google built not only the file system but also all of the applications running on top of it. While adjustments were continually made in GFS to make it more accommodating to all the new use cases, the applications themselves were also developed with the various strengths and weaknesses of GFS in mind, and this approach makes the life of the application developer quite difficult.

4) Clients that cache chunk locations could potentially read from a stale replica.

5) One flaw of the design is the decision to have a single master, which limits the availability of the system. Although the writers argue that a takeover can happen within seconds, I believe that the most important implication is that a failed master might mean that some operations are lost, if they have not been recorded in the log. Relying on a quorum among multiple masters seems a straightforward extension and can provide better performance.

Friday, October 2, 2009

[01-Oct-2009] Interview of the Day: Future of Programming

Sorry for the delay in posting, being a computer science researcher myself I had certain tasks to accomplish so was a bit off my blog. But now I am back and this time with a whole new series of "Interviews."

Today I share an interview by inventor of Arc: Paul Graham from ACM student magazine. But this series does not stop here, I will publish interviews of Computer Science students, industry professionals, researchers to share their viewpoints and experiences to contribute for the Computer Science community as a whole.

Paul Graham was co-founder of Viaweb, the first ASP; discovered the algorithm that inspired the current generation of spam filters, is co-founder of Y Combinator, a new seed venture firm, started the Spam Conference and the Startup School, is working on a new Lisp dialect called Arc, wrote two books on Lisp and a book of essays called Hackers & Painters, and is writing a new book about startups. He has a PhD in CS from Harvard and studied painting at RISD and the Accademia in Florence.

In the following interview Graham discusses the future of programming, outsourcing, and Y Combinator.

Where do you see programming as a discipline in five, ten, or twenty years?

I think in the future programmers will increasingly use dynamic languages. You already see this now: everyone seems to be migrating to Ruby, which is more or less Lisp minus macros. And Perl 6, from n what I've heard, seems to be even more Lisplike. It's even going to have continuations.

Another trend I expect to see a lot of is Web-based applications. Microsoft managed to keep a lid on these for a surprisingly long time, by controlling the browser and making sure it couldn't do much. But now the genie is out of the bottle, and it's not going back in.

I don't think even now Microsoft realizes the danger they're in. They're worrying about Google. And they should. But they should worry even more about thousands of twenty year old hackers writing Ajax applications. Desktop software is going to become increasingly irrelevant.

What has your experience developing a new programming language, Arc, been like?

Interrupted. I haven't spent much time on it lately. Part of the problem is that I decided on an overambitious way of doing it. I'm going back to McCarthy's original axiomatic approach. The defining feature of Lisp, in his 1960 paper, was that it could be written in itself. The language spec wasn't a bunch of words. It was code.

Of course as soon as his grad students got hold of this theoretical construct and turned it into an actual programming language, that plan came to a halt. It had to, with the hardware available then. But with the much faster hardware we have now, you could have working code as the entire language spec.

I hope to get back to work on Arc soon. One of the reasons Y Combinator operates in 3-month cycles is that it leaves me some time to work on other stuff. (The other is that it's actually the right way to do seed investing.)

What is starting a startup incubator like?

Y Combinator is not really an incubator. Incubators interfere a lot in the startups they fund, even to the point of making you work in their building (which is where the name "incubator" comes from). I think the reason we get called an incubator is that we fund startups at the very beginning, and till now the only companies doing that have been incubators. Really, we're a new kind of thing, but because there's only one of us, there's no name for it.

Several things have surprised me about it. The biggest surprise is that it worked, or seems to be working so far. We had no idea what would happen if we just gave smart hackers some money and let them work on whatever they wanted. Fortunately the first batch turned out really well.

Another surprise is how much work it was. I'd hoped it would be a part-time job, but it hasn't been so far.

I'm also surprised at how fun it's been. I really like the founders. Many of them have become personal friends. And most of their startups are working on interesting, novel stuff. There's a new startup boom happening now, so there's a feeling of excitement around the Web generally, but it's especially concentrated when you have eight startups founded at the same time by young guys who all (now) know one another.

Why did you start Y Combinator?

Originally it started almost by accident. I gave a talk at Harvard about how to start a startup. In it I said that would-be founders should get their initial funding from individual rich people called "angels," and that the best angels were people who'd made their money in technology. And then, worried that I'd be deluged with business plans, I added: "but not me." I was kind of joking, but not entirely.

Afterward I felt bad about this. So I figured out a way to give seed money to startups without being deluged with pitches. We would start a company to do it, and tell people to send the pitches to the company. Of course I end up reading them in the end, but it gets concentrated into a couple weekends a year.

So the original motivation for Y Combinator was to avoid work, but as so often happens, I got sucked into it and I'm constantly coming up with new schemes that require me to do more work. Like the Startup School we just organized this October.

One of the startups we funded this summer was started by two guys who were in the audience at that original Harvard talk. And better still they're one of the more successful startups. Their site, Reddit, is so useful that almost everyone who was around Y Combinator this summer is now genuinely addicted to it, including me. It's the first site I look at every morning and the last I look at every night.

What advice can you give to aspiring entrepreneurs?

I've written a lot about this, so generally I'd advise reading the essays about startups on Especially "How to Start a Startup" and "Hiring is Obsolete."

The most important piece of advice is just: go do it. A lot of people in their early twenties are intimidated by the idea of starting a company and feel they're not ready. Actually they have a huge advantage they don't even know they have: they're not tied down.

If you don't have kids yet, you can (a) work long hours without feeling you're neglecting them, (b) live on nothing, (c) move anywhere, and (d) afford to fail. The last is the most important of all, because it means you can take risks, and risk and reward are always proportionate.

What is your position on outsourcing programming/tech jobs, and where will this lead the US?

I'm in favor of free trade in this as in everything else. If you can get a job done cheaper in another country, great. Protectionism almost always turns out to be a loss, even for the country that's supposedly being protected. It may benefit some small group within the country, but usually at the expense of everyone else.

In any case, I don't think outsourcing per se is much of a threat. I bet much of the time it's just a symptom of using a language that's not abstract enough. In effect you're using the programmers in India or wherever as human compilers.

The danger to the US is not the outsourcing of implementation, but that whole applications will get designed and implemented entirely overseas. But if other countries can develop software better than us, they deserve to win.

My guess is that they won't be able to, incidentally. You need a special environment to develop really novel technology. It's not just that you won't necessarily find this environment in India or China; you don't find it in 99% of the US either.

What motivates or inspires your work on a daily basis?

I keep having ideas for new things to do. It's almost pathological. Mostly bad ideas, of course. But I have various tricks for filtering out those. (One of the best is asking friends.)

At any given time I'm in the grip of some scheme or other. These vary greatly in size. Some take a couple hours and others take years. The scheduling algorithm is totally random. I just work on whichever I feel like at the moment.

This may sound disorganized, but I've found that planning doesn't work well. It forces you to work on stuff you're not interested in, and then you do a bad job.

The main motivator is the schemes themselves. Once you have an idea, it would be a shame to waste it. But if there is an underlying goal, it's to make stuff that will last. That's one reason I avoid writing about politics. A lot of famous writers wasted years and years writing about controversies of their time that no one cares about now, because they were just cases of the star-bellied sneetches versus the plain-bellied ones.

What is your problem-solving approach or strategy?

That's a hard one to answer. I have a thousand and one tricks.

One thing I try to do is treat the world like math. Good mathematicians are good at visualizing problems. They can see how things must be. Actually writing down the steps must often be mere transcription, or at least, implementation.

I try to understand non-math things so well that I can rotate and rearrange them in my head like that -- so I can see how things must be, then just write it down.

For example, I try to understand history so well that I can run thought experiments in my head. How would a Roman legionary or a medieval Flemish merchant seem if we could bring one forward in a time machine? If someone like Hitler took over the US now, who would be the first recruits marching with him in the street, and who would resist? Was the European domination of the rest of the world inevitable, or due to one or two random events in Chinese politics? (Diamond wrote about the easy question. The real question is: why not China?)

What advice do you have for our readers to succeed in the current tech job market?

Are they sure they want jobs? Maybe some of them would prefer to start their own companies.

In either case the single most important thing is to work on one's own projects. When we hired hackers at our startup, this was practically the whole interview: what have you built on your own, outside of school or work?

We asked that partly to tell if someone was a real hacker, since anyone who likes to hack will invariably be working on schemes ofb their own. (Unless they've been working at a startup, which could well absorb 100% of ones's energy.)

A lot of employers have learned this test. Both Yahoo and Google seem hot to hire people who've made a name for themselves by creating admired open-source projects -- to say nothing of venture capitalists.

The other reason to work on your own projects is that that's the best way to learn. You learn by doing, and you'll work more energetically on something that interests you, and that you own.


If you are interested in having your interview on this blog just drop a comment with your email and you will be contacted. Looking forward to hear from you.