gmark-this

Dear readers,
A couple days ago I managed to make available my first Chrome extension. GMark This is a Google Bookmarks extension for Google Chrome. It aims to provide a lightweight and elegant interface between Bookmarks and Chrome by providing simple ways of bookmarking new pages and searching bookmarks. Searches are also integrated with Chrome’s Omnibox.

GMark This was motivated by the lack of a simple extension to manage Google Bookmarks on Chrome. For users with a significant amount of labels and bookmarks, using traditional options like YAGBE is not feasible. Loading all your labels on a pop-up and browsing through them is simply not efficient. That’s why GMark This aims to provide the simplest integration between Bookmarks and Chrome, without overheads.

Currently GMark This aims to provide a simple integration between Bookmarks and Chrome, thus, its features also aim to be as simple as possible.

  • New bookmark creating through a browser action button.
  • Bookmark search through Chrome’s Omnibox.

You can check it out at http://code.google.com/p/gmark-this/

The Open Proxy Saga

A couple weeks ago I was messing with a few Apache configs, trying a few things that could improve the server performance. Everything was fine until late last week when I noticed that the page was really slow. Initially I thought it was a connectivity issue but after a couple hours I decided to troubleshoot it. First thing to do is check the logs for any possible explanation. Found two interesting messages:

[error] server reached MaxClients setting, consider raising the MaxClients setting

That is interesting specially because I tweaked the MaxClients setting not too long ago and the traffic has not increased significantly since then. The second interesting information was the number of GETs to external domains. That can’t be right. Why users would be requesting pages from other domains?

First thing a thought was ‘Damn, I’m serving as an open proxy!’, and I was right! Went to check the Apache configs and found:

ProxyRequests On

ProxyRequests was set to On, meaning that Apache was serving as an open proxy.

Second thing I went to check was the server statistics. Interestingly 3 days ago the memory usage increased significantly and and also the bandwidth utilization. More memory was coming from more Apache processes, showing exactly when it started. But how I started to get so many requests so quickly? Goggled it. My IP was listed on several open proxy lists, containing even the status, latency and even the last check time. That is awesome! Probably they have bots port scanning all around. One of these bots found my IP and published it somewhere and this list was replicated and replicated from here to Japan!

Obviously I don’t want to be serving as and open proxy for several reasons, so I went and changed the ProxyRequests back to Off. Right after I changed it, I saw the logs growing enormously. That’s when I noticed the extent of the problem. I was serving hundreds of concurrent users, a pretty good burn test for the server. And guess what, after days like that, it was still rock solid!

Now the second part of the saga. After turning ProxyRequests back to Off, besides the huge increase on logging (error only), CPU spiked to a load average of 22 on a 4 proc server. That’s a lot for those not familiar with Linux. An increase on logging is expected, since we’re having far more errors now, were users requests for other domains are failing. An increase on CPU usage was also expected, since the number of requests to my main page increased significantly (failed proxy requests are redirected to the default Apache site), but not as much as 22.

Checking the logs again I’ve noticed a huge number of errors stating that the URL was too long. All of these ‘long’ URLs had the same format, an external domain, followed by ‘http’ in a loop, like ‘http://www.google.com/httphttphttphttphttphttphttphttphttphttp…’. That was strange, why would someone requests a website like this. Then I decided to try using my server as a proxy. The same thing happened, tried google.com and I was being automatically got to a redirect loop that and appending ‘http’ to my requests until reaching a limit of 20 or so redirects. This means that every proxy request by users was generating over 20 requests on my server. Next step is to check why that was happening. Time to ‘telnet’ my server on port 80:

GET http://www.google.com/ HTTP/1.1
Host:www.google.com

That returned and HTTP 301 (Moved Permanently) response, moving to the same domain, but appending ‘http’ to the address. Good, same behavior we had on the browser. Now why is this happening. Looking a little bit further into this, I’ve found that when Apache gets a request for a domain that not in your virtual host list, it responds with the default virtual host, or the first virtual host loaded if you have not defined that explicitly. My default virtual host is my main website, a WordPress based site. Analyzing this further, I’ve found that when WordPress receives a request for an unknown page, it redirects to a standard page, instead of returning an HTTP 404 (Not Found) error. That is called canonical URL redirection and is used for a number of reasons, from enabling alternative URLs to ‘fancy’ permalinks. That explains the loop. Apache opens the default website with WordPress, which redirects automatically to the same non existent domain, just appending the requested page to the address. Since the ‘:’ char is not valid in an address, WordPress stops there. Since the user still has my server set as proxy, the process starts again, but that time with an extra ‘http’, and so on, on a infinite loop. So how do we disable that??

Found a simple how-to at velvetblues.com. You simply have to add the following line to your templates ‘functions.php’:

remove_filter('template_redirect','redirect_canonical');

Tried that and it worked! Now when I try to use my Apache server as proxy, all requests return a WordPress page mentioning that the page was not found. Problem solved!

Not so much, we still have the part three of the saga. I waited a few minutes and checked the server statistics again. CPU usage reduced significantly, to a load average of 5. Still a lot, but much better than 22 we had before. Server was responding quickly, but I’m still not satisfied. I don’t like the fact that a lot of leechers are consuming a lot of resources on my server. How can I improve that assuming that leechers will keep trying to access my server as a proxy for a while before figuring out it is working anymore. To solve this we have plenty of options, from simple ones to more complex ones like adding modules to Apache to ‘iptables’ block users that try to request domains that are not on the virtual hosts list. I don’t want to waste too much time on this since it’s not critical, so I opted for a very simple solution. Dynamic pages are very resource intensive compared to static pages. I don’t really care about serving a ‘nice’ page to users that trying to use my server as a proxy. So why not show these users a simple html page instead of my WordPress website? Well, Apache serves the default website to virtual hosts not matching any virtual host on the list, so I decided to simply change the default website to the default and well known Apache page ‘It Works!’. To do so, I just had to enable the default Apache site that was already there, just not enabled.

Guess what?? It worked. Requests to non-mapped domains were being served with a simple ‘It Works!’ page. Waited for a few minutes and checked the server statistics again, and wow, load average went to 0.1. Problem solved. Serving simple static pages reduced the CPU usage drastically. Now I just have to deal with the error log file.

That was easy, since now all ‘undesired’ users were being ‘redirected’ to the default Apache website, it was just a matter of changing the error log level. Just went and changed the following line on the default configuration:

LogLevel crit

This will only log critical errors, which are not the errors we’re having now, solving the log file issue. Ohh, just remember to comment the CustomLog line too, to avoid access logging, which is even worse.

Cheers, Martin

Chrome or Firefox?

I’ve been a Chrome user for quite a while and before that, Firefox for a long time. I decided to make the move almost 2 years ago, right after extensions were introduced. The main reasons to do so were the start time (cold and warm), which by that time were blazing fast on Chrome compared to the slow Firefox and the cool new Chrome start page. So far I’ve been happy with Chrome besides a few problems with it’s extension capabilities, but the recent launch of Firefox 7 made me question my decision. The new Firefox is as fast, if not faster than Chrome on start time and apparently consumes significantly less resources. That, combined with more powerful extensions is a recipe for a great browser.

Now the requirements. What is really important to me is, besides the speed and footprint, are a few particular extensions. Good support to Google Bookmarks, Evernote, Read it Later, a Session Manager, AdBlock, a bulk media downloader and a Twitter client.

So far I came to the following analysis:

ChromeFirefox
Google BookmarksPartially. YAGBE works but it’s not great.Yes, thru GMarks
EvernoteYes, thru official extensionYes, thru official extension
Read it LaterYesYes
Session ManagerYesYes
AdBlockYesYes
Media DownloaderNo. Apparently this is a Chrome API limitationYes, thru DownThem All or Download Helper
Twitter ClientYes, TweetDeck appNo. Could not find a web app at the same level as
TweetDeck.
Nice Start PageYes.Yes, thru myfav.es.

So what is your opinion? Have any extension suggestions? Firefox or Chrome?

LoadRunner and IE8

Had this small problem today and I believe it would be interesting to share the solution. I was “forced” to update to Internet Explorer 8 (IE8) and for my surprise, VuGen crashes when trying to record anything with it.

My first thought was to check for patches. No patches, LoadRunner was already on version 9.52 (9.50 + 9.51 patch + 9.52 patch). Checking some forum posts I’ve found that the issue can be caused by the “Disable Execute Bit” (DEP) functionality. So how to disable it??

You have to open the boot.ini file (C:\boot.ini) and add the following string to your boot line:

/noexecute=alwaysoff

Usually it will become something like that:

multi(0)disk(0)rdisk(0)partition(1)\WINNT=”Microsoft Windows” /noexecute=alwaysoff /fastdetect

I’m not sure if the same solution can be applied to Windows Vista or 7. Also I don’t know if older LoadRunner versions are affected too, but this worked for me. :-)

Data Generator

My colleague and friend Mauricio (@mvgiacomello) shared a useful tip on Data Generation.

There is a web application that can generate test data based on a set of parameters. It can export the data on HTML, Excel, CSV, SQL and XML.

Their website has a demo that can generate up to 200 records. If you need more, you can download the code and deploy to your own server!

Here is what they say:

Ever needed custom formatted sample / test data, like, bad? Well, that’s the idea of the Data Generator. It’s a free, open source script written in JavaScript, PHP and MySQL that lets you quickly generate large volumes of custom data in a variety of formats for use in testing software, populating databases, and scoring with girls.

You can check it out at: http://www.generatedata.com/#generator