Archive Page 3

We use ConfigObj configuration files pretty extensively at WebMynd; it would be nice to use the ConfigParser module available in Python’s standard library, but the extra features ConfigObj has, such as lists, multi-line strings and nested sections, make it hard to say no to the richer library…

Unfortunately, TextMate doesn’t come with support for ConfigObj syntax, but the editor’s excellent Bundle Editor allowed me to fix that pretty easily.

Here is an example ConfigObj file as I see it in TextMate, with two different “Font & Color” schemes:

TextMate language definitions use regular expressions to categorise text in a file (into keywords, constants, variables and so on). The regexes I’ve put together for this ConfigObj bundle are somewhat fragile – if you try to break it you probably will.

However, it should be good enough for the majority of configuration in the majority of files. As an added bonus, ConfigObj syntax is a superset of INI syntax, so you get the full poly-chromatic experience in .cfg and .ini files alike!

If you’re a TextMate user, download this file, unzip it and double-click on ConfigObj.tmbundle.


Today we launched our first customized search enhancer in collaboration with Fluther. It offers all the usual WebMynd features but is especially applicable to Fluther users and to those who want to tap into their networks’ and others’ knowledge through their question and answer service. Fluther are distributing the extension on their own homepage and have written about it here.

This is the first example of others taking advantage of WebMynd’s personalized search interface on Google (and other search engines shortly) to better deliver their service to the their users.

If you would like WebMynd to include your content or service, please get in touch.


WebMynd personalizes your search with the information sources you most value in the places that you expect. For the moment that means we embed search results from sources such as Twitter, Amazon, YouTube, Flickr, Wikipedia, your web history, your top sites and others on the right hand side of Google and let you configure it.

But we know there are many other sources of information that you may use that we do not yet include. We’d love to hear from you with suggestions on what other sources you would find useful. Or if you’re a site-owner who has unique content that we should include for our users. Just email us anytime at: founders@webmynd.com with your suggestions

We’ve also just released an update of WebMynd with more configuration options, a better UI for changing record modes and much better performance. So if you already use WebMynd be sure to download this latest version.


We have just released a version of WebMynd which takes us beyond visual web history which we described as a ‘DVR for the web’ when we launched in January. You can download and try out the update now.

It includes a completely re-designed Google interface with aggregation of many search tools such as Flickr, Wikipedia, Twitter search, Linkedin and many more. Many of the sources offer results that are not usually surfaced by Google. If you can’t find what you’re looking for by searching, WebMynd lets you post to Twitter to ask help from your network right from the search results page. As well as aggregating different search tools, WebMynd uses your web history to improve your search by showing you results from ‘Your Top Sites’ namely the sites that you most frequently visit – this is powered by Yahoo! BOSS.

We’d love to hear what you think and get your suggestions on other search tools to include.
WebMynd currently supports Firefox 3 on Windows, MacOS and Linux.


Scaling on EC2

23Jun08

Like any application developed for a platform, the success of a Firefox Add-on is closely tied to the popularity and distribution you get from the underlying delivery mechanism. So, when we honed down the WebMynd feature set, improving the product enough to get on Mozilla’s Recommended List, we were delighted by our increasing user numbers. A couple of weeks later, Firefox 3 was released, and we got a usage graph like this:WebMynd usage statistics

With a product like WebMynd, where part of the service we provide is to save and index a person’s web history, this sort of explosive expansion brings with it some growing pains. Performance was a constant battle with us, even with the relatively low user numbers of the first few months. This was due mainly to some poor technology choices; thankfully, the underlying architecture we chose from the start has proven to be sound.

I would not say that we have completely solved the difficult problem in front of us – we are still not content with the responsiveness of our service, and we’re open about the brown-outs we still sometimes experience – but we have made huge progress and learned some invaluable lessons over the last few months.

What follows is a high level overview of some of the conclusions we’ve arrived at today, best practices that work for us and some things to avoid. In later weeks, I plan to follow up with deeper dives into certain parts of our infrastructure as and when I get a chance!

Scaling is all about removing bottlenecks

This sounds obvious, but should strongly influence all your technology and architecture decisions.

Being able to remove bottlenecks means you need to be able to swap out discrete parts which aren’t performing well enough, and swap in bigger, faster, better parts which will perform as required. This will move the bottleneck somewhere else, at which point you need to swap out discrete parts which aren’t performing well enough, and swap in bigger, faster, better parts… well you get the idea. This cycle can be repeated ad infinitum until you’ve optimised the heck out of everything and you’re just throwing machines at the problem.

At WebMynd, for our search backend, we’ve done this four or five times already in the five months we’ve been alive, and I think I still have some iterations left in me. Importantly, I wouldn’t say that any of these iterations were a mistake. In a parallel to the Y Combinator ethos of launching a product early, scaling should be an iterative process with as close a feedback loop as possible. Premature optimisation of any part of the service is a waste of time and is often harmful.

Scaling relies on having discrete pieces with clean interfaces, which can be iteratively improved.

Horizontal is better than vertical

One of the reasons Google triumphed in the search engine wars was that their core technology was designed from the ground up to scale horizontally across cheap hardware. Compare this with their competitors’ approach, which was in general to scale vertically – using larger and larger monolithic machines glued together organically. Other search engines relied on improving hardware to cope with demand, but when the growth of the internet outstripped available hardware, they had nowhere to go. Google was using inferior pieces of hardware, but had an architecture and infrastructure allowing for cheap and virtually limitless scaling.

Google’s key breakthroughs were the Google File System and MapReduce, which together allow them to horizontally partition the problem of indexing the web. If you can architect your product in such a way as to allow for similar partitioning, scaling will be all the more easy. It’s interesting to note that some of the current trends of Web2.0 products are extremely hard to horizontally partition, due to the hyper-connectedness of the user graph (witness Twitter).

The problem WebMynd is tackling is embarrassingly partitionable. Users have their individual slice of web history, and these slices can be moved around the available hardware at will. New users equals new servers.

Hardware is the lowest common denominator

By running your application on virtual machines using EC2, you are viewing the hardware you’re running on as a commodity which can be swapped in and out at the click of a button. This is an useful mental model to have, where the actual machine images you’re running on are just another component in your architecture which can be scaled up or down as demand requires. Obviously, if you’re planning on scaling horizontally, you need to be building on a substrate which has low marginal cost for creating and destroying hardware – marginal cost in terms of time, effort and capex.

A real example

To put the above assertions into context, I’ll use WebMynd’s current architecture:WebMynd architecture

The rectangles represent EC2 instances. Their colour represents their function. The red arrow in the top right represents incoming traffic. Other arrows represent connectedness and flows of information.

This is a simplified example, but here’s what the pieces do in general terms:

  • All traffic is currently load balanced by a single HAProxy instance
  • All static content is served from a single nginx instance (with a hot failover ready)
  • Sessions are distributed fairly across lots of TurboGears application servers, on several machines
  • The database is a remote MySQL instance
  • Search engine updates are handled asynchronously through a queue
  • Search engine queries are handled synchronously over a direct TurboGears / Solr connection (not shown)

One shouldn’t be timid in trying new things to find the best solution; almost all of these parts have been iterated on like crazy. For example, we’ve used Apache with mod_python, Apache with mod_proxy,  Apache with mod_wsgi. We’ve used TurboLucene, looked very hard at Xapian, various configurations of Solr.

For the queue, I’ve written my own queuing middleware, I’ve used ActiveMQ running on an EC2 instance and I’m now in the process of moving to Amazon’s SQS. We chose to use SQS as although ActiveMQ is free as in beer and speech, it has an ongoing operations cost in terms of time, which is one thing you’re always short of during hyper-growth.

The two parts which are growing the fastest are the web tier (the TurboGears servers) and the search tier (the Solr servers). However, as we can iterate on our implementations and rapidly horizontally scale on both of those parts, that growth has been containable, if not completely pain free.

 
Amazon’s Web Services give growing companies the ideal building blocks to scale and keep up with demand. By iteratively improving the independent components in our architecture, we have grown to meet the substantial challenge of providing the WebMynd service to our users.
 


We have been listening closely to what all of our great users have been saying and are rolling out our next major version, WebMynd 0.4. The new features include…

1. An iphone fling interface to navigate pages, man is it fun!
2. A simple and powerful organization system that stacks all of your pages from the same website. Combats all the clutter and endless scrolling in the old version.

3. Enhanced Google search that reminds you of pages you have seen before. Great for finding that site you remember seeing but Google isn’t pulling up.

Let us know what you think of the new features. We hope that you enjoy them.


Hello WebMynders, full text search is now working with out slowing down your WebMynd playback. There is a lag of about 5 minutes between when you look at a page and when it gets indexed for full text search, so if you are not seeing recent pages appear when you run a text search that is the reason.

We want to thank the people who took the time to fill out our survey, your feedback and suggestions are extraordinarily valuable in helping us figure out what features to build next. The majority of people requested a tagging system and a way to extract contact information from pages that they see. We are hard at work implementing these features now.

On a separate but related topic I came across an interesting book today that my friend Justin recommended entitled “Keeping Found Things Found: The Study and Practice of Personal Information Management” by William Jones. Dr Jones is a Professor at the Information School, University of Washington. Seeing how relevant this topic is to WebMynd I quickly ordered the book from amazon.com and am awaiting its arrival.

Doing a little more online research about the book I came across some interesting studies that Dr. Jones’s research group had conducted on how people keep track of the websites they visit. I was interested to find out that they identified 13 different methods that people commonly use to keep track of websites, they are (in no particular order)…

Send email to self
Send email to others
Print out the web page
Save the web page as a file
Paste the web address (URL) into a document.
Add a hyperlink to a personal web page
Do nothing to save but search again to re-access
Do nothing to save but enter the URL directly
Make a Bookmark or Favorite
Do nothing to save but access via another web site
Use Personal Information Management Software
Personal Toolbar or Links
Write down the web address (URL) on paper

The study also found that generally people have a repertoire of between five and seven keeping methods from the list above that they use weekly. That is quite a lot of ways to try and keep track of information that comes from a single source, the web. It seems like having website information spread across email accounts, note pads, bookmark folders, etc. is more confusing then helpful. These folks need to give WebMynd a try! Keep all those website in one spot and do a WebMynd search when you need to find it again. Anyway I look forward to reading the book and learning more about keeping found things found, that is certainly something we want WebMynd to help people with.


We want to thank all of our users for their patience as we scale the WebMynd service. Due to the large number of people installing and using the WebMynd addon we have suffered some slowdowns. We realize that WebMynd playback and search have been running slowly but rest assured that we are working around the clock to improve performance.

If you are having problems please post to our forum or email us at support@webmynd.com. We want to make your experience the best it can be. Thanks again for your patience.


We just got added to the official Firefox Addons list (AMO). It took a couple of weeks in the sandbox, where Mozilla pokes your plug in to make sure it works properly, but we are finally on the list. The people at Mozilla are some of the best folks around and we want to thank the editors who took the time to review WebMynd and put it on the public AMO list.

Now that we have the Mozilla seal of approval it means that Firefox beta 3 users will be able to install WebMynd. Starting tomorrow FF3 users can install from WebMynd.com or if you just can’t wait you can get it at the AMO site right away.

Happy surfing.


For those of you who have the latest version of the WebMynd plug in you may have noticed a new pull down menu next to the WebMark star with the word “publish”. So what ever does this magical menu item do? It lets you send pages to the OpenMynd collection for the rest of the community to see.

This public collection is our first step towards letting you share your pages with friends and family even if they don’t have a WebMynd themselves. The pages you choose to publish will be available for everyone to see.

The OpenMynd pages are not the same as the virtual copies that you have in your personal WebMynd. OpenMynd page are a static image of the page as it was when you clicked the publish button. However there will be a link out to the live copy on the web.

We look forward to you populating the OpenMynd with some cool pages. Our hope is that the OpenMynd will grow into an interesting collection of pages that is fun for everyone to flip through.

We would love to hear what you think of the new publish feature.