Why I do not much care for Facebook

Thu 29 Jan 2009 03:17 PM

Tags: Facebook FOAF Social Graph XFN

I am not a big fan of Facebook. This has always been true, but I've never given it a lot of thought. Recently a friend challenged me to come up with reasons why I do not like Facebook, and so here is my response.

Facebook does a lot of things -- pictures, status, friend tracking, interest sharing, and, with apps, a lot more. The problem is, for about any of these things, there is a webapp that does it much better than them. Flickr does better picture management, Twitter does better status updating. Recently they have tried to implement chat, which is so bothersome to use and often fails to work properly, making it much worse than a service like Meebo. So why is Facebook so popular?

Facebook is popular because of its social graph. If you were going to use Flickr to share pictures, you would find that there is just not as many of your friends on Flickr. If you were to go to Twitter you would find the same thing.

Is there a solution? Not quite, or at least not quite yet, but the best contenders I have seen are FOAF (Friend of a Friend) and XFN (XHTML Friends Network). Now for tech people this stuff is not new to you, in fact you may think it is old hat, and I am not here to advocate for either of them, just advocate for the idea behind them. If there was a way that our social graph could be queryable, portable, dynamic, we wouldn't need Facebook. In fact we would be better off than just having Facebook.

As a disclaimer, I do not hate Facebook, and there are somethings that it does well, but I imagine a world in which my social graph uses any app that I can choose instead of having to settle for the sub-par Facebook apps.

If You Open Source It, They Will Come

Wed 10 Dec 2008 07:49 PM

Tags: Django Open Source

The Cathedral and The Bazzar

I want to talk about open source and the community and how powerful they can be. One of the most famous examples of open source is the Linux operating system originally developed by Linus Torvald. This development was mentioned in a book that someone recently suggested to me. It is called The Cathedral and the Bazaar by Eric S Raymond, and it was actually published over 9 years ago, but it gives a lot of insight into not only the development of that time, but development today as well.

In the second chapter of his book, bearing the same name as the title of the book, Raymond talks about Cathedral development and Bazaar development. Cathedral development, referring to closed-source or restricted-source projects, is an idea that a piece of software must be painstakingly crafted, much like a cathedral: every brick perfectly placed, and only by people who totally understand the entire structure of the building.

Bazaar development, referring to open-source distributed projects like Linux, is much more like a bazaar (obviously). Anyone can set up there, bring their goods, and sell them. If their goods are quality people will buy them. You don't have to buy all the goods, you can pick and choose. The main point is there is less restrictions to get into a bazaar.

If you actually think about it, and Raymond mentions this in his book, who would have ever believed that this mass amount of disorganization could have ever worked. Hindsight is 20-20, but trying to think back to what Linus must have been thinking is astounding.

My Experiences With Open Source

As some people may be aware of, I started a project a couple months ago called django-schedule. When I started my development, it was like a cathedral. I spent a long time thinking out ways of how pieces of my code would interact, writing documentation, re-writing documentation, and even though the source was open, I still acted as a gatekeeper, scrutinizing every patch as if people who did not understand my complex system could not possible improve it.

During the beginning of November Yann Malet approached me with an idea of implementing recurring events. We disagreed a lot and discussed ideas and he submitted a pretty extensive rework of the project. I again scrutinized it and we had some more discussions. Eventually after much reworking of the code I re-released django-schedule with recurring events, and it was a great success, shortly after this I added Yann on as a contributor with commit access.

This was scary at the time. Some one who I barely knew, who I had only seen a small amount of code that he had written, would be editing my project. And you know what happened, Everything worked amazingly well. The code started to blossom as his areas of expertise complimented mine. The fear dissipated, and I realized that this wasn't my project, it was your project; better yet, it was our project.

Recently, Rock Howard had approached me with a fork of my project that had some bug fixes, better templating, and many other improvements. So I added him to the list of committers. Only, this time, it came with much less angst. Actually it came with relief because I had finally found some one who had a good amount of experience with writing templates.

The beauty with an open source project, is that it is improving everyday. And how much have I been working on it recently? The truth is not very much. I am finishing up college and have been very busy with that process, and when I am finished graduating I will continue to help with the project. But the beauty of it is that this project is still getting better, and I am not doing a thing. So thank you Yann and Rock, thank you Linus, thank you Raymond, and especially thank you everyone.

Naive Bayesian Classifier and Django

Sat 15 Nov 2008 03:47 PM

Tags: django NewsPet project

I am currently taking an introductory course to artificial intelligence. I find it really interesting. I have learned more about search algorithms than I could have ever imagined. Recently we were instructed to create some artificial intelligence agent to do any task, as long as it uses some element of our class. Thus News Pet was created.

News Pet is a trainable (hence the name) news fetcher, that hopefully will be like Pandora for news. The scope of this project has pretty limited features (mostly a proof of concept at this point), but I will offer it up as food for thought. Often people have lots of RSS feeds that they subscribe to, but they do not want to read all of the items that come from these feeds, either that or they want to categorize them in some way. NewsPet categorizes or trashes all the articles that come through your reader.

We will be using Naive Bayesian Classification, which I will discuss briefly later for any one who is unfamiliar with Bayesian Classification. Our project has three important parts: Classification, Initial Learning, and Feedback (or Subsequent Learning).

Bayesian Network and Classification

Bayesian networks are unidirectional graphs using probability, specifically Bayes Theorm, to help calculate the likeliness of events based on partial evidence. The way the graph works is an edge between node A and B represents P (A`|`B), or the probability distribution of A given B. So lets give an example. You have a nuclear plant with a core, we have the temperature of the core (T), the reading from the temperature gauge or the perceived temperature (G), the probability of the the gauge being faulty (F_g), the probability of the alarm going off (A), and the probability that the alarm is faulty(F_a). The alarm is more likely to go off if the perceived temperature is high, and the gauge is more likely to be faulty if the temperature is high. So lets look at the graph.

/media/img/BayesNetwork.jpg

Naive Bayesian Classifiers (NBC) use Bayesian networks to classify documents. It is often used for spam filtering. It is trained by taking a batch of documents and giving it to the NBC and telling it if each document should be accepted or should not be accepted. The NBC then builds a Bayesian network. When you give a document to an NBC it uses that document as evidence and it can give you a confidence value of whether the document should be accepted.

Classification

Our approach to classification is nothing special. There is a list of feeds and there are a list of categories. Users create the categories and they all have an NBC associated with them. The way they are trained will be talked about in Initial Training and Feedback. Each item from the feeds will be tested against each of these categories. If the confidence is greater than some constant it will be added to that category. The default constant will be decided after some experimentation, and it will also be customizable. If the item fails to be added to any category, it will be added to the special category, Trash.

Initial Training

Because each category has its own Bayesian classifier it needs to be initially trained. We have several ideas for this right now: choose from some pre-trained categories (ie Business, Programming), submit a batch of files to train the classifier, or use a single word or phrase. The first two are fairly straight forward, but they are not extremely helpful, that being said it is impossible to train a Bayesian classifier with just one word. Because google is a reliable source for finding documents based on single words, we are going to use google search to retrieve a number of documents to train the Bayesian classifier. We will be doing experimentation to decide which to use, the final solution will probably be the ability to choose which training you would like.

Feedback

One important part of this feed reader is that its trainable. For all items you will have the choice to thumbs it up, thumbs it down, or say that it belongs to another category. Behind the scene thumbs up will and thumbs down will be more training cases for that category's NBC. When you move it to a new category, it becomes a training case for both category's NBCs. This means that the NBCs will always be changing and improving.

Django

Many of you may be skipping right to this section to see how Django has anything to do with this. I have convinced my group members that a Django powered website would be the best choice for the user interface. In the background doing the Bayesian classification will be Java. While this may be disappointing, it was for a couple of reasons: my project members do not know Python, and we couldn't find a good a good NBC library for python. If anyone knows of any, I would be interested in hearing about them, so leave a comment.

Django Schedule

Sun 09 Nov 2008 10:29 PM

Tags: django documentation project

Django-schedule has just been released, supporting recurring events. Doing this required a paradigm switch. In this post I will describe the paradigm switch as well as explaining some features.

Events and Occurrences

The new idea is to think of Events as a thing that a person would like to track, and an Occurrence as a instance of an event with a specific time and date. It works best if we think about it with an example. You have a 'Weekly Staff Meeting', this is an Event. Its a meeting that happens every week. Now 'Tuesday's Staff Meeting' is an Occurrence. It is a specific instance of the Event 'Weekly Staff Meeting'. So now lets look at how this works with the code.

>>> user = User.objects.get(username='thauber')
>>> start = datetime.datetime(2008,1,1,14,0)
>>> end = datetime.datetime(2008,1,1,15,0)
>>> rule = Rule.objects.get(name = "Weekly")
>>> event = Event(title = 'Staff Meeting',
...           start = start,
...           end = end,
...           rule = rule,
...           description = "description")
>>> event.create_relation(user)

What we just created here was an event called "Staff Meeting." Don't worry about the create_relation line we will deal with that in Relations. Now we can worry about getting the Occurrences. Lets say that you want all occurrences of that event from today to a week from today.

>>> start = datetime.datetime.now()
>>> end = start + datetime.timedelta(days=7)
>>> event.get_occurrence(start, end)

This would return all of the occurrences of this event between start and end.

Periods

So now you have a list of events, and you would like all of the occurrences for that list. You can do this with the Period class.

>>> events = Event.objects.get_for_object(user)
>>> period = Period(events, start, end)
>>> period.get_occurrences()

If you are wondering why there is a class for this there are several reasons.

1) It is useful to know which events start in this period, end in this period, or are just continued in this period. To deal with this there is a function, get_occurrence_partials, which returns what I like to call Occurrence Partials. Meaning Occurrences relevant to to a discrete period of time. Each element in the returned list is a dictionary {'event': event, 'class': 0} the classes are as follows:

  • 0: The event begins in this period
  • 1: The event begins and ends in this period
  • 2: The event doesn't begin or end in this period, but it exists in this period (AKA it continues during this period)
  • 3: The event ends during this period

2) It can be subclassed so that special functionality can be added to special periods. Some subclasses that are included out-of-the-box are Month, Week, and Day. These subclasses have some specific functionality that you may find helpful, for example Month has get_weeks, which returns the Week periods for that specific Month period. Month, Week, and Day are all initialized by a date or a datetime object.

>>> date = datetime.datetime(2008,5,20)
>>> month = Month(date)
>>> month.start
datetime.datetime(2008,5,1,0,0)
>>> month.end
datetime.datetime(2008,6,1,0,0)

Notice that the end of a period is not inclusive in the period.

To see more information on the Period class you should view the source.

Rules

Rules are how you define the recurrence pattern of an Event. This uses the rrule in the dateutil module (not included with python). For more information on rrule you should see the documentation. Rule is a model so it can be created through the admin interface. As of now the fields are

Name
The name of the recurrence pattern (ie Weekly, Every other Month)
Description
A more verbose definition of the recurrence pattern.
Frequency
Defines the frequency set for the rrule. Must be YEARLY, MONTHLY, WEEKLY, DAILY, HOURLY, MINUTELY, SECONDLY.
Params
This field holds the params that allow you to customize the rrule. It is key value pairs seperated by semi-colons(;) the key value pairs are seperated by colons(:). The value must be integers, or list of integers. An example would be count:2;byweekday:0,1,2; (see source for more help).

Eventually the admin will be easier to work with for this model, and it will come with some builtin Rules, like Weekly, Monthly, Yearly, Every Weekday, etc.

Relations

There is a built in relationship table for relating events to generic objects. This also works with calendars. You do not need to worry about the relationship table as it all happens behind the scene. Lets say you want to relate a calendar to a Group, which represents a group of users. This is really simple to do.

>>> group = Group.objects.get(name = "Pythonistas")
>>> cal = Calendar.objects.get(name = "Pythonistas' Calendar")
>>> cal.create_relation(group)
# Now to get that calendar
>>> Calendar.objects.get_calendars_for_object(group)

Both Calendar and Event have create_relation functions. If you know that there should only be one Calendar you can use get_calendar_for_object. It will return one Calendar or raise Calendar.DoesNotExist. Or if you only want there to be one calendar, but you don't know if there is one you can use get_or_create_calendar.

>>> Calendar.objects.get_or_create_calendar(group, name = "Pythonistas' Calendar")

As you can see there is an optional keyword name. If the Calendar needs to be created it will get the name name.

Conclusion

There is some work that still needs to be done. I would like upgraded forms, templatetags, and I am always looking for more features to be implemented. If you have an comments you can let us know at the Django-schedule page.

A special thanks to Yann Malet for his help getting event recursion working

UPDATE fixed some typos, see yml's and Guenter's post below.

My New Design

Mon 03 Nov 2008 11:06 AM

Tags: Blog Django Personal

I have finally redesigned (and fixed) my blog. I understand this is a long awaited, momentous occasion, and, with no further adieu, here it is, soon to be a hot-spot of information with a wealth of insight. A new feature I have added, which many blogs have had before me, is the Link. Where I can link to cool things I find on the internet. Some features haven't been implemented yet, such as the buttons on the right, post detail pages, tag pages, a blog roll, and a 'find me on other websites' section.