<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ramblings of Doug Warren</title>
	<atom:link href="http://dougwarren.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://dougwarren.org</link>
	<description>Ramblings Revisited</description>
	<lastBuildDate>Wed, 06 Jul 2011 05:57:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>Amazon Android App Store Free App of the Day RSS Feed in Django</title>
		<link>http://dougwarren.org/2011/07/amazon-android-app-store-free-app-of-the-day-rss-feed-in-django/</link>
		<comments>http://dougwarren.org/2011/07/amazon-android-app-store-free-app-of-the-day-rss-feed-in-django/#comments</comments>
		<pubDate>Wed, 06 Jul 2011 05:57:06 +0000</pubDate>
		<dc:creator>Doug Warren</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[rss]]></category>

		<guid isPermaLink="false">http://dougwarren.org/?p=118</guid>
		<description><![CDATA[A showcase of technologies to implement an RSS feed that shows the Amazon Android Store Free App of the Day.]]></description>
			<content:encoded><![CDATA[<p><span class="initialcap">I</span>&#8216;ve been working off and on with Django now for a little over a year, but haven&#8217;t actually published anything yet.  Most of the projects I&#8217;ve tackled are rather large, so I have nothing to really to show for it.  Last night however, I came up with an idea for the perfect small Django project that would be simple to implement and actually quite useful.  </p>
<p>Amazon&#8217;s Android App Store has the concept of a Free App a day.  The idea being, if you check their marketplace every day, they&#8217;ll reward you with the opportunity to get an app that&#8217;s normally paid for free.  However, this of course requires you to remember to check it each day.  Something I&#8217;m rather bad at.  What I&#8217;m not bad at however is visiting my RSS Reader of choice at least once a day.  So the only question then is how to get the free app into an RSS format.  (Spoiler Alert: You can find the RSS page at <A HREF="http://rss.dougwarren.org/AmazonFreeAppFeed/">http://rss.dougwarren.org/AmazonFreeAppFeed/</A>)</p>
<p><H3>Django to the Rescue</H3></p>
<p>This gave me the opportunity to try out and showcase some technologies I&#8217;ve been wanting to use for awhile.  In a previous project I parsed HTML with <a href="http://www.stonesoup.com/">StoneSoup</a>, but it&#8217;s not very well supported and <a href="http://lxml.de/lxmlhtml.html">lxml.html</a> has been benchmarked[<a href="#1"><sup>1</sup></A>] as being superior in every way.  I&#8217;ve also talked a lot to co-workers about how easy <a href="http://pypi.python.org/pypi/virtualenv">virtualenv</a> keeps your Python code separate and gets rid of dependency conflicts.  And I&#8217;ve been meaning to look into <a href="http://pypi.python.org/pypi/django-celery">celery</a> for asynchronous task resolution.  (Particularly, I liked the idea of using celery as a cron[<a href="#2"><sup>2</SUP></A>] as I didn&#8217;t like django-command-extension&#8217;s <a href="http://packages.python.org/django-extensions/jobs_scheduling.html">runjobs</a> system.  However, for the sake of finishing this in a single evening, I cut that last dependency.  I left the code in place to come back to it in the future, but for now this is powered by cron.)  Finally of course, it would be my first public facing <a href="http://www.djangoproject.com">Django</a> project.  Albeit very trivial and minimal.</p>
<p><H3>Installing Software</H3><br />
The first step is to take care of the actual installation of software.  I&#8217;ll make the project root, set up virtualenv, and install Django:</p>
<pre class="brush: bash; title: ; notranslate">
[dwarren@thebigwave ~]$ cd /home/dougwarren/
[dwarren@thebigwave dougwarren]$ mkdir rss
[dwarren@thebigwave dougwarren]$ cd rss
[dwarren@thebigwave rss]$ virtualenv -p /usr/local/bin/python2.7 --no-site-packages .
[dwarren@thebigwave rss]$ . bin/activate
(rss)[dwarren@thebigwave rss]$ vi requirements.txt
</pre>
<p>The (rss) on the path is a reminder as to what virtualenv environment is currently running.  In my bashprofile I have several aliases that will call deactivate before activating another virtualenv so I can quickly switch from environment to environment.</p>
<p>Add into the requirements file the following projects:</p>
<pre class="brush: bash; title: ; notranslate">
django
django-extensions
requests
lxml
ipython
</pre>
<p>Now I&#8217;ll install the software and set up our version control system.  I&#8217;ve been using <a href="http://www.git-scm.com">git</a> lately, but I may spend some time with <a href="http://mercurial.selenic.com/">Mercurial</a> soon to get a better understanding of the pros and cons of each.</p>
<pre class="brush: bash; title: ; notranslate">
(rss)[dwarren@thebigwave rss]$ pip install -r requirements.txt
(rss)[dwarren@thebigwave rss]$ git init .
(rss)(master) [dwarren@thebigwave rss]$ vi .gitignore
</pre>
<p>Into the .gitignore file I&#8217;ll add the list of files and directories that should not be added to source control:</p>
<pre class="brush: bash; title: ; notranslate">
bin/
include/
lib/
db/
migrations/
share/
*.pyc
</pre>
<h3> Starting out with Django </h3>
<p>Next, now that there&#8217;s a base of the project, I&#8217;ll check in what&#8217;s there and start a new project called &#8216;apps&#8217;.  This is actually a point that I wish could be different.  From above, I have a directory structure that looks like:</p>
<pre class="brush: plain; title: ; notranslate">
/home/dougwarren/rss
                /bin
                /lib
                /src
                /share
</pre>
<p>Now Django-admin won&#8217;t start a project in an existing directory, so I have to have another subdirectory off of rss for the project.  I&#8217;d rather install the project in the rss directory and have the apps off of it.  If anyone knows an easy way to accomplish this let me know!</p>
<pre class="brush: bash; title: ; notranslate">
(rss)(master) [dwarren@thebigwave rss]$ git add .gitignore requirements.txt
(rss)(master) [dwarren@thebigwave rss]$ git commit -m 'initial commit'
(rss)(master) [dwarren@thebigwave rss]$ django-admin.py startproject apps
(rss)(master) [dwarren@thebigwave rss]$ cd apps
(rss)(master) [dwarren@thebigwave apps]$ django-admin startapp amazonfeed
(rss)(master) [dwarren@thebigwave apps]$ chmod a+x manage.py
(rss)(master) [dwarren@thebigwave apps]$ mkdir db
(rss)(master) [dwarren@thebigwave apps]$ git add * amazonfeed/*
(rss)(master) [dwarren@thebigwave apps]$ git commit -m 'initial django baseline'
(rss)(master) [dwarren@thebigwave amazonfeed]$ vi settings.py
</pre>
<h3> Exploring Django projects and apps </h3>
<p>The previous commands have made a new directory off of rss &#8216;apps&#8217; inside of this is all of the Django files for the site that I&#8217;m creating.  Of particular note is manage.py and settings.py.  Manage.py is a python script that will be used to interact with the Django internals.  One of the first steps I take is making it directly executed because it&#8217;s a lot easier to be typing <code>./manage.py</code> than <code>python manage.py</code> all the time.  Settings.py is used to specify django-specific settings for this installation.  In particular I&#8217;m going to be making the following changes:</p>
<pre class="brush: python; title: ; notranslate">
# Django settings for apps project.
from os.path import abspath, dirname, basename, join

DEBUG = False
TEMPLATE_DEBUG = DEBUG

ROOT_PATH = abspath(dirname(__file__))
PROJECT_NAME = basename(ROOT_PATH)

ADMINS = (
    ('Doug Warren', 'rss@dougwarren.org'),
)

MANAGERS = ADMINS

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
        'NAME': join(ROOT_PATH, 'db', 'rss.db'),# Or path to database file if using sqlite3.
        'USER': '',                      # Not used with sqlite3.
        'PASSWORD': '',                  # Not used with sqlite3.
        'HOST': '',                      # Set to empty string for localhost. Not used with sqlite3.
        'PORT': '',                      # Set to empty string for default. Not used with sqlite3.
    }
}
...
INSTALLED_APPS = (
    'django.contrib.sites',
    'amazonfeed',
)
</pre>
<p>Two words of note here.  First, I always try to make my projects as relocatable as possible.  As such, I never hardcode a path if I can avoid it at all.  The use of basename() and join() will enable me to make a development copy of the same project on the same machine just by checking it out of git.  For one of my other projects I run 3 different versions on the same VPS the only difference is the Apache config (See below for an example.)</p>
<p>Now, for the second point I&#8217;m going to contradict the first.  The django sites app has a default app called &#8216;example.com&#8217; and the Django syndication code uses the sites app to get the atom link.  So it will need to be set to the proper URL.  I&#8217;ll handle that after the database has been created.</p>
<h3> Adding Model Data </h3>
<p>The previous commands have created the app amazonfeed, I&#8217;ve added it to the INSTALLED_APPS list so Django knows it exists, and I&#8217;ve even defined a database that Django can use.  Now to add the code that will describe the tables in the database.</p>
<pre class="brush: bash; title: ; notranslate">
(rss)(master) [dwarren@thebigwave apps]$ cd amazonfeed
(rss)(master) [dwarren@thebigwave amazonfeed]$ vi models.py
</pre>
<pre class="brush: python; title: ; notranslate">
from django.db import models
from django.db.models import ForeignKey, DateField, CharField, URLField, TextField

class Vendor(models.Model):
    name = CharField(max_length=128)

    def __unicode__(self):
        return self.name

class App(models.Model):
    name = CharField(max_length=128)
    url = URLField(verify_exists=False)
    image_location = URLField(verify_exists=False)
    vendor = ForeignKey(Vendor)
    description = TextField()
    date = DateField(auto_now=True)

    def __unicode__(self):
        return self.name
</pre>
<p>I&#8217;ve kept things fairly simple here but still somewhat normalized.  The main entry of the Free App will be the App, and I&#8217;ll keep track of it&#8217;s name, the URL where it can be found, the image shown for the app, who makes it, what does Amazon have to say about it and when was it last seen.  The vendor data will live in another table so if one vendor is blessed enough to have multiple free apps of the day I won&#8217;t duplicate that data.  I probably should have taken the App ID out of the URL field, but I&#8217;m not certain at this point how portable that will be in the future.  I also originally thought of writing my own markup (and hence taking the image_location) but at this point I just display the description from a separate page.  The date field is the date that the App was last seen by the scraper, so anytime that the model saves the date will be updated.  Finally for both the vendor and the app itself, the __unicode__ special method which is used to display a representation of the object will simply return the name.</p>
<p>Now that the database models have been described, I can create the actual database.  this is done through the manage.py script discussed previously.  &#8216;syncdb&#8217; will scan through all of the models in the INSTALLED_APPS and create tables for them.  If tables already exist it will not alter them.  There is a standard app called <a href="http://south.aeracode.org/">South </a>that will handle migrations for you however.  (This is a slight failing of Django in my opinion, in that we&#8217;re in the 2nd decade of the 21st century and database migrations aren&#8217;t considered part of the core project still.)</p>
<pre class="brush: bash; title: ; notranslate">
(rss)(master) [dwarren@thebigwave amazonfeed]$ cd ..
(rss)(master) [dwarren@thebigwave apps]$ ./manage.py syncdb
</pre>
<p>I mentioned previously that the sites database will contain the example.com entry.  I&#8217;ll need to change that to be the site where the app is running from or some of the fields won&#8217;t come out right.  In developing other Django apps I&#8217;ve frequently had to delete and restart the database.  As such I prize repeatably.  For this next task I could have specified a <a href="https://code.djangoproject.com/wiki/Fixtures">fixture</a> to add a 2nd site ID, and updated the settings.py entry for SITE_ID to be 2.  Instead though I&#8217;ll have a small script that will change the existing SITE_ID 1:</p>
<pre class="brush: bash; title: ; notranslate">
(rss)(master) [dwarren@thebigwave apps]$ vi sites.py
</pre>
<pre class="brush: python; title: ; notranslate">
from django.contrib.sites.models import Site
my_site = Site.objects.get(pk=1)
my_site.domain = 'rss.dougwarren.org'
my_site.name = &quot;Doug's RSS Feeds&quot;
my_site.save()
</pre>
<p>And now to execute it I turn back to manage.py which using it&#8217;s shell subcommand let&#8217;s me execute python code with all of the paths correctly set.  (See below for doing so outside of manage.py)</p>
<pre class="brush: bash; title: ; notranslate">
(rss)(master) [dwarren@thebigwave apps]$ ./manage.py shell &lt; sites.py
(rss)(master) [dwarren@thebigwave apps]$ cd amazonfeed
(rss)(master) [dwarren@thebigwave amazonfeed]$ vi tasks.py
</pre>
<h3> Scrape the Page </h3>
<p>As mentioned in the introduction, I originally intended to use celery as a crontab, and hence the name of the scraper being tasks.  I may still do so in a followup.  However for now it will just contain a single function that will be run when the script is run:</p>
<pre class="brush: python; title: ; notranslate">
from lxml.html import fromstring, tostring
import requests
import re
from amazonfeed.models import Vendor, App

free_app_location = 'http://www.amazon.com/mobile-apps/b/ref=topnav_storetab_mas?node=2350149011'

def getfreeappdata():
    &quot;&quot;&quot; Get the data on the current free app and insert it into the database &quot;&quot;&quot;
    r = requests.get(free_app_location)

    amazon_url = 'http://www.amazon.com{0}'
    html = fromstring(r.content)

    app_html = html.cssselect('span.fad-widget-footer-title a')[0]
    app_name = app_html.text
    app_url = amazon_url.format(app_html.get('href'))

    vendor_html = html.cssselect('span.fad-widget-footer-vendor')[0]
    vendor_name = re.sub('by ', '', vendor_html.text)

    image_html = html.cssselect('div.fad-widget-large-artwork img')[0]
    image_location = image_html.get('src')

    description_request = requests.get(app_url)

    description_html = fromstring(description_request.content)
    description = description_html.cssselect('div.aplus')[0]

    # Create Django objects
    vendor = Vendor.objects.get_or_create(name=vendor_name)[0]

    # Check to see if this app already exists
    app_query = App.objects.filter(name=app_name)

    # Update the time on the current app
    if app_query.count() != 0:
        app = app_query[0]
    else:
        # Or create a new one
        app = App(name=app_name,
                url = app_url,
                image_location=image_location,
                vendor=vendor,
                description=tostring(description),
                )

    app.save()

if __name__ == '__main__':
    getfreeappdata()
</pre>
<p>The code is fairly self-explanatory.  The first third is getting the page from amazon, the middle third concerns isolating the variables that I care about, and the final portion creates the database objects.  A few things to note though.  First, there&#8217;s absolutely no error detection or evasion going on.  Scrapers are nasty dirty things.  If the site changes too much it will fail.  And when it does fail, I want to be notified in an E-Mail right away.  I don&#8217;t want to try to recover from the unexpected here, the unexpected is that a 3rd party changed the feed from under me and it&#8217;s not going to be recoverable.  Second, I&#8217;m looking forward a bit to the time when Amazon lists the same app a second time.  When that happens the date field will get updated and things will progress.</p>
<p>The next step is getting this task to be run at a scheduled time, so I&#8217;ll add it to the crontab.</p>
<pre class="brush: bash; title: ; notranslate">
(rss)(master) [dwarren@thebigwave amazonfeed]$ crontab -e
</pre>
<pre class="brush: bash; title: ; notranslate">
VIRTUAL_ENV=/home/dougwarren/rss
PATH=/home/dougwarren/rss/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/dwarren/bin
PYTHONPATH=/home/dougwarren/rss:/home/wyntersoft/dougwarren/apps
DJANGO_SETTINGS_MODULE=apps.settings

0 1 * * * python /home/wyntersoft/rss/apps/amazonfeed/tasks.py
</pre>
<p>The only thing to note here is that the PYTHONPATH specifies both the apps directory and the root of the virtualenv.  It seems odd to me that both are required, but they are.  (I would think just the apps directory should be sufficient.)  I specified the cron job to fire at 1AM PST as it seems that the free apps rotate at midnight PST.</p>
<h3> Django Feed </h3>
<p>Now that the model has created the database, and the script has populated the database, it&#8217;s time to read the database and output an RSS feed:</p>
<pre class="brush: bash; title: ; notranslate">
(rss)(master) [dwarren@thebigwave amazonfeed]$ vi feeds.py
</pre>
<pre class="brush: python; title: ; notranslate">
from django.contrib.syndication.views import Feed
from amazonfeed.models import Vendor, App
from amazonfeed.tasks import free_app_location

class AmazonFeed(Feed):
    title = 'Latest Free App of the Day'
    description = 'Latest Amazon Free App of the Day'
    link = free_app_location

    def items(self):
        return App.objects.order_by('-date')[:5]

    def link(self, obj):
        return free_app_location

    def item_link(self, item):
        return item.url

    def item_title(self, item):
        return &quot;{0} by {1}&quot;.format(item, item.vendor)

    def item_description(self, item):
        return item.description
</pre>
<p>This will return the 5 latest Apps sorted by date descending along with where they can be gotten from, and the description of the App.  You&#8217;ll note in the title the __unicode__() representation is being used to print both the App and the Vendor. The next step is to map it into the url scheme:</p>
<pre class="brush: bash; title: ; notranslate">
(rss)(master) [dwarren@thebigwave amazonfeed]$ cd ..
(rss)(master) [dwarren@thebigwave apps]$ vi urls.py
</pre>
<pre class="brush: python; title: ; notranslate">
from django.conf.urls.defaults import patterns, include, url
from apps.amazonfeed.feeds import AmazonFeed

urlpatterns = patterns('',
    (r'', AmazonFeed()),
    (r'AmazonFreeAppFeed/$', AmazonFeed()),
)
</pre>
<p>I&#8217;ve published /AmazonFreeAppFeed/ as the canonical URL for the project, but for now I&#8217;m also allowing / to access it as well.  Mostly because I wanted to do this entire project without having to define any templates or write any HTML.</p>
<p>At this point everything is done from the Django end.  The only thing left is to get the webserver to serve the pages.  I use apache, and I have a small WSGI template that I replicate over and over:</p>
<pre class="brush: python; title: ; notranslate">
(rss)(master) [dwarren@thebigwave rss]$ sudo vi /etc/httpd/conf/httpd.conf
</pre>
<pre class="brush: xml; title: ; notranslate">
&lt;VirtualHost *:80&gt;
        DocumentRoot &quot;/home/dougwarren/rss/apps&quot;
        ServerName rss.dougwarren.org
        Alias /static/ /home/dougwarren/rss/apps/static-final/
        WSGIScriptAlias / /home/dougwarren/rss/apps/rss.wsgi
        WSGIProcessGroup wynter
&lt;/VirtualHost&gt;
</pre>
<p>I set a wildcard DNS entry on the dougwarren.org domain to the address of my VPS.  As such, whenever I wish, I can add new hostnames to Apache and start serving content from it.  The last thing to do is to set up the wsgi that was specified above:</p>
<pre class="brush: bash; title: ; notranslate">
(rss)(master) [dwarren@thebigwave apps]$ vi rss.wsgi
</pre>
<pre class="brush: python; title: ; notranslate">
import sys
import site
import os

cur_path = os.path.dirname(__file__)
sys.path.append(cur_path)
base_path = os.path.abspath(os.path.join(cur_path,&quot;..&quot;))

sys.path.append(base_path)
prev_sys_path = list(sys.path)

# add the site-packages of our virtualenv as a site dir
site.addsitedir(os.path.join(base_path,'lib','python2.7','site-packages'))
site.addsitedir(os.path.join(base_path,'src'))

# reorder sys.path so new directories from the addsitedir show up first
new_sys_path = [p for p in sys.path if p not in prev_sys_path]
for item in new_sys_path:
    sys.path.remove(item)
sys.path[:0] = new_sys_path

# import from down here to pull in possible virtualenv django install
from django.core.handlers.wsgi import WSGIHandler
os.environ['DJANGO_SETTINGS_MODULE'] = 'apps.settings'
application = WSGIHandler()
</pre>
<p>This wsgi file is similar to the changes made to settings.py or the crontab, it&#8217;s based off of a snippet I found on-line somewhere, but I lost the attribution at some point. If you know where it&#8217;s from please let me know so I can update it.  Again no actual paths are specified everything is relative to where the wsgi file is specified.</p>
<p>The only thing left to do is restart Apache, and commit the changes!</p>
<pre class="brush: bash; title: ; notranslate">
(rss)(master) [dwarren@thebigwave apps]$ sudo /etc/rc.d/init.d/httpd restart
(rss)(master) [dwarren@thebigwave apps]$ git add settings.py rss.wsgi sites.py amazonfeed/models.py amazonfeed/feeds.py amazonfeed/tasks.py
(rss)(master) [dwarren@thebigwave apps]$ git commit -m 'Final commit'
</pre>
<p>Well, that&#8217;s it!  The Amazon Free Apps of the day are now parsed and I hopefully won&#8217;t miss any in the future, and if you point your RSS reader at <A HREF="http://rss.dougwarren.org/AmazonFreeAppFeed/">http://rss.dougwarren.org/AmazonFreeAppFeed/</A> maybe you won&#8217;t either.</p>
<ol>
<li> <a name="1">&nbsp;</a><a href="http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/">Python HTML Parser Performance</A>
<li> <a name="2">&nbsp;</a><a href="http://bitkickers.blogspot.com/2010/07/djangocelery-quickstart-or-how-i.html">Django/Celery Quickstart (or, how I learned to stop using cron and love celery)</A>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://dougwarren.org/2011/07/amazon-android-app-store-free-app-of-the-day-rss-feed-in-django/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quantity vs Quality?</title>
		<link>http://dougwarren.org/2010/11/quantity-vs-quality/</link>
		<comments>http://dougwarren.org/2010/11/quantity-vs-quality/#comments</comments>
		<pubDate>Fri, 05 Nov 2010 23:49:09 +0000</pubDate>
		<dc:creator>Doug Warren</dc:creator>
				<category><![CDATA[Philosophy]]></category>

		<guid isPermaLink="false">http://dougwarren.org/?p=80</guid>
		<description><![CDATA[&#8220;The ceramics teacher announced on opening day that he was dividing the class into two groups. All those on the left side of the studio, he said, would be graded solely on the quantity of work they produced, all those on the right solely on its quality. His procedure was simple: on the final day [...]]]></description>
			<content:encoded><![CDATA[<p>&#8220;The ceramics teacher announced on opening day that he was dividing the class into two groups. All those on the left side of the studio, he said, would be graded solely on the quantity of work they produced, all those on the right solely on its quality.</p>
<p>His procedure was simple: on the final day of class he would bring in his bathroom scales and weigh the work of the “quantity” group: fifty pound of pots rated an “A”, forty pounds a “B”, and so on. Those being graded on “quality”, however, needed to produce only one pot &#8211; albeit a perfect one &#8211; to get an “A”.</p>
<p>Well, came grading time and a curious fact emerged: the works of highest quality were all produced by the group being graded for quantity. It seems that while the “quantity” group was busily churning out piles of work &#8211; and learning from their mistakes &#8211; the “quality” group had sat theorizing about perfection, and in the end had little more to show for their efforts than grandiose theories and a pile of dead clay.&#8221;<br />
- <cite>Art &amp; Fear: Observations On the Perils (and Rewards)</cite></p>
]]></content:encoded>
			<wfw:commentRss>http://dougwarren.org/2010/11/quantity-vs-quality/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Sometimes to progress you must move backwards</title>
		<link>http://dougwarren.org/2010/06/sometimes-to-progress-forward-you-must-move-backwards/</link>
		<comments>http://dougwarren.org/2010/06/sometimes-to-progress-forward-you-must-move-backwards/#comments</comments>
		<pubDate>Thu, 24 Jun 2010 03:56:38 +0000</pubDate>
		<dc:creator>Doug Warren</dc:creator>
				<category><![CDATA[Fitness]]></category>
		<category><![CDATA[fitness]]></category>

		<guid isPermaLink="false">http://dougwarren.org/?p=51</guid>
		<description><![CDATA[There is a concept in resistance training of the plateau. The idea being that after weeks or months of straight hopefully linear progress all gains in increasing weight or increasing the numbers of reps stop. There are many reasons for a plateau and almost everyone has their own theory. The ones that I believe the [...]]]></description>
			<content:encoded><![CDATA[<p><span class="initialcap">T</span>here is a concept in resistance training of the plateau.  The idea being that after weeks or months of straight hopefully linear progress all gains in increasing weight or increasing the numbers of reps stop.  There are many reasons for a plateau and almost everyone has their own theory.  The ones that I believe the most in are that Central Nervous System Adaptation, Over Training, and Bad Form.  Over the past several weeks, I believe I&#8217;ve suffered from the first and third so let&#8217;s take a look at those separately.</p>
<h3>Central Nervous System Adaptation</h3>
<p>Since I started resistance training in October of last year, I haven&#8217;t missed a single day.  Three times a week initially moving up to four recently with added depletion workouts when I switched to a <a href='http://en.wikipedia.org/wiki/Cyclic_ketogenic_diet'>Cyclic Ketogenic Diet</a> that&#8217;s the same sets of back squats, same sets of dead-lifts, and same sets of bench presses week after week.  After a period of time your central nervous system will no longer adapt to the demands placed on it and you must change to move forward.  In my case, I knew that I had a week vacation coming up at Disney World.  Unlike previous vacations I made no plans to find a local gym, I didn&#8217;t pack protein-powder in my luggage, instead I planned on eating three meals a day rather than my traditional 6, and the only exercise I got was hours of low impact cardio from walking through the parks again and again.  When I returned I weighed myself the day after resuming the low carb portion of the CKD, and I weighed the same I had the Monday before I left.</p>
<h3>Bad Form</h3>
<p>The second major problem that I suffered from is that after the loss of my work-out partner, I had no one to &#8216;keep me honest&#8217; I was close to a few goals in the different compound lifts and I grew sloppy in desiring to reach them.  For bench presses I increased the weights if I was able to reach all of the reps in my set regardless of how shakey or close to failure I was.  For Squats, I had the squat turn into a good morning coming out of the hole on the first rep, but I was so close to a goal I increased the weight anyway.  Finally for Dead-lifts I&#8217;d increase the weights even on a case when I missed the last rep of a set.  In each case I made my goal, but ran into a plateau almost immediately afterwords.  For this issue and likewise for Over Training there&#8217;s only one real solution.</p>
<h3>De-Load</h3>
<p>(Also known as the backoff, or reset)  If you reach a plateau in an exercise particularly in one of the major compound exercises where week after week you reach the same weight for the same reps or even fall back a rep or two, the number one way to break this is to reduce the weight by 20% and start increasing the amount again on a weekly basis.</p>
<p>In my case it was only the compound lifts that were stalling (However it was all three of them.)  I&#8217;m still making gains to every accessory and that feels good, but more importantly after resetting the weight I was able to focus on the form again.  I&#8217;m convinced now that in the absence of a partner watching the form the most important thing to pay attention to while lifting is the barbell speed.  That in addition to the number of reps made in the set will determine if I increase a lift for the next session.</p>
<p>With-in weeks I will be at the same place I was before my de-load, and the week after that I expect to surpass it.  (3 weeks for bench, 2 weeks for squats, 3 weeks for dead-lifts.)  Because, sometimes to move progress you must move backwards.</p>
]]></content:encoded>
			<wfw:commentRss>http://dougwarren.org/2010/06/sometimes-to-progress-forward-you-must-move-backwards/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OAuth and Web2Py Part 1</title>
		<link>http://dougwarren.org/2010/06/oauth-and-web2py-part-1/</link>
		<comments>http://dougwarren.org/2010/06/oauth-and-web2py-part-1/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 15:36:25 +0000</pubDate>
		<dc:creator>Doug Warren</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[web2py]]></category>
		<category><![CDATA[oauth]]></category>

		<guid isPermaLink="false">http://dougwarren.org/?p=28</guid>
		<description><![CDATA[Many Website integrations done today involves Authorization and Authentication via an open protocol known as]]></description>
			<content:encoded><![CDATA[<p><span class="initialcap">M</span>any Website integrations done today involves Authorization and Authentication via an open protocol known as <a href="http://wwww.oauth.org>OAuth</a>  While web2py doesn&#8217;t natively support authorization via OAuth, in the next two articles I will be demonstrating first how to create and verify an OAuth ticket, and in the second article how to integrate this with the built in web2py auth system.</p>
<h3>OAuth overview</h3>
<p>First, a little bit about OAuth, OAuth is defined in <A HREF="http://tools.ietf.org/html/rfc5849">RFC-5849</A> as a method for clients to gain access to server resources for a resource owner.  In this nomenclature the web2py application is a client or as it was called in the original OAuth specification a consumer (of the resource.) the visitor to the application is the resource owner or User, and the remote connection that provides the resource is the service provider or server.</p>
<p>OAuth uses a three part handshake to verify the credentials of a connection.  If there is an application that wishes to make a transactional request for an user with a 3rd party, the application and the 3rd party (Google, twitter, etc&#8230;) first communicate together setting up a public key encryption involving a key and a secret.  (Called the client credentials)  Using these credentials the application requests a temporary set of credentials called the Request Token that identifies a transaction to take place in the future.  The application then redirects the end user to the server giving the request token as an argument to the URL.  The end user identifies themselves to the service and on authentication the end user is redirected back to the application website to either a predetermined callback page or to a page specified in the redirect.  At this point the client presents it&#8217;s oauth token and secret to the application which the application then use to connect to the remote server and validate it.  This last step returns an access token.  Until the access token expires, it may  be cached to link an user with the permitted resources.</p>
<p>By design neither the application developer personally nor the application itself knows anything about the credentials that the user submitted to the remote system, the application does not know any passwords, nor does the application knows any e-mail addresses or other mechanisms to identify the user, it only knows that given this access token the remote server will respond with requested information about a specific user; to the <b>remote system</b> he has been authenticated, and the <b>application</b> has been authorized to make requests.</p>
<h3>Acquiring OAuth:</h3>
<p>You can get a Python OAuth library straight from its source and install it as a site-package from <A HREF="http://github.com/simplegeo/python-oauth2">GitHub</A> It also relies on httplib2 which can be acquired from <A HREF="http://code.google.com/p/httplib2/">Httlipb2</a>&#8216;s website.  You can simply copy the python2/httplib2 and oauth2 into the web2py/site-packages directory.</p>
<p>For example application I will be creating a twitter client, it&#8217;s not the most original thing ever, but it&#8217;s a functional test.  First register with Twitter&#8217;s <A HREF="http://twitter.com/oauth_clients">OAuth Clients</A> On this form, select a browser application, Read &#038; Write access, that your app does want to use twitter for authentication, and specify a callback URL as site/app/controller/function.  In the example I am using http://127.0.0.1:8000/twitter/default/callback  Then a page will be displayed that shows the consumer key and secret, as well as the three URLs to connect to the server, the Request token URL, Access token URL, and Authorize URL.  These are common with any OAuth service that I&#8217;ve seen.</p>
<h3>Creating The Application</h3>
<p>Create a new app through the administrative control panel and name it twitter.  Begin editing models/db.py by appending to the end of the file:</p>
<pre class="brush: python; title: ; notranslate">
consumer_key = &quot;PASTED FROM TWITTER&quot;
consumer_secret = &quot;PASTED FROM TWITTER&quot;

request_token_url = 'https://twitter.com/oauth/request_token'
access_token_url = 'https://twitter.com/oauth/access_token'
authorize_url = 'https://twitter.com/oauth/authorize'
</pre>
<p>Now that the global read-only data is in place, edit the controllers/default.py file to load the needed libraries:</p>
<pre class="brush: python; title: ; notranslate">
from oauth2 import Client, Consumer, Token

try:
    from urlparse import parse_qs, parse_qsl
except ImportError:
    from cgi import parse_qs, parse_qsl

import time
import gluon.contrib.simplejson
</pre>
<p>The parse_sql trick is straight from the OAuth API.  urlparse was added to the standard lib with Python 2.5 (And in Python 3.0 it has been moved to httplib.Parse) in Python 2.4 cgi is a base module so the above import will maintain backwards compatibility as is the way of web2py. </p>
<h3>OAuth Tickets</h3>
<p>There are three states that an user can be in when it comes to an OAuth connection, they can have no tokens at all in which case a Request Token will need to be generated and handed to them, they could be giving the application an oauth token for it to validate, or they could be fully authenticated and authorized and have a valid auth token associated with them.  To tackle the states one at a time:</p>
<p>in the default.py controller file, edit the index method as:</p>
<pre class="brush: python; title: ; notranslate">
def index():
    consumer = Consumer(key=consumer_key, secret=consumer_secret)

    client = Client(consumer)

    resp, content = client.request(request_token_url, &quot;GET&quot;)
    if resp['status'] != '200':
        redirect(URL(r=request, f='maintenance'))

    # Turn response into dict
    request_token = dict(parse_qsl(content))

    # Store it in a session
    session.request_token = request_token

    redirect_location = &quot;%s?oauth_token=%s&quot; % (authorize_url, request_token['oauth_token'])
    redirect(redirect_location)
</pre>
<blockquote>
<h5>A Note on Redirects</h5>
<p>Web2py redirect() method acts as a Python exception, all code stops being executed at that time and a 302 HTTP redirect header is sent to the client.  From web2py&#8217;s point of view, when the user is redirected a new request and response object is created as it is an entirely new connection.  This enables the application to control the work-flow in a dynamic fashion.
</p></blockquote>
<p>What I&#8217;ve done here is to create a consumer object using the consumer key, then requested a request token for this transaction.  If anything went wrong with this request I&#8217;ve redirected the user to a maintenance page.  Without a twitter token and with no way to get one the application doesn&#8217;t do anything.  However if one was retrieved properly (As it probably would be in almost every case) it is then converted to a dict and store it in the current session for later retrieval.  In the future it will be stored in the auth database instead.  What is being stored in the session?  Just two pieces of data:<br />
oauth_token     :       tPmOLbed75wgvqVKZXmStShhNIBSmaGBjIOBABGHHo<br />
oauth_token_secret      :       THE SECRET REMOVED</p>
<p>Now access the page with a web browser, You will be asked to log in via a twitter log in page and then you will be redirected to a page that doesn&#8217;t yet exist, in my case: http://127.0.0.1:8000/twitter/default/callback?oauth_token=UkGtfUfvQ4rJlNIn35WhHi8UBNXo9EPVMuzvOvcavM where the later part was the oauth token returned from the log in.  This oauth_token now needs to be validated with the remote server and turned into an access token to complete the third part of the handshake.  Doing this requires the callback function to be implemented:</p>
<pre class="brush: python; title: ; notranslate">
def callback():
    if session.request_token:
        consumer = Consumer(key=consumer_key, secret=consumer_secret)

        token = Token(session.request_token['oauth_token'],
            session.request_token['oauth_token_secret'])

        client = Client(consumer, token)

        resp, content = client.request(access_token_url, &quot;GET&quot;)
        if resp['status'] != '200':
            redirect(URL(r=request, f='maintenance'))

        access_token = dict(parse_qsl(content))
        session.access_token = access_token

    redirect(URL(r=request, f='index'))
</pre>
<p>At this point there is now a successful access_token in the session and calls can be made using it.  Each OAuth server can return additional data about the session with the access_token.  However, this is not required and is completely specific to the server being connected.  For example, Twitter will return two pieces of data: session.access_token['screen_name'] is the screen name of the logged in person, and session.access_token['user_id'] is the user id.  Now that the access_token has been created, new requests can be made to gather more data.  Add a new block to the start of index() as so:</p>
<pre class="brush: python; title: ; notranslate">
    if session.access_token:
        consumer = Consumer(key=consumer_key, secret=consumer_secret)
        token = Token( session.access_token['oauth_token'], session.access_token['oauth_token_secret'])
        client =  Client(consumer, token )

        resp, content = client.request('http://api.twitter.com/1/statuses/home_timeline.json','GET')

        if resp['status'] != '200':
            if resp['status'] == '401':
                session.clear()
                redirect(URL(r=request, f='index'))
            redirect(URL(r=request, f='maintenance'))

        return dict(statuses=gluon.contrib.simplejson.loads(content))
</pre>
<p>This block of code is something that will be seen repeatedly in dealing with OAuth requests.  Create a consumer, Create a token, Create a client, make a request of the client, verify the result of the request, decompress the request into a dict.  Additional checks to ensure that the token is still valid.  If it&#8217;s not then all data associated with the session will be cleared and the login workflow will begin again by redirecting the user back to the index page.  Here a request was made for JSON encoded data, one could also ask for XML encoded data from Twitter and parse it with an XML reader.  To my knowledge there is no difference among the two methods of encoding.  I prefer to request JSON just in case I need to display the raw data on the client and manipulate it via JavaScript.</p>
<p>By the end of the controller&#8217;s index() execution, any status updates that the user would see from the front page of Twitter is loaded into the statuses key and can be displayed in the view.  To do so, edit views/default/index.html as:</p>
<pre class="brush: python; title: ; notranslate">
{{if statuses:}}
{{tab = TABLE(TR(TH('Tweeter'),TH('Tweet')))}}
{{for status in statuses: }}
	{{tab.append(TR(TD(status['user']['screen_name']), TD(status['text'])))}}
	{{pass}}
{{=tab}}
{{pass}}
</pre>
<p>This ends the initial post on how to use OAuth with Web2Py, in the next post I will discuss integrating the web2py auth by associating an existing user with an oauth session, and finally with using oauth completely in place of the auth logins.  I will also discuss decorators slightly and global functions to make your life easier.</p>
]]></content:encoded>
			<wfw:commentRss>http://dougwarren.org/2010/06/oauth-and-web2py-part-1/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Controller wide auth in web2py</title>
		<link>http://dougwarren.org/2010/06/controller-wide-auth-in-web2py/</link>
		<comments>http://dougwarren.org/2010/06/controller-wide-auth-in-web2py/#comments</comments>
		<pubDate>Thu, 10 Jun 2010 15:24:07 +0000</pubDate>
		<dc:creator>Doug Warren</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[web2py]]></category>

		<guid isPermaLink="false">http://dougwarren.org/?p=3</guid>
		<description><![CDATA[Lately I&#8217;ve been spending a lot of time working on a home project that uses web2py. It&#8217;s a really powerful MVC based framework written in Python. Part of what I love about it is the speed that you can prototype code and see it on a website. It allows you to quickly mock up the [...]]]></description>
			<content:encoded><![CDATA[<p><span class="initialcap">L</span>ately I&#8217;ve been spending a lot of time working on a home project that uses <a href="http://www.web2py/">web2py</a>.  It&#8217;s a really powerful <a href="http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">MVC</a> based framework written in Python.  Part of what I love about it is the speed that you can prototype code and see it on a website.  It allows you to quickly mock up the interaction of several database tables and after a weekend of working with it, I was able to re-implement a project that took me three weeks of time to do in C++.  However, s I worked more with it I have noticed that a lot of it&#8217;s more powerful features are either not documented, documented only in comments in the source code, or only documented in passing on the <a href="http://groups.google.com/group/web2py">mailing-list</a>.  Part of that is because development is so fast-paced, part because development is somewhat distributed in that there are many separate places where knowledge lives, and part of that is because the main documentation is a book that only gets updated once a year at best.</p>
<p>So one of my goals is to add to this problem.  <img src='http://dougwarren.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   As I have read the past several months of archives of the mailing list I&#8217;ve kept notes of what&#8217;s <i>interesting</i> in the problems that come up and the solutions offered.  Over time I hope to document these bits of knowledge and provide links to running examples of how they can be used.</p>
<p>The first of these however is one that I&#8217;ve come up with myself the other day.  I have an app in mind that requires the assumption that the user is logged in before taking any action in that controller.  Other controllers allow some actions while not logged in, but this controller was deep in the work-flow and each function with-in it was assumed to have a valid user.  What I had done initially was to decorate each function in the controller with an auth check such as:</p>
<pre class="brush: python; title: ; notranslate">
@auth.requires_login()
def index():
    return dict(message=&quot;You are logged in&quot;)
</pre>
<p>The problem of course is that as the number of functions in the controller grew it became very error prone to have to decorate each function.  Also it turned out that there was a small class of functions where the logged in requirement wasn&#8217;t actually a requirement and that the end-user should be able to to access the page while not logged in.  So after a few false starts, I settled on:</p>
<pre class="brush: python; title: ; notranslate">
import urllib

# List of functions to be called without being logged in
whitelist = ('chat')

if not auth.basic() and not auth.is_logged_in() \
   and not auth.environment.request.function in whitelist:
    request = auth.environment.request
    next = URL(r=request,args=request.args, vars=request.get_vars)
    redirect(auth.settings.login_url + '?_next='+urllib.quote(next))
</pre>
<p>Placing the above snippet in your controller file will allow the chat function to be executed without being logged in, but any other function request will redirect the viewer to the login url and then back to their original destination.  </p>
]]></content:encoded>
			<wfw:commentRss>http://dougwarren.org/2010/06/controller-wide-auth-in-web2py/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Goodbye and hello, as always</title>
		<link>http://dougwarren.org/2010/06/goodbye-and-hello-as-always/</link>
		<comments>http://dougwarren.org/2010/06/goodbye-and-hello-as-always/#comments</comments>
		<pubDate>Thu, 10 Jun 2010 15:17:23 +0000</pubDate>
		<dc:creator>Doug Warren</dc:creator>
				<category><![CDATA[blog]]></category>

		<guid isPermaLink="false">http://dougwarren.org/?p=16</guid>
		<description><![CDATA[Shortly after I began blogging there was a catastrophic failure on my hosting provider. I lost 4 personal websites, 2 of the 3 SVN repositories I use were corrupted beyond recovery, and well generally bad things happened. I&#8217;ve spent a lot of my free time lately recreating what was lost and getting everything re-implemented. In [...]]]></description>
			<content:encoded><![CDATA[<p><span class="initialcap">S</span>hortly after I began blogging there was a catastrophic failure on my hosting provider.  I lost 4 personal websites, 2 of the 3 SVN repositories I use were corrupted beyond recovery, and well generally bad things happened.  I&#8217;ve spent a lot of my free time lately recreating what was lost and getting everything re-implemented.  In some cases in a <a href="http://www.thebigwave.net/wiki/Category:Areas">far better format</a>.  Last night I finally had the time to bring the blog back.  At the moment the theme is recovered, some of the plugins are, but the articles aren&#8217;t back yet.  I hope to get those restored soon but in the mean time there&#8217;s new content coming as well of what I&#8217;ve learned recently.  </p>
]]></content:encoded>
			<wfw:commentRss>http://dougwarren.org/2010/06/goodbye-and-hello-as-always/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

