[Previous] How To Get Popular | Home | [Next] Silly Studies and Food Fads

Programmer Productivity

Yannis has proposed Yannis's Law which states that programmer productivity doubles every 6 years. He gets the figure from a project that took a week or two in 1972, but would now take an hour or two. I just did it in twenty minutes using a plain text editor (with python syntax highlighting) and a unix terminal. My Python is rusty.

The KWIC index system accepts an ordered set of lines, each line is an ordered set of words, and each word is an ordered set of characters. Any line may be "circularly shifted" by repeatedly removing the first word and appending it at the end of the line. The KWIC index system outputs a listing of all circular shifts of all lines in alphabetical order. This is a small system. Except under extreme circumstances (huge data base, no supporting software), such a system could be produced by a good programmer within a week or two.
Here's my code:

#!/usr/bin/env python
def main():
    f = open("kwic.txt", "rU")
    out = open("kwic-output.txt", "w")
    final = []
    for line in f:
        words = line.split()
        count = len(words)
        for i in xrange(count):
            final.append(makestr(words))
            cycle(words)        
    final.sort()
    for ele in final:
        out.write(ele + "\n")
        
def makestr(li):
    s = ""
    first = 1
    for ele in li:
        if first == 1:
            first = 0
            s += ele
        else:
            s += " " + ele
    return s
    
def cycle(li):
    tmp = li[0]
    del li[0]
    li.append(tmp)
    return li

if __name__ == '__main__': main()


By the way, if someone knows a more elegant way to avoid having an extra space in makestr, let me know. I'm aware of the option of deleting the first character after making the string, but I don't consider that very nice either.

Elliot Temple on February 20, 2006

Messages (4)

Unix (sh/ksh/bash):

while read line; do n=$(echo "$line" | rs -h); n=${n#* }; echo "$line" | rs -yg1 $n $((n+1)); done | sed -e 's/^[^ ]* *//' -e 's/ */ /g' | sort -df

Note: there are supposed to be two spaces before the * in the sed patterns, but blogger won't let me use <pre>.


kps at 5:32 PM on February 20, 2006 | #33 | reply | quote

I think you might want the string method "join" for the space problem. See how I used it below. Thanks for the fun problem, by the way!

Underscores for leading spaces:

def lineToCycles(line):
____toks = line.split(' ')
____def rot(first):
________return ' '.join(toks[first:] + toks[:first])
____return [rot(i) for i in range(len(toks))]

def kwic(lines):
____result = [];
____for line in lines:
________result += lineToCycles(line)
____result.sort()
____return result

# For instance,

print kwic(['blah foo bar', 'one two three'])

# prints ['bar blah foo', 'blah foo bar', 'foo bar blah', 'one two three', 'three one two', 'two three one']


Ben at 5:52 PM on February 20, 2006 | #34 | reply | quote

Oh right, Python has a join function built in already :-)

I decided it'd be convenient to avoid an extra space if combing recursively and wrote this in Scheme:

(define makestr
__(lambda (li)
____(cond ((null? li) "")
______((null? (cdr li)) (car li))
______((string-append (car li) " " (makestr (cdr li)))))))


Elliot at 6:23 PM on February 20, 2006 | #35 | reply | quote

You can get shorter...
def line_to_cycles(line):
____toks = line.split(' ')
____return [" ".join(toks[i:] + toks[:i])
________________for i in range(len(toks))]

def kwic(lines):
____return sorted(line_to_cycles(line) for line in lines)

-T.


Taro at 7:01 PM on February 20, 2006 | #36 | reply | quote

Want to discuss this? Join my forum.

(Due to multi-year, sustained harassment from David Deutsch and his fans, commenting here requires an account. Accounts are not publicly available. Discussion info.)