The KWIC index system accepts an ordered set of lines, each line is an ordered set of words, and each word is an ordered set of characters. Any line may be "circularly shifted" by repeatedly removing the first word and appending it at the end of the line. The KWIC index system outputs a listing of all circular shifts of all lines in alphabetical order. This is a small system. Except under extreme circumstances (huge data base, no supporting software), such a system could be produced by a good programmer within a week or two.Here's my code:
#!/usr/bin/env python
def main():
f = open("kwic.txt", "rU")
out = open("kwic-output.txt", "w")
final = []
for line in f:
words = line.split()
count = len(words)
for i in xrange(count):
final.append(makestr(words))
cycle(words)
final.sort()
for ele in final:
out.write(ele + "\n")
def makestr(li):
s = ""
first = 1
for ele in li:
if first == 1:
first = 0
s += ele
else:
s += " " + ele
return s
def cycle(li):
tmp = li[0]
del li[0]
li.append(tmp)
return li
if __name__ == '__main__': main()By the way, if someone knows a more elegant way to avoid having an extra space in makestr, let me know. I'm aware of the option of deleting the first character after making the string, but I don't consider that very nice either.