all.txt was designed to be easy to process. The general format is: ----------Diary: $diary_poster ---------- ----------url: $diary_url ---------- ----------topic: $diary_topic ---------- ----------date: $diary_date ---------- $diary_body ----------end Diary: $diary_poster ---------- ----------comment: $first_comment_poster ---------- ----------url: $first_comment_url ---------- ----------topic: $first_comment_topic ---------- ----------date: $first_comment_date ---------- $first_comment_body ----------end comment: $first_comment_poster ---------- ----------comment: $second_comment_poster ---------- ----------url: $second_comment_url ---------- ----------topic: $second_comment_topic ---------- ----------date: $second_comment_date ---------- $second_comment_body ----------end comment: $second_comment_poster ---------- [snip] ----------Diary: $second_diary_poster ---------- ----------url: $second_diary_url ---------- ----------topic: $second_diary_topic ---------- ----------date: $second_diary_date ---------- $second_diary_body ----------end Diary: $second_diary_poster ---------- etc. Some examples: Show the meta information: $ grep "^----------" all.txt | head ----------Diary: Vampire Zombie Abu Musab al Zarqawi ---------- ----------url: http://www.kuro5hin.org/story/2011/1/24/161116/052 ---------- ----------topic: It's actually been fairly warm lately ---------- ----------date: Mon Jan 24, 2011 at 04:11:16 PM EST ---------- ----------end Diary: Vampire Zombie Abu Musab al Zarqawi ---------- ----------comment: king of fools ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/161116/052/1#1 ---------- ----------topic: it's been ridiculously cold here - a few days ago ---------- ----------date: Mon Jan 24, 2011 at 07:15:21 PM EST ---------- ----------end comment: king of fools ---------- Show the diary posters: $ grep "^----------Diary" all.txt | head ----------Diary: Vampire Zombie Abu Musab al Zarqawi ---------- ----------Diary: modus ---------- ----------Diary: modus ---------- ----------Diary: Nimey ---------- ----------Diary: hugin ---------- ----------Diary: United Fools ---------- ----------Diary: GreyGhost ---------- ----------Diary: hugin ---------- ----------Diary: jxg ---------- ----------Diary: donnalee ---------- Show the comment posters: $ grep "^----------comment" all.txt | head ----------comment: king of fools ---------- ----------comment: Brogdel ---------- ----------comment: Ruston Rustov ---------- ----------comment: cockskin horsesuit ---------- ----------comment: Nimey ---------- ----------comment: hugin ---------- ----------comment: osm ---------- ----------comment: Ezra Loomis Pound ---------- ----------comment: Blarney ---------- ----------comment: modus ---------- count the number of diaries: $ grep -c "^----------Diary" all.txt 6000 count the number of comments: $ grep -c "^----------comment" all.txt 65300 count the number of diaries by modus: $ grep -c "^----------Diary: modus" all.txt 26 count the number of comments by Ruston Rustov: $ grep -c "^----------comment: Ruston Rustov ----------" all.txt 9 Show all urls (diaries and comments): $ grep "^----------url:" all.txt | head ----------url: http://www.kuro5hin.org/story/2011/1/24/161116/052 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/161116/052/1#1 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/161116/052/2#2 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/161116/052/3#3 ---------- ----------url: http://www.kuro5hin.org/story/2011/1/24/16276/4552 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/1#1 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/2#2 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/3#3 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/4#4 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/11#11 ---------- Show only diary urls: $ grep "^----------url:" all.txt | grep -v "#" | head ----------url: http://www.kuro5hin.org/story/2011/1/24/161116/052 ---------- ----------url: http://www.kuro5hin.org/story/2011/1/24/16276/4552 ---------- ----------url: http://www.kuro5hin.org/story/2011/1/24/211032/954 ---------- ----------url: http://www.kuro5hin.org/story/2011/1/24/212548/441 ---------- ----------url: http://www.kuro5hin.org/story/2011/1/24/221724/773 ---------- ----------url: http://www.kuro5hin.org/story/2011/1/25/13283/9516 ---------- ----------url: http://www.kuro5hin.org/story/2011/1/25/193718/873 ---------- ----------url: http://www.kuro5hin.org/story/2011/1/25/19819/2990 ---------- ----------url: http://www.kuro5hin.org/story/2011/1/25/201931/094 ---------- ----------url: http://www.kuro5hin.org/story/2011/1/25/211656/574 ---------- Show only comment urls: $ grep "^----------url:" all.txt | grep "#" | head ----------url: http://www.kuro5hin.org/comments/2011/1/24/161116/052/1#1 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/161116/052/2#2 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/161116/052/3#3 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/1#1 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/2#2 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/3#3 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/4#4 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/11#11 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/5#5 ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/8#8 ---------- Show diary and comment topics: $ grep "^----------topic:" all.txt | head ----------topic: It's actually been fairly warm lately ---------- ----------topic: it's been ridiculously cold here - a few days ago ---------- ----------topic: We've had a bad winter already. ---------- ----------topic: oh no! an inch! ---------- ----------topic: Vintage Crawford #4 ---------- ----------topic: BAHAHA!!! ---------- ----------topic: Kill yourself ---------- ----------topic: crawford eagerly awaits JDK 1.4 ---------- ----------topic: i would expect more ---------- ----------topic: about 2000 words more, in fact .nt ---------- Show diary topics only: $ grep -A2 "^----------Diary" all.txt | grep "^----------topic" | head ----------topic: It's actually been fairly warm lately ---------- ----------topic: Vintage Crawford #4 ---------- ----------topic: Vintage Crawford #5 ---------- ----------topic: ATTN: channel ---------- ----------topic: assholes charging $20 for a 30mg roxy ---------- ----------topic: No time to fool around! ---------- ----------topic: GOP: The party of small government ---------- ----------topic: Praise jesus ---------- ----------topic: Best programming language for noobs? ---------- ----------topic: Boehner looks like he's gonna cry already ---------- Show comment topics only: $ grep -A2 "^----------comment" all.txt | grep "^----------topic" | head ----------topic: it's been ridiculously cold here - a few days ago ---------- ----------topic: We've had a bad winter already. ---------- ----------topic: oh no! an inch! ---------- ----------topic: BAHAHA!!! ---------- ----------topic: Kill yourself ---------- ----------topic: crawford eagerly awaits JDK 1.4 ---------- ----------topic: i would expect more ---------- ----------topic: about 2000 words more, in fact .nt ---------- ----------topic: honestly, this is unfair ---------- ----------topic: It's a study in contrasts. Consider it art. $ ---------- Show posting times: $ grep "^----------date:" all.txt | head ----------date: Mon Jan 24, 2011 at 04:11:16 PM EST ---------- ----------date: Mon Jan 24, 2011 at 07:15:21 PM EST ---------- ----------date: Tue Jan 25, 2011 at 01:39:08 AM EST ---------- ----------date: Tue Jan 25, 2011 at 10:29:25 AM EST ---------- ----------date: Mon Jan 24, 2011 at 04:27:06 PM EST ---------- ----------date: Mon Jan 24, 2011 at 04:46:45 PM EST ---------- ----------date: Mon Jan 24, 2011 at 04:47:11 PM EST ---------- ----------date: Mon Jan 24, 2011 at 05:07:41 PM EST ---------- ----------date: Mon Jan 24, 2011 at 05:22:18 PM EST ---------- ----------date: Tue Jan 25, 2011 at 06:09:15 AM EST ---------- Show diary posting times only: $ grep -A3 "^----------Diary" all.txt | grep "^----------date:" | head ----------date: Mon Jan 24, 2011 at 04:11:16 PM EST ---------- ----------date: Mon Jan 24, 2011 at 04:27:06 PM EST ---------- ----------date: Mon Jan 24, 2011 at 09:10:32 PM EST ---------- ----------date: Mon Jan 24, 2011 at 09:25:48 PM EST ---------- ----------date: Mon Jan 24, 2011 at 10:17:24 PM EST ---------- ----------date: Tue Jan 25, 2011 at 01:28:03 PM EST ---------- ----------date: Tue Jan 25, 2011 at 07:37:18 PM EST ---------- ----------date: Tue Jan 25, 2011 at 07:08:19 PM EST ---------- ----------date: Tue Jan 25, 2011 at 08:19:31 PM EST ---------- ----------date: Tue Jan 25, 2011 at 09:16:56 PM EST ---------- Show comment posting times only: $ grep -A3 "^----------comment" all.txt | grep "^----------date:" | head ----------date: Mon Jan 24, 2011 at 07:15:21 PM EST ---------- ----------date: Tue Jan 25, 2011 at 01:39:08 AM EST ---------- ----------date: Tue Jan 25, 2011 at 10:29:25 AM EST ---------- ----------date: Mon Jan 24, 2011 at 04:46:45 PM EST ---------- ----------date: Mon Jan 24, 2011 at 04:47:11 PM EST ---------- ----------date: Mon Jan 24, 2011 at 05:07:41 PM EST ---------- ----------date: Mon Jan 24, 2011 at 05:22:18 PM EST ---------- ----------date: Tue Jan 25, 2011 at 06:09:15 AM EST ---------- ----------date: Mon Jan 24, 2011 at 06:21:44 PM EST ---------- ----------date: Mon Jan 24, 2011 at 07:59:22 PM EST ---------- Show diary meta-data: $ grep -A3 "^----------Diary: " all.txt | head -20 ----------Diary: Vampire Zombie Abu Musab al Zarqawi ---------- ----------url: http://www.kuro5hin.org/story/2011/1/24/161116/052 ---------- ----------topic: It's actually been fairly warm lately ---------- ----------date: Mon Jan 24, 2011 at 04:11:16 PM EST ---------- -- ----------Diary: modus ---------- ----------url: http://www.kuro5hin.org/story/2011/1/24/16276/4552 ---------- ----------topic: Vintage Crawford #4 ---------- ----------date: Mon Jan 24, 2011 at 04:27:06 PM EST ---------- -- ----------Diary: modus ---------- ----------url: http://www.kuro5hin.org/story/2011/1/24/211032/954 ---------- ----------topic: Vintage Crawford #5 ---------- ----------date: Mon Jan 24, 2011 at 09:10:32 PM EST ---------- -- ----------Diary: Nimey ---------- ----------url: http://www.kuro5hin.org/story/2011/1/24/212548/441 ---------- ----------topic: ATTN: channel ---------- ----------date: Mon Jan 24, 2011 at 09:25:48 PM EST ---------- -- Show comment meta-data: $ grep -A3 "^----------comment: " all.txt | head -20 ----------comment: king of fools ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/161116/052/1#1 ---------- ----------topic: it's been ridiculously cold here - a few days ago ---------- ----------date: Mon Jan 24, 2011 at 07:15:21 PM EST ---------- -- ----------comment: Brogdel ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/161116/052/2#2 ---------- ----------topic: We've had a bad winter already. ---------- ----------date: Tue Jan 25, 2011 at 01:39:08 AM EST ---------- -- ----------comment: Ruston Rustov ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/161116/052/3#3 ---------- ----------topic: oh no! an inch! ---------- ----------date: Tue Jan 25, 2011 at 10:29:25 AM EST ---------- -- ----------comment: cockskin horsesuit ---------- ----------url: http://www.kuro5hin.org/comments/2011/1/24/16276/4552/1#1 ---------- ----------topic: BAHAHA!!! ---------- ----------date: Mon Jan 24, 2011 at 04:46:45 PM EST ---------- -- And so on.