C# vs. Clojure vs. Ruby & Scala

Short preface: at a job interview, Zach Cox was told to aggregate words and word counts from a bunch of files into two files, sorted alphabetically and by word count respectively, which he did in Ruby and Scala. This led Lau Bjørn Jensen to do the same thing in Clojure, which apparantly sparked other people to do it in Java, Python etc.

Inspired by the aforementioned problem, and an extended train ride home (thank you, Danish National Railways!!), I decided to see what a C# (v. 3) version could look like:

namespace NewsReader
{
  using System;
  using System.IO;
  using System.Linq;
  using System.Text.RegularExpressions;
 
  class Program
  {
    static void Main()
    {
      const string dir = @"c:\temp\20_newsgroups";
      var stopwatch = System.Diagnostics.Stopwatch.StartNew();
      var regex = new Regex(@"\w+", RegexOptions.Compiled);
 
      var list = (from filename in Directory.GetFiles(dir, "*.*", SearchOption.AllDirectories)
                  from match in regex.Matches(File.ReadAllText(filename).ToLower()).Cast<Match>()
                  let word = match.Value
                  group word by word
                  into aggregate
                    select new
                             {
                               Word = aggregate.Key,
                               Count = aggregate.Count()  ,
                               Text = string.Format("{0}\t{1}", aggregate.Key, aggregate.Count())
                             })
        .ToList();
 
      File.WriteAllLines(@"c:\temp\words-ordered-by-count.txt", list.OrderBy(c => c.Count).Select(c => c.Text).ToArray());
      File.WriteAllLines(@"c:\temp\words-ordered-by-word.txt", list.OrderBy(c => c.Word).Select(c => c.Text).ToArray());
 
      Console.WriteLine("Elapsed: {0:0.0} seconds", stopwatch.Elapsed.TotalSeconds);
    }
  }
}

Weighing in at 35 lines and executing in 10.2 seconds (on my Intel Core 2 laptop with 4 GB RAM), I think this is a pretty clear and performant alternative to the other languages mentioned.

  • Twitter
  • DotNetKicks
  • Technorati
  • Google Bookmarks
  • Reddit
  • Digg
  • del.icio.us
Posted Wednesday, January 6th, 2010 under c#, clojure, ruby, scala, snippet.

2 comments

  1. Smukt!

  2. The linq code looks very interesting to a non Microsofter…