Linux – Counting code lines with the Unix find command

Here we use the Unix find command in conjunction with cat and wc to get a picture of the number of code lines in a directory hierarchy.

In Finding distinct file types with the Unix find command we used the Unix find command to generate an overview of the types and counts of files lurking in an unfamiliar codebase. While this gives us a great picture of the languages and tools involved in maintaining it, there’s an important sense of scale missing from that information – there’s a world of difference between a hundred 50 line PHPs and a hundred 5000 line PHPs after all.

The wc command’s -l option will give us a count of lines in a particular file or in its input. So we could just cat all instances of the file types we’re interested in to wc -l to get the line counts for an individual directory:-

cat *.php *.js *.html  | wc -l
cat: *.js: No such file or directory
   356

Here we’re looking for PHP, HTML or JS files and we’ve found 356 lines. The particular directory we’re looking in doesn’t contain any JS files, hence the rather distracting error message. We can clean that up by re-directing the errors to /dev/null:-

cat *.php *.js *.html 2>/dev/null | wc -l
   356

This is fine if we’re interested in a single directory and not its subdirectories. If we want everything under the current directory then we need to use find first:-

find . -type f -name "*.php" -or -name "*.html" 
         -or -name "*.js" | xargs cat | wc -l
  1241

Here we’re using multiple -name rules combined with -or to match each file type we’re interested in and sending all the file names to xargs cat to have them all spooled to wc -l. Our command now picks up the extra code lines in the subdirectories.

To get an overview of a large codebase or website it would be good to run this command in each top-level directory to capture the count of code lines in each of them and all their subdirectories. We can use find‘s -depth option to get a list of all those top-level directories to work with:-

find . -type d -depth 1
./connection
./control
./images
./portal
./public
./resources
./shared

Here we’re getting a list of all directories (-type d) at the top level (-depth 1). Now all we need to do is send that list to our earlier command.

This is where it gets a little tricky. We can’t use find‘s -exec option as it doesn’t run embedded shell commands so can’t cope with the | to run multiple commands. We could get round this by creating a little script containing our line count command but this would be something of a kludge. Nor can we send the output directly to something like xargs as we need to supply the directory name as a parameter to our find command.

The trick here is to use find‘s -exec option, but use it to run a shell which will indirectly run our command:-

find . -type d -depth 1 -exec bash -c 'echo $0' {} \;
./connection
./control
./images
./portal
./public
./resources
./shared

The output might look the same but we’re generating it in an importantly different way. For every match find locates we’re executing a shell command via bash -c  which in this case simply echoes its first parameter. We complete our top level find command by providing the matching directory name as a parameter to that shell command.

Since we’re now running a shell command for each match we can now use it with the line counting command we created earlier:-

find . -type d -depth 1 -exec bash -c 
    'echo $0 `find $0 -type f -name "*.php" -or -name "*.html" 
       -or -name "*.js" | xargs cat | wc -l`' {} \;
./connection 254
./control 5860
./images 0
./portal 68385
./public 212
./resources 97032
./shared 381

Leave a Reply

Your email address will not be published. Required fields are marked *