[ TechnoCage | Caskey | dos2unix ]

Line Feeds

The three main computer operating systems in use today have (unfortunately) diverged long ago in their treatment of line-endings. In most documents, the author of the document has certain control over how the information they have authored is being presented. Of major importance is the notion of a line-ending. When a written document is encoded by a computer there exist special characters to signal when the end of one line has been reached and the next should begin. These invisible control characters is one specific area in computing that frustrates many computer users to this day.

Under DOS (Windows/PC) the end of a line of text is signalled using the ASCII code sequence CarriageReturn,LineFeed. Alternately written as CR,LF or the bytes 0x0D,0x0A. On the Macintosh platform, only the CR character is used. Under UNIX, the opposite is true and only the LF character is used. Advanced document encoding formats such as HTML, PDF and XML do not utilize such crude techniques to signal the end of one line of text and the beginning of the next thus avoiding this entire class of problems (among others).

Conversion programs

As luck would have it, a very powerful tool, PERL, exists and has many, many ways of solving this simple problem. In fitting with the perl motto of "There's More Than One Way To Do It", I provide several samples of just how this could be done. All of these convert to/from the unix format of a single LF. These examples are written around the context of having a number of JAVA source code files that need to be converted due to the insanity of a particular tool you may have used to edit the files which forced it's notion of line-endings upon you. In perl the CarriageReturn character is represented by "\r" and LineFeed (aka NewLine) is represented as "\n".

One Command Line

The simplest perl script is this one: perl -pi -e 's/\r\n/\n/;' *.java

This does the reverse: perl -pi -e 's/\n/\r\n/;' *.java

Two Lines

If you wish to be a little more complicated, you can do the same in two lines of perl. This enables you to simply name the file(s) you wish to convert on the command line. It would be used like so: dos2unix-2line *.java

Here is what dos2unix-2line it looks like:

#!/usr/bin/perl -pi
s/\r\n/\n/;
  

Here is what unix2dos-2line it looks like:

#!/usr/bin/perl -pi
s/\n/\r\n/;
  

Thirty Lines

To do it "right" it takes much more complicated code. The tragic thing about this version is that it is marginally more readable than the previous two versions yet contains fifteen times more lines than the longer of the two prior. In bytes, it is over 20 times larger.

Nonetheless Here is the code for dos2unix:

#!/usr/bin/perl -w
#
# A script to convert a number of java source 
# files from dos line ending format to unix line format.
#
# WARNING: THIS SCRIPT DOES NOT CHECK FOR PRE-EXISTING 
#          FILES, USE WITH CAUTION.
#
# Usage: dos2unix <DIRECTORY>
#

$directory = shift @ARGV;
$directory = '.' unless $directory;
chdir( $directory ) || die "Unable to enter directory '$directory'.\n$!\n";

@files = <*.java>;
$| = 1;
$linesFixed = 0;
foreach( @files ) {
    print "$_\t";
    open(INPUT, "<$_");
    rename( $_, "$_.bak") || die "Unable to rename $_\n$!\n";
    open(OUTPUT, ">$_");
    while(<INPUT>) {
	if ( s/\r\n/\n/ ) {
	    $linesFixed++;
	}
	print OUTPUT;
    }
} continue {
    print "($linesFixed)\n";
    $linesFixed = 0;
    close INPUT;
    close OUTPUT;
}


  

It's matching partner unix2dos has only one line that differs:

	if ( s/\r\n/\n/ ) {
  

becomes

	if ( s/\n/\r\n/ ) {
  

More information

More information about line-feeds and character sets can be found on the internet and at the following locations.

The code on this page is all public domain and may be used, abused and modified in any way the reader sees fit. As we live in such a litigious society, the author must disclaim any liability for your use of this educational material.

Comments welcome.

  • Thanks to Andrey Salaev who kindly pointed out the CR/LF being backwards in the introduction.
  • Thanks to Richard Copley who provided significant edits to the introductory paragraph.


    Last updated: 2004-08-23