california Canada conference Conferences database db db2 DB2 pureXML development eclipse fall flickr google ibm Internet it java jdbc Joomla Linux mapping Open Source Other perl Personal Photography Portugal programming purexml rails rogers ruby shipment Software software-testing sql sqlj Tech toronto tpmg Travel twiki USA xml yahoo

How to replace text in a file with huge lines

Tags: , , , , , , , ,

This is my problem: I have a huge xml file (150MB), in which I want to rename some of the node names. Conisder the example:

<root><prefix_name1>1</prefix_name1><prefix_name2>text</prefix_name2></root>

The creators of this document were not aware of namespaces, so they decided to use different prefixes in the element names in the documents created. My goal is to rename all the tags with prefix removing the "prefix_" part of the name. The solution looked simple, but it isn’t.

To begin with, I tried a sed, using the argument "s/prefix_//g". This didn’t work, because in AIX sed only accepts up to 4096 bytes per line (I read somewhere that this doesn’t happen with some sed versions, but I couldn’t spend the whole day trying versions of sed). Using perl also resulted in Out of memmory errors (I’ve just very basic perl knowledge, so there may be a way of handling such big lines).

So, I had to come up with a set of steps to do this change and maintain the file in it’s original format (all contents in one line). The steps where:

  •  used the tr utility to translate all ">" characters into a new line (a new > is printed before the replaced char) using tr  ‘>’ ‘\012′ < $1  | while read line; do echo "${line}>"  >> $1.new done.
  • rename the tags by removing the prefix. Either use sed or perl with the replacement pattern "s/prefix_//g".
  • delete all the "\n" (newline chars) from the current file, to get back to the one liner format. Used perl to read from one file and write to other applying "s/\n//g" to the content.

I’m sure that this is not the best solution, and there must be some utility out there that I can use so, if you have any idea, just leave a comment.

VN:F [1.9.11_1134]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.11_1134]
Rating: 0 (from 0 votes)

Popularity: 3% [?]

1 Comment »

Programming Perl

Tags: , , , ,

I have finally decided to learn Perl. I like to be constantly learning new programming languages, and I’ve been delaying my Perl introduction for a long long time.
I want to develop a TWiki
plugin to track the progress of our product testing and, since TWiki is
written in Perl, this will be my opportunity to learn and practice the
language.
Besides using some ideas and code from TWiki’s Project Planner Plugin,
I also bought a Perl book, for those moments when I want to really
understand things instead of just copy and paste from the web.
0596000278.01._AA_SCMZZZZZZZ_.jpgI was indecisive between the books "Learning Perl" and "Programming Perl",
and my choice was the second one. Since somebody at work already has
the first one, we can now share our books when needed (we are both Perl
newbies :-)).
The book came from Barnes & Noble, with a 20% company’s discount.

VN:F [1.9.11_1134]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.11_1134]
Rating: 0 (from 0 votes)

Popularity: 2% [?]

1 Comment »