How to replace text in a file with huge lines

This is my problem: I have a huge xml file (150MB), in which I want to rename some of the node names. Conisder the example:

<root><prefix_name1>1</prefix_name1><prefix_name2>text</prefix_name2></root>

The creators of this document were not aware of namespaces, so they decided to use different prefixes in the element names in the documents created. My goal is to rename all the tags with prefix removing the "prefix_" part of the name. The solution looked simple, but it isn’t.

To begin with, I tried a sed, using the argument "s/prefix_//g". This didn’t work, because in AIX sed only accepts up to 4096 bytes per line (I read somewhere that this doesn’t happen with some sed versions, but I couldn’t spend the whole day trying versions of sed). Using perl also resulted in Out of memmory errors (I’ve just very basic perl knowledge, so there may be a way of handling such big lines).

So, I had to come up with a set of steps to do this change and maintain the file in it’s original format (all contents in one line). The steps where:

  •  used the tr utility to translate all ">" characters into a new line (a new > is printed before the replaced char) using tr  ‘>’ ‘\012′ < $1  | while read line; do echo "${line}>"  >> $1.new done.
  • rename the tags by removing the prefix. Either use sed or perl with the replacement pattern "s/prefix_//g".
  • delete all the "\n" (newline chars) from the current file, to get back to the one liner format. Used perl to read from one file and write to other applying "s/\n//g" to the content.

I’m sure that this is not the best solution, and there must be some utility out there that I can use so, if you have any idea, just leave a comment.

Popularity: 7% [?]

Tags: , , , , , , , , . 

Related Entries:
  • Avoiding Spam in Joomla
  • Redirecting my blog feeds to FeedBurner
  • Joomla - Enabling registered users do add content
  • Mylyn task manager
  • Backspace in Vim


  • One Response to “How to replace text in a file with huge lines”

    1. Antibush Says:

      Bush and the Republicans were not protecting us on 9-11, and we aren’t a lot safer now. We may be more afraid due to george bush, but are we safer? Being fearful does not necessarily make one safer. Fear can cause people to hide and cower. What do you think? How does that work in a democracy again? How does being more threatening make us more likeable?Isn’t the country with
      the most weapons the biggest threat to the rest of the world? When one country is the biggest threat to the rest of the world, isn’t that likely to be the most hated country?
      What happened to us, people? When did we become such lemmings?
      We have lost friends and influenced no one. No wonder most of the world thinks we suck. Thanks to what george bush has done to our country during the past three years, we do!

    Leave a Reply


    Close
    E-mail It