Jump to content

Deleting many lines of text between specified characters?


terrypin

Recommended Posts

I'm hoping one of the experts can help please. I have a text file that looks like this:

 

--- Start paste ---

 

[blackfordLane.jpg]

File name = BlackfordLane.jpg

Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\

Compression = JPEG, quality: 87, subsampling OFF

Resolution = 96 x 96 DPI

File date/time = 19/01/2012 / 15:01:23

 

- IPTC -

Object Name - s bridge over the River Thames is not a footbridge but carries pipes.

 

- COMMENT -

Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton.

 

[Castle Eaton Church.jpg]

File name = Castle Eaton Church.jpg

Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\

Compression = JPEG, quality: 87, subsampling OFF

Resolution = 72 x 72 DPI

File date/time = 19/01/2012 / 14:03:55

 

- EXIF -

Make - FUJIFILM

Model - FinePix2600Zoom

Orientation - Top left

XResolution - 72

YResolution - 72

ResolutionUnit - Inch

 

- COMMENT -

Castle Eaton Church

 

[CastleEaton-2.jpg]

File name = CastleEaton-2.jpg

Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\

Compression = JPEG, quality: 75

Resolution = 0 x 0 DPI

File date/time = 18/01/2012 / 15:40:05

 

- COMMENT -

The Red Lion, Castle Eaton

A warm welcoming pub on a cold winter's day, with the River Thames running at the bottom of the garden.

 

And this is what I want to get as a result:

 

BlackfordLane.jpg

Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton.

 

Castle Eaton Church.jpg

Castle Eaton Church

 

CastleEaton-2.jpg

The Red Lion, Castle Eaton

A warm welcoming pub on a cold winter's day, with the River Thames running at the bottom of the garden.

 

My first line of attack is to try for a Regex expression that will Find everything (for example) between the ']' of '[blackfordLane.jpg]' and the '-' of '- COMMENT -'? That would leave only a little tidying up, I think.

 

But so far it's eluded me after a couple of hours. The best I could come up with was the following to delete all lines from File name... to File date/time (with the Replace box empty):

 

File name = .*\nDirectory = .*\nCompression = .*\nResolution = .*\nImage dimensions = .*\nPrint size = .*\nColor depth = .*\nNumber of unique colors = .*\nDisk size = .*\nCurrent memory size = .*\nFile date/time = .*\n

 

But that's only part of the task and seems very inelegant.

 

Any suggestions please?

 

--

Terry, East Grinstead, UK

Link to comment
Share on other sites

Hi, if this were in a database it would be so much easier ;)

 

Anyway, this is untested but I think your best line of approach is to preg_match_all both items and match up the matches. So the first is a file name which is enclosed in square brackets try this:

 

preg_match_all("/(?<=[)[^\]]/sm", $text, $file_matches);

 

Then for the comment, a bit trickier. It always starts after '- COMMENT -' and ends when the next item begins with a square bracket. It can also cover multiple lines so we can't stop at the end line. You could try the following which is almost identical to above, but you would have to make sure that there were no start square brackets within the comment:

 

preg_match_all("/(?<=- COMMENT -)[^\[]/sm", $text, $comment_matches);

 

Preg_match_all will place the matches into a multi dimensional array and since there is a comment for every file name you should just be able to match up the matches and if there isn't (but there's still the word comment) it will just match up the blank space inbetween. So to print the results:

<?php
//file name in position $file_matches[0][0] will match up with comment in position $comment_matches[0][0] etc
$count = count($file_matches[0]);
for($i=0;$i<$count;++$i)
   {
       echo $file_matches[0][$i].'<br/>';
       echo $comment_matches[0][$i];
   }

 

Hope this helps you,

Joe

Link to comment
Share on other sites

Hy Terrypin,

 

Didn't have time to look at Joe's solution, rushing out, just wanted to give you a preg_replace option.

You can run this php code.

 

The Regex:

,(?sm)\[([^]]+.jpg)\].*?- COMMENT -(\r\n[^[]*),

 

Code:

<?php
$regex=',(?sm)\[([^]]+.jpg)\].*?- COMMENT -(\r\n[^[]*),';
$string='[blackfordLane.jpg]
File name = BlackfordLane.jpg
Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\
Compression = JPEG, quality: 87, subsampling OFF
Resolution = 96 x 96 DPI
File date/time = 19/01/2012 / 15:01:23

- IPTC -
Object Name - s bridge over the River Thames is not a footbridge but carries pipes.

- COMMENT -
Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton.

[Castle Eaton Church.jpg]
File name = Castle Eaton Church.jpg
Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\
Compression = JPEG, quality: 87, subsampling OFF
Resolution = 72 x 72 DPI
File date/time = 19/01/2012 / 14:03:55

- EXIF -
Make - FUJIFILM
Model - FinePix2600Zoom
Orientation - Top left
XResolution - 72
YResolution - 72
ResolutionUnit - Inch

- COMMENT -
Castle Eaton Church

[CastleEaton-2.jpg]
File name = CastleEaton-2.jpg
Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\
Compression = JPEG, quality: 75
Resolution = 0 x 0 DPI
File date/time = 18/01/2012 / 15:40:05

- COMMENT -
The Red Lion, Castle Eaton
A warm welcoming pub on a cold winter\'s day, with the River Thames running at the bottom of the garden.
';

$s=preg_replace($regex,'\1\2',$string);
echo '<pre>'.$s.'</pre>';
?>

 

Output:

BlackfordLane.jpg

Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton.

 

Castle Eaton Church.jpg

Castle Eaton Church

 

CastleEaton-2.jpg

The Red Lion, Castle Eaton

A warm welcoming pub on a cold winter's day, with the River Thames running at the bottom of the garden.

 

Didn't have time to look at the fine details, let me know if that works for you.

:)

Link to comment
Share on other sites

Thanks Joe, much appreciate that fast response. However, I realise now that I forgot an important point! My post here turns out to be somewhat OT, as I'm not a PHP user. Not even a programmer, just an end user using Regex ((the POSIX version apparently) in my text editor, TextPad.

 

I had assumed that 'PHP Regex' would be close enough for me to make any necessary syntax changes. But obviously I was mistaken, as TextPad Regex looks quite different. No preg_match_all for example!

 

But your post has inspired me to come at the problem from a totally different angle. Instead of trying to find those strings I described, and replace them with blanks, I should Find the individual names inside the initial square brackets, giving me the filename. And then the line(s) directly after '- COMMENT  -' and before the following open square bracket. Both of those should be fairly easy I think. The challenge then however is how to do this for all such pairs?

 

BTW, this is the first forum I've joined (I use scores of them) that makes me complete image verification and answer test questions even though I've already regsitered! Bit OTT isn't it?  :)

 

--

Terry, East Grinstead, UK

Link to comment
Share on other sites

Ah, well I'm afraid I have very little knowledge on POSIX regex as I've only learnt PCRE (do note that the patterns are different between POSIX and PCRE). That's because POSIX regex functions were deprecated in php as of version 5.3.0 and as a result this board is ultimately PCRE only now too. I have never used Textpad so can't help you there either. Maybe you could try a textpad forum if there is one?

 

I think it's the spammers which have made you do image verification etc.. When I changed my registered email they deactivated my account till I verified my new one... what if I'd made a typo! haha.

 

Good luck,

Joe

Link to comment
Share on other sites

The challenge then however is how to do this for all such pairs?

 

Hi again Terry,

 

If you don't have PHP, for the simple REPLACE approach I gave you above, I'd use a program that has regex search-and-replace capabilities.

Two that I like: EditPadPro, Aba Search and Replace.

 

There's also some regex replace functionality in some Adobe programs (Dreamweaver, Indesign). The regex flavor there is probably strong enough for the expression I gave you, which is fairly simple.

 

Some of the IDEs have regex functionality: Code::Blocks, NetBeans. I haven't fully tested them.

 

Let me know if you need any help with the two linked tools or the Adobe tools.

 

 

Link to comment
Share on other sites

Thanks Playful, appreciate your help.

 

The Regex in TextPad seems pretty good, with the usual Find/Replace functionality:

http://dl.dropbox.com/u/4019461/TextPad-Regex-1.jpg

 

But its repertoire and syntax as I said seems radically different. In that code you suggested

',(?sm)\[([^]]+.jpg)\].*?- COMMENT -(\r\n[^[]*),'

I don't recognise/understand:

- the commas

- '(?sm)

- [^]

- \r (although I think it means CR? Why do I need that? Isn't Return ' \n' sufficient?

- * on its own, instead of '.*'

 

As mentioned, I'm just a Regex novice, so maybe much of this is obvious stuff to you!

 

--

Terry, East Grinstead, UK

 

 

Link to comment
Share on other sites

Hi Terry,

 

The Regex in TextPad seems pretty good,

 

From what you sent, I'd say very basic.  ;)  But maybe there's more.

 

The expression I sent is meant to work with a full-blown regex flavor.

 

I don't recognise/understand:

- the commas

- '(?sm)

- [^]

- \r (although I think it means CR? Why do I need that? Isn't Return ' \n' sufficient?

- * on its own, instead of '.*'

 

The commas are delimiters. They're part of the php code I sent you. If you're not using php (although this is the phpfreaks forum), then omit the commas when you paste the expression in your tool. For instance it works in regexbuddy.

 

(?sm) turns on "dot matches new line" and "multiline" modes

 

[^[] Means anything that is not an opening square bracket. (The caret here stands for NOT)

 

\r is a carriage return, whether you need \r\n or \n depends on your OS. \r\n for Windows.

 

* means zero or more. That's what it means in .* and in [^[]*

 

Hope this helps, don't hesitate to ask more.

 

 

Link to comment
Share on other sites

Thanks for the follow-ups. Still experimenting. Will report back when I have a clearer picture. I suspect TextPad (for the basic regex) plus my macro program (for the iteration across multiple lines) might be my best approach.

 

I'm also determined to master more of Regex itself!

 

--

Terry, East Grinstead, UK

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.