 |
;===============================================================================
;
; Text File Successive Duplicate Line Remover
;
; This script copies a file. As it does so it will delete duplicate lines.
; It can make this decision based on the entire line, or it can do so based
; on a column (character position) range, so a line need not be entirely
; identical to the preceding one to be considered a duplicate. For example,
; you could compare lines based only on positions 5 to 10.
;
; Deletion is done only on "successive" duplicates. This means that the
; lines must follow one another (or be part of a series of duplicates).
; Thus, the following would be cleaned up:
;
; AAAAAAAAAAAAAA
; AAAAAAAAAAAAAA
; BBBBBBBBBBBBBB
;
; ... yielding one line of A's and one line of B's. The following example
; would NOT be altered:
;
; AAAAAAAAAAAAAA
; BBBBBBBBBBBBBB
; AAAAAAAAAAAAAA
;
; There is indeed duplication here, but it is not successive duplication.
; This script could be modified to handle non-successive duplication, but
; that would require saving the entire input file in an array, which is
; beyond the scope of this demonstration.
;
; This script works only on CRLF-terminated (Windows/DOS) text files,
; though modifying it to work on Unix/Linux and Mac files would be very
; easy.
;
; This script is designed for use with the Parse-O-Matic Power Tool,
; which is available from www.parse-o-matic.com.
;
;===============================================================================
; Config Section
;===============================================================================
Config
$CfgEnableOptionX = 'Y'
$CfgCaptionX = '&FromColumn'
$CfgHintX = 'Starting column number (blank = start of line)'
$CfgEnableOptionY = 'Y'
$CfgCaptionY = '&ToColumn'
$CfgHintY = 'Ending column number (blank = use entire line)'
$CfgEnableOptionZ = 'Y'
$CfgCaptionZ = '&Log?'
$CfgHintZ = 'Enter Y to copy deleted lines to log; N otherwise'
$CfgCopyright = 'Copyright © 2005-2008 by Pyroto, Inc.'
$CfgVersion = '1.00.00'
$CfgProgrammer = 'Kevin Beck'
$CfgEmail = 'info' $40 'parse-o-matic.com' ; Note anti-spam tactic
$CfgLicense = 'This script may be used by anyone who has a valid ' >>
'Advanced Scripting License from Pyroto, Inc.' >>
', or is evaluating one of our ' >>
'Parse-O-Matic products (for up to 30 days).'
End
;===============================================================================
; TaskInit Step
;===============================================================================
TaskInit
;-----------------------------------------------------------------------------
; Check options
;-----------------------------------------------------------------------------
Call CheckOption $OptionX '/FromColumn'
FromCol = CheckOption
Call CheckOption $OptionY '/ToColumn'
ToCol = CheckOption
If ToCol #< FromCol Stop '"FromColumn" must be less than "ToColumn"'
If $OptionZ = '' $OptionZ = 'N'
If 'YN' ~ $OptionZ Stop 'Please set the Log? option to Y or N' >>
$0A$0D$0A$0D >>
'Y saves all deleted lines to the log file' $0A$0D >>
'N does not do this'
;-----------------------------------------------------------------------------
; Handy constants
;-----------------------------------------------------------------------------
NoMatch = $0A$0D ; This cannot be a line in a CRLF-delimited text file
End
;===============================================================================
; FileInit Step
;===============================================================================
FileInit
;-----------------------------------------------------------------------------
; Are we logging deletions?
;-----------------------------------------------------------------------------
Begin $OptionZ = 'Y'
LogMsgLF
LogMsg '-------------'
LogMsg 'Deleted Lines'
LogMsg '-------------'
NumDeleted = 0
End
;-----------------------------------------------------------------------------
; Set LastFragment in case multiple files are being copied (using wildcards).
; Note that this script sends all lines to the same output file. This could
; be easily changed, using $CfgDefaultOFN = '' and the OutFile command.
;-----------------------------------------------------------------------------
LastFragment = NoMatch
End
;===============================================================================
; FileDone Step
;===============================================================================
FileDone
;-----------------------------------------------------------------------------
; Are we logging deletions?
;-----------------------------------------------------------------------------
Begin $OptionZ = 'Y'
If NumDeleted = 0 LogMsg 'No deletions'
End
;-----------------------------------------------------------------------------
; Set LastFragment in case multiple files are being copied (using wildcards).
; Note that this script sends all lines to the same output file. This could
; be easily changed, using $CfgDefaultOFN = '' and the OutFile command.
;-----------------------------------------------------------------------------
LastFragment = NoMatch
End
;===============================================================================
; Main Step
;===============================================================================
; If $Data = '' Done ; Uncomment this line to ignore null lines
;-------------------------------------------------------------------------------
; Assess starting column
;-------------------------------------------------------------------------------
If FromCol <> 0 PosnFrom = FromCol ; A start column was specified
Otherwise PosnFrom = 0 ; We use zero so null lines are also seen
If $Data Len< PosnFrom Call OutDone ; FromCol exceeds length of input line
;-------------------------------------------------------------------------------
; Assess ending column
;-------------------------------------------------------------------------------
Begin ToCol <> 0
PosnTo = ToCol ; An end column was specified
If $Data Len< PosnTo Call OutDone ; ToCol exceeds length of input line
Else
PosnTo = Len $Data ; No end column was specified
End
;-------------------------------------------------------------------------------
; Compare with the previous line
;-------------------------------------------------------------------------------
TestFragment = Cols $Data PosnFrom PosnTo ; Get the fragment
Begin TestFragment = LastFragment
;-----------------------------------------------------------------------------
; It's the same as the last fragment; log it if we're doing that
;-----------------------------------------------------------------------------
Begin $OptionZ = 'Y'
LogMsg $Data
Inc NumDeleted
End
Done
End
;-------------------------------------------------------------------------------
; This is different, so output it and remember it
;-------------------------------------------------------------------------------
LastFragment = TestFragment
OutEnd $Data
Done
;===============================================================================
; Subroutines
;===============================================================================
Procedure CheckOption
MyOption = Parse CheckOption '>*/' '' 'Cut'
TrimChar CheckOption
Begin CheckOption = ''
CheckOption = 0
Exit
End
TestNum = Numeric CheckOption
If TestNum = 'N' Stop '"' MyOption '" must be blank (i.e. empty) or a number'
If CheckOption #< 0 Stop '"' MyOption '" may not be a negative number'
End
Procedure OutDone
OutEnd $Data
LastFragment = NoMatch
Done
End
 | Here is some good data for testing this script (copy only the actual lines of
text, not the blank lines).
AAAAAAAAAAAAA
AAAAAAAAAAAAA
AAAAAAAAAAAAAAA
BBBBAAAAAAAAA
BBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCC
BBBBBBBBB
BBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDDDD
   
Parse-O-Matic Free and Advanced Editions are data conversion tools that allow you to parse, convert, mine, import and export data files, reports, web capture, logs, legacy databases, text, CSV (comma separated; comma delimited), ASCII, EBCDIC, and almost any data format that you may have.
|