aboutsummaryrefslogtreecommitdiff
path: root/coreutils-5.3.0-bin/man/cat1p/tr.1p.txt
blob: 01e5f20859efdde1d7928c1ed785dcef806c2613 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
tr(P)                                                     tr(P)





NAME
       tr - translate characters

SYNOPSIS
       tr [-c | -C][-s] string1 string2

       tr -s [-c | -C] string1

       tr -d [-c | -C] string1

       tr -ds [-c | -C] string1 string2


DESCRIPTION
       The  tr  utility  shall  copy  the standard input to the
       standard  output  with  substitution  or   deletion   of
       selected  characters.   The  options  specified  and the
       string1 and string2 operands shall control  translations
       that occur while copying characters and single-character
       collating elements.

OPTIONS
       The tr utility shall conform  to  the  Base  Definitions
       volume  of  IEEE Std 1003.1-2001,  Section 12.2, Utility
       Syntax Guidelines.

       The following options shall be supported:

       -c     Complement  the  set  of  values   specified   by
              string1. See the EXTENDED DESCRIPTION section.

       -C     Complement  the  set  of  characters specified by
              string1. See the EXTENDED DESCRIPTION section.

       -d     Delete all occurrences of input  characters  that
              are specified by string1.

       -s     Replace  instances  of repeated characters with a
              single character, as described  in  the  EXTENDED
              DESCRIPTION section.


OPERANDS
       The following operands shall be supported:

       string1, string2

              Translation  control  strings.  Each string shall
              represent a set of  characters  to  be  converted
              into an array of characters used for the transla-
              tion. For  a  detailed  description  of  how  the
              strings   are   interpreted,   see  the  EXTENDED
              DESCRIPTION section.


STDIN
       The standard input can be any type of file.

INPUT FILES
       None.

ENVIRONMENT VARIABLES
       The following environment  variables  shall  affect  the
       execution of tr:

       LANG   Provide a default value for the internationaliza-
              tion variables that are unset or null.  (See  the
              Base  Definitions volume of IEEE Std 1003.1-2001,
              Section 8.2, Internationalization  Variables  for
              the  precedence of internationalization variables
              used to determine  the  values  of  locale  cate-
              gories.)

       LC_ALL If  set to a non-empty string value, override the
              values  of  all  the  other  internationalization
              variables.

       LC_COLLATE

              Determine  the  locale  for the behavior of range
              expressions and equivalence classes.

       LC_CTYPE
              Determine the locale for  the  interpretation  of
              sequences  of  bytes  of  text data as characters
              (for example, single-byte as  opposed  to  multi-
              byte characters in arguments) and the behavior of
              character classes.

       LC_MESSAGES
              Determine the  locale  that  should  be  used  to
              affect the format and contents of diagnostic mes-
              sages written to standard error.

       NLSPATH
              Determine the location of  message  catalogs  for
              the processing of LC_MESSAGES .


ASYNCHRONOUS EVENTS
       Default.

STDOUT
       The  tr output shall be identical to the input, with the
       exception of the specified transformations.

STDERR
       The standard error shall be  used  only  for  diagnostic
       messages.

OUTPUT FILES
       None.

EXTENDED DESCRIPTION
       The  operands  string1 and string2 (if specified) define
       two arrays of characters. The constructs in the  follow-
       ing  list  can  be used to specify characters or single-
       character collating elements. If any of  the  constructs
       result  in  multi-character collating elements, tr shall
       exclude, without  a  diagnostic,  those  multi-character
       elements from the resulting array.

       character
              Any character not described by one of the conven-
              tions below shall represent itself.

       \octal Octal sequences can be used to represent  charac-
              ters   with   specific  coded  values.  An  octal
              sequence shall consist of a backslash followed by
              the longest sequence of one, two, or three-octal-
              digit characters (01234567). The  sequence  shall
              cause  the value whose encoding is represented by
              the one, two, or three-digit octal integer to  be
              placed  into  the array. If the size of a byte on
              the system is greater than nine bits,  the  valid
              escape  sequence  used  to  represent  a  byte is
              implementation-defined.   Multi-byte   characters
              require  multiple,  concatenated escape sequences
              of this type, including the leading '\' for  each
              byte.

       \character
              The  backslash-escape sequences in the Base Defi-
              nitions  volume  of  IEEE Std 1003.1-2001,  Table
              5-1,  Escape  Sequences  and Associated Actions (
              '\\' , '\a' , '\b' , '\f' , '\n' , '\r' , '\t'  ,
              '\v'  )  shall be supported. The results of using
              any other character, other than an  octal  digit,
              following the backslash are unspecified.

       c-c    In  the POSIX locale, this construct shall repre-
              sent the range of collating elements between  the
              range  endpoints  (as long as neither endpoint is
              an octal sequence of the form \octal), inclusive,
              as defined by the collation sequence. The charac-
              ters or collating elements in the range shall  be
              placed   in  the  array  in  ascending  collation
              sequence. If the  second  endpoint  precedes  the
              starting  endpoint  in the collation sequence, it
              is unspecified whether  the  range  of  collating
              elements  is  empty, or this construct is treated
              as invalid.  In  locales  other  than  the  POSIX
              locale,  this construct has unspecified behavior.

       If either or both  of  the  range  endpoints  are  octal
       sequences  of  the form \octal, this shall represent the
       range of specific coded values  between  the  two  range
       endpoints, inclusive.

       :class:
              Represents   all   characters  belonging  to  the
              defined character class, as defined by  the  cur-
              rent setting of the LC_CTYPE locale category. The
              following character class names shall be accepted
              when specified in string1:
      alnum     blank     digit     lower     punct     upper
      alpha     cntrl     graph     print     space     xdigit

       In  addition, character class expressions of the form [:
       name:] shall be recognized in those  locales  where  the
       name  keyword  has  been given a charclass definition in
       the LC_CTYPE category.

       When both the -d and -s options are  specified,  any  of
       the  character class names shall be accepted in string2.
       Otherwise, only character class names lower or upper are
       valid  in  string2  and  then  only if the corresponding
       character class (  upper  and  lower,  respectively)  is
       specified in the same relative position in string1. Such
       a specification shall be interpreted as  a  request  for
       case  conversion. When [: lower:] appears in string1 and
       [: upper:] appears in string2, the arrays shall  contain
       the  characters from the toupper mapping in the LC_CTYPE
       category of the current locale. When [: upper:]  appears
       in string1 and [: lower:] appears in string2, the arrays
       shall contain the characters from the tolower mapping in
       the  LC_CTYPE  category of the current locale. The first
       character from each mapping pair shall be in  the  array
       for  string1  and the second character from each mapping
       pair shall be in the array for string2 in the same rela-
       tive position.

       Except  for case conversion, the characters specified by
       a character class expression  shall  be  placed  in  the
       array in an unspecified order.

       If  the name specified for class does not define a valid
       character class in the current locale, the  behavior  is
       undefined.

       =equiv=
              Represents  all  characters or collating elements
              belonging to the same equivalence class as equiv,
              as  defined by the current setting of the LC_COL-
              LATE  locale  category.  An   equivalence   class
              expression  shall  be allowed only in string1, or
              in string2 when it is being used by the  combined
              -d  and  -s  options. The characters belonging to
              the equivalence class  shall  be  placed  in  the
              array in an unspecified order.

       x*n    Represents  n repeated occurrences of the charac-
              ter x.  Because this expression is  used  to  map
              multiple characters to one, it is only valid when
              it occurs in string2. If n is omitted or is zero,
              it shall be interpreted as large enough to extend
              the string2-based sequence to the length  of  the
              string1-based  sequence. If n has a leading zero,
              it shall be interpreted as an octal value. Other-
              wise, it shall be interpreted as a decimal value.


       When the -d option is not specified:

              Each input character found in the array specified
              by  string1 shall be replaced by the character in
              the same relative position in the array specified
              by  string2.  When the array specified by string2
              is shorter that the one specified by string1, the
              results are unspecified.

              If the -C option is specified, the complements of
              the characters specified by string1 (the  set  of
              all  characters  in the current character set, as
              defined by the  current  setting  of  LC_CTYPE  ,
              except   for  those  actually  specified  in  the
              string1 operand) shall be placed in the array  in
              ascending  collation  sequence, as defined by the
              current setting of LC_COLLATE .

              If the -c option is specified, the complement  of
              the  values  specified by string1 shall be placed
              in the array in ascending order by binary  value.

              Because  the  order in which characters specified
              by character  class  expressions  or  equivalence
              class  expressions is undefined, such expressions
              should only be used if the intent is to map  sev-
              eral  characters  into  one. An exception is case
              conversion, as described previously.

       When the -d option is specified:

              Input characters found in the array specified  by
              string1 shall be deleted.

              When  the  -C  option  is  specified with -d, all
              characters  except  those  specified  by  string1
              shall  be  deleted.   The contents of string2 are
              ignored, unless the -s option is also  specified.

              When the -c option is specified with -d, all val-
              ues except those specified by  string1  shall  be
              deleted.   The   contents  of  string2  shall  be
              ignored, unless the -s option is also  specified.

              The  same  string  cannot be used for both the -d
              and the -s option; when both options  are  speci-
              fied,   both  string1  (used  for  deletion)  and
              string2 (used for squeezing) shall be required.

       When the -s option is specified, after any deletions  or
       translations have taken place, repeated sequences of the
       same character shall be replaced by  one  occurrence  of
       the  same  character,  if  the character is found in the
       array specified by the last operand. If the last operand
       contains  a character class, such as the following exam-
       ple:


              tr -s '[:space:]'

       the last operand's array shall contain all of the  char-
       acters  in that character class. However, in a case con-
       version, as described previously, such as:


              tr -s '[:upper:]' '[:lower:]'

       the last operand's array shall contain only those  char-
       acters  defined  as the second characters in each of the
       toupper or tolower character pairs, as appropriate.

       An empty string used for  string1  or  string2  produces
       undefined results.

EXIT STATUS
       The following exit values shall be returned:

        0     All input was processed successfully.

       >0     An error occurred.


CONSEQUENCES OF ERRORS
       Default.

       The following sections are informative.

APPLICATION USAGE
       If necessary, string1 and string2 can be quoted to avoid
       pattern matching by the shell.

       If an ordinary digit (representing itself) is to  follow
       an  octal sequence, the octal sequence must use the full
       three digits to avoid ambiguity.

       When string2  is  shorter  than  string1,  a  difference
       results  between  historical System V and BSD systems. A
       BSD system pads string2 with the last character found in
       string2.  Thus, it is possible to do the following:


              tr 0123456789 d

       which  would  translate  all  digits to the letter 'd' .
       Since this area is specifically unspecified in this vol-
       ume  of  IEEE Std 1003.1-2001, both the BSD and System V
       behaviors are allowed, but a conforming application can-
       not  rely on the BSD behavior. It would have to code the
       example in the following way:


              tr 0123456789 '[d*]'

       It should be noted that, despite similarities in appear-
       ance,  the  string  operands  used by tr are not regular
       expressions.

       Unlike some historical implementations, this  definition
       of  the tr utility correctly processes NUL characters in
       its input stream. NUL  characters  can  be  stripped  by
       using:


              tr -d '\000'

EXAMPLES
       The  following  example  creates  a list of all words in
       file1 one per line in file2, where a word is taken to be
       a maximal string of letters.


              tr -cs "[:alpha:]" "[\n*]" <file1 >file2

       The  next example translates all lowercase characters in
       file1 to uppercase and writes the  results  to  standard
       output.


              tr "[:lower:]" "[:upper:]" <file1

       This  example  uses  an  equivalence  class  to identify
       accented variants of the base character  'e'  in  file1,
       which  are  stripped of diacritical marks and written to
       file2.


              tr "[=e=]" e <file1 >file2

RATIONALE
       In some early proposals, an explicit option -n was added
       to  disable  the  historical  behavior  of stripping NUL
       characters from the input. It was considered that  auto-
       matically  stripping  NUL  characters from the input was
       not correct functionality.  However, the removal  of  -n
       in a later proposal does not remove the requirement that
       tr correctly process NUL characters in its input stream.
       NUL characters can be stripped by using tr -d '\000'.

       Historical implementations of tr differ widely in syntax
       and behavior. For  example,  the  BSD  version  has  not
       needed   the   bracket  characters  for  the  repetition
       sequence. The tr utility syntax is based more closely on
       the System V and XPG3 model while attempting to accommo-
       date historical BSD implementations. In the case of  the
       short string2 padding, the decision was to unspecify the
       behavior and preserve System V and XPG3  scripts,  which
       might  find  difficulty with the BSD method. The assump-
       tion was made that BSD users of tr have to make accommo-
       dations  to  meet  the  syntax defined here. Since it is
       possible to use the repetition sequence to duplicate the
       desired  behavior,  whereas  there  is  no simple way to
       achieve the System V method, this was  the  correct,  if
       not desirable, approach.

       The  use  of octal values to specify control characters,
       while having historical precedents, is not portable. The
       introduction  of escape sequences for control characters
       should provide the necessary portability. It  is  recog-
       nized  that  this  may  cause some historical scripts to
       break.

       An early proposal included support  for  multi-character
       collating  elements.   It was pointed out that, while tr
       does employ some syntactical elements from REs, the  aim
       of  tr  is  quite different; ranges, for example, do not
       have a similar meaning (``any of the chars in the  range
       matches",  versus "translate each character in the range
       to the output counterpart"). As a result, the previously
       included  support for multi-character collating elements
       has been removed. What remains  are  ranges  in  current
       collation order (to support, for example, accented char-
       acters), character classes, and equivalence classes.

       In XPG3 the [: class:] and [=  equiv=]  conventions  are
       shown with double brackets, as in RE syntax. However, tr
       does not implement RE principles; it just  borrows  part
       of  the  syntax. Consequently, [: class:] and [= equiv=]
       should be regarded as syntactical elements on a par with
       [ x* n], which is not an RE bracket expression.

       The standard developers will consider changes to tr that
       allow it to translate characters between different char-
       acter  encodings,  or they will consider providing a new
       utility to accomplish this.

       On historical  System  V  systems,  a  range  expression
       requires enclosing square-brackets, such as:


              tr '[a-z]' '[A-Z]'

       However, BSD-based systems did not require the brackets,
       and this convention is used here to avoid breaking large
       numbers of BSD scripts:


              tr a-z A-Z

       The  preceding  System  V  script  will continue to work
       because the brackets, treated as regular characters, are
       translated  to  themselves. However, any System V script
       that relied on "a-z" representing the  three  characters
       'a' , '-' , and 'z' have to be rewritten as "az-" .

       The  ISO POSIX-2:1993  standard  had  a  -c  option that
       behaved similarly to the -C option, but did  not  supply
       functionality  equivalent  to the -c option specified in
       IEEE Std 1003.1-2001.  This meant that historical  prac-
       tice  of  being  able  to  specify tr -d\200-\377 (which
       would delete all bytes with the top bit set) would  have
       no  effect because, in the C locale, bytes with the val-
       ues octal 200 to octal 377 are not characters.

       The earlier  version  also  said  that  octal  sequences
       referred to collating elements and could be placed adja-
       cent to each other  to  specify  multi-byte  characters.
       However,  it  was  noted  that  this  caused ambiguities
       because tr would not be able to  tell  whether  adjacent
       octal  sequences  were  intending  to specify multi-byte
       characters   or   multiple   single   byte   characters.
       IEEE Std 1003.1-2001   specifies  that  octal  sequences
       always refer to single byte binary values.

FUTURE DIRECTIONS
       None.

SEE ALSO
       sed

COPYRIGHT
       Portions of this text are reprinted  and  reproduced  in
       electronic  form  from  IEEE  Std  1003.1, 2003 Edition,
       Standard for Information Technology -- Portable  Operat-
       ing System Interface (POSIX), The Open Group Base Speci-
       fications Issue 6, Copyright (C) 2001-2003 by the Insti-
       tute  of  Electrical  and Electronics Engineers, Inc and
       The Open Group. In the event of any discrepancy  between
       this  version  and  the original IEEE and The Open Group
       Standard, the original IEEE and The Open Group  Standard
       is  the  referee  document. The original Standard can be
       obtained        online        at        http://www.open-
       group.org/unix/online.html .



POSIX                         2003                        tr(P)