SED - Using sed to modify files¶
Overview and Introduction¶
Sed is the Stream EDitor. It performs requested text transformations on a stream of input, as part of a shell pipeline. (It can also edit a file in-place.)
UNIX offers many text manipulation utilities, such as sed, tr, and awk. Sed can
be extremely useful if you can identify a string which you would like to transform.
If you have a dozen SLURM batch job scripts which contain a file path string, sed can be used to easily modify that path string to conform to a new reality.
The purpose of this document is to teach you how to do that specific task.
You can modify a whole mass of files in a directory tree by using other
UNIX utilities like find and xargs to create a list of files for sed to work
on.
Sed's primary limitation is that it works one line at a time, so you cannot easily change a string which wraps across multiple lines (unless you know that the string is split in a consistent fashion). It IS possible but requires a significant amount of regular expression knowledge -- see here
Change name of the day of the week
Here we see sed operating in a pipeline, reading material from its standard input (stdin), possibly modifying it if a match is found, and printing the result to its standard output (stdout). (If no match is found, sed simply prints what was passed to it.)
% date
Fri Dec 26 19:28:49 EST 2025
% date | sed -e 's/Fri/Friday/'
Friday Dec 26 19:29:32 EST 2025
Structure of a sed program string
-e 's/find_pattern/replace_pattern/optional_chars'
- "-e" means a program will follow
- 'programs need to be surrounded by single quote marks'
- "s" means substitution
- "/" is the "delimiter character" (this does not need to be a forward * slash)
- "find_pattern" is the regular expression (RE) to look for in each line
- "replace_pattern" is the replacement RE
- "optional_chars" - a variety of letters, which can be combined to indicate
actions to take or modifiers for the transformation process. Examples:
- 'i' means case-insensitive
- 'g' means change all matches found within a line
Sed can be given a large number of combinations of arguments and programs. The above program string is just one form, and we describe it because it is how you can use sed to modify file path strings.
Sed works with regular expressions (RE). RE's are formulas which follow certain rules. Some characters represent themselves, while others, such as square brackets [ ], ^ and $ have special meanings.
Sed uses basic regular expressions (BRE) unless you specify
the -E flag. Then sed works with extended, modern regular expressions.
(For details, see the GNU sed manual's RE
section.)
An excellent resource for Regular Expressions is
https://regex101.com, which includes a test
interpreter. You can provide sample text and a regular expression, and it
will show you whether it finds any matches.
Change multiple things with one command (click to expand sections like this)
In the first example, we see sed making two transformations.
The programs are implemented in the order specified, so beware of stepping on your own toes, where the first program changes strings such that second or third programs then fail to do what you expect.
% date
Fri Dec 26 19:28:49 EST 2025
% date | sed -e 's/2025/2032/' -e 's/Fri/Freakday/'
Freakday Dec 26 19:37:23 EST 2032
Change multiple occurences per line, case-insensitive
Each program is executed once, so only the first occurence of a match in a line is replaced, unless you add a 'g' after the third delimiter character.
% echo "Bob susan franky-frank bobby" | sed -e 's/bob/paper/gi'
paper susan franky-frank paperby
Chg a pattern inline in a file while making a backup with suffix '.bak'
% sed -i.bak 's/LogDenied/LogApproved/' filename
filename.bak which contains the original
contents, and filename, which contains either the original contents or
altered contents if any occurrences of "LogDenied" were found. (If they
were found, only the first match on each line was replaced because there is
no trailing 'g'.)
Note: there are no spaces between the "-i" and the ".bak" suffix name.
Modifying SLURM batch job scripts with sed¶
In the following sed examples, the original file is named sample.sh and contains:
#!/bin/bash
# SBATCH --mem=40GB
# SBATCH --partition=sas
cd /cms01/data/55548/deeper/stilldeeper/further
for i in sample1 sample3 sample7
do
/users/55548/c-jxu123-55548/bin/myscript ${i}.sas7bdat
done
This C-SUB script contains three things which need to be modified to work in JADE:
- In the C-SUB, Controlled Unclassified Information (CUI) was stored in
/cms01/data/<dua>/
In JADE, CMS CUI is stored in/data/cms/<cdua>/cui/ - In the C-SUB, home directories had the form
/users/<dua>/<username>/
In JADE, they have the form/users/<comm>/<cdua>/<username>/ - In the C-SUB, the SAS program only ran if it were launched in a "sas" job * partition.
In JADE, each community's partitions' names begin with their community letter and a dash, so the CMS SAS partition is namedc-sas.
Replace /cms01/data/55548/ with /data/cms/c55548/cui/
Those strings contain forward slashes, so we change our sed program delimiter character to “(“.
sed -i.bak 's(/cms01/data/55548/(/data/cms/c55548/cui/(g' sample.sh
sample.sh.bak
The modified resulting file (named sample.sh) contains:
#!/bin/bash
# SBATCH --mem=40GB
# SBATCH --partition=sas
cd /data/cms/c55548/cui/deeper/stilldeeper/further
for i in sample1 sample3 sample7
do
/users/55548/c-jxu123-55548/bin/myscript ${i}.sas7bdat
done
Correct the home directory path
We can correct the home directory path in two ways -- give its exact full path, or hide the leading path elements using the ~ character.
In UNIX, ~ is replaced by the shell with the path to someone's home
directory. If used for yourself, you can say either ~ or include your
username, e.g. ~c-jxu123-55548. If used for someone else's home directory,
then you always need to include their username, e.g. ~that-bob-guy
Use the full path
sed -i.bak2 's(/users/55548/c-jxu123-55548(/users/cms/c55548/c-jxu123-55548(g' sample.sh
sample.sh would contain /users/cms/c55548/c-jxu123-55548/bin/myscript ${i}.sas7bdat
Use a tilde
sed -i.bak2 's(/users/55548/c-jxu123-55548(~(g' sample.sh
sample.sh would contain ~/bin/myscript ${i}.sas7bdat
Correct the partition name to be c-sas
sed -i.bak3's(--partition=sas(--partition=c-sas(' sample.sh
If you know that you've already made a backup of the original file, you can
drop the file suffix to -i from second and later sed commands.
Note!! Using the -i argument will always result in updating files
modification times, whether the file contained a matching pattern or not.
So even files which whose content was not actually modified wind up looking
changed according to an ls -l command.
If you applied the three sed commands mentioned above, you would wind up with:
The original, unmodified file is found at sample.sh.bak while modified intermediate
files are named sample.sh.bak2 and sample.sh.bak3
The final sample.sh contains:
#!/bin/bash
# SBATCH --mem=40GB
# SBATCH --partition=c-sas
cd /data/cms/c55548/cui/deeper/stilldeeper/further
for i in sample1 sample3 sample7
do
~/bin/myscript ${i}.sas7bdat
done
Modifying many files at once¶
Using a list of files found in a text file¶
You can make a list of file pathnames and process them with sed
using a for loop. This allows you to be certain about the files
you might change.
How to do that
Create a plain text file with one file on each line. The path to that file
needs to either be absolute or, if relative, be correct relative to the
directory in which you execute your for loop command.
./myproj/sample15/dog.R
/data/cms/c34345/joe/sample97/deer.R
./myproj/sample15/mouse.sh
sed:
% for i in `cat myinput.txt` ; do echo $i ; sed -i.orig 's/pat1/pat2/gi' ${i};done
You can also use other commands to create a list of files in a file, then edit that list to include only the files you want to modify. Or simply verify for yourself that you're listing only the right type of files.
How to do that
ls -1 */script* journey/*.scr > myinput.txt
nano myinput.txt
% for i in `cat myinput.txt` ; do echo $i ; sed -i.orig 's/pat1/pat2/gi' ${i};done
Using the find command¶
The find command will print the names of files which match the criteria you
specify. Find's first argument is a location in the file system. By default find
will start a recursive search starting at that point and proceeding downwards. That
location can be "." if you want to start in the current working directory.
The following command will change a string, if found, in as many files which exist that match the search criteria.
find ~/project1 -type f -name “*\.html” -print0 | \
xargs -0 sed -i.orig 's/badstring/mygoodstring/g'
That long command is broken down into two lines to improve readability using an escape character.
Escaping special characters
The bash shell you use gives certain characters special meanings. If you want
to prevent them from being interpreted in the normal fashion, usually you
preceed that character with a \. This is called "escaping" a
character.
The Enter key is one of those special characters. If you escape that key, then bash will allow you to continue a command onto the next line of the terminal window, where you indicate to bash that it should execute the command now with a final Enter.
You also need to escape characters inside of regular expressions, which has it's own list of special characters, include . The string "*.html" includes a backslash to escape the period, which in REs stands for "any single character", not a period!
Breaking it down, left to right:
- start looking at the directory named "project1" inside of your home directory
- only consider files (not directories)
- only consider files ending in ".html"
- print names of matching files, in a way that works better with names containing spaces
- pass those names to the
xargsprogram - with the "-0" flag, which works with the
findprogram's "-print0" flag - and execute the sed program.
- Because the sed flag "-i.orig" is given, changes will be made to the named files where they are located. Backups will be made of each file processed.
Find all bash scripts matching certain criteria, and fix a path
find ~/project1 -type f -iname "run_test*" -print0 | xargs -0 file | \
grep -i Bourne-Again | sed -e 's(: .*((' | \
xargs sed -i 's(/cms01/incoming(/transfer/in/cms(g'
This example finds some files using a case-insensitive name, runs the file
command on them, searches for ones that find reports to be Bourne-Again,
uses sed to trim off everything file printed after the file name, then
has xargs execute the final sed
Find command examples¶
You might find combinations of these useful when trying to identify the files you want to modify with sed, or control which files are considered.
Print out the names of bash scripts in your home directory
Note that this command does NOT correctly deal with filenames containing spaces!
You should use instead the example above where "find -print0" feeds into "xargs
-0"
find ~ -type f -exec file {} \; | grep -i Bourne-Again | sed -e 's(: .*(('
Print out the sizes of SLURM job output files owned by user c-jxu123-55548:
find /users/cms/c55548/shared -type f -name "*.out" -exec du -sh {} \;
Print the names of files modified after noon of 9/11/2025
% touch -t 202509111200 911-noon
% find . -newer 911-noon -print
Print the names of files modified in February of this year:
find . -newermt "2025-02-01" -a ! -newermt "2025-02-28" -print
Print the names of files created in February of this year:
find . -newerct "2025-02-01" -a \! -newerct "2025-02-28" -print
Print the names of executable files modified in the last 24 hours:
find . -mtime 0 -executable
Print out the names of files containing either "School" or "semester":
find . -name "*School*" -o -name "*semester*" -print
Print out the names of files containing both "School" and "semester":
find . -name "*School*" -a -name "*semester*" -print
Print out the names of files containing either "School" or "semester" but NOT susyq:
find . \( -name "*School*" -o -name "*semester*" \) -a \! -name 'susyq' -print
Print the names of files except those found in directories named "bak":
find . \( -type d -name bak -prune \) -o -type f -print
Print out the names of every directory in your home directory down to the third level:
find ~ -type d -maxdepth 3
Print out the names of every immediate subdirectory of your home directory:
find ~ -type d -maxdepth 1
Print the names of files named doghouse using a case-insensitive form of -name:
find /some_dir/susy -iname doghouse -print
Useful sed commands¶
For grins, in case any of them strike your interest.
Delete completely blank lines
sed '/^$/d'
Delete commented-out lines (only where # is the first character of a line)
sed '/^#/d'
Delete commented-out lines (where # is the first non-whitespace character of a line)
sed '/^[[:space:]]*#/d'
Delete white space (spaces and tabs)
sed '/^[[:space:]]*$/d'
Convert tabs into 6 spaces
sed '/\t/ /'
Delete the first seven line
sed '7d'
Delete specified lines (5 thru 10, and 12)
sed -i -e '5,10d;12d' filename