I have been meaning to note down my unix checklist of commands (For Osx) which are very handy for basic operations on data. I will modify this post as and when I remember or come across something that fits here. These *nix commands are specifically tested for Mac OS.
uniq - This is the unix unique function which can be primarily used to remove duplicates from a file amongst other things. The file has to be pre sorted for uniq to work
Consider file test which contains the following
Count occurences of each item
Print only duplicate items in file
Print only unique lines
Consider test now contains
Remove duplicate case insensitive.
This file is not sorted though. So it has to be sorted first before uniq. -i flag is for case in sensitive
Convert all upper case in fileA to lower case and output as fileB
Compare two files and keep strings present in fileA but not in fileB
Compare two files and keep strings present in fileB but not in fileA
Compare two files and keep only strings which are present in both files
Primary purpose of sed is string replacement or pattern replacement.
Consider the following file as input
- Replacing or substituting string By default, the sed command replaces the first occurrence of the pattern in each line and it won't replace the second, third...occurrence in the line. Here the "s" specifies the substitution operation. The "/" are delimiters. The "unix" is the search pattern and the "linux" is the replacement string. If you miss a delimiter then the expression errors out as below 2 Replacing the nth occurrence of a pattern in a line. Use the /1, /2 etc flags to replace the first, second occurrence of a pattern in a line. The below command replaces the second occurrence of the word "unix" with "linux" in a line. Here is the first occurence which is the default option And the third occurence To replace all the occurence use 'g' (global replacement) To make the search case insensitive sed on mac does not have a flag but you can use plain regex to achieve it. For example modify the file.txt to below How to find a string in all the files contained in a directory. You could use grep or find. To find/replace multiple strings use the -e flag. To replace a string that begins with a pattern use the regex for it alongwith sed To remove whitespace characters at end of the line Unix command to know if your file has whitespace or tab characters Unix command to remove BOM (Byte Order Mark) characters from your file Open the file in binary mode using -b flag to verify if you have BOM. And then remove them Use the -i flag to overwrite the existing file and create a backup of the original file. For example to remove all white spaces in a file. This will create a backup file called file.txt.bak with the original file contents and overwrite file.txt with no spaces To remove only the trailing spaces in a line use *$. The * character means "any number of the previous character" and $ refers to end of line. Verify the trailing whitespaces are removed by :set list To replace a blank line with something else. You can match a blank line by specifying an end-of-line immediately after a beginning-of-line, i.e. with ^$ To remove tabs at the end of a line. Ex: Add a tab to the end of first line, so :set list will show ^I To create a tab in your sed command. use ctrl + v and then ctrl + i Consider file test which contains the following To extract the content after firstname
Total occurences of searchStr in current directory
Total number of files where searchStr occurs in current directory
To get an exact word match use the -w flag.
Recursively replace string original with replacement in all files under OSx directory mydir recursively(Excludes hidden files and folders)
The regex excludes all hidden files and folders which is particularly important if you want to avoid messing up your .DS_Store or .git files unknowningly.
if you use zsh then the following would also work
This isnt exlcuding hidden files though. The */(D*) is basically zsh way of saying recursively go through all sub directories and all files.