SAS Remove Special Characters from String

How to: check for, identify and remove non-alphanumeric characters when working with strings in SAS base code.

About

The below SAS code is one way to remove non alphanumeric characters from a string when working with SAS code. It utilises the prxchange() function to remove any characters not specified in the perl regular expression argument.

Code

Below is the same SAS code showing you how to use the prxchange() and prxmatch() functions in a datastep.

Output

Explanation

At line 18 we use prxmatch() to identify if our string contains any special characters according to the regular expression passed as argument one /[^A-Z 0-9]/i. This returns true or false and basen on this result we update column special_char_found.

Then if the result of the prxmatch() condition is true, we use prxchange() to first extract all special characters (populating special_chars_list) and secondly to create a new string without any special characters (clean_string). Not the differece between the two is the use of the ^ symbol in the regular expression.

Additionally if you wanted to add a character to ignore like for example an underscore the code from line 30 onwards shows you how. Like wise if you wanted to remove spaces form the string and only keep underscores you could update the regular expression to look like this /[^A-Z0-9_]/i.

Links

  • SAS prxchange() documentaion link.
  • SAS prxmatch() documentaion link.
  • SAS home page link,

About the author

Leave a Reply

Your email address will not be published. Required fields are marked *