Monthly Archives: December 2017

#Excel: Extracting an Array of Words From a Sentence

 

The following formula will create an array of words from a string (sentence).

= ArrayFromSDS(TEXTJOIN(“”,,IF(MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)=” “,” “,IFERROR(CHAR(64+MATCH(MID(SUBSTITUTE(SUBSTITUTE(A1,”?”,”-“),”*”,”-“),ROW(INDIRECT(“1:”&LEN(A1))),1),CHAR(64+ROW(INDIRECT(“$1:$26″))),0)),””))))

Here is an explanation of how it works.

The array used in the 3rd argument of the TEXTJOIN function starts with the 1st part of the IF formula, shown below.

IF(MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)=” “,” “,

which keeps any space from the string in cell A1. The rest of the IF formula

IFERROR(CHAR(64+MATCH(MID(SUBSTITUTE(SUBSTITUTE(A1,”?”,”-“),”*”,”-“),ROW(INDIRECT(“1:”&LEN(A1))),1),CHAR(64+ROW(INDIRECT(“$1:$26″))),0)),””)

returns only letters from the string in A1. The 1st argument of the MATCH function in this construction,

MID(SUBSTITUTE(SUBSTITUTE(A1,”?”,”-“),”*”,”-“), ROW(INDIRECT(“1:”&LEN(A1))),1)

is very similar to the 1st part of the IF function, but it has one important difference.

Instead of using the string from cell A1, the formula SUBSTITUTE(SUBSTITUTE(A1,”?”,”-“),”*”,”-“) is used instead. The reason for doing this is that the MATCH function recognizes the * and ? symbols as wildcard searches. So, if the 2nd argument of the MATCH function does NOT contain a * or ?, the character “A” will be returned instead (if not removed from the core string).

The 2nd argument of the MATCH function is

CHAR(64+ROW(INDIRECT(“$1:$26”)))

which returns an array of letters from A TO Z.

The result of the MATCH function is an array with numbers from 1-26 for positions in the string with letters and “” if not. For example, if the string in A1 is “ AAx,d a.”x~y*z”. c?e! ”, then the MATCH array will return

{1;1;24;#N/A;4;#N/A;1;#N/A;#N/A;24;#N/A;25;#N/A;26;#N/A;#N/A;#N/A;3;#N/A;5;#N/A}

This array is particularly useful in this specific case, since the numbers can be converted to the letters in the string by using the CHAR function (along with the IFERROR function to turn errors to an empty string). That converts the array to

{“A”;”A”;”X”;””;”D”;” “;”A”;””;””;”X”;””;”Y”;””;”Z”;””;””;” “;”C”;””;”E”;””}

Now, this array can be used as the main argument in the TEXTJOIN function to afford the string

AAXD AXYZ CE

Now, using the ArrayFromSDS user-defined function (shown below)

Function ArrayFromSDS(MyString As String)

ArrayFromSDS = Split(MyString, ” “)

End Function

produces this array of words.

{“AAXD”,”AXYZ”,”CE”}

HTH!

 

Advertisements