#Excel: Extracting an Array of Words From a Sentence

 

The following formula will create an array of words from a string (sentence).

= ArrayFromSDS(TEXTJOIN(“”,,IF(MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)=” “,” “,IFERROR(CHAR(64+MATCH(MID(SUBSTITUTE(SUBSTITUTE(A1,”?”,”-“),”*”,”-“),ROW(INDIRECT(“1:”&LEN(A1))),1),CHAR(64+ROW(INDIRECT(“$1:$26″))),0)),””))))

Here is an explanation of how it works.

The array used in the 3rd argument of the TEXTJOIN function starts with the 1st part of the IF formula, shown below.

IF(MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1)=” “,” “,

which keeps any space from the string in cell A1. The rest of the IF formula

IFERROR(CHAR(64+MATCH(MID(SUBSTITUTE(SUBSTITUTE(A1,”?”,”-“),”*”,”-“),ROW(INDIRECT(“1:”&LEN(A1))),1),CHAR(64+ROW(INDIRECT(“$1:$26″))),0)),””)

returns only letters from the string in A1. The 1st argument of the MATCH function in this construction,

MID(SUBSTITUTE(SUBSTITUTE(A1,”?”,”-“),”*”,”-“), ROW(INDIRECT(“1:”&LEN(A1))),1)

is very similar to the 1st part of the IF function, but it has one important difference.

Instead of using the string from cell A1, the formula SUBSTITUTE(SUBSTITUTE(A1,”?”,”-“),”*”,”-“) is used instead. The reason for doing this is that the MATCH function recognizes the * and ? symbols as wildcard searches. So, if the 2nd argument of the MATCH function does NOT contain a * or ?, the character “A” will be returned instead (if not removed from the core string).

The 2nd argument of the MATCH function is

CHAR(64+ROW(INDIRECT(“$1:$26”)))

which returns an array of letters from A TO Z.

The result of the MATCH function is an array with numbers from 1-26 for positions in the string with letters and “” if not. For example, if the string in A1 is “ AAx,d a.”x~y*z”. c?e! ”, then the MATCH array will return

{1;1;24;#N/A;4;#N/A;1;#N/A;#N/A;24;#N/A;25;#N/A;26;#N/A;#N/A;#N/A;3;#N/A;5;#N/A}

This array is particularly useful in this specific case, since the numbers can be converted to the letters in the string by using the CHAR function (along with the IFERROR function to turn errors to an empty string). That converts the array to

{“A”;”A”;”X”;””;”D”;” “;”A”;””;””;”X”;””;”Y”;””;”Z”;””;””;” “;”C”;””;”E”;””}

Now, this array can be used as the main argument in the TEXTJOIN function to afford the string

AAXD AXYZ CE

Now, using the ArrayFromSDS user-defined function (shown below)

Function ArrayFromSDS(MyString As String)

ArrayFromSDS = Split(MyString, ” “)

End Function

produces this array of words.

{“AAXD”,”AXYZ”,”CE”}

HTH!

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s