String Splitting and Trimming


Elixir for Programmers, Second Edition

I had some confusion with the String.split function in yesterday’s exercises. The documentation says that the trim: true option removes empty strings from the resulting list.

The important term here is “empty”, not “blank” or “whitespace”. My Python-brain was still thinking of trim (or strip) options meaning whitespace is stripped from start/end of the source string.

For example:

iex(1)> "  bobby  jones  " |> String.split("", trim: true)
[" ", " ", "b", "o", "b", "b", "y", " ", " ", "j", "o", "n", "e", "s", " ", " "]

What I found more confusing, is the behavior when the trim: true option is not used when splitting a string into graphemes:

iex(2)> "  bobby  jones  " |> String.split("")
["", " ", " ", "b", "o", "b", "b", "y", " ", " ", "j", "o", "n", "e", "s", " ",
 " ", ""]

iex(3)> "" |> String.split("")
["", ""]

Splitting an empty string into graphemes returns two empty strings.

I found a discussion on Elixir Forum where @sergio asked this question and @whatyouhide from the Elixir core team answered:

“You can think of it as there’s an empty string between every pair of graphemes in a string. There’s also an empty string between the start of the string and the first grapheme, sort of. It also helps with having some properties that hold true for String.split/2.”

All notes and comments are my own opinion. Follow me at @rgacote@genserver.social