Rtf stripper
Have you ever had the need to parse an rtf file to normal text ?
One option (and the easiest one) is just using the RichTextControl, setting its RTF property and getting its text property.
If however your not in a UI project it just smells to add a controls reference to do just a simple parsing, or to be more precise a simple stripping.
Since one of my collegues needed such functionality we ended up converting an old c function to the following.
public static String Strip(String rtf)
{
String strCopy = "";
bool slash = false; //indicates if backslash followed by the space
bool figure_opened = false; //indicates if opening figure brace followed by the space
bool figure_closed = false; //indicates if closing brace followed by the space
bool first_space = false; //the else spaces are in plain text and must be included to the result
int length = rtf.Length;
if (length < 4) return "";
int start = 0;
int k = 0;
start = rtf.IndexOf(@"\pard");
if (start < 1) return "";
char ch;
for (int j = start; j < length; j++)
{
ch = rtf[j];
if (ch == '\\')//we are looking at the backslash
{
first_space = true;
slash = true;
}
if (ch == '{')
{
first_space = true;
figure_opened = true;
}
if (ch == '}')
{
first_space = true;
figure_closed = true;
}
if (ch == ' ' && rtf.IndexOf(@"\datafield", j - 10) + 10 != j)
{
slash = false;
figure_opened = false;
figure_closed = false;
}
if (ch == '\\' && rtf[j + 1] == '{') //if the text contains symbol '{'
{
slash = false;
figure_opened = false;
figure_closed = false;
first_space = false;
strCopy += '{';
j++; k++;
continue;
}
if (ch == '\\' && rtf[j + 1] == '}') //if the text contains symbol '}'
{
slash = false;
figure_opened = false;
figure_closed = false;
first_space = false;
strCopy += '}';
j++; k++;
continue;
}
if (ch == '\\' && rtf[j + 1] == '\\')//if the text contains symbol '\'
{
slash = false;
figure_opened = false;
figure_closed = false;
first_space = false;
strCopy += '\\';
j++;
continue;
}
if (rtf.IndexOf("\\par ", j) == j && rtf.IndexOf("\\pard", j) != j)//if there is next line of text
{
slash = false;
figure_opened = false;
figure_closed = false;
first_space = false;
strCopy += '\n';
j += 4;
continue;
}
if (rtf.IndexOf("HYPERLINK", j) == j)
{
int i = rtf.IndexOf('"', j) - j + 1;
while (rtf[j + i] != '"')
{
i++;
}
j = j + i + 1;
continue;
}
if (slash == false && figure_opened == false && figure_closed == false && ch != '\n' /*&& ch!=13*/ && rtf.IndexOf("HYPERLINK", j + 1) != j + 1)
{
if (!first_space)
{
strCopy += ch;
}
else
{
first_space = false;
}
}
}
return strCopy;
}
Hope it can save somebody half an hour
Cheers Stefan