Updated: 5/5/17
- Better handling of diacritics in sample
I’ve just discovered what a Slug is:
Some systems define a slug as the part of a URL that identifies a page in human-readable keywords.
It is usually the end part of the URL, which can be interpreted as the name of the resource, similar to the basename in a filename or the title of a page. The name is based on the use of the word slug in the news media to indicate a short name given to an article for internal use.
I needed to know this as I’m particapting in the Realworld example projects and I’m doing a back end for ASP.NET Core.
The API spec kept saying slug, and I had a moment of “ohhh, that’s what that is.” Anyway, I needed to be able to generate one. Stackoverflow to the rescue!: https://stackoverflow.com/questions/2920744/url-slugify-algorithm-in-c
Also, decoding random characters from a lot of languages isn’t straight forward so I used one of the best effort implementations from the linked SO page: https://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net
Now, here’s my Slug generator:
//https://stackoverflow.com/questions/2920744/url-slugify-algorithm-in-c //https://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net public static class Slug { public static string GenerateSlug(this string phrase) { string str = phrase.RemoveDiacritics().ToLower(); // invalid chars str = Regex.Replace(str, @"[^a-z0-9\s-]", ""); // convert multiple spaces into one space str = Regex.Replace(str, @"\s+", " ").Trim(); // cut and trim str = str.Substring(0, str.Length <= 45 ? str.Length : 45).Trim(); str = Regex.Replace(str, @"\s", "-"); // hyphens return str; } public static string RemoveDiacritics(this string text) { var s = new string(text.Normalize(NormalizationForm.FormD) .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark) .ToArray()); return s.Normalize(NormalizationForm.FormC); } }
I think the following code accomplishes the same result and is simpler and somewhat faster(and does not require an extra package for code pages):
public static class Slug
{
private const int MAX_STRING_LENGTH = 45;
private static readonly Regex InvalidCharsRegex = new Regex(“[^a-z0-9\s-]”, RegexOptions.Compiled);
private static readonly Regex MultipleWhitespacesRegex = new Regex(“\s+”, RegexOptions.Compiled);
}
LikeLike
Actually, I think we’re both wrong. Either of our solutions will properly get rid of accents.
This linked SO post as more about it: https://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net
I should redo my solution.
LikeLike
Added a slightly better implementation
LikeLike