Solving an XML Entity Deserialization Issue

featured
December 31, 2016
💫 Originally posted here. Broken? Let me know ~

I’ve recently released a new version of MyAnimeListSharp and I’d like to talk about a challenge I faced while implementing it.

MAL (MyAnimeList.net) API returns search responses in an XML format instead of in JSON. To make library users’ lives easier, I decided to deserialize the XML response into an object (either as AnimeSearchResponse or MangaSearchResponse) for easier processing. Then Alas, I run into a problem. For some reason, I am not able to deserialize XML into an object due to undeclared XML entities such as (&mdash;), < (&lt;) or  >(&gt;), etc…

Here is the edited sample response from MAL API for an anime search (“synopsys” section usually contains undeclared XML entities)

 

Hacking begins…

 

Here is the run-down of SearchResponseDeserializer.Deserialize.

  1. Given the response string in XML format
  2. Disable undeclared entity check
  3. Deserialize.

The part I was having trouble figuring out was #2, disabling undeclared entity check. There is a limit to replacing all entities as an empty string and that solution is just not optimal since one never knows when XML response will change to return other unknown XML entities.

I looked for an alternative in .NET documentation. There were no properties to set or functions to call to disable the entity check. But I’ve found a way in one of StackOverflow answer (by Sam Harwell who is a Microsoft MVP in .NET), which discusses how to use reflection to set an internal variable to bypass entity check.

 

XmlReader does not expose a property DisableUndeclaredEntityCheck publicly so it needs to be turned on using reflection. The property name is aptly named since you can guess what it does from the name.

I’ve never hacked my code this bad by having to set an internal property in .NET library. What I’ve learned from this challenge was that this experience has broadened my horizon that learning the internal of a framework can be useful in certain scenarios even though messing around with internal details is not a good idea most of time.