Usage

As with the XmlSerializer, it is a requirement that the target Type expose a default paramaterless constructor.

Usage is then a two stage process. Firstly, create a CsvSerializer instance of the target Type. Then secondly, call either the Serialization or Deserialization methods with a valid stream object.

The Serialization method will of course require the list of items to be serialized:

IList<Person> data = GetPeople();
using (var stream = new FileStream("persons.csv", FileMode.Create, FileAccess.Write))
{    
    var cs = new CsvSerializer<Person>();
    cs.Serialize(stream, data);
}

While the Deserialization method will return a list of objects of the target Type:

IList<Person> data = null;
using (var stream = new FileStream("persons.csv", FileMode.Open, FileAccess.Read))
{
    var cs = new CsvSerializer<Person>();
    data = cs.Deserialize(stream);
}

Syntactical & Functional Options

  1. Separator: The CSV Separator character (default: comma).
  2. Replacement: Replacement string if the Separator character appears in a field value (default: ((char)0x255).ToString()).
  3. NewLineReplacement: Replacement string if a NewLine appears in a field value (default: ((char)0x254).ToString()).
  4. UseLineNumbers: An additional column can be inserted into the CSV for Row Number (default: true).
  5. UseEofLiteral: An EOF Literal can be used to mark the end-of-file (default: false).
  6. IgnoreEmptyLines: If not true then exceptions will be thrown on encountering blank lines during deserialization (default: true).
  7. IgnoreReferenceTypesExceptString: Exclude reference type properties other than string. (default: true).
  8. RowNumberColumnTitle: Title of the Row-Number column if UseLineNumbers is true (default: "RowNumber").

The Separator and Replacement Options are already considered by exposing the Separator and Replacement Properties, so that the consumer is able to use any other character for the delimiter and set the Replacement string.

It is important that the consumer understand the range of values for the types being serialized such that they specify a Replacement string that cannot occur in the data range, otherwise valid instances of the Replacement string appearing in normal data will be erroneously converted to the separator character on deserialization.

The RowNumberColumnTitle will work in conjunction with the UseLineNumbers Option and introduce an additional first-column into the CSV file containing the row number.

It was found that CsvIgnore was consitently applied to Reference Types such that Reference Types make little sense in mose CSV serialization contexts. This statement does not apply to the String Type however. So it appeared to be sensible to include this property to allow the automatic exclusion of all Reference Types other than String.

It was specified in the requirements that prompted the creating of this class that the last line of the CSV file contain the string literal "EOF". This is not common CSV practice and so an option is introduced to allow for this possibility. The implementation simply add one additional line to the CSV text during serialization. And allows for the possibility of an EOF line during deserialization.

The IgnoreEmptyLines option allows the deserializer to ignore empty or corrupt lines of text. Otherwise an InvalidCsvFormatException is thrown.

Design

We would expect such a serializer to deliver the following abilities:

  1. Serialize lists of a specified Type to a flat CSV file.
  2. Reconstitute objects from the same (or similarly constructed) file.
  3. Mimic the usage of the .NET Xml Serializer, or Binary Formatter, in writing-to and reading-from a stream object.
  4. Nominate properties to Ignore from the target Type to be serialized using a CsvIgnore attribute.
  5. Allow the user to specify a delimiter character other than the default comma character.

Concerns

The following are concerns that represent risks and may not not be apparent at the outset:

  1. The separator character may appear in the value of a property being serialized. This will cause the resulting row to appear to have more columns than expected.
  2. Likewise, a NewLine may appear in the value of a property. This will cause the resulting row to span multiple rows.

Both of these problems will render the row (if not the entire CSV file) unreadable, and so must be dealt with.

We solve both these problems with the same approach: replace the offending character sequence with some User-Specified string. On serialization each occurence of the problem characters is to be replaced with a dummy character, and then on deserialization, the original character replace the dummy characters, and the original string is restored.

The User needs to be aware that in each of these cases the replacement string must not possibly appear in the normal data to be serialized. Otherwise deserialization will replace valid occurrences of the replacement string with either the Separator or NewLine characters.

CSV Serialization

Given that CSV is a flat-file structure, we do not have to consider deep object serialization. Thus, associations will be rendered as a single text value.

The CSV structure also implies that the data to be serialized should be a list of objects, so that the Serialization method is to receive an IList of the target Type. The Deserialization method will return such a list.

The Class

From Design Requirements 1,2,3 and 5, and the discussion so far, we can define some of the main requirements for the class.

The class will:

  • Expose a delimiter character (comma by default).
  • Expose a Replacement string
  • Reference the target Type to be serialized/deserialized by generic parameter T.
  • Build and maintain a list of PropertyInfo for the properties to be serialized/deserialized.
  • Expose a Serialization method that accepts a stream object and an IList of data.
  • Expose a Deserialization method that accepts a stream object and returns an IList of data.

We can now define this rough outline as follows:

public class CsvSerializer<T> where T : class, new()
{
   public char Separator { get; set; }

   public string Replacement { get; set; }

   private List<PropertyInfo> _properties;   

   public void Serialize(Stream stream, IList<T> data) { }

   public IList<T> Deserialize(Stream stream) { } 

   .
   .
   .
}

List of PropertyInfo

The first step carried-out by the class will be to assemble a list of properties that are to be rendered to CSV (in the form of a List of PropertyInfo). This step will be executed on construction.

This list of properties is used for three purposes.

  1. To create the CSV Header,
  2. To get instance values by reflection on serialization.
  3. To set instance values by reflection on deserialization.
public CsvSerializer()
{
   var type = typeof(T);

   var properties = type.GetProperties(BindingFlags.Public | BindingFlags.Instance 
		| BindingFlags.GetProperty | BindingFlags.SetProperty);

   _properties = (from a in properties
                  where a.GetCustomAttribute<CsvIgnoreAttribute>() == null
                  orderby a.Name
                  select a).ToList();
}

CsvIgnoreAttribute

As with XmlSerializer, we want to be able to nominate properties to exclude from serialization by decorating the property with a CsvIgnore attribute.

The CsvIgnore attribute is defined simply as follows:

public class CsvIgnoreAttribute : Attribute    { }

CSV Header

The header row for the file will be a comma-separated string of the names of the properties. The list of properties will already be sorted by name, so we can be confident that the column order will match the order of values when it comes time to render each row.

We create a private GetHeader method that returns the header row as a string:

private string GetHeader()
{
    var columns = Properties.Select(a => a.Name).ToArray();
    var header = string.Join(Separator.ToString(), columns);
    return header;
}

The Serialization Method

Once the CSV Header row is created, the serializer iterates through the enumerable list of data items, and for each item retrieves the values for all properties in the list of PropertyInfo (discussed above). The values are added to a string array which is finally joined by the Separator character.

The serialization method itself accepts a stream object and an IList of objects of Type T.

After iterating over the list and generating each row, it uses a stream writer to write the final CSV text to the stream object.

public void Serialize(Stream stream, IList<T> data)
{
    var sb = new StringBuilder();
    var values = new List<string>();

    sb.AppendLine(GetHeader());

    var row = 1;
    foreach (var item in data)
    {
        values.Clear();

        foreach (var p in _properties)
        {
            var raw = p.GetValue(item);
            var value = raw == null ? 
                        "" :
                        raw.ToString().Replace(Separator.ToString(), Replacement);
            values.Add(value);
        }
        sb.AppendLine(string.Join(Separator.ToString(), values.ToArray()));
    }

    using (var sw = new StreamWriter(stream))
    {
        sw.Write(sb.ToString().Trim());
    }
}

The Deserialization Method

The Deserialization method accepts a stream object that represents a CSV text file.

We read the first line as the CSV Header row. The header is split with the Separator character, then stored in a columns array, and later indexed to reference the Properties by name.

The remaining text is split by NewLine and stored in 'rows'. Each row is then split using the Separator character. The resulting parts then represent the values to be deserialized to a new instance of the Type T. The order of these values will appear in the same order as the columns array so that we can retrieve the column name by index, and use the column name to retrieve the corresponding PropertyInfo from the PropertyInfo list.

Each string part can then be converted to the correct value by using the .NET TypeConverter for the property and stored in the new instance of the target object using the PropertyInfo SetValue method.

public IList<T> Deserialize(Stream stream)
{
    string[] columns;
    string[] rows;

    try
    {
        using (var sr = new StreamReader(stream))
        {
            columns = sr.ReadLine().Split(Separator);
            rows = sr.ReadToEnd().Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
        }
    }
    catch (Exception ex)
    {
        throw new InvalidCsvFormatException(
                "The CSV File is Invalid. See Inner Exception for more inoformation.", ex);
    }

    var data = new List<T>();
    for (int row = 0; row < rows.Length; row++)
    {
        var line = rows[row];
        if (string.IsNullOrWhiteSpace(line))
        {
            throw new InvalidCsvFormatException(string.Format(
                    @"Error: Empty line at line number: {0}", row));
        }

        var parts = line.Split(Separator);

        var datum = new T();
        for (int i = 0; i < parts.Length; i++)
        {
            var value = parts[i];
            var column = columns[i];

            value = value.Replace(Replacement, Separator.ToString());

            var p = _properties.First(a => a.Name == column);

            var converter = TypeDescriptor.GetConverter(p.PropertyType);
            var convertedvalue = converter.ConvertFrom(value);

            p.SetValue(datum, convertedvalue);
        }
        data.Add(datum);
    }
    return data;
}

Last edited Mar 26, 2013 at 11:30 PM by cybercortex, version 3

Comments

edokt Dec 31, 2014 at 4:04 AM 
Hello
One question please : is it compatible with 4.0 Framework?
When i try to compile it with this version i have the folowing error on GetCustomAttribute():


Error 1 'System.Reflection.PropertyInfo' does not contain a definition for 'GetCustomAttribute' and no extension method 'GetCustomAttribute' accepting a first argument of type 'System.Reflection.PropertyInfo' could be found (are you missing a using directive or an assembly reference?) C:\Users\Administrator\Downloads\Csv.Serialization\Csv.Serialization\CsvSerializer.cs 118 14 Csv.Serialization