There’s a lot of data out there for the taking, if you only know how to get it. There is a treasure trove of government-based data sets available to the public. There’s a lot of information free for the taking you might want to know, or add to your app.
There’s also many different ways to get that data. Three data formats CSV, XML,and JSON all provide great ways to getting that data. Some data portals are one type, others you have the option of all of them. In our last lesson, we began to look at the CSV file format with a data file in the app bundle. In this lesson we’ll dig deeper into CSV files, and learn to load them from an external source.
The CSV Data Format
We don’t often have just text in a text file. Instead, we use a variety of arrangements to store other types of data than text. One of the oldest of these arrangements is the CSV, or comma separated value file. Here’s an example.
Title,data1,data2 Hello Pizza!,24,42 Ice Cream,16,32
As the name implies, we separated out three values per line by a comma. The CSV file represents this table:
Title | data1 | data2 |
---|---|---|
Hello Pizza! | 24 | 42 |
Ice Cream | 16 | 32 |
While standards exist, you’ll rarely find anyone following them. There is no real standard for CSV that everyone follows. You’ll find several variations. Fortunately there are a few rules that tend to be true. A row of data usually ends with a newline character \n
. The elements, which I’ll often call fields of the row have some character to delimit them. The most popular is the comma, but tab characters and the pipe character | show up too. Once again, it’s important to know what you are dealing with. Notice the first line of our CSV file has a description, often referred to as the header, of the column of data. That is helpful, and you’ll find it often in data sets, but it isn’t always there. When writing my own, I tend to include it. Sometimes it is the only documentation about this file.
Now consider this file:
Title,data1,data2 Hello, Pizza!,24,42 Ice Cream,16,32
Line 2 of this file has a problem. The text Hello, Pizza!uses a comma. In CSV, this means we have four fields in our first row of data, and that messes up our table.
Title | data1 | data2 | |
---|---|---|---|
Hello | Pizza! | 24 | 42 |
Ice Cream | 16 | 32 |
There are two approaches in CSV to handle this. One is to change the delimiter from a comma to something not often used as a character. Two common ones are the tab character (Often referred to a Tab-Demlimited) or a pipe |
character.
Title|data1|data2 Hello, Pizza!|24|42 Ice Cream|16|32
The most common way to handle the problem is to use quotes. Quotes in a CSV file mean this is a string and to include everything in between the quotes as a single field.
"Title","data1","data2" "Hello, Pizza!",24,42 "Ice Cream",16,32
Some implementations of code are smart enough to know quotes are strings, numbers without decimal points are Int
, and with decimal points are Double
. For reasons I’ll discuss later, I usually don’t make things this smart.
Many web data portals such as the Socrata.com platforms listed in the Open Data Network get paranoid about the whole thing and put everything in quotes.
"Title","data1","data2" "Hello, Pizza!","24","42" "Ice Cream","16","32"
We’ll be working with files of this last kind. However, any one of these are possible. One of may many rules about using data is always check the specs and context of your data set. Look at that data and code appropriately. Which leads to a disclaimer here. This demo will lead to one way of handling data. There are as many ways of handling data in CSV as there are datasets. For any project, you’ll have to do some work in getting it to work with your data set.
This by the way is why I’m not a big fan of 3rd party libraries. I simply don’t trust them. A lot of talented people did a lot of good work, but I can’t tell that without so much testing and reading of code that I’d write my own a lot faster and keep it up to date a lot easier and faster.
Biking to the Center of the Universe
The Fremont neighborhood of Seattle has proclaimed itself the Center of the Known Universe. This quirky neighborhood has its own troll, a rocket, and has a statue of Vladimir Lenin. Adobe and Google both have offices here. Fremont also has one of the few bridges between the northwest neighborhoods of Seattle and downtown. In October 2012, Seattle installed bike sensors on the sidewalks of the bridge to count the number of bicycles that cross the bridge. That data is a rather simple data set where we can see patterns on bicycle traffic over this bridge. In another post, I explained how to download data to a file, which you could then put into the app bundle. That does not give us updated information. To get updated information, we poll the website directly, which is what we’ll do in this lesson.
Setting up the Storyboard
For this project we’ll use a very simple storyboard. I could do this in the playgrounds, but I find them very slow in compiling. We’ll display to a text view. Open a new project in Xcode called CSVWebDemo. Set the language to Swift and a Universal device. Save the file and go to the storyboard. Change the background color of the view to Light gray in the attributes inspector. Find the Text View in the object library
Drag the text view to the center of the storyboard.
On the auto layout toolbar in the lower right of the storyboard, click the pin button . Pin the view 10 Points up, 0 points right, 0 points left, and 20 points down. Be sure to press Tab after each field to make sure the entry takes. Set the Update Frames to Items of New Constraints.
Click the Add 4 Constraints button. We get this:
Open the assistant editor. Control drag from the text view to the ViewController
class code and make an outlet named textView.
Close the assistant editor.
The CSVDemo Class
We’ll make a new class for our data model. Press Command-N and make a new Cocoa Touch class CSVDemo subclassing NSObject
. Add three properties to the class
var data:[[String:String]] = [] var columnTitles:[String] = [] var columnType:[String] = ["NSDate","Int","Int"]
This class will read the CSV as a text file and store it in a string. We’ll convert it to array containing a data dictionary stored as strings. To help us access that data easily, we have two additional arrays. columnTitles
is the header from the first row of the array. These are the keys for the dictionaries in data
. When we need to convert this to the actual data types for the columns, we have a second array for the type of the corresponding column. Looking at the data portal we are getting this data from, we determine those types.
If there’s API documentation, it will tell us the type:
However, you still might need to do some thinking. From this, I know my first field is an NSDate.
Since number
is a count of bikes, Int
is the best choice for a type for the other two fields.
Reading a Web File
Our first task is reading a text file as a string into the app. Add this code:
func readStringFromURL(stringURL:String)-> String!{ if let url = NSURL(string: stringURL) { do { return try String(contentsOfURL: url, usedEncoding: nil) } catch { print("Cannot load contents") return nil } } else { print("String was not a URL") return nil } }
We return an optional String
value from this method. This code could be two lines:
let url = NSURL(string: stringURL) return String(contentsOfURL: url, usedEncoding: nil)
The first line makes a NSURL
from the string we pass to it. The second line reads a text file and stores it in a string, returned in the function. It would be noce if this was all we needed, but it isn’t. Don’t trust anything when reading and writing data. Neither does iOS, and the String
initializer that reads data from an external source throws errors. We can have dozens of things go wrong between server, transmission, and client when reading data from the web. Swift forces us to use the do-catch
clause with this String
initializer. If the function throws an error, we return nil
and send a message to the console. We also check that our URL is well formed, THe NSURL
initilizaer first checks that the string we give it is really a URL If it isn’t, the method returns nil
. If it is properly formed, we enter the do-catch
clause to read the data with the String:contentsOfURL:usedEncoding
initializer.
Go to viewDidLoad
in the ViewController
class. Change viewDidLoad
to this. To make sure the URL is correct, I suggest copying and pasting this code.
override func viewDidLoad() { super.viewDidLoad() let csvDemo = CSVDemo() let stringURL = "https://data.seattle.gov/resource/4xy5-26gy.csv?$where=date%20between%20%272016-04-01T00:00:00%27%20and%20%272016-04-30T23:00:00%27&$order=date" textView.text = csvDemo.readStringFromURL(stringURL) }
We make an instance of our model and set a constant to the url we want. While I’ll go into the full anatomy of this URL in a separate lesson, there are a two important points to note. I could have written this
let stringURL = "https://data.seattle.gov/resource/4xy5-26gy.csv
which would have given me the full database of 31,000 records unsorted. That is a huge memory investment. Try to load a little as possible and try to avoid heavy processing on the device. Instead leave that heavy lifting to the server, which is what the rest of this URL is about. The API for the Socrata platform allows a subset of SQL call SoQL for retrieving data, so the filtered data is April 2016 and sorted it by date.
We used a https
in this url. Pay attention if this is a http
or https
. iOS doesn’t trust http
. Since it is a less secure protocol, iOS and Foundation will reject your request if you use a site that loads from http. I discuss this in more details in my post about web views and security if you want to know how to bypass the iOS security.
Build and run. You’ll get this
I happen to like this view this in landscape. If you want to do that press Command-Right Arrow.
You can scroll down by swiping up. Scroll to see the last row of the data
Converting the Data to CSV
The data we have is one long string. We’d like it to be accessible data. There are many data structures use can use to do that. You can make a struct or class for the data in each row. For this file I could do this.
class DataRow:NSObject{ var date:NSDate var northCount:count var southCount:count }
Then keep the rows together in an array.
var dataTable = [DataRow]
If the data will be used in user interfaces, you want to use an array to arrange the rows. Most of the views that have large sets of data like UITableView
use arrays.
In our case we’ll use dictionaries. I prefer dictionaries since it’s easy to do repetitive tasks with them. In a class or struct I’d have to directly assign everything. In a dictionary, as long as I have an array of keys, I can make a loop to do that work for me. You could, of course, use an array instead of a dictionary, but Apple uses Dictionaries in some of the other data format conversions so it’s good to get used to it.
As I discussed in the last lesson, there are several steps to converting the string to a CSV data file.
- Clean the row delimiter to be
\n
- Break the string into an array of rows
- Loop through the array of rows
- Break the row into an array
- Place that array into a structure with an element in a new array.
- Return the new array
We’ll do the same here, with two changes: Our data has quotes around all the fields. We need to remove the quotes. We want our fields to be of type NSDate
and Int
instead of strings. We’ll add to 3.1 and 3.2 some code to accomplish these tasks.
Clean the Row Delimiter
First task is to clean the rows and return the separated rows. Add the following method to CSVDemo
.
func cleanRows(stringData:String)->[String]{ var cleanFile = stringData cleanFile = cleanFile.stringByReplacingOccurrencesOfString("\r", withString: "\n") cleanFile = cleanFile.stringByReplacingOccurrencesOfString("\n\n", withString: "\n") return cleanFile.componentsSeparatedByString("\n") }
Or row delimiter is \n
. We might have some people or systems who put \r\n
, \r
, or \n\n
instead. To make sure we have the correct delimiter, we replace the string using the String
function stringByReplacingOccurrencesOfString:withString:
so they are all \n
Once they are \n
, separate into an array of rows using the componentsSeparatedByString
string function and return the result
Clean the Rows of Quotes
Next we’ll clean individual rows and make a string array of them. This works the same way as the previous method. Add this method:
func getQuoteFieldsForRow(oldString:String) -> [String]{ let delimiter = "\t" var newString = oldString.stringByReplacingOccurrencesOfString("\",\"", withString: delimiter) newString = newString.stringByReplacingOccurrencesOfString(",\"", withString: delimiter) newString = newString.stringByReplacingOccurrencesOfString("\",", withString: delimiter) newString = newString.stringByReplacingOccurrencesOfString("\"", withString: "") return newString.componentsSeparatedByString(delimiter) }
Remember there’s a reason for the quotes. We want to properly reflect commas in the data. So we take the strategy we mentioned earlier of converting quotes to a delimiter we can’t use in the string. I use tabs that delimiter. I find every combination of quotes and change them to tabs or delete them.
If you are paying attention, you’ll might be wondering something. I said earlier that you should pay attention to the context of your data. Here I have no string types in my data and the comma is not going to show up. I could cut three lines of code from this and just delete the quotes. However, I’m also not sure that data won’t change and integers start using commas. We play it safe by setting things up this way.
Putting It All Together
We can make nested loops to build our data structure. Add this method to CSVDemo
func convertCSV(stringData:String) -> [[String:AnyObject]] { let rows = cleanRows(stringData) if rows.count > 0 { data = [] columnTitles = cleanFields(rows.first!) for row in rows{ let fields = cleanFields(row) } } else { print("No data in file") } return data }
This is the basic structure of the rows loop. As long as there is one row in the file for each row, we break the row into an array of fields. If there isn’t any rows, we send an error message. We also do something special with the first row, adding its contents into columnTitles
, the dictionary keys for the other rows.
After the cleanFields
statement add this:
if fields.count != columnTitles.count {continue}
This is a check that we have a valid row. The header had the names of each row. The number of elements in the fields array should match the number in the columnTitles
array. If our current row has a different number of fields, it’s an invalid row. For an invalid row in this demo, I ignore the row by continuing the loop to the next iteration. We can get much fancier though, and warn the app or the user that there is an invalid row or even attempt to fix the row.
Under that, add the next loop
var newRow = [String:AnyObject]() for index in 0..<fields.count{ let column = columnTitles[index] let field = fields[index] } data += [newRow]
This loops through the fields. This time we have multiple arrays with the same index. The loop therefore uses an integer index
. We also have a dictionary newRow
where we’ll store the fields for this row. After the loop of the fields completes, newRow
is added as a new entry to the array data
.
I could add just one more line and this would work:
newRow[column]= field
Since the value of newRow
is an AnyObject
, the strings would store. In the last lesson we did that, storing everything in a string. I could change the string to a more useful type later in a separate function.
However for memory management reasons, it is better to convert to appropriate types now. The number 314159 is six bytes as a String
and one as an Int
. We get huge memory and speed savings in bigger files by converting. Therefore we’ll add a parser that converts and saves the data in the type we want.
With CSV, there’s several approaches to this. If you wanted to do this automatically, you could try telling the difference between a String, an Int, and a Double. Strings will have the quotes around them, Assuming we didn’t strip the quotes earlier. Doubles will have a decimal point. However that’s lot of exceptions and problems. Take the number 1,234,567. If the data contains that number, and your field delimiter is a comma, it breaks into three integers. Suppose you have these two rows
"pizza",12,15 "pasta",12.2,56
You have in one field the numbers 12 and 12.2. The field should be Double
but the system might read one as an Int and one as a Double, something any Swift developer who uses implicit typing can appreciate. As I’ll show shortly, dates cannot be converted automatically. For me, it’s too much of a headache. Unless the souce of this data publishes an API for it(and even then I’m suspicious) I skip all this and use explicit typing on my dictionary with columnType
.
Add this to your code under let field = fields[index]
.
switch columnType[index]{ case "Int": newRow[column] = Int(field) case "NSDate": guard let newField = dateFormatter.dateFromString(field) else { print ("\(field) didn\'t convert") continue } newRow[column] = newField default: //default remains a string let newField = field newRow[column] = newField }
We use a switch
on columnType
. Remember, columnType
has the following values:
var columnType:[String] = ["NSDate","Int","Int"]
columnType
sets up cases for the types we want, an Int
and NSDate
. We can add more types here to our hearts desire. We default to a String
if the field is not a type we listed here.
Converting to Int
is simple. Cast the field to Int
then store it in the newRow
dictionary with the key column
.
newRow[column] = Int(field)
The date is a bit more complicated, and a good example why I used this approach to typing. We use the NSDateFormatter
method dateFromString
. This will require us to set a specific date format. Dates have too many formats, and while standards exist, there are almost as many exceptions as standards. Is any given date represented by date MM-dd-yyyy
, MM/dd/yy
or yyyy-MM-dd
? Is the time 12-hour with AM/PM or 24 hour? How many decimal points on the seconds? What in the strong seperates the time from the date? The answers get complex if you have code figuring it out. For converting an external file to an NSDate using NSDateformatter, we have to get explicit with our date format. At the top of the CSVDemo
class add this property:
var dateFormat = "yyyy-MM-dd'T'HH:mm:ss.sss"
You can find a full table of possible values for the formatting string in the Unicode Technical Standard #35. In our first iteration, the simulator printed the date as 2016-04-01T00:00:00.000
. Set it the same using yyyy-MM-dd'T'HH:mm:ss.sss
. If anything doesn’t match this time pattern, such as skipping the three decimal point places, NSDateFormatter
will not convert it.
Just after declaring the function convertCSV
, add this code to initialize and set the properties of the NSDateFormatter
.
let dateFormatter = NSDateFormatter() dateFormatter.dateFormat = dateFormat dateFormatter.timeZone = NSTimeZone(abbreviation: "GMT")
We need to make one important but not obvious setting here: we set the time zone to GMT
so we have no correction for time zone in our data structure. Otherwise, iOS will input this as GMT time, and then convert it to your local time zone.
Our completed method looks like this:
func convertCSV(stringData:String) -> [[String:AnyObject]] { let dateFormatter = NSDateFormatter() dateFormatter.dateFormat = importDateFormat dateFormatter.timeZone = NSTimeZone(abbreviation: "GMT") let rows = cleanRows(stringData) if rows.count > 0 { data = [] columnTitles = cleanFields(rows.first!) for row in rows{ let fields = cleanFields(row) if fields.count != columnTitles.count {continue} var newRow = [String:AnyObject]() for index in 0..<fields.count{ let column = columnTitles[index] let field = fields[index] switch columnType[index]{ case "Int": newRow[column] = Int(field) case "NSDate": guard let newField = dateFormatter.dateFromString(field) else { print ("\(field) didn\'t convert") continue } newRow[column] = newField default: //default keeps as string newRow[column] = field } } data += [newRow] } } else { print("No data in file") } return data }
Making Some Useful Output
We’ve converted the data into a useful data structure. Now we can use that structure for some computations. Let’s look at the frequencies and averages of crossing the bridge. Add this code:
func printTotalsAndRatio()-> String{ var north = 0 var south = 0 for index in 1..<data.count{ let row = data[index] north += row["fremont_bridge_nb"] as! Int south += row["fremont_bridge_sb"] as! Int } let totalCrossings = north + south let totalAverageCrossings = totalCrossings / 30 let averageNorthCrossings = north / 30 let averageSouthCrossings = south / 30 let crossingRatio = Double(north) / Double(south) var displayString = "Fremont Bridge April 2016 Data\n" displayString += String(format:"North Side Count:%8i SouthSide Count%8i\n",north,south) displayString += String(format:"Total Crossings:%8i\n",totalCrossings) displayString += String(format:"Average Crossings per day:%8i\n",totalAverageCrossings) displayString += String(format:"North Side Average:%8i SouthSide Average%8i\n", averageNorthCrossings,averageSouthCrossings) displayString += String(format:"North/South ratio:%8.3f",crossingRatio) return displayString }
The first part of the method totals the number of northbound and southbound crossings of the bridge. The second part uses those totals to compute a few monthly statistics. The third creates a string presenting that information.
Go to the ViewController
class. Change viewDidLoad
to this:
override func viewDidLoad() { super.viewDidLoad() let csvDemo = CSVDemo() let stringURL = "https://data.seattle.gov/resource/4xy5-26gy.csv?$where=date%20between%20%272016-04-01T00:00:00%27%20and%20%272016-04-30T23:00:00%27&$order=date" textView.text = csvDemo.readStringFromURL(stringURL) csvDemo.convertCSV(textView.text) textView.text = csvDemo.printTotalsAndRatio() }
Build and run. We get some results.
This is a bridge used primarily by commuters to downtown Seattle. It’s therefore not surprising that there is close to an even distribution of people going both north and south.
There’s Always One More Bug
You’ll also notice the console had this:
date didn't convert
We need to make a small change to our code. Find this line in the convertCSV
method:
for index in 0..<fields.count{
This reads the title row along with everything else. We don’t want to read the header row when converting data. change the code to this:
for index in 0..<fields.count{
With this simple change, the header is excluded data, and not converted to dates and integers.
What CSV Lacks
CSV is very popular with those transferring data into or out of spreadsheets. Most spreadsheets use only a few data types and is thus easy to use. However, it fails miserably once you get to more complex data types. Some have tried an array (see the Violations column in this data base for example) but the results are rather painful to work with. Dictionary types are near impossible in CSV. What if you want to need those types in your dataset? For dealing with such complex issues our next few lessons on XML and JSON will give us solutions.
The Whole Code
ViewController.swift
// // ViewController.swift // CSVWebDemo // // Created by Steven Lipton on 5/27/16. // Copyright © 2016 MakeAppPie.Com. All rights reserved. // import UIKit class ViewController: UIViewController { @IBOutlet weak var textView: UITextView! override func viewDidLoad() { super.viewDidLoad() let csvDemo = CSVDemo() let stringURL = "https://data.seattle.gov/resource/4xy5-26gy.csv?$where=date%20between%20%272016-04-01T00:00:00%27%20and%20%272016-04-30T23:00:00%27&$order=date" textView.text = csvDemo.readStringFromURL(stringURL) csvDemo.convertCSV(textView.text) textView.text = csvDemo.printTotalsAndRatio() } }
CSVDemo.swift
// // CSVDemo.swift // CSVWebDemo // // Created by Steven Lipton on 5/27/16. // Copyright © 2016 MakeAppPie.Com. All rights reserved. // import UIKit class CSVDemo: NSObject { var data:[[String:AnyObject]] = [] var columnTitles:[String] = [] var columnType:[String] = ["NSDate","Int","Int"] var importDateFormat = "yyyy-MM-dd'T'HH:mm:ss.SSS" func cleanRows(stringData:String)->[String]{ //use a uniform \n for end of lines. var cleanFile = stringData cleanFile = cleanFile.stringByReplacingOccurrencesOfString("\r", withString: "\n") cleanFile = cleanFile.stringByReplacingOccurrencesOfString("\n\n", withString: "\n") return cleanFile.componentsSeparatedByString("\n") } func cleanFields(oldString:String) -> [String]{ let delimiter = "\t" var newString = oldString.stringByReplacingOccurrencesOfString("\",\"", withString: delimiter) newString = newString.stringByReplacingOccurrencesOfString(",\"", withString: delimiter) newString = newString.stringByReplacingOccurrencesOfString("\",", withString: delimiter) newString = newString.stringByReplacingOccurrencesOfString("\"", withString: "") return newString.componentsSeparatedByString(delimiter) } func convertCSV(stringData:String) -> [[String:AnyObject]] { //for date formatting let dateFormatter = NSDateFormatter() dateFormatter.dateFormat = importDateFormat //dateFormatter.timeZone = NSTimeZone(abbreviation: "GMT") let rows = cleanRows(stringData) if rows.count > 0 { data = [] columnTitles = cleanFields(rows.first!) for row in rows{ let fields = cleanFields(row) if fields.count != columnTitles.count {continue} var newRow = [String:AnyObject]() for index in 1..<fields.count{ let column = columnTitles[index] let field = fields[index] switch columnType[index]{ case "Int": newRow[column] = Int(field) case "NSDate": guard let newField = dateFormatter.dateFromString(field) else { print ("\(field) didn\'t convert") continue } newRow[column] = newField default: //default keeps as string newRow[column] = field } } data += [newRow] } } else { print("No data in file") } return data } func printTotalsAndRatio()-> String{ var north = 0 var south = 0 for index in 1..<data.count{ let row = data[index] north += row["fremont_bridge_nb"] as! Int south += row["fremont_bridge_sb"] as! Int } let totalCrossings = north + south let totalAverageCrossings = totalCrossings / 30 let averageNorthCrossings = north / 30 let averageSouthCrossings = south / 30 let crossingRatio = Double(north) / Double(south) var displayString = "Fremont Bridge April 2016 Data\n" displayString += String(format:"North Side Count:%8i SouthSide Count%8i\n",north,south) displayString += String(format:"Total Crossings:%8i\n",totalCrossings) displayString += String(format:"Average Crossings per day:%8i\n",totalAverageCrossings) displayString += String(format:"North Side Average:%8i SouthSide Average%8i\n", averageNorthCrossings,averageSouthCrossings) displayString += String(format:"North/South ratio:%8.3f",crossingRatio) return displayString } func readStringFromURL(stringURL:String)-> String!{ if let url = NSURL(string: stringURL) { do { return try String(contentsOfURL: url, usedEncoding: nil) } catch { print("Cannot load contents") return nil } } else { print("String was not a URL") return nil } } }
Leave a Reply