Posts Tagged ‘health’

To test the feasibility of a low cost, mobile environmental sensor system that could be used to sample air quality as needed and in areas where conventional sensor systems may not be available or cost effective.

Based around the low-cost Arduino microcontroller, a GPS shield for capturing geographic location was paired with a relative humidity and temperature sensor to act as a mobile data logger. The relative humidity and temperature sensors were chosen for their cost and ease of implementation as an initial set of sensors. The sensors were mounted at the end of a five foot long section of one-inch PVC mast inside a one-inch to two inch tee fitting. The tee fitting was used to protect the sensors from direct sunlight and weather (Figure 1).


Figure 1

The entire unit is powered by a single 9-volt alkaline battery and housed inside a rigid plastic storage container to protect the equipment from handling and the elements. The 9-volt battery should power the unit for several hours and a 9-volt lithium may provide even more operating time (Figure 2).


Figure 2

Software was modified based on that supplied by Adafruit industries, the makers and suppliers of the sensors and GPS unit. The software reads the raw GPS datastream and parses out the $GPRMC string, reads the relative humidity and temperature sensor, then writes these values as a line to a comma separated value text file on a microSD card on the GPS shield. Each additional reading from the time the unit is started is appended to this text file. If power is interrupted to the unit, when it restarts it will create a new file before appending data.

For the initial test of the unit, it was placed inside a backpack with the mast rising to approximately seven feet above the ground. The trip was made on a standard mountain bike and ridden at a nominal pace. The initial test site chosen was a road that runs from north to south from West Walnut Street in Greencastle, IN to the DePauw Nature Park parking lot. There is a change in elevation of roughly fifteen feet where the road enters the forest area at the north end. Otherwise the road is nearly level throughout. The distance covered was approximately 1.7 miles roundtrip and the sensor unit gathered 1059 data points during this trip. Riding was done by staying as close as comfortable to the outside edges of the road for not only safety reasons, but to also help determine the accuracy of the GPS unit.

The unit was turned on and allowed to get a fix on the satellite constellation, then an additional five minutes was allowed to elapse to allow the sensors to acclimate to ambient conditions. Upon return to starting position, the unit was turned off and the file from the microSD card removed and the file copied off.

The data file was opened in Microsoft Excel and the GPRMC latitude and longitude coordinates converted to decimal degrees for use in the GIS. This was done by extracting the degree portion of the latitude string, dividing the rest of the string by 60, then adding the two back together. The temperature was also converted from degrees Celsius to degrees Fahrenheit using the standard formula. Table 1 shows a sample of the raw data file collected and Table 2 shows a sample of the finished spreadsheet.


The resulting Microsoft Excel table was then opened in ArcGIS 10.1 and the XY data used to plot the points (Figure 3).


Figure 3

The projection used by the GPS constellation is WAS 1984 and that was what was used here. A basemap of aerial imagery was then added for reference. Finally, the point data was buffered to 25 meters, the points were kriged, and then the kriged raster was clipped using the buffer (Figure 4).


Figure 4

This project has shown the feasibility of using low cost, mobile arduino-based devices as environmental sensor systems that could be used to sample air quality as needed and in areas where conventional sensor systems may not be available or cost effective.

Several future considerations should be considered, including increasing the number and type of sensors such as adding benzene, NO2, or similar sensors. Additional or different sensors can be added or changed out with minimal effort in programming or hardware.

Another area of interest would be to outfit similar units with small solar panels and battery backup systems so that they could be placed in remote locations, thereby increasing the ability to monitor environmental pollution throughout a much wider area.


Autism Spectrum Disorder (ASD) in Indiana schoolchildren appears to follow the same trend as that reported by the CDC with roughly two percent of the children enrolled in Indiana Public Schools reported as having been diagnosed with an ASD.

For my research methods and stats modeling class this semester, we are required to write a research proposal. For this project I chose to look at autism rates in Indiana due to having the data readily at hand from an earlier request to the Indiana Department of Education (IDOE), and also because I am taking an Intro to Epidemiology class. The idea is to take a geospatial look at the distribution of children enrolled in the Indiana Public School system that have been reported as having been diagnosed with an ASD. I was able to get a polygon shapefile of the school districts in Indiana and joined that with the table sent to me by IDOE. The data from IDOE only shows school district and a count of children reported to have an ASD. In the reporting, if a school district had less than 5 children, no data was reported. For these districts, the null value was replaced with 2.5, which is half the minimum reported value. This is the same technique used by other projects I have worked on. This allowed me to create a quick map of the “observed” ASD rates reported for each school district as shown in Figure 1.

Figure 1 - Observed ASD by School District

Figure 1 – Observed ASD by School District

This map shows the reported values, without any adjustment, just the raw numbers. It shows clusters of higher rates around Marion, Allen, St Joseph, and Vigo counties. But are there really clusters in these areas? We need to look deeper into the data and adjust these raw numbers for enrollment, as schools with significantly higher numbers of students, should naturally reflect a higher number of children with an ASD. So in Figure 2 we have the same data mapped out using the crude rate of number of children diagnosed per 1000 enrolled children per school district.

Figure 2 - Crude ASD rate per 1000 Enrolled Students

Figure 2 – Crude ASD rate per 1000 Enrolled Students

Now we see much less variation across the state, but there is still what appears to be a clustering in several areas. The very dark blue up near Ft Wayne and the other nearly as dark near Evansville are outliers and very well could be an incorrectly entered counts. I would need to recheck these with the IDOE to verify. So, is this the whole story? Well, it could be, but there is another technique we can apply to look even deeper at the data. We can use an empirical Bayesian estimation method. This method takes into account the ASD rate and variance of the surrounding school districts to adjust the value for each district. We do this because we really want to see what the dispersion is across the state without the manmade district lines. In Figure 3 we see the result of the Bayesian analysis.

Figure 3 - Bayes Adjusted ASD rate

Figure 3 – Bayes Adjusted ASD rate

Now we see a relatively even and fairly random pattern, meaning that ASD is fairly evenly dispersed across the state. The effect of the two previously reported outliers is clearly evident and needs to be addressed as to whether the data was reported accurately for those two locations. If I were doing this as a thesis/dissertation, I would contact the IDOE and we could compared the reported values with the previous year’s values to see if it is a significant change.

The idea for this post is to show how important it is to look critically at charts, graphs and maps that are presented in reports, and especially in the media, as to how they present the data. Are they showing us raw numbers, crude rates, or something else? The crude rate is very often used as it is quick, easy, and does a good job of reflecting the data, whereas the Bayesian method is much more time consuming and therefore costly, even though it does the best job of reporting data like this as it looks at the data more spatially. It all depends on the needs and questions the research is trying to answer.