{"id":1599,"date":"2013-05-05T12:55:54","date_gmt":"2013-05-05T10:55:54","guid":{"rendered":"http:\/\/sickel.net\/blogg\/?p=1599"},"modified":"2013-05-05T20:11:08","modified_gmt":"2013-05-05T18:11:08","slug":"automatic-datavalidation-in-postgres","status":"publish","type":"post","link":"http:\/\/sickel.net\/blogg\/?p=1599","title":{"rendered":"Automatic datavalidation in postgres"},"content":{"rendered":"<p>I have a <a href=http:\/\/sickel.net\/blogg\/?page_id=1538\">system<\/a> that continously logs various environmental parameters at home. Occasionally one of the sensors delivers some wrong readings and I would like to stop them from showing up in analyses and plots. To do this, I am inserting the data through a function that autmatically flags data that have too high or too low values &#8211; too high and low being defined for each physical sensor.<\/p>\n<p>The part of the database that helps me achieve this is defined as<br \/>\n<code>create table measure(<br \/>\n   id serial primary key,<br \/>\n   value float not null,<br \/>\n   sensorid integer references sensor(id),<br \/>\n   use boolean not null default true,<br \/>\n   timestamp datetime default now()<br \/>\n)<br \/>\n<\/code><br \/>\nWhen I am using data, I am reading them from the view measure_qa defined as<br \/>\n<code>create view measure_qa as<br \/>\nselect * from measure where use=true<br \/>\n<\/code><\/p>\n<p>Data about each physical sensor is stored in table sensor:<br \/>\n<code><br \/>\ncreate table sensor(<br \/>\nid serial primary key,<br \/>\ntypeid integer references type(id),<br \/>\nname varchar not null,<br \/>\nmaxvalue float not null,<br \/>\nminvalue float not null<br \/>\n)<\/code><br \/>\nand for each sensor type in table type:<br \/>\n<code><br \/>\ncreate table type(<br \/>\nid serial primary key,<br \/>\nname varchar,<br \/>\nunit varchar)<br \/>\n<\/code><\/p>\n<p>To insert data, I have defined the function addmeasurement as<br \/>\n<code><br \/>\nCREATE FUNCTION addmeasure(double precision, integer) RETURNS integer<br \/>\n    LANGUAGE plpgsql<br \/>\n    AS $_$ declare<br \/>\nsval float:=$1;<br \/>\nsid integer:=$2;<br \/>\nmx float;<br \/>\nmn float;<\/p>\n<p>begin<br \/>\nselect minvalue,maxvalue into mn,mx from sensor where id=$2;<br \/>\ninsert into measure(sensorid,value,use) values(sid,sval,mn<sval and mx >sval);<br \/>\nreturn 0;<br \/>\nend;<br \/>\n$_$;<br \/>\n<\/sval><\/code><\/p>\n<p>By calling addmeasure with the reading and the sensor id as parameters, it will check the allowed max and min values for the actual sensor and .set the use-field accordingly &#8211; if the value is within the allowed span, it is true, else it is false &#8211; and the value will not be used. It could also be possible to add in validation if the value has been fluctuating too much within a given timespan &#8211; I may implement that later on. In that case, I must consider how long time has gone since the last valid data point has been inserted and how much the data should be allowed to change per time unit.<\/p>\n<p>I could also have made the use-field a varchar field (or as a foreign key to a lookup table) &#8211; then I could have had e.g. &#8220;OK&#8221; as default value and the value for data that should be used later on, and I could have made a difference between data points that were automatically discarded and data points that I have manually marked as invalid.<\/p>\n<p>A important, but potentially difficult part is to set the max and min values right. They should trigger on as many error as possible without flagging valid data as invalid. It might be useful to know how the various sensors fails. E.g. I am using some ds18b20 temperature sensors. If they are not initiated properly, they may return either -127, +85 or 0 &deg;C. I live in a place where the outdoor temperatures may go below freezing for several month of the years, so for an outdoor sensor, I cannot automatically filter out 0&deg;C errors, so I set the limits to -50 \/ +60 which is far outside observed temperatures here but for the indoors sensors, (which never should go below 15 or above 45, I can set the limits to 2 and 40 &deg;C.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to stop bad data from propagating &#8211; automatically check what goes into a postgres database <a href=\"http:\/\/sickel.net\/blogg\/?p=1599\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[4],"tags":[],"class_list":["post-1599","post","type-post","status-publish","format-standard","hentry","category-data"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pnVtD-pN","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"http:\/\/sickel.net\/blogg\/index.php?rest_route=\/wp\/v2\/posts\/1599","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/sickel.net\/blogg\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/sickel.net\/blogg\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/sickel.net\/blogg\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/sickel.net\/blogg\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1599"}],"version-history":[{"count":8,"href":"http:\/\/sickel.net\/blogg\/index.php?rest_route=\/wp\/v2\/posts\/1599\/revisions"}],"predecessor-version":[{"id":1603,"href":"http:\/\/sickel.net\/blogg\/index.php?rest_route=\/wp\/v2\/posts\/1599\/revisions\/1603"}],"wp:attachment":[{"href":"http:\/\/sickel.net\/blogg\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1599"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/sickel.net\/blogg\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1599"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/sickel.net\/blogg\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1599"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}