Has anyone else had a play with Google's new Datatset Search? I found that some of the items from our site (ORDO) are showing up, but not all. I guess that's because Google's still crawling sources and eventually everything will show up? Missing examples seem ok according to the Google Structured Data Testing Tool.
I noticed that what is there already shows as being from figshare.com and wondered if anything could/should be done to identify items as belonging to our institutional sites (e.g. ORDO)?
I realise it's early days and that we may not be high up Google's to-do list, but worth thinking about?
When I used the tool on one of the items in our figshare repository it did failed the test. It would be good to hear more from the developers how well this is currently supported.
Currently we add the required schema.org markup only for records with the type "dataset", as this is what GDS is mostly interested in. We have closely worked with the Google team to ensure proper coverage, but if you can't seem to find a dataset please let us know and we'll take a look at it momentarily. We don't plan to have other item types indexed for the time being.
We will perform some improvements soon in order to have the source of the dataset be the actual institution, and not "figshare.com".
Let me know if I can provide further details!
Hi Tudor, thanks for the update.
For what it’s worth, I think there’s a case to be made for including some other types. We encourage researchers to think of a broad range of file types as ‘data’, and Figures, Media, Filesets and Code are all Figshare item types that could be published as ‘data’ supporting publications (I agree that Papers, Preprints, Theses etc. are worth excluding).
Discoverability of stuff in Figshare is something we promote a lot internally, and saying that it will be found by data searches like Google’s is a good selling point. If not everything relevant is found it becomes a less compelling message.
Google’s ‘About’ text talks about “Datasets and related data…”if you need any help convincing them.
I'm concerned that only the type "dataset" is being indexed as data for GDS. Isn't the "dataset" content type scheduled to change to "tabular data"? If so, then this is a problem as it discriminates against data of other types. We have researchers with non-tabular datasets (i.e. file sets) that should also be indexed by schema.org as they are supporting data for various publications and useful for their communities of interest.
This dataset is a good example - it's scanning data, not in tabular format and contains multiple datasets (I just noticed that the author choose "figure" as their format which is incorrect, the data supports their figures but the data isn't a figure...)
+1 to Dan's comment. We were writing at the same time.
Thank you for the feedback, really useful! We are already considering adding schema.org markup to other item types, but need to make sure first that this won't surface the wrong content to GDS; for example, we had to clean up a considerable number of "dataset" items before the initial launch of GDS (mostly for the free version of figshare.com). As we progress with this, we might include filesets and figures for example in the surfaced group.
Let me know if I can clarify any other aspects of this!
+1 to institutional branding rather than the generic "figshare.com" icon that leads to the non-customised version of the items as well (rather than via the institutional portal). Has there been any progress on this?
+1 to surfacing other items in addition to "datasets", especially with the new "tabular data" restriction.
Another issue is that specific versions of DOIs seem to surface--is it possible to feed through the root DOI (without any version suffixes) so that people are led to the most recent version? See here (or do a GDS search for "Alasdair Rae", note the spelling). The description is from .v1, and the link goes to .v1, but the current version is .v5.
Yes, indeed it seems that it would be more useful to link the base DOI, which would redirect to the latest version. I have added this to the list of improvements for the next version.
Agree with Jen about sending out the base DOI.
Me too about the base DOI. thanks
Hi all, I want to revisit the discussion about the decision to only index the Figshare item type "dataset" (which I believe will be relabeled tabular data soon?) as datasets in schema.org metadata. Google's documentation specifically points out that "datasets" can be in multiple formats and may include file sets and images. I'm in a position were I may change the majority of our published items to "dataset" if it means they will then be indexed as data.
Our current thinking is that we should remove the "fileset" type altogether from Figshare; this is because, opposed to all other record types, "fileset" is more about the structure (multiple files) than the content, and this creates a lot of confusion.
We are planning to apply this change this year and thus we won't implement any new logic regarding filesets for GDS; already changing all filesets to dataset or other appropriate types seems like the sensible option here.
That sounds like a good start, Adrian. Fileset has always been problematic since, as you mentioned, it's about structure, not content or file type.
I don't know how feasible or desired this may be but it might be a good idea to make the "dataset" item type separate from the item types that refer to format since "dataset" is very ambiguous and type neutral.
Please see recent communication around changes to item type.