This thread is for discussing website features (including forum features) and tech stack for the New Community Website Project.
This thread is for discussing website features (including forum features) and tech stack for the New Community Website Project.
Want to discuss this? Join my forum.
(Due to multi-year, sustained harassment from David Deutsch and his fans, commenting here requires an account. Accounts are not publicly available. Discussion info.)
Page loading slowly? View only the latest 30 messages.
Messages (219)
Tech stack suggestions
- Ruby on Rails or a lisp
- git or hg
- nginx (i like it better than apache. don't have experience with others)
- postgresql or sqlite (not mysql)
- Debian (over ubuntu; i don't have much experience with other linuxes)
- linode has been fine as my web host for curi.us (not rackspace)
- some sorta virtual machine or container setup for development environment (vmware, docker, vagrant, virtual box, idk what's best)
- Stripe (from reading patio11's marketing, i am under the impression that Stripe has the best tech of the payment processors)
- i don't have much experience with front end js libraries. idk what's best. if it was me i'd probably just use whatever Rails has as a default
more thoughts:
- high abstraction level web dev framework (like Rails), preferably with "convention over configuration" approach (good defaults; customize when you need to but stuff just works automatically when u don't)
- auto generate CRUD pages framework like https://activeadmin.info (which looks way better than Rails' built in scaffolding feature)
- standard markdown library (idk what but we'll want some ability to customize it, e.g. might want to display **bold** in the way curi.us does, and compatibility with lots of existing markdown editors and tools is good)
- payment processor library that abstracts over Stripe (so it's easier to switch if deplatformed)
Forum Features
Some of these are maybes, and some don't need to be in version 1. For stuff not in version 1 it can be good to have it in mind in advance so you design things in a way compatible with it.
- tagging parent posts, including multiple parents, similar to writing #1 here
- graph view showing all the comments with the parent/child links
- category tags on posts and comments, e.g. author
- search/filter views (e.g. you can see a personal blog for Joe by searching for top level posts by joe)
- smart filtering options like in a thread filter for only comments by particular users, in a date range, nested under a particular parent, or non-reply comments. and a mix of those. and for non-replies you should optionally be able to have self-replies be given the status of their parent. (so you can see my top level comments + my self reply chains to my own top level comments, but not my replies to other ppl including if i self-reply to a reply to someone else then that's treated as part of me replying to someone else)
- being able to do comment search stuff across multiple threads
- having good permalinks for some common searches that represent e.g. Joe's blog
- standard category tags, e.g. education, relationships, parenting, rationality
- user entered category tags? maybe, maybe not, idk
- subforums, which might just be category tags and allow a post to be in several
- subforums could include sections like serious discussion, "i want criticism", debate, "i want to discuss to a conclusion". not necessarily topics like econ or sci.
- DMs?
- RSS, email notifications, on site notifications
- integration with email newsletters like mailchimp so the archive "read on the web" pages for my newsletters would be on the website and then auto format and send that to mailchimp (or whoever else) to be emailed out
- rich text? maybe. would need to play nice with quoting. probably want to convert it to markdown and save it in the db as markdown so ppl who write in markdown can quote it, and limit the features to what has a markdown equivalent. i'm not sure how much users care about this or how much trouble it is. there are premade tools you can get. e.g. Rails 6 added a WYSIWYG rich text javascript text area as an included default thing.
- markdown
- post/comment preview
- anonymity features, e.g. a way to choose or generate an anonymous name that becomes your default in that one thread, and being able to pick different names for individual comments
- pay-to-read and pay-to-post sections (possibly most of the site being pay to post but publicly readable. undecided)
- patreon/subscribestar/locals type features including subscription tiers (first thought is just for CF rather than letting other ppl have their own subscribers/supporters but idk)
- gumroad type selling digital products
- post nesting view like reddit? definitely a linear chronological view like here but other views could be available too
- possibly main and side comments, or serious and casual comments, or some kinda distinction like that, so ppl can say minor things without it cluttering the page and taking equal attention. u have to be careful with this stuff tho. i might want to reply to a minor comment and have my comment be shown normally. this feature might be good only to use in specific extra serious threads, not in general, like use it for debates so the audience can comment but the main debaters stand out more
- nice quoting features that are user friendly enough for regular ppl
- text should either display with formatting visible (so you can copy/paste without losing formatting) *or*, probably better, have buttons like "show raw md" or "copy raw md to clipboard" available. though often you won't need that cuz u can have buttons like "add highlighted text to bottom of my message as a quote" or "quote this entire message in my message" (like the quote link here)
- integration with Ulysses or other markdown editors so you can write in them and post from them (Ulysses can connect to Ghost, Medium and Wordpress. maybe you can provide the same API that one of those does? and idk what APIs other editors can deal with. for curi.us i have a simple custom API for creating blog posts, marking comments as spam and a few other things and a command line ruby script that uses it and can open a markdown editor on my computer like Textmate or MacDown and use what i write. both of which are kinda bad btw. i'm trying VSCode atm which might be better)
- make it easy to post images. including whatever image is on my clipboard, not just via file upload. with curi.us i use ctrl-s as my Mac hotkey for screenshot a region to clipboard and then i have a globally hotkeyed script that will upload the image on the clipboard to my own ftp server and then write out a markdown image tag which points to the right url. posting images should be around that easy (press hotkey, select screen region, press second hotkey).
- maybe save ppl's text as they write it (only if logged in?) so they don't lose what they were writing from internet problems
- user accounts
- maybe some kinda software support to have an idea tree style discussion and collaboratively add nodes to the tree (just specifying parents for comments and having a graph view does something similar)
- maybe comment statuses you can set, e.g. refuted, focus point, comment/note/sidepoint, agreed on by debaters, low confidence. maybe these could be set separately per user.
need some UI mockups and focus, not just throw in all the features. don't want a bunch of clutter. just brainstorming. maybe we can get most of these in an elegant way and unify a few things.
- tip/donation for a specific post you liked
- something more than "tag stuff + chronological sorting within tags" to help ppl (esp me) organize their archives and help readers find stuff. automatic stuff nice but manual helps too
- exporting posts, threads, all results for a search (e.g. all my top level posts + non-reply comments including self-replies to my own non-replies) to md, html, pdf, epub, mobi, txt, rtf
- word counter
- reading time estimates at top of article
- built in RSVP reader (probably not but maybe setting to enable RSVP links by everything via spreeder or something, or at way to get RSVP-optimized pages that don't have extra words u wouldn't want showing up in ur RSVP reading like the sidebar or the post dates or the reply links)
- built in text-to-speech feature or some way to connect to ur OS's TTS
- what's ideal is if you can RSVP+TTS together in sync. Voice Dream Reader somewhat does this (not really intended but u can limit to one line of text visible at a time and set a large font size on a phone so u only get a few words at a time). i don't know anything else good at it.
- editing? on the one hand it's good if ppl can fix typos or add updates, particularly for posts that start a thread (which will sometimes be like an article or blog post kinda thing). on the other hand i don't want ppl significantly changing stuff or deleting what they said. saving version history is one approach, possibly combined with moderators forcing an older version to display by default when someone makes problematic changes
- some ppl may not like being disallowed from deleting their writing or account, and there are maybe some data privacy or copyright type issues. OTOH i basically want discussion comments to go into the permanent (archived, mirrored, etc) public record and not be ninja edited or ever withdrawn. withdrawing a discussion message is highly problematic once anyone reads it and starts thinking about it, let alone replies, tries to discuss it, quotes it, etc. i don't want authors to be able to take things away from readers and screw the readers over. i want to support readers and let them rely on content not disappearing.
Storing quotes by reference
I have been thinking about whether it would be a good idea to capture and store quotes exclusively by reference. The display engine would dereference and generate the header ("At 12:01 AM on May 1, 2020, Joe Smith wrote:") and display the quotes in real time. I think this approach has advantages over storing quotes as text just like original material.
If the UI could be worked out reasonably, capture and storage by reference would eliminate common errors like misquotes, improper quote formatting, and mismatched quote headers.
Quotes could be hotlinked back to the original post for more context.
It would also be possible for an author to indicate they've changed their mind about something, and have that reflected in all quotes.
For example, suppose I write "minimum wage is good". That gets quoted (and requoted) and discussed, etc. say 100 times. A year later I change my mind, and say "minimum wage is bad". I could link that repudiation back to the original statement as an explicit repudiation, and all 100 places it was quoted could instantly have a note added that the author has repudiated and no longer holds that position.
The main downside I can see is if an original post gets deleted or altered it instantly affects all the downstream quotes as well.
#6 I'm more inclined to ban editing posts entirely than to set it up so edits automatically propagate to quotes... (But version history and/or some editing restrictions could work instead.)
I wouldn't mind if you could trigger something like a color in the margin, with blue = "i changed my mind", that encourages people seeing the quote to click on the source to read an update. But I think any kind of changing the text that's quoted is essentially editing the contents of someone else's post. So I'd want to limit what automatically propagates to subtle metadata only.
I think it's important to have exact control, to the character, over what you quote. setting start and end points could be OK in general but it's important to be able to use square bracket notes/paraphrases and ellipses too. also lots of quotes will be from linked websites elsewhere, videos or books.
I'd be OK with a feature where an author who used a blockquote could explicit choose to update it to reflect a change to the original. E.g. if there were edits with version history, and your quote points to an older version, and you want to manually switch it to the new version, that's OK. idk if that feature would see much use though.
#2
> subforums could include sections like serious discussion, "i want criticism", debate, "i want to discuss to a conclusion". not necessarily topics like econ or sci.
Rather than subforums, another option is that these could be checkboxes that you fill in on each post.
In 2018, Elliot and I talked about ideas for an FI forum: google docs link.
- new threads with one or more parent threads and/or comments, so you can trace back a whole discussion hierarchy after sub-discussions branch off.
- a reading list type feature where u can save stuff in a queue to read later
> - a reading list type feature where u can save stuff in a queue to read later
Maybe people should just be encouraged to use pocket/instapaper/etc instead (or use read/unread in an RSS client or using email notifications). Some won't or will over-fill their Pocket with stuff from elsewhere on the web and never go through much of it. You can't save people from themselves very thoroughly though (just a bit here and there). Maybe having a read/unread (like new or unseen markers) feature on the site anyway is useful and then maybe being able to mark stuff as read again would be worthwhile as a way to do a site-specific queue.
#4 Voice Dream Reader is coming to OS X soon. I recently asked them about Big Sur functionality. They said:
> a full development of a Mac OS version is underway from our end.
- Tagging users (like @curi) to get their attention
It'd be nice if pages could auto-refresh frequently (or even better by push not pull, so ~instant) so that you could have a conversation at almost the same pace as IMs.
Tech stack
I have some familiarity with git, docker and postgres: they're all okay.
If we're going to use a lisp my recommendation would be clojure. It has a good default set of immutable data structures and nice syntax for them:
https://www.clojure.org/guides/learn/syntax
The repl works nicely and with suitable editor support you can send individual expressions to the repl without typing in the repl itself.
It has a variant that compiles to javascript: clojurescript. Clojurescript applications normally use libraries such as reagent that use react:
https://reagent-project.github.io/
You can stuff like hot code reloading on the browser that works very reliably IME:
https://figwheel.org/
There is some relevant stuff that I haven't used.
There are facilities for sharing code between clojure and clojurescript:
https://clojure.org/guides/reader_conditionals
There are a couple of clojure web frameworks:
https://coastonclojure.com/
https://luminusweb.com/
I think it would be better to focus on having good features for public discussion rather than DMs.
#15 Coast says it's full stack web framework. It's only 3 years old. idk how actively developed or well maintained. The author mentions Rails familiarity:
https://www.reddit.com/r/Clojure/comments/7mrqt7/coast_on_clojure_a_full_stack_framework_with_a/
> I’ll probably add validation using an existing validation library... I want something similar to rails, declarative and hopefully a data structure, I like spec but I think it might be overkill for validating data on its way from a form to a database...
Oh even more here:
https://coastonclojure.com/docs/about
> In my short web programming career, I've found two things that I really like: Clojure and Ruby on Rails. This is my attempt to put the two together in a tasteful, clojure-like way.
That is a good sign.
Luminus says "micro-framework based on a set of lightweight libraries" which I consider a bad sign (want to get more stuff done for us by the library! not make most stuff everything ourselves).
https://luminusweb.com/docs/database.html
they have db migrations and a API for dealing with SQL, but:
> HugSQL takes the approach similar to HTML templating for writing SQL queries. The queries are written using plain SQL, and the dynamic parameters are specified using Clojure keyword syntax. HugSQL will use the SQL templates to automatically generate the functions for interacting with the database.
oh god no, fuck that. writing raw sql templates is a bad idea.
i noticed Coast has something more high level than HTML templates with embedded clojure bits, which is good. that's actually better than Rails for which you write raw html with ruby mixed in (definitely not raw sql though, which i think is way worse than raw html). Luminus HTML templates sound like rails: raw HTML with code inserted here and there.
so Coast looks pretty good in concept. if it actually works well and is maintained/developed etc it could be good. ecosystem/libraries are my main concern re clojure.
here's the Coast intro post:
https://medium.com/hackernoon/coast-on-clojure-an-easy-full-stack-framework-fb7f0987e110
Coast's abstraction layer on SQL looks significantly lower level and more primitive compared to Rail's ActiveRecord, but maybe it's OK: https://coastonclojure.com/docs/queries
oh btw what is ur opinion about test driven development, turpentine?
Those commit dates don't look very good. Looks like you'll have to reinvent the wheel a lot more with clojure vs. rails. Hard for me to say if that's worth it for someone else with different language and framework familiarity (cuz getting used to a new framework is itself a significant cost, and cuz i don't know how much value turpentine will get out of using some of lisp's additional power features).
#18 That pic is for Coast: https://github.com/coast-framework/coast
> reinvent the wheel
Not reinvent. Recreate. Can still copy ideas from elsewhere no problem.
Clojure can use all Java libraries, right? How common or convenient is that? I'm guessing it's a bad option for web dev framework but maybe could work if needed for payments API, markdown processor, various components.
> It has a variant that compiles to javascript: clojurescript. Clojurescript applications normally use libraries such as reagent that use react:
> https://reagent-project.github.io/
> You can stuff like hot code reloading on the browser that works very reliably IME:
> [...]
Does anyone know if server-side rendering is still required for rich client-side UIs to be scraped properly? I know vue.js has some support for server-side rendering, though I'm not sure if that's still required. I think google has been loading client-side JS to ensure dynamic content has loaded, not sure about other search providers.
Criticism of Clojure: http://xahlee.info/comp/clojure_is_hard_to_learn.html
#17 TDD is okay if you put some thought into the tests. And writing code without good test coverage is bad.
Clojure is designed to make it easy to use java code from clojure:
https://clojure.org/reference/java_interop
turpentine, what do you think are the main goals of language and library choices for this project? What do you think are the benefits and concerns/risks of Clojure? Of Rails?
If I reply to comment #25 (for example), it should be easy to know who wrote #25 while looking at my post. Maybe it should show the author name or maybe just visible by hovering. It'd be nice if you could hover and also see the title and maybe entire contents of any post on our forum that's being replied to.
community website: authentication
I didn't see any previous notes on authentication -- if there are payment features then some kind of user account will needed.
There are two mainstream approaches that come to mind:
1. user/pass
2. oauth/openid (like 'log-in with X' buttons)
(1) has some downsides like: somewhat annoying to manage multiple accounts, somewhat cumbersome/unsafe to do community accounts naively, potential for leaking usernames/pws if the server gets compromised. On the whole these things aren't that big a deal and there are ways to solve those problems.
(2) has a decisive downside of requiring a 3rd party to act as authentication gatekeeper.
There's a relatively new option for authentication that might be worth considering: SQRL. Even if it's not appropriate for this project it's an interesting and elegant authentication system so might be of interest to some ppl.
The idea is: you run an app like 2FA (on your desktop or phone or both). To log-in you click a QR code which triggers the SQRL app (or you scan the QR code with the app/camera). You might need to unlock your SQRL client at this point and confirm the site you're logging in to. After that: you're just logged in (the SQRL client authenticates the session in the bg in <<1s). User registration is the same process; when an SQRL account that the server hasn't seen before logs in, the server adds it to the user database. (you set username, email, etc after that).
The main differences that could be decisive IMO:
* no server side secrets to leak, just public keys (which are known anyway)
* easy creation of alt accounts (SQRL clients support this natively I think)
* one std way to do authentication that works everywhere
* accounts can't be compromised unless the user's root private key is compromised (no credential stuffing)
* long term simplicity
SQRL might be a better choice if parts of the server-side stack are written in different languages. I could see multiple languages making sense if a microservice or serverless (functions-as-a-service) paradigm is used. That might be over-complicating things, though.
The whole website and forum should play nice on mobile – still look nice and be a reasonable experience to use. Need some some responsive web design stuff that looks at the available pixels in the browser or screen size or whatever and can adjust a few things. (I don't think native iOS or Android apps are needed. Nor Mac or Windows. Just a website should be fine IMO.)
#27 I'd rather not rely on Google, Twitter, Facebook or Apple for authentication. I don't know if there's some way better option than them for oauth stuff.
Just storing salted hashes of passwords sounds OK to me.
For analytics maybe avoiding Google Analytics would be good cuz they are semi spyware. Maybe should just use it anyway.
If something else, note: having one database row per page view can get very unwieldy (huge db table) if we start getting decent traffic. I've seen problems with that kinda thing at two companies (out of a pretty small sample size).
From a tech perspective idk if posts and comments should be separate db tables. Maybe they're all posts but some don't have a parent and some have a flag saying they're a top level post. (I think top level posts should be able to have one or more parents – you can branch a thread into a separate thread).
My four subforum idea (main/detail/meta/other) could potentially be hidden posts which serve as root nodes (rather than category tags, or do both): parents to all top level posts in that subforum. And have their own hidden parent that is the single, ultimate root node. This would make all posts be nodes in a single, unified graph. (Graph not tree because I think posts should be able to have multiple parents. E.g. you can say you're replying to two people at once.) Having all posts organized in a nice graph could help enable elegant features for viewing sections of the graph and stuff.
Here's some indication of what I have in mind:
https://my.mindnode.com/SA4WneY4jvsqk9mQmx9eWb4tsLwcraJGqnXRxBdB
All articles are part of this graph, not a separate feature.
Then the website has various features for interacting with the graph, such as:
- subforum view: view based on any node (or set of nodes, e.g. Main + Details) and see its children as a linear list of threads in a sort order (chronological by most recent node anywhere in the subtree is probably the default, but you could have other sorts like when the root of the subtree was created or some algorithm that represents recent activity level or by total number of nodes or words in those nodes).
- blog view: view top level posts by an author as a blog (authors could be added to the graph and treated as additional parent nodes, but i think that'd be confusing rather than elegant, and authors should just be a kind of category tag or even just a special case that every node has a user that owns in so it no category even needed, though for elegance probably do want categories for every author). but you can do this with any category, e.g. "physics", not just author names. a top level post means that the root node is its grandparent. and you can include, or not, some additional lower level posts like self-replies by the author of the top level post, to the top level post, that are over 300 words. and the whole thing can be filtered, e.g. curi's top level posts over 500 words and only in the Main subforum. and you could also change what is viewed as a top level post, e.g. i want to see curi's posts that are top level under the beating bottleneck's topic (in other words, all his comments in that thread which aren't replies to other ppl in the thread). and you could do that without specifying author, so see all the top level posts in a thread (or subthread), using the blog view.
thread view: view all posts in a thread, as a linear list or with comment nesting
tree/graph view: view some posts as a graph (that's usually similar to a tree). maybe just title or first words shows unless you click to expand, and there'd be an expand all button too.
basically the point is make several views which can then be pointed at different parts of the graph and with different filters.
does that sound powerful and elegant and make sense?
#32 Then I could make a thread called "curi's series of articles on physics" and post only articles on physics as the top level posts by me, and then i could share the link which uses blog or subforum view (i can imagine either being nice), pointed at that thread as the root (rather than at e.g. Main), and with a search filter specified (only my posts and maybe a few other options).
Or I could write 5 comments in a row that are related, then link to a view on just that little part of the overall graph, starting from the first of my comments, so there is a permalink to just my little mini series, kinda like a tweet storm.
So several nice, generic views related to common concepts like "display a subforum", "display a series of posts like a blog" (so you see the full posts like a blog homepage, possibly with the text past the first paragraph collapsed), "display a list of articles" (might be same as subforum or not), display a thread (with all the comments), etc., plus filtering options, and make those things elegant and powerful enough to give good access to the underlying single graph of post nodes.
My thinking: The graph is directed (parent/child not symmetric relationships) and acyclic (you can pick any parent of a node, then any parent of that, and keep repeating and you will get to the root node). All nodes have at least one parent (except the root). It looks similar to a tree except that nodes may have multiple parents.
You can use the same underlying graph and nodes for multiple purposes: forum discussion, blogging, making a list, or creating and discussing an idea tree, b/c all of those can be represented as graphs. So you can just create a subgraph for whatever you want, anywhere in the graph, and it can all use the same underlying representation. Nodes can be idea tree nodes, blog posts, comments, subforums, open threads – it's all the same underneath (though we'd privilege some things, e.g. users cannot connect their nodes to the root node, only to the four main subforum nodes under it. that's probably going to be easier than getting rid of the "Other" subforum and having 3 main subforums + anything else under the root is viewed as "Other").
Are people understanding what I'm talking about?
#34 So technically you could start a blog 20 levels deep in some thread (make a post/node called "Joe's blog" and then post your blog posts under that parent, and they'd have full support for comments nested under them), and have a link to the blog view pointed at that "Joe's blog" node and its subgraph. But don't do that cuz it'll keep bumping the thread its in on some standard views. Stuff should be put in appropriate places to keep organized.
Mixing levels can cause issues. If I make "curi's posts about Paths Forward" as a post in the Main subforum, then people browsing the main subforum will see that rather than any of the individual posts within it. And when someone comments on any of the paths forward posts, then it bumps "curi's posts about Paths Forward" in the Main subforum as a whole rather than the specific post (but you can click through, point subforum view at my PF posts parent node, and then see each article in it like a thread with the most recently commented on ones on top). But the posts within it won't show up in the Main subforum.
There are a few ways to handle this. One is marking posts like "this is appropriate for showing as a top level forum thread". Or marking "this is an organizational node; don't display in forum view; display its children instead".
idk, would need to think it through more. cuz we do want to have some kinda normal-ish forum view implemented on the underlying graph and the underlying graph is most useful if ppl actually use it, and nest things in various ways, rather than sticking to only doing what a regular forum implementation would allow.
community website: markdown notes
One markdown feature we should add: when you copy text it should automatically have appropriate `>`s added so quoting is the default.
----
> - standard markdown library (idk what but we'll want some ability to customize it, e.g. might want to display **bold** in the way curi.us does, and compatibility with lots of existing markdown editors and tools is good)
I looked at rendering markdown like it is on curi.us recently. I think we'll need to write an extension for whichever markdown renderer we use (note: this could be entirely client side).
One thing I tried that didn't work was using CSS to add content before elements. I managed to get quotes to show `>` before every line, but they weren't selectable (and thus not copy-paste-able). if they could be made select-able that would make things a lot easier b/c adding markup like `>` and `\*` would be trivial.
With the python framework I was using, the extensions I shortlisted as useful were:
- nl2br -- newlines are treated as hard line-breaks. curi.us, StackOverflow, and GitHub do this.
- fenced code blocks -- code blocks can be defined with triple backticks (plus allows for syntax highlighting integration)
- code hilite -- syntax highlighting
- footnotes
- tables
- toc
- extra -- a commonly included extension that has fenced code blocks, footnotes, tables, among others.
some docs on writing extensions for python-markdown -- might be useful if anyone wants an idea of what is involved. I've done custom extensions for YAML before so have some guesses about how to go about it.
----
> - rich text? maybe. would need to play nice with quoting. probably want to convert it to markdown and save it in the db as markdown so ppl who write in markdown can quote it, and limit the features to what has a markdown equivalent
There are lots of options for rich-text-over-markdown that are basically plug and play, e.g.
- JS: Toast UI Editor (looks neat, includes extensions like a chart rendering plugin), prosemirror
- react: react-md-editor, react-mde (demo), react+prosemirror - this powers outline
----
> - integration with Ulysses or other markdown editors so you can write in them and post from them
I had a look at Ulysses; it looks like there are 4 integrations and the only useful one atm would be custom wordpress (because you get to set the end-point and login yourself). I don't know what the login process would be like but I'm sure it's shimmable, mb a pain tho.
I'd also like something like this. Particularly for Standard Notes or MWeb.
The bigger sticking point I thought of was: how to do replies? Here's one possible flow which is pretty basic and could be a bit nicer with URI support from Ulysses or other apps.
- click 'reply in external editor'
- user downloads .md file with metadata prepopulated in yaml frontmatter (or another format, but yaml frontmatter is popular and well supported in software). the frontmatter can have stuff like the thread, which comments/posts are being replied to, etc.
- user opens .md file in editor, copies in quotes etc, writes their post, saves file
- if posting integration is enabled then it's a one-click *post* and done
- otherwise drag and drop file back on to forum site (anywhere); the metadata in the frontmatter is enough to know where it goes. ideally a preview is shown with the author name, etc before it's permanent. (in case the person is using an alt, etc).
> - make it easy to post images. including whatever image is on my clipboard, not just via file upload.
GitHub has this feature (for issues, PRs, etc). I think it'd be easy to replicate.
JS: listen for paste and check mime-type (included with clipboard data) => insert placeholder text to textfield e.g. `![... uploading image j9384593 ...]()` => if pasted file has mime-type of image/{png,jpg,gif,etc} => push to API /example/upload/image => get return URL => drop in to text field replacing `![... uploading image j9384593 ...]()` or whatever gets inserted. We can use S3 or similar as a place to store images; I have cloudformation snippets to set that up with HTTPS etc in AWS in a few minutes. Then you just use the AWS SDK in whichever backend thing gets called.
----
while looking for rich markdown editors I found outline. It's an open-source 'knowledge base' -- basically a markdown-documentation authoring/management system. it has 1-2 dozen integrations, including mindmeister (a mindmapping tool, tho curi's review said "web app, iffy UI, no svg export"). it might be worth looking at bc something like this might be easier to modify to build the community site than making it all from scratch. I wouldn't bet on finding something that's good enough to be worth modifying, but it could save lots of time / effort if we did find something. (particularly server mgmt and deployment design/impl comes to mind as a possibly big time-save).
outline's default production stack is node.js, postgres, redis, S3, and nginx.
----
I also found outline.com while wondering why getoutline.com didn't have it. turns out it's sorta like archive.is but with annotations support. e.g. https://outline.com/8gMRNR
It's somehow connected to https://web.hypothes.is/ which looks like an attempt to make an app for critical thinking or something. their github org description is
> We build software to enable the annotation of the web.
#35 one way to handle "what goes in main subforum?" is by parent/child relationships. specifically:
- the node "curi's posts about Paths Forward" does NOT have main as a parent. it'd have the root as its parent or maybe better something else like an "uncategorized" node that has root as parent.
- every article nested under the "curi's posts about Paths Forward" node would have (at least) two parents. it'd have "curi's posts about Paths Forward" as a parent AND have Main as a parent to indicate it should show up as a thread in Main.
this is the same idea i used earlier when i did thread branching in my mindnode illustration. if you branch a thread to a new top level thread, you can give it Main as a parent as well as it having one or more posts from the discussion as parents which its replying to.
that raises a question then: should it show up in multiple places? or put another way: should it NOT show up in some places? or only in a minimal way? cuz if i'm branching something to a new thread in Main, then i maybe don't want that entire discussion also to be nested in the prior thread, particularly not in multiple places. in general i might say "my comment is a reply to both X and Y" but generally not want someone to see my comment twice at once on screen. one potential way to handle is with two types of connections between nodes: visible and invisible. what is the point of invisible? would display in some views but not most? that doesn't sound right. what we really want is sometimes instead of display a child node, let alone entire subgraph, we just display something really small linking to it. so you can see the connection and find it but it isn't expanded. so the question is how do we know when to expand a child or not? how can that be done in a smart way that covers all cases? do users need some control over that so they can specify which connections are important (should be expanded in most views) and which are not (should be collapsed in most views)?
#31
> From a tech perspective idk if posts and comments should be separate db tables.
I think they should be the same thing. Or at least their content should be. Like all content is in the Content table and can have flags like is_post, is_reply, etc. (replies should be able to be top-level posts too, IMO; this fits with your hidden-post/sub-forum idea). Metadata for Content/Post/Reply/etc can be stored in separate tables.
> Graph not tree
I agree
#32
> Here's some indication of what I have in mind:
> https://my.mindnode.com/SA4WneY4jvsqk9mQmx9eWb4tsLwcraJGqnXRxBdB
That looks good to me. One extra thing to include mb is replying/linking across category, tho nbd.
It would be nice to have user defined stuff that's like a hidden category, like "curi's TOC posts" which is a parent of multiple relevant posts. That way someone can make a mini blog-series/book/playlist.
Maybe ppl should be able to create those lists of other ppl's posts too. Then again, isn't that roughly equivalent to a post only containing a markdown list of links to posts?
Alternatively, those user-maintained lists could be children of the posts they want to link to; that fits with the existing model/idea, and provided someone can edit and add additional parents later it'd be fit for purpose. It also means that any 'list the things replying to this post' feature would show those user-maintained list posts.
> basically the point is make several views which can then be pointed at different parts of the graph and with different filters.
> does that sound powerful and elegant and make sense?
yes and yes and yes. I have thought about the same sort of thing, particularly a rich interface that animates between different views (I think this would actually be quite easy to do with CSS animations, my guess is the biggest issue would be calculating layouts to work nicely and consistently).
> - click 'reply in external editor'
> - user downloads .md file with metadata prepopulated in yaml frontmatter (or another format, but yaml frontmatter is popular and well supported in software). the frontmatter can have stuff like the thread, which comments/posts are being replied to, etc.
> - user opens .md file in editor, copies in quotes etc, writes their post, saves file
> - if posting integration is enabled then it's a one-click *post* and done
> - otherwise drag and drop file back on to forum site (anywhere); the metadata in the frontmatter is enough to know where it goes. ideally a preview is shown with the author name, etc before it's permanent. (in case the person is using an alt, etc).
This has similarities to how I do curi.us blog posts.
I have an executable ruby script named "blog" that I use on the command line. It can list all my posts so I can find the right ID for editing.
It has options to create a new post or edit any post ID. There is an old and new way.
Old is it opens Textmate, with appropriate front matter (for new post) or front matter + existing post (for editing). I type in Textmate, then close the window and it automatically sends it back to my blog. The command line script just waits the whole time the textmate window is open.
New process i've used with lightpaper and now macdown (lightpaper isn't maintained so i switched, but macdown isn't very good and i still need to find a better light weight markdown editor. i actually liked lightpaper. RIP) is it writes a file to disk (to create or edit) and then tells my mac to open the file in my editor. after i save it, i have to run a second command to send the updated file back to the server.
I'm not sure if directly opening the editor and having the script wait for response when closing the window is possible with most editors. Maybe only advanced ones. I didn't figure out how to do it with the markdown editors I was trying. I think it only works with textmate b/c it has a command line executable (mate) instead of having to do something like this:
`open -a MacDown #{path}`
"open" is a mac shell command that tells the OS or finder to open something in the normal way. for a file it basically does the same thing as double clicking it in the finder would.
whereas my ruby for opening textmate is:
`mate -w < #{tf.path}`
the important thing is the -w (wait) option that makes the shell wait for data to come back instead of just opening it and moving on.
maybe some other power editors like vscode or atom could do something similar. i started using vscode recently (textmate is old and not maintained well) and i know it has a "code" command. ah i just checked the docs and yes vscode can do the same thing. it has a -w option to wait.
this is too techie for most ppl but is potentially useful for some. and the same thing can easily work with replies. just have to generate a new post with the front matter/metadata saying what the parent(s) is.
> The graph is directed (parent/child not symmetric relationships) and acyclic
with versioning of content this shouldn't be an issue (because you reply to a *version* of a post specifically), but if ppl can edit in parents later then I think it's worth-while *letting* them make cycles. Editing is useful for e.g. maintaining an up-to-date list. It's safe if there are good features around showing history and edits. (mb with a diff-like view by default). alternatively there could be some rule like 'can't reply to posts with a timestamp later than this post's original timestamp'.
if we try and prevent cycles (enforcing that would require traversing the graph somewhat) then that could be a DoS vector (b/c the graph could get complex), and we don't need to enforce it to prevent silly UI stuff. For the UI: it's easy to check for cycles as you're traversing the graph, and the client needs to pull down all that info anyway. so it's not really any more taxing on the server than it would be otherwise. Then the client can just indicate there's a loop and not do silly things rendering an infinite amount of comments or whatever.
> One markdown feature we should add: when you copy text it should automatically have appropriate `>`s added so quoting is the default.
I really dislike screwing with copy/paste by default. I think it'll cause problems. Sometimes ppl wanna copy something and use it for a different purpose, e.g. an IM or tweet, and don't want the ">". Maybe a separate hotkey for copy+quote but other workflows are better a lot of the time, e.g. "i want to reply to this" -> puts the whole thing with quotes in my textarea for posting and then i can delete parts i want to trim.
it reminds me of how copy/paste from kindle and ibooks is broken. yes the extra stuff they add to my clipboard is useful sometimes. but it's my clipboard and overall i kinda don't think it should even be possible for apps to control my clipboard, break default functionality, and i as the user have no simple way to override and get regular OS default behavior back.
> It would be nice to have user defined stuff that's like a hidden category, like "curi's TOC posts" which is a parent of multiple relevant posts. That way someone can make a mini blog-series/book/playlist.
oh hmm, mb ppl should be able to specify children of a post instead of just parents. that way you can go back later and add a categorization parent like that on top of existing posts (including someone else's existing posts). it'd need to stay out of the way in some contexts/views though. you can run into people creating cycles though. and it takes away ppl's control over what they are replying to (parents of their post). i do see see a clear, non-problematic use case in adding organizational stuff though tags could do that too.
> Alternatively, those user-maintained lists could be children of the posts they want to link to;
"i (this node) am a label/grouping for all my parents" seems kinda contrary to the meaning of the parent/child relationship, so maybe problematic conceptually.
db tables could be:
- nodes table (id, text, author, etc)
- relationships table (parent_node_id, child_node_id, created_at)
is that a good system or is there a better way to do a graph with sql tables? or does using something other than sql make sense for a graph?
#35
> bumping the thread
IDK if this needs to be an issue for *our* forum. It's an issue elsewhere because recency is a big thing right? (frontpage, etc)
But that's not really how I use curi.us or FIGG. I keep track of topics and look through all new posts, disregarding the stuff I am not interested in. When a new comment is posted to a curi.us thread it only really 'bumps' it to http://curi.us/comments/recent - not like the front-page.
> But don't do that cuz it'll keep bumping the thread its in on some standard views. Stuff should be put in appropriate places to keep organized.
All that said I agree with keeping things in appropriate places. It makes stuff easier for discovery at least.
----
We could have a (hidden) main category for each user, or a hidden 'blog' category where each user gets a post automatically (or mb their account *is* the post, or at least the bio/etc they can put on their account)
----
> And when someone comments on any of the paths forward posts, then it bumps "curi's posts about Paths Forward" in the Main subforum as a whole rather than the specific post (but you can click through, point subforum view at my PF posts parent node, and then see each article in it like a thread with the most recently commented on ones on top). But the posts within it won't show up in the Main subforum.
If there are going to be very long chains of replies then I think 'bumping' has to be restricted to the most recent top-post (which might have been a reply to something else), and not anything else recursively up the graph. Parents can be ordered which means the parent at index 0 has priority (and that's how you find the most recent top-post).
----
btw I created a linearization method for DAGs some years ago that has some nice properties WRT merging histories and preserving order, starting from a root node. IDK if it was preexisting or not. It supports relative weighting between nodes, too.
if I link to a node within my post (just a regular web link, e.g. i talk about paths forward and then link several relevant articles, though we probably want some shortcuts for that for linking on-site stuff like maybe #23 for same thread or #500/23 for post 23 in thread_id 500, though that's just one idea on how to do it and not super user friendly).
anyway if i link stuff we should probably capture those links in some way in the graph, even though they are in some sense a different type of link/relationship than a "nest me under X" type link/relationship.
#37
> that raises a question then: should it show up in multiple places? or put another way: should it NOT show up in some places? or only in a minimal way?
I think it should show up in multiple places *for views where that can happen*, though showing a shorter/title-only/default-collapsed element is fine (b/c presumably the person has read it already earlier in the thread).
*I don't think you **ever** need to show them twice, though.* like either you're looking at a subgraph where it has only one parent also in the subgraph (even tho it could have multiple parents), or it has multiple parents in the subgraph so it shows up below both of them in the linearization, or if looking at a graph view then it just appears once and shows the multiple parent links.
what's a concrete case where a post would show up twice?
> If there are going to be very long chains of replies then I think 'bumping' has to be restricted to the most recent top-post (which might have been a reply to something else), and not anything else recursively up the graph. Parents can be ordered which means the parent at index 0 has priority (and that's how you find the most recent top-post).
idk. suppose we have a Main subforum that many users use, a bit like the recent comments page here, and a lot of ppl don't read everything and want to see recently active threads. and one of the threads is "working on an idea tree about the current state of the economics debate". and then it has some super nested econ graph. or there is "debating animal rights" and then ppl go back and forth for 50 replies to each other. i'd still expect that to bump the OP just like posting the 500th linear comment in a curi.us thread (except not exactly linear b/c there's frequently a parent tag or a quote indicating nesting).
Max what do you think about Clojure, Rails or other?
> what's a concrete case where a post would show up twice?
The most basic case I had in mind is I reply to two comments at once.
So like (in curi.us terms) I start my post with #46 #47 (doesn't necessarily have to be at the start) and then i talk about ideas from both and maybe even quote from both.
Then someone is looking at the thread view with nesting, basically like the reddit version of the post page for this post (reddit has visually nested reply comments rather than linear chronological like here) https://curi.us/2396-new-community-website-features-and-tech/
so 1) viewing the whole thread; 2) with nesting; and 3) my comment has two parents in the thread. then my comment would have two places to show up: nested under each comment it replies to.
#41
>> One markdown feature we should add: when you copy text it should automatically have appropriate `>`s added so quoting is the default.
> I really dislike screwing with copy/paste by default. I think it'll cause problems. Sometimes ppl wanna copy something and use it for a different purpose, e.g. an IM or tweet, and don't want the ">". Maybe a separate hotkey for copy+quote but other workflows are better a lot of the time, e.g. "i want to reply to this" -> puts the whole thing with quotes in my textarea for posting and then i can delete parts i want to trim.
Yeah, I agree with not screwing with copy/paste (I've had more than one convo with someone about this before, come to think of it).
After posting the above I thought of a diff soln which is similar to discord. When you highlight text you get a simple little non-intrusive hover-over type thing that has a "copy quote" (and maybe "copy", too; optionally more stuff like highlight, annotate, etc).
That way default is ctrl+c, we don't have to worry about extra shortcuts (not sure how difficult they'd be + could interfere with ppls setups), and since you're selecting text the mouse cursor is right where the button is anyway, so super low overhead.
#43
> is that a good system or is there a better way to do a graph with sql tables? or does using something other than sql make sense for a graph?
One downside of SQL with a graph like this might be *really painfully intensive joins*. IDK how easy that is to avoid but feels like it could be an issue.
Depending on your structure nosql stuff can be good for graphs, but it's also easy to make a mess of it with the wrong architecture.
I did a bunch of research on efficient dynamodb (ddb) structures recently, like making use of their platform specific features like global/local secondary index and how to do efficient indexing/lookups/relational-like operations. One *big* advantage of dynamodb is that it can be super cheap. I don't think I've ever paid a cent for my ddb tables. But I pay $10/mo for a ~50mb mongodb.
Here's a ~2yr old comment/replies implementation I did using dynamodb (nosql); it uses a library that means you get model-like objects out so it looks a lot like ORM code (depending on your ORM). https://github.com/voteflux/THE-APP/blob/master/packages/api/sam-app/funcs/qanda/models.py
Here's what it looks like:
> or does using something other than sql make sense for a graph?
Well, there is graphql, but I'm not convinced it really makes things easier. I've played with it a bit and thought it had a much steeper learning curve than SQL or NoSQL stuff I've used. (on the NoSQL side I've used MongoDB, DynamoDB, Redis, and some K:V stores)
Some (potentially) decisively good things about graphql: write schema stuff in .graphql files and get autogenerated code for JS/typescript/python/ruby/whatever
It advertises a type system which is nice, esp b/c it's basically as deep as possible. that way generated code has maximum benefit across different languages/platforms. (you can use it in client-side/UI code too, I think, so multi-lang code generation is v useful and keeps stuff consistent)
I might look in to graphql a bit more. I've only used it in passing (when other projects have used it that I've build on or hacked around in)
#45
> anyway if i link stuff we should probably capture those links in some way in the graph, even though they are in some sense a different type of link/relationship than a "nest me under X" type link/relationship.
Yup, this sounds good. It's similar to the original hyperlink idea.
We can parse out those links when a post is saved. There is a 'wikilinks' extension for python-markdown that has links like [[this]] which would link to like some/wiki/this.html or whatever you set. we could easily do a similar thing, either using [[this syntax]] or whatever else we chose.
once we've parsed out those links we should store them separately to parent links.
#52 so two linking concepts, parent/child and something like "X points to Y". should get that more conceptually clear before implementing. what exactly do they mean and how will they be used and why are there two different types. seems maybe reasonable though.
if we do this, then the "i want to make a category" use case can be done use points-to links instead of parent/child links. so no need to let ppl add a node as a parent of some existing node (which would be a way to group together multiple posts into a category). ppl can just make a post with a list of links and then we'll automatically capture everything their post points to and have some functionality related to that.
#47
>> If there are going to be very long chains of replies then I think 'bumping' has to be restricted to the most recent top-post (which might have been a reply to something else), and not anything else recursively up the graph. Parents can be ordered which means the parent at index 0 has priority (and that's how you find the most recent top-post).
> idk. suppose we have a Main subforum that many users use, a bit like the recent comments page here, and a lot of ppl don't read everything and want to see recently active threads. and one of the threads is "working on an idea tree about the current state of the economics debate". and then it has some super nested econ graph.
Hmm, I still think bumping should be restricted in its impact, but I think I get where you're coming from.
The more I think about this the more I suspect it can all be solved with various parameterized 'views', where you plug in stuff like parent topic(s), users who have participated (perhaps only recently), diversity of the graph (so you don't get 1-on-1 back and forths), pure chronological (and combined with the other params), then also linear view, tree/graph view, and so on.
Bumping is a problem (at least in part) because *other ppl choose for us how to sort posts*, so mb it goes away if we give ppl enough freedom.
#53 note: points-to linkings should be expected to have cycles! e.g. i might have multiple posts related to paths forward that say "see also" at the bottom and link each other. (this requires editing posts, otherwise posts could only ever link stuff that was made prior to it, which would avoid cycles. adding extra links at the bottom to newer, relevant posts is a good example of being able to edit posts being useful. editing is dangerous when abused though! and deleting is even more dangerous because it could orphan descendant nodes.)
#54 re views note we need great defaults so most ppl can get a pretty normal and effective experience without having to understand anything about graphs.
# 48
> Max what do you think about Clojure, Rails or other?
I've been thinking a bit about this and whether I'd recommend a heterogeneous backend or not (like using multiple languages). Will put my notes on heterogeneous backends at the end.
Clojure, WRT JVM variant: IDK why you'd bother *except* to get access to JVM ecosystem. Same with Scala - why use that instead of Haskell or something else better? That could be b/c a dev knows the JVM ecosystem or b/c there's legacy to integrate with. Both are decent reasons IMO, but I don't think either of them apply here do they? mb turpentine and JVM? I personally hate working with the JVM, but mb that is b/c of what I associate it with (basic 1st year uni stuff).
Rails: ruby is dying, but it'll take ages till it's close to dead so probably not super relevant. Seems like Rails is decent, but I would expect the ecosystem has started atrophying and that'll only get worse. Mb that doesn't matter tho b/c it's still way bigger than lots of other projects. IMO ruby's future looks a bit like PHP's unless it get's majorly picked up outside Rails. I remember 10yrs ago there were lots of ruby vs python type discussions and since then the common opinion seems to be that python got used in lots of sectors (data analytics, AI, web, scripting). I might be totally wrong about ruby, though. I haven't looked in to it much.
Other: I've mostly done JS/Typescript and Python, tho a bit of haskell/purescript for web/api stuff too. I haven't found the perfect framework yet tho.
Python: django seems nice and powerful, though I've not used it for anything besides tutorial projects.
IMO the biggest factor is what do ppl know? The problem with that is that we all know different things.
We could solve that in a few ways. I have some ideas about solns to that but will post separately. Mb now is a good time for:
On heterogeneous backends: Flux's backend ATM has one main python API server/WSGI thing, but has both mongoDB and dynamoDB stuff, and it also has a typescript API using serverless.js that talks to mongodb, and there's a python module using an AWS-serverless framework - SAM. IMO the most PITA one to maintain was the old-school server one b/c I've had to migrate it multiple times and getting the environment working properly each time was painful (migrated from: server -> heroku -> elastic beanstalk + docker). However, it's also the easiest to add features to, not because it's way better or something, just because I can define everything in one place (the two serverless components have partial implementations for various things). On the whole it hasn't been super painful having multiple backends/dbs/frameworks, but I also didn't use any super integrated frameworks to start with (like Rails/Django), so maybe that helped.
Heroku was way more expensive than AWS for what you got, tho, so overall costs have dropped - atm it's like $50 USD/mo, $20 of which is a load balancer :/
#56 yup, I think a selection of defaults with curated params might work best for that. If there's a guided tour ppl can be introduced to the various default configs and what they're good for gently (no steep learning curve about graphs)
#57 Ruby is still doing well. idk about dying. Not paying close attention to the community but e.g. IIRC matz talked in some presentation about it being in the top 10 on some popular language lists and being pretty steady there not falling. IIRC Ruby 3 is coming out this month. afaik Rails still sees lots of use, has lots of job postings, etc. not the biggest but still big. php btw seems to still be popular with laravel. not at all dead. and that's despite various major flaws. i looked into this a bit a couple months ago.
I still remain of the opinion that ruby is better designed than python (including being more friendly to functional programming stuff) and the only reason to use python is if you need specific libs it offers. i also don't think using ruby should be a significant problem for ppl familiar with python or just generally for most programmers who have are used to swift, js, any C, VB, java, Go, perl, lua, etc. and i see no significant advantage to using django over rails for this (i haven't checked closely but understand django to have similar features to rails).
i generally think heavyweight frameworks are good when you're doing something that fits them well like making a website. don't reinvent stuff.
my dev experience is mostly Rails 2 (i'm actually currently, finally, upgrading a codebase from Rails 2 to 6). i used a little of a lot of things before that. i did SICP early in learning to code and it still informs my thinking and i like lisp, but the main reason Clojure is being considered is that Alan is experienced with it and it looks like Alan will be the primary dev.
tech stack brainstorming
I did a bit of brainstorming around the tech stack. Particularly with a focus on how I'd design the architecture to allow for a variety of languages to be used to code it, but with reasonable flexibility so a large chunk could be done in one lang, and bits and pieces in another.
I'm toying with writing up a bit more of a formal AWS architecture thing as a conjecture for how to structure stuff. If Alan is planning to do a fair chunk of the coding then making sure it'd work for him would be a big priority (I think this should be doable b/c java integrates with lamdba and there are libraries for relevant clojure things).
#53
> so two linking concepts, parent/child and something like "X points to Y"
Why not just one linking concept (X points to Y) plus 0..N flags or descriptors for what the link is:
- Parent
- Child
- Quote by reference: At position Q in X, place a quote of the text at position A to B in Y
- Footnote reference for position Q in X [at position A to B in Y]
- Author repudiation of position C to D in X [at position A to B in Y]
- Refutation of position C to D in X [at position A to B in Y]
There's probably more possibilities for link types I'm not thinking of.
#61 yes it could be implemented with a flag. that's an implementation detail. conceptually we probably don't want a lot of link types b/c it adds complexity. if they are just minor superficial flags it's probably ok but having a bunch deeply integrated into the system makes it messier.
#42
> "i (this node) am a label/grouping for all my parents" seems kinda contrary to the meaning of the parent/child relationship, so maybe problematic conceptually.
I dunno. What if a node meant "i (this node) was created by Alisa as a label/grouping for all my parents"? The fact that I categorized all those nodes in a certain way obeys the parent/child relationship to an extent, because the label node reflects *my annotation on those nodes*.
Simplifying idea: one parent per node. the graph is just a tree.
want to reply to multiple things? link them. do the extra stuff with references but have the main graph/tree be simpler.
#25
> turpentine, what do you think are the main goals of language and library choices for this project?
The main goals of language and library choices are to aid with writing, maintenance and improvement of the website.
> What do you think are the benefits and concerns/risks of Clojure?
The benefits are that clojure and clojurescript are well designed languages that give you access to java and javascript libraries without having to write java and javascript.
> Of Rails?
Rails has a lot of features included. It's not clear to me how easy it would be to do something that is outside that set of features.
#65 If you have some examples of things you're not sure how to do with Ruby, I could tell you the general idea.
Ruby is a modern programming language. It generally has the features you'd expect plus extras. It encourages Object Oriented Programming a lot.
Can you give an example of some Clojure code you wrote that you think is good and shows the benefits of the Clojure? Or do you have any ideas about an example benefit Clojure would have for this project that you think might be worse in Rails?
I watched a Coast tutorial and saw data validations being specified in the code to handle the web request to create a row for a db table (basic web form). That looked bad to me because I think data validations are something that should be reusable and associated with code for dealing with that data table/type, *not* with code for handling a web request. In Rails you have a object oriented programming class, called a model, for every db table and you put your validations there as well as other code for dealing with that db table. Then the web request handling code can call those methods but doesn't need to know or worry about details like which fields should be validated in what way, or how to deal with the internals or details of that data type, it just calls high level stuff like thingie.validate() or more typically just thingie.save() and the save function runs the validations before saving.
> - standard markdown library (idk what but we'll want some ability to customize it, e.g. might want to display **bold** in the way curi.us does, and compatibility with lots of existing markdown editors and tools is good)
i like how curi shows which characters were used for the markdown stuff.
#45
> anyway if i link stuff we should probably capture those links in some way in the graph, even though they are in some sense a different type of link/relationship than a "nest me under X" type link/relationship.
Just an idea, but what if we represented links like we do on the web, where the direction of the arrow is thought of as going from the page that links to the page that is linked to? Then, if you post something and I reply to it, the link would go from my reply to your post.
Maybe that also makes it cleaner when I want to categorize a bunch of your nodes under a certain category that I came up with. It's just me making a node for that category and making links from that category to all of your nodes that I think should be in the category. Kind of like if I made a web page for the category that linked to each of your posts.
#43 curi wrote:
> db tables could be:
> - nodes table (id, text, author, etc)
> - relationships table (parent_node_id, child_node_id, created_at)
> is that a good system or is there a better way to do a graph with sql tables? or does using something other than sql make sense for a graph?
It's a good system.
SQLite supports recursive queries. That makes it convenient to query graph-like data structures:
> Recursive Common Table Expressions
> A recursive common table expression can be used to write a query that walks a tree or graph.
Postgres supports recursive queries too:
> WITH RECURSIVE search_graph(id, link, data, depth) AS (
> SELECT g.id, g.link, g.data, 1
> FROM graph g
https://github.com/coast-framework/coast/blob/master/docs/about.md
The readme on coast sez that the author makes breaking changes. I'm not keen on using tools that break your code when you update them so I'm ruling out Coast.
> oh god no, fuck that. writing raw sql templates is a bad idea.
Why is writing SQL templates a bad idea?
#71 Breaking changes make sense for less mature software so they can iterate on the design and get it right, so I don't blame them for. Rails had significant breaking changes when upgrading to Rails 3 but has broken less since then. I don't think dealing with breaking changes is worse than making stuff more from scratch (and you can just decide not to upgrade), but if you find a more mature tool that you're satisfied with that certainly has advantages.
Apple does breaking changes sometimes, like not making software backwards compatible with stuff that's like 5 years old. Microsoft tries hard to avoid them, which helps some people but also means a bunch of maintenance complexity can accumulate.
#72 SQL is too low level and merits an abstraction layer over it. This helps with portability (could change databases), avoiding repetition, and uniformity (e.g. doing everything in Clojure rather than having to read sometimes Clojure and sometimes SQL mixed together in the codebase).
At a basic level, many queries are repetitive, e.g. find one, find many, create, update. So one should have software libraries that know how to do those standard concepts, similar to having software that knows what a loop is. So you may want to share your find code between different db tables. Then you run into issues like sometimes you want a limit, order or condition, so you need ways to manage those things, preferably at a higher abstraction level. So if you don't have a software layer over SQL, you either start having repetitive SQL or creating your own software layer above SQL as a variety of helper functions that isolate all the SQL in one file and try to find patterns in the queries. But a lot of the patterns are standard and well known enough that a pre-existing abstraction on them works well instead of having to figure it out yourself and then probably making some mistakes that popular libraries fixed years ago.
At a very advanced level, you might have to use raw SQL despite having a library because you're doing something too customized to fit the standard patterns the library helps with.
At a middle level is when libraries may give the most value because they solve harder yet still common problems for you. A good example is abstractions on joins. These are well known in Rails and Coast has some of the same main ones: has_many, has_one, belongs_to, and has_many through
https://coastonclojure.com/docs/relationships
So e.g. in Rails you can do something like:
class User < DB_Table_Superclass
# define relationships that User class has to other classes that are also based on db tables
has_many :posts # this is a many to one mapping. a post only belongs_to one user
has_many :followed_posts, through: :user_post_follows # user_post_follows table has user_id and post_id to enable a many-to-many mapping between users and posts
end
User.order("subscribed_at asc").first.posts # do a SQL join, return list of posts
User.where(email: "curi@curi.us").first.followed_posts # do two SQL joins, return a list of posts
And you can define relationships in the other direction too (not required for the prior code):
class Post < DB_Table_Superclass
belongs_to :user
end
Post.find(55).user # abstract over an sql join to get a user object who wrote post 55
Post.find(55).user.posts.map {|p| p.replies(nesting_limit: 1}.flatten.compact.uniq.map(&:user) # chaining some stuff to get a list of unique user objects who directly replied to any post by the author of post 55. no raw sql.
You may find the Coast docs easier to understand. I think this sort of abstraction is 1) better than writing sql joins yourself 2) a common, useful, relevant pattern 3) better to get from a library than recreate, since the library has a bunch of reusable knowledge for how to design it, add the right extra features, deal with edge cases, make the interface to use these features nice. E.g. libraries may have code built in, that you don't even have to think about, to help reduce the total number of db queries made.
#74
> Post.find(55).user.posts.map {|p| p.replies(nesting_limit: 1}.flatten.compact.uniq.map(&:user)
should remove non-unique elements after getting users, not before:
Post.find(55).user.posts.map {|p| p.replies(nesting_limit: 1)}.flatten.compact.map(&:user).uniq
I think ActiveRecord (Rails' abstraction over SQL) has some feature for using group_by but I don't know the syntax offhand. This cleanly, elegantly mixes in ruby functions (flatten removes all nesting in an array that contains other arrays, compact removes nils, and uniq gets rid of duplicates, and map is also ruby) with db queries (find, user, posts, replies), which is convenient (probably slower but usually not enough to care). I think making a single SQL query to do this whole thing would be possible but messy and confusing.
https://naturaily.com/blog/who-gives-f-about-rails
Some positive and negative comments about Rails from 2 years ago. Seems pretty fair to me. Addresses whether ruby or rails has become too unpopular.
I'm willing to try doing the project using Rails.
My current preference for the front end is Reframe, which is a clojurescript front end framework:
https://day8.github.io/re-frame/re-frame/
#77 Willing like a favor? Willing like convinced it's best as your own choice? I didn't think the discussion was conclusive myself so I wasn't expecting that reply. I'd like to know more about your reasoning.
I'd also want to look for a relevant ruby gem (library) for handling graph and tree data (via postgresql or possibly some other way) and make sure I find something good before choosing Rails, since that's a major feature which I have no experience with. And I'd want to hear your opinion on the forum design discussions like about organizing posts in a tree or graph, whether anything sounds like a technical issue or bad for users, any suggestions, etc.
re Reframe, I glanced at it and wasn't clear on what it's for (I haven't used React or Reagent). javascript and ajax type stuff? Would it replace Rails' views (html templates) entirely, somewhat or not at all? more broadly, what sort of division between front and back end do you have in mind? how much and what stuff would you put in each area?
PS I'm pretty busy for the next 3 weeks, possibly more.
PPS any thoughts on the goals or marketing stuff? one thing silence could mean is "i read it all; sounds good; i don't think i have anything to add". but there are a lot of other things silence could mean, so please say something. similarly, i don't know what you thought of my critical explanation about sql templates.
#78
> Willing like a favor? Willing like convinced it's best as your own choice? I didn't think the discussion was conclusive myself so I wasn't expecting that reply. I'd like to know more about your reasoning.
I am willing to use Rails in the following sense. I am convinced, partly by reading this article
https://www.flyingmachinestudios.com/programming/why-programmers-need-frameworks/
that using a framework would be a good idea. I also agreed with your explanation of the problems with sql templating. I looked at the options for back end clojure frameworks and they are all either badly documented or have authors who make breaking changes. My guess was that it will be possible to get stuff done faster with less annoyance with Rails than with the options available with clojure.
I may reconsider this as I think more about handling graph and tree data.
#55
> i might have multiple posts related to paths forward that say "see also" at the bottom and link each other. (this requires editing posts, otherwise posts could only ever link stuff that was made prior to it, which would avoid cycles.
The data for a post could have a "relevant links" section that could be added to without allowing the text of the post to be edited at all.
#31 I think a directed acyclic graph is in general a more accurate description of discussion than a tree. If this turned out to be difficult to implement we could do a tree but I think we should aim for a graph.
#79 Good article. Yeah basically it's preferably (when a good fit is available) to build on abstractions other ppl put a lot of work into instead of making your own. One thing I think he understands but didn't say directly as a main point (but did mention in passing) is that a big part of an abstraction is that you usually don't need to to know what's underneath it. It lets you think at a higher level and not worry about the details. It frees up your attention away from smaller, lower level conceptual units. This is fundamentally related to Objectivism's integration of ideas into a smaller number of more powerful conceptual units (and repeating that again and again in a hierarchy or pyramid), and to similar stuff I say about learning, e.g. practicing stuff to the point of mastery so that your error rate is very low, it's very cheap (low conscious attention especially), and then you're ready to build on it (you can build a little on it early, but you're limited to only a few layers built on top of it until it's really good).
Here's another article I just received today in a Ruby newsletter:
When Should You NOT Use Rails?. i think it's fair.
and it links https://rubyonrails.org/doctrine/ which is worth a look. many languages and ecosystems have themes. *programmer joy and convenience* are ruby's biggest themes. Rails has others like convention over configuration (ruby has some of that too, particularly the large standard library). and both are anti hard limits: you're allowed the power to shoot yourself in the foot ("sharp knives").
I have a mostly positive opinion of both matz and DHH, and this is important because their thinking has heavily influenced ruby and rails. They are not designed by committee, so you better have a reasonably positive opinion of the leaders or look elsewhere. (Like if someone doesn't like me, FI is the wrong place for them, but less extreme.)
#81 i have two main concerns re DAG, neither of which is really about difficulty of implementing (though making it pretty fast when there are 100k posts in the graph might also be a concern – idk how hard that is). the first is about displaying it to the user. a tree is simpler to show people and help them navigate (and also simpler to get people to input – they only pick one parent). that was my main motivation for considering having a primary tree hierarchy. but maybe a good design can make a graph work. and the second is that links create cycles. e.g. this blog post links to "New Community Website Project" which contains a link to this blog post, so that's a cycle.
> The data for a post could have a "relevant links" section that could be added to without allowing the text of the post to be edited at all.
i think editing posts is important. e.g. i want to be able to write articles using the same tools as the forum, rather than having a separate system. and i want to be able to go back and fix typos, and sometimes make larger changes.
but i don't want ppl to make edits that are disruptive to discussion. and if we provide standard edit and delete tools like they see on facebook/reddit/etc they will just assume they can edit their messages in general, and we'll have a problem. idk what the best solution is but i'm not inclined to entirely give up on editing. maybe something with versioning. the downside of that is it adds complexity which is rarely useful.
#83
You could allow edits with versioning. In that case:
The older version(s) would continue to be available though perhaps a bit hard to get to other than when following references/quotes.
Quotes should reference a specific version rather than whatever the latest happens to be. And people who click on a quote for context would be taken to the version of the post the quote is from. When viewing older versions there should be some clear visual indication that a newer version exists, and those should be easy to get to.
Edit tools should make it clear that editing creates a new version but the old version will remain.
Delete should perhaps be renamed something like "repudiate", or just come with an explanation that you can indicate you've changed your mind or make a NULL (empty) current version of the post, but not delete the prior version(s).
I think this would allow for both an optimized current state of discussion (with corrections etc.) while also preserving discussion history.
> re Reframe, I glanced at it and wasn't clear on what it's for (I haven't used React or Reagent).
React is a javascript library for front end stuff. Reagent is a clojurescript wrapper for react that converts clojurescript data structures into html. Clojurescript is a variant of clojure that compiles to javascript.
React takes changes that the user makes, uses this to change a variable containing some data and then the new dom is generated by a pure function of that data and react does some optimisations to make that efficient. If you use immutable data structures then it's easier to optimise.
In clojurescript the standard data structures are all immutable. There is a library called hiccup that turns suitable clojure vectors into html. Clojurescript also has a reference type called an atom, which contains an immutable data structure. There are functions for atomically changing the atom's contents to a new immutable data structure:
https://clojure.org/reference/atoms
Reagent has a modified form of an atom that renders the parts of the dom that need to be updated when the atom changes:
https://reagent-project.github.io/
and it uses the immutability of clojure data structures to do the relevant optimisations. Clojure has a lot of functions in the standard library and some libraries for querying and manipulating data structures.
You could write a reagent application with multiple atoms all containing different bits of state that affect different bits of the dom when they change and make calls to the back end and that could easily turn into a mess. Reframe is a framework based on reagent that organises your application and provides a layer of abstraction. All of the state is kept in one atom and the dom is only ever updated from that atom. There is a queue and an event handler to that organises requests, updating the reframe atom, updating the dom and other side effects.
https://web.archive.org/web/20120718093140/http://magicscalingsprinkles.wordpress.com/2010/01/28/why-i-wrote-arel/
This is the 11 year old blog post from the guy who wrote the key part of the current Rails database abstraction layer (still in use now). He is a SICP fan and brings it up in the post.
This post may convince you re why his way of doing a database abstraction layer is good. It impressed me. His code involves Relational Algebra which is kinda like SQL but with some advantages, in particular it's far better for composing multiple db queries together (so you can build a complex query out of parts) than SQL strings. And you don't have to know what Relational Algebra is in order to use it for database queries, with an SQL db, and have the API you're using be convenient, easy to understand, and flexible/powerful.
It took two years to integrate his code into Rails, partly due to his own neglect. (The post was written when it got added to Rails.) It wouldn't have ended up in Rails without some Rails leaders, including DHH, valuing it. Other people with good taste and judgment put effort into it.
#85 Rails includes front end stuff like Action Cable. I'm not familiar with it, but the headline feature appears to be that it integrates with your Rails model code and database seamlessly. There's also a gem (library) for using React with Rails.
How would you divide up front and back end? How would Rails and clojurescript interact? What would be the jobs of each? Is there something wrong with Rails' front end tools?
Suppose you do: db -> Rails -> json API to share data from db -> clojurescript front end
Then you end up duplicating a lot of code logic, because both the Rails and clojurescript parts have code that deals with the db data. An example of something that might get duplicated is data validations. More broadly, if you think of each db table as a Object Oriented Programming class, then any methods it'd have could end up duplicated in both the Rails and front end layers. (Tangentially, it's common for some data validation logic to be duplicated in both Rails code and in the database itself. I think this problem is common with other tools besides Rails too. I'm not familiar with an elegant solution but at least it's a reasonably mild problem when you don't have much logic in your db.)
Some quick thoughts about solutions while keeping the same basic architecture:
1. Thin Rails layer: Have Rails do very little. Just pass on data from the db. So there's little code for using or processing db data there.
2. Thin clojurescript layer: Limit what clojurescript does to basically only UI stuff so it never runs any significant algorithms on the db data.
3. Thick API layer: instead of an API for querying db data, it's more of a fancy API involving running complicated code logic.
4. Combine 2 and 3.
But I'm not clear on the need for and advantages of a dual architecture with an API in the middle, and making code reuse/sharing harder is a significant downside.
Turpentine, how much are you:
- busy temporarily cuz holidays
- busy in ongoing way
- not sure what to say
- looking into stuff, thinking about stuff, etc., but without talking about some of it
?
I'd like things to move faster and I'm not sure what the blockers are.
I've been looking into some stuff without talking about it.
I've been thinking that if we have graph shaped data it might be a good idea to store it in a graph database like neo4j. I can send you a pdf of a graph database book if you're interested. It looks like neo4j might be a better option than trying to write sql to represent graphs and it can be used with Rails:
https://neo4j.com/developer/ruby-course/
#89 What are a few specific advantages of a graph db over postgres?
#92 https://neo4j.com/blog/rdbms-graphs-basics-for-relational-developer/
A graph db describes relationships between different kinds of data directly. You say "post x was posted by user y" instead of having a post table and a user table and doing a join to get the posts for a user, say. This can improve performance and makes the db easier to understand.
#93 SQL is designed with joins as a primary feature. in general, using joins is fine – working as intended – and fast. e.g. joining users and posts table is no problem for postgres (and Active Record or any good abstraction layer should make it convenient from a coder pov, e.g. current_user.posts.limit(5).where(custom_conditions) in Rails). splitting things in different tables and doing joins is part of the whole database normalization concept behind SQL's design.
#70 mentioned postgres has this:
https://www.postgresql.org/docs/9.1/queries-with.html
> WITH RECURSIVE search_graph(id, link, data, depth) AS (
> SELECT g.id, g.link, g.data, 1
> FROM graph g
i'm guessing the graph db stuff has some additional features but i'm not sure which specific features you want.
my concern is having two tools and two dbs adds complexity so it needs to bring a clear win to be worth it. unless you're thinking graph db only? idk how general purpose it is. i haven't looked at your links for it yet.
some summary of site ideas (i probably forgot some stuff)
- forum with a tree or graph of posts
- modern user friendly design and appearance (should work fine for ppl who have used social media but not email, BB forums or image boards)
- some kinda reference/link concept (separate graph or integrated, idk yet)
- works well as many screen/browser sizes including phone size
- good filtering/searching options
- works as a place to put a blog or article collection (just put in the right filter and have a nice permalink)
- some kinda post tagging system (like categories)
- markdown and quoting features
- conveniently posting images and embedding vids
- some kinda user tagging and notification system (so you can @mention ppl in a post), and a good way to find and read new posts
- maybe a rich text editor (probably converts to and saves as markdown)
- multiple different views on the graph or tree (e.g. viewing a page with linear chronological comments or with nesting like reddit or as a tree)
- admin area/tools
- something to deal with editing/deleting content (don’t want people just removing stuff from discussions, so maybe version histories)
- possible for a single coder to maintain over time and add features
- payments (subscriptions like patreon, digital products like gumroad, donations, maybe tipping for posts you like)
- user accounts
- access controls (it should be possible to set read and write access differently, e.g. something could be publicly readable but only subscribers can post comments)
- chatroom, likely via discord or slack integration (optional cuz chatrooms are usable with no code support)
- private/direct messages (optional, not sure if i even care about this)
- i want some kinda feature for saying “i want attention for this post” or “i opt-in to unbounded criticism for this” or something. design isn’t figured out yet.
- file upload including large files so i could host some videos myself or upload the CF course files to sell or something like that (doesn’t need to be in version 1)
Instead of an "i want attention for this post" checkbox, maybe a more general purpose metadata system would be good. And maybe a sentence for an opt-in is better than a checkbox. E.g. writing one sentence about e.g. what response you want to the post, why you want criticism, what type of criticism, or which part you want criticism about. Maybe you could do more than one of those. And another type of metadata is a sentence about your goal. Then I or anyone could use a filter like "has attention/criticism metadata OR has goal metadata; order by recent" and use a metadata view that shows title, author, date, word count (an auto-generated meta data), as well as custom meta data.
This is just an idea. I don't want it to be overly complicated. It's important to limit and simplify the features that ask for user attention. E.g. maybe a single metadata field where they could write a sentence or multiple sentences would work better. But mixing things together (opting into criticism; asking for attention; saying goals) can be confusing or problematic too.
I would not expect people to do metadata the majority of the time. (Similarly I'd like if category tagging your posts was optional and we could automate some organizational stuff, cuz ppl don't wanna do that. Maybe asking ppl to category tag posts when starting a new thread would be worth it but idk. Also I'd want a short list of standard categories instead of ppl having to come up with their own. That's easier *and* much better for search/filtering to find stuff. Searchers prefer 10 standard categories *not* 100+ category tags, some with very few posts in them.)
I think a single, unified type of connection between nodes won't work. We need a distinction between "X is a reply to Y" and "X refers to Y (e.g. has a web link to Y)". If a comment was treated like a child of everything it linked to, without differentiation, it'd be confusing.
So there either need to be two graphs (or trees), or two link types within one graph. Basically a strong(er) and weak(er) linking, or primary/secondary. A direct strong connection is for replying to something and engaging directly, specifically with it, and then a weaker connection is for mentioning or referring to stuff.
If I write a critique of an idea from BoI then my post should have a strong connection to BoI (pretend BoI was a node in our graph). But if I just quote BoI in passing while talking about something else (and link it for attribution), it should be a weak connection.
I'm thinking that two connection types is enough. Does anyone see a need for more?
For the strong connections, a tree would mean you choose one strong connection to a parent post, which is how forums in general work. I'm not sure if multiple strong connections would be useful or confusing. A reasonable sounding use case is multiple people ask similar questions and you write a reply to all of them at once. But just picking one to reply to and then referencing/mentioning the others could work too. For strong links, disallowing cycles might work. For the weak connections, I think it needs to be a directional graph that allows cycles. I think strong and weak connections will be handled significantly differently by the UI.
Users could view the pages organized by strong connections and get something that looks kinda like a regular forum, and there could be a different view for exploring the whole graph including both connection types.
Anyone have other organizational ideas that might be better than this? See any problems?
Site Views
Some possible views. You can suggest other views (also did i forget any that were discussed above?).
**outline views**
- tree view (or something that looks pretty similar to a tree, kinda like MindNode)
- generic graph view
- email conversation tree view (similar to nested bullet point list)
**collection views**
- list of articles (according to some kinda filter). can work as e.g. a blog homepage, a forum/subforum page, or an article collection. an option should control how article text displays (all collapsed, intro showing with rest collapsed, none collapsed).
**post text views**
- *single post view* (might not need but lets people link to one specific thing more clearly than linking a page with many things that scrolls to the right thing and puts a highlight box around it like on curi.us)
- *single post + replies* (with some optional collapsing/expanding available stuff, e.g. having stuff nested more than X collapsed by default, and being able to click to collapse a post and everything nested under it). this is the standard view for one thread on a blog or BB.
** replies views*
this is a nested view within the "single post + replies" view, and possibly used elsewhere
- linear chronological (asc or desc)
- nested replies (like reddit). sorting of siblings can be linear chronological by creation date or by new descendant date
- partially flattened replies (mix btwn linear and nested – stuff past X nesting level gets flattened into linear chronological)
- any of the outline views could be used for displaying replies
linear chronological should still indicate what replies to what, e.g. it could have link(s) like #97 at the top (or better if it displays title not number, at least if there is a title, or maybe author or first words are useful, anyway the point is you can easily get some kinda labelled links to parent posts. even with nesting we still might want those)
Secondary Comments
I'd like to avoid having upvotes, likes or karma. I don't want a popularity contest atmosphere.
Having a way to save/bookmark stuff could be OK, but I figure people should use third party tools like the bookmark feature of their browser and Pocket.
An upside of upvotes is it gives people a way to engage that's lower effort, but they're engaging a little more than reading and saying/doing nothing. But I'd rather people write one sentence and would like to encourage that instead.
One way to do that is to have *two types of posts*: regular posts and "side comment" posts. That way you can make minor comments and deemphasize them. This can help reduce clutter on pages (view all the main comments only, or have main comments expanded and side comments collapsed). And it has other uses, e.g. making meta comments in a discussion. You can have like the debate itself in main comments, and then stuff like "busy 3 days then i'll respond to your argument" in a side comment. Then it's easier to read through the actual debate itself. It also helps people comment on someone else's debate without being disruptive and cluttering it up. And it makes it easier to write comments like "awesome!" or "+1" if secondary comments exist.
#99 Polls are a voting type thing that I think are OK. Not a priority feature at all, and readily available from third parties, but I'm not against it.
A to share progress bars could be useful. E.g. so people can quickly see what I'm working on and how far along it is. Similarly, a good way of sharing word counts written per day could be useful.
Just having a post with this info and updating it could work. idk if any extra features would help much.
> A to share progress
Typo. I meant: A *way* to share progress
Looking at parent posts and going back where you were should be user friendly. Possibly a *split view* feature would help? Split view would also help with writing a reply: view the post you're replying as well as the reply you're writing.
#96
> Instead of an "i want attention for this post" checkbox, maybe a more general purpose metadata system would be good. And maybe a sentence for an opt-in is better than a checkbox. E.g. writing one sentence about e.g. what response you want to the post, why you want criticism, what type of criticism, or which part you want criticism about. Maybe you could do more than one of those. And another type of metadata is a sentence about your goal.
Maybe also provide the ability to ask for metadata easily.
Example:
Person A makes Post X. Fills no metadata about X.
Person B sees X and wonders what Person A's goal was.
There should be an easy way for Person B to ask Person A what the goal of X was, without it getting personal and without a bunch of effort.
I can imagine not filling a bunch of metadata initially, but if someone was interested going back and adding it later.
> I'm thinking that two connection types is enough. Does anyone see a need for more?
If metadata is attached to think link then that sounds fine. The metadata can include details like decisive refutations or related topics, and it allows for extension later.
> I'm not sure if multiple strong connections would be useful or confusing. A reasonable sounding use case is multiple people ask similar questions and you write a reply to all of them at once. But just picking one to reply to and then referencing/mentioning the others could work too. For strong links, disallowing cycles might work. For the weak connections, I think it needs to be a directional graph that allows cycles. I think strong and weak connections will be handled significantly differently by the UI.
Multiple strong connections for normal comments sounds confusing, except in one circumstance:
new topics typically (outside FI) don't have a 'parent'. The FI email group has a norm of adding `(was: ...)` for new threads with parents, though. Mb there's a use-case for new threads having multiple parents, tho. If someone wanted to write a comment with multiple strong links then they probably need to do some bridging between the topics, too. that sounds like a good reason to start a new thread.
creating strong links only at comment time / thread creating time.
#99
> One way to do that is to have *two types of posts*: regular posts and "side comment" posts. That way you can make minor comments and deemphasize them. This can help reduce clutter on pages (view all the main comments only, or have main comments expanded and side comments collapsed). And it has other uses, e.g. making meta comments in a discussion. You can have like the debate itself in main comments, and then stuff like "busy 3 days then i'll respond to your argument" in a side comment. Then it's easier to read through the actual debate itself. It also helps people comment on someone else's debate without being disruptive and cluttering it up. And it makes it easier to write comments like "awesome!" or "+1" if secondary comments exist.
Side comment on this: I like the idea of side comments. I think I'd use it.
Overall, I like the idea of conversations that are more organized than they are here or on the FI list.
Filter Persistence
If I search for posts over 5000 words, then I click on one, I probably want to see replies that are under 5000 words.
If I search for posts by curi, then click one, I might only want curi's self-replies. I might want my search filter to persist.
If I search for posts over 25 words, I probably want that filter to persist.
If I search for main comments not side comments, I probably want that to persist.
Some thought needs to go into when a search filter might automatically turn itself off and when it'd stay on screen somewhere and the user could click to cancel it.
For access controls, most nodes should inherent the access controls of their parent node (if nodes have multiple strong parent connections, how do we handle?). Then a few nodes will be set differently by an admin. E.g. setting access controls for the root of a subforum or making a specific thread public.
It'd also be good to be able to say "make this node public but do NOT make its descendants public".
#98 My list forgot about *anonymity* features, e.g. switching names for a particular post or thread.
#109 Another feature I forgot to list is *exporting*. I want good options to turn some posts (any group you can select with searching/filtering) into txt, md, pdf, epub and probably other formats (e.g. html, mobi, rtf).
There is some concern about making it very easy for people to grab all the data. If it's an issue we could limit export size in some cases. But I think it's important to let people easily read stuff how they want to, including e.g. grabbing a ton of data to put in their Voice Dream Reader or other app. I don't think it's a huge security concern because anyone who cares very much could use a web scraper to get everything.
Rails vs. Django
https://www.reddit.com/r/rails/comments/a3w1nu/django_or_rails/
> I have 6 years of professional Django experience, and I was a tech reviewer for the newest version of Two Scoops of Django.
> I would not even consider Django for a new project right now.
> The problems that I have with Django are almost entirely fueled by the community behind it, who have embraced configuration over convention as a means to justify eschewing community standards. Because of this Django is in the dark ages compared to rails on many ways; particularly when it comes to testing and app architecture.
https://www.reddit.com/r/rails/comments/8i42zi/rails_vs_django_python/
> i recently went thru this quandry myself. i already knew a lot of python from working with flask for a couple years and needed a more feature-rich framework and django was the obvious choice but...man did i hate it. i really tried to give it a chance but it seemed like every single step there was some sacrifice i had to make (for example, the django template language is very limiting, they say it can be switched out with jinja but this is a joke as doing so makes it ridiculous and super hacky to work with contexts/variables/etc). by the time i got to setting up a way to merge all my css/javascript and serve them as one file, it became so painfully obvious that django just...isn’t fun for me. in almost every context, there is one or 2 popular add-ins that both quasi-do what you want but then fall short and you really have no alternatives besides just living with the annoyances.
Most stuff people say isn't useful. I didn't find any substantive complaints against Rails similar to these. Some people did complain non-specifically about the "magic" (some stuff is less explicit). There was also a complaint that Rails added opinionated front end tools as defaults, but afaik it's not that hard to choose others instead if you want to, so I don't see the problem. It fits *convention over configuration* to add some good default options, but you can still configure instead if you prefer.
This next post has a lot of info with meaningful specifics; it's worth reading the whole thing:
https://www.reddit.com/r/Python/comments/21dyf3/rails_programmer_here/cgc73st/
> 4) The Python community has three key philosophies that are diametrically opposed to that of the Ruby community. One, we believe there should be one, and preferably only one obvious way to accomplish a certain task. The Ruby community believes that the language should not dictate how you solve a problem. (Your frameworks dictate it instead). Two, we believe that things should be as explicit as possible. (Python doesn't have implicit return statements for example). and Three, Python programmers believe that most magic should be easily exposed and navigable by programmers, whereas Rails tends to hide as much as it can. (Readability counts)
He says he prefers Python and Django over Rails but reading his post Rails sounds better IMO. E.g. Django has a lower level ORM design and I generally think higher abstraction level tools are better (if mature, well made, suitable to your use case, etc). https://culttt.com/2014/06/18/whats-difference-active-record-data-mapper/
BTW I'm particularly impressed by Rails' Arel: http://curi.us/2396-new-community-website-features-and-tech#86 (though i'm not sure if it's relevant if we use a graph db instead of postgres).
#111 I have only used django a bit, but some of the stuff you quoted sounds right to me. there was a bunch of frustrating django stuff, and it doesn't feel very pythonic. There was lots of magic stuff where I ended up thinking something like 'okay, that's cool, but how do I do ...?' and there wasn't an obvious answer. Also, i looked into commenting systems at the same time and this rings true:
> in almost every context, there is one or 2 popular add-ins that both quasi-do what you want but then fall short and you really have no alternatives besides just living with the annoyances.
WRT python web apps: I have typically used tornado over flask or 'heavier' frameworks like django. I like it's abstraction (class based over messier function based handlers), and I like that it's unopinionated about database stuff. I've used it with sqlalchemy, mongodb (via motor), and dynamodb (via pynamodb). I think some other ORMs too over the years. It's always been easy to maintian stuff, and it's clean when it comes to routing and implementing get/put/post/delete/etc logic.
curi said:
> (though i'm not sure if it's relevant if we use a graph db instead of postgres)
IMO the main reason to use graphql (as a DB backend / schema) is being able to codegen DB stuff for multiple languages. Otherwise SQL is easier and better known (not to mention more frameworks which are also more mature). If using graphql as an API ~framework, well that doesn't preclude using SQL+postgres.
If it's better for us to go with a design that's *easily* self-hostable, then I don't think multi-language makes sense.
Open source forum project in Rails:
https://github.com/discourse/discourse
might have useful stuff.
gnu 2 license. i think that means if we use their code we have to make our code open source? idk details.
#112 A graph db is a database that represents data in the form of graphs rather than tables:
https://en.wikipedia.org/wiki/Graph_database
Graphql isn't a database:
https://graphql.org/faq/#is-graphql-a-database-language-like-sql
#114 You're right. I thought there were like native DBs that used graphql. I misread 'graph db' in #111.
Note: it does look like there's a small number of like 'native' graphql DBs, but nothing with the maturity of postgres. here's a saas example.
@curi, what do you think about adding paths forward as an explicit thing to the new site? Like a user can click a button named something like 'declare impasse' to fork the discussion off into a new (linked) topic, and an explicit msg is then shown in the origin thread. That new thread/topic could have resolution conditions and stuff too.
one way to support this sort of thing (and lots of other stuff too) would be to have some kind of *event* object that is listed and shown alongside replies/posts. it's not an MVP level feature, but it feels like it'd be useful to have the ability to easily add that later.
#116 I want to be wary of special cases. Can a general concept of forking threads (or more generally creating a new node with some references to nodes that already exist) be used for this or whatever else?
#117 Something like a label on a post saying it is a paths forward post might work without being too burdensome.
#117
> I want to be wary of special cases. Can a general concept of forking threads ... be used for this or whatever else?
Yes, I think so. I think forking threads is a good feature (like adding "was:" to a subject on FI), and we can signal this sort of stuff with metadata.
#118
> Something like a label on a post saying it is a paths forward post might work without being too burdensome.
Yup. I think this sort of thing would work in a linear list of posts in some thread for the general case:
> ---
> from: x
> title: tttt
> body: asdf
> ---
> thread forked to: <new thread title>
> forked by: <user>
> [preview button/link or something]
> ---
> from: y
> body: <some later reply>
> ---
Then we can add special cases only when showing an optional reason (or w/e) associated with the new topic.
I think the most common problem with special cases is when they get too difficult to maintain, or reason about, or even keep track of. This is a way to keep things pretty limited so the code can stay clean, but we also can have more custom features. Like, in a paths forward case, we can show extra material about the process, goals, etc.
In the main topic curi had a thing required: "Guided new person learning tour"; this sort of thing feels like a natural extension of that.
#119 You don't need a special forking feature like that if you have, generically, every node display a list of every other node that links to it (probably normally excluding any descendants that are displayed on the page currently, so basically links from elsewhere, though I think getting the full list should be possible).
#120 yup. I see a 'fork' button being a shortcut to like 'reply as new topic' mb with some pre-filled stuff. I think 'fork' might be a bad term and isn't clear enough. The design I have in mind let's a post be both a reply and act like a 'top level' post -- one without a parent. (or: if it has a parent, then the parent is like one of the main categories, which could also be nodes in the discussion graph.)
I think we have similar ideas in mind, but I'm not completely sure.
OK, similar ideas. But I think it's better to start with generic features and try to make them simple, elegant, powerful, and then try using and iterating them a while. Then only add more specific/parochial stuff if really needed. I think that means start with general features to create new nodes and attach them anywhere in the graph, as well as quoting. (And yeah main categories should be nodes under the root node).
idea for URLs in cf-forum:
- use posix paths, start at `/` (aka 'root')
- have few, well known, etc top level categories; e.g. `/main`, `/meta`, `/other`, `/detailed` (mb `/technical` but that implies like tech/code type stuff and isn't what I'm thinking of) -- users can post in these
- posts always go under `p/:id`, e.g. `/main/p/123`, `/other/p/84938`, etc; alt: `/p/main/123` and `/p/other/4985`
- users have a personal area for a blog under `/u/:username`
- users can connect a custom domain via a cname pointing to a dns name like `#{username}.forumname.com` (where the forum is primarily hosted at `forumname.com` and there's a wildcard dns `*.forumname.com` set)
- other users can comment on that user's posts, but can't post to the namespace themselves
- when browsing the forum under `/`, the user sees all blog posts and all posts in all categories (they could filter to just a subset or like a saved set; effectively things they 'subscribed' to)
- when browsing under a specific category/path (e.g. `/main`, `/u/max`) they see just the posts in that cat.
I think /main/p/123 can just be /p/123
I'm doubtful about user sections. My initial thought was your own blog is just a site search that filters for only your own posts with a nesting limit (or with an article/comment distinction). Your user profile page could link to and/or contain multiple standard (or customized by you) searches (e.g. one with all comments, one with comments with certain filters).
A problem with sections is we end up with content in different places. If I don't follow Max, how will I see Max's posts on his blog? They are all nested under /u/max.
I figure Max should post in /main and then /u/max/blog is a particular search/filter that gathers Max content from everywhere. that way someone who just reads /main will see Max articles. so Max is participating on the main site in the usual way instead of off in a separate world.
advanced feature ideas (not for v1):
- leave browser open and page will auto-update with replies, so you can have a conversation at the speed of IMs
- auto link titling. so i can put in a youtube link or blog/article link, and it'll find the title for me. here i often go back and forth between tabs and do two separate copy/pastes, one for link and one for title, which is a minor inconvenience
- tweet or YT vid embeds
caching in new comm forum
it'd be useful to have good caching of raw data. there are some breakpoints we can meet if we do that.
particularly around responsiveness of UI, easy of doing UI features, reducing query load on the server (could be done in the client), etc.
#126 https://guides.rubyonrails.org/caching_with_rails.html
#124 I agree with this. Sounds like a much better way to do things.
I think we should support rich attachments, including video and supporting features like an html5 video player. we can restrict upload of large files to subscribers or certain user groups as an anti-dos measure.
Does anyone have thoughts on how to do permissions on threads/subforums? e.g. we might want a subforum like 'only subscribers can post & reply but publicly readable'. I have some early ideas but want to develop them a bit before posting. The main thing I'm concerned with is designing a good, general, elegant system. There are some 'brute force' type methods I can think of that are less elegant.
#131 set permissions on a node. nodes inherent from their parent unless set otherwise. read and write can be specified separately when desired (one, when unset, inherits from the other). users have some roles, tags or other settings so you can specify which groups of users can do something.
more advanced: i think it'd be good to be able to set a permission that isn't passed to descendants, so you can share something individually without sharing its descendants/subtree.
#132 note: only admins can set permission on nodes.
#133 In your model in #132 - if there's private/public stuff - users can implicitly set permissions on nodes via choosing the subforum/category?
Otherwise sounds good. I have been thinking about how to do stuff using tags and tag-tags. I think there's a nice way to do lots of stuff with 1 general system.
Also:
Not much yet, but a start. Here's the migrations file. Hope that's enough to give ppl an idea of what I've been thinking of so far. Crits and suggestions welcome - ctx: this is my first 'working' draft of the schema.
I have some fixtures set up, hence the weird titles/content.
Note: nodes and content_versions have authors, and authors have users. that way we can manage identities or ppl can use an anon identity, etc.
Also, I'm enjoying Rails. It's pretty nice to work in, and I think I've made more progress, faster, than I did when learning django.
I'll try adding user accounts soon and building out some of the UI elements and backend logic.
> if there's private/public stuff - users can implicitly set permissions on nodes via choosing the subforum/category?
yes. i'm thinking some subscription tier gets you access to a Private node (subforum) that you can post under, which not all members can read, which ppl are asked not to share quotes from elsewhere, and which admins won't make posts from public as free samples.
> t.boolean :is_top_post
is this meant to indicate subforum type nodes? or thread starts? either way, a default_view field would be more generic. (i think pointing any view at any node is ok, so i'm thinking of it as just a default to make things convenient so u don't have to change views after clicking on that node.)
> t.integer :genesis_id, index: true
not sure what this is
>> if there's private/public stuff - users can implicitly set permissions on nodes via choosing the subforum/category?
> yes. i'm thinking some subscription tier gets you access to a Private node (subforum) that you can post under, which not all members can read, which ppl are asked not to share quotes from elsewhere, and which admins won't make posts from public as free samples.
Yup, I think I have a clear idea of what's required.
> not to share quotes from elsewhere
Do you mean 'not to share quotes from the private forum elsewhere'? Otherwise I'm not sure exactly what you mean. It reads like 'don't quote stuff from outside the private section', but that sounds wrong to me.
>> t.boolean :is_top_post
> is this meant to indicate subforum type nodes? or thread starts? either way, a default_view field would be more generic. (i think pointing any view at any node is ok, so i'm thinking of it as just a default to make things convenient so u don't have to change views after clicking on that node.)
thread starts. I am not sure what the value of `default_view` would be. The idea of a view sounds like the results of a search/filter/sort type operation. but I'm not sure how that and the current idea I have of top_posts / thread_starts unifies with that idea.
>> t.integer :genesis_id, index: true
> not sure what this is
This is the earliest ancestor without a parent. A node's parent's parent's ... parent's parent.
#136 It might be worth me mentioning: the migrations file I posted doesn't include the idea of main/detailed/meta/other as nodes under a root node.
I want to try an alternate idea, too: doing that high-level organisation via tags. The way you've described permissions might make the tag-method more complex, tho. Like it's elegant if permissions are set on some siblings of main/meta/etc, e.g. main_private, main_subscribers, etc.
tho doing the high-level org via tags might fit with some search methods/systems better.
#136 #137 I think I know what you mean by `default_view` after reading http://curi.us/2395-new-community-website-project#55
> #131 set permissions on a node. nodes inherent from their parent unless set otherwise. read and write can be specified separately when desired (one, when unset, inherits from the other). users have some roles, tags or other settings so you can specify which groups of users can do something.
> more advanced: i think it'd be good to be able to set a permission that isn't passed to descendants, so you can share something individually without sharing its descendants/subtree.
Another more advanced idea would be to have permissions for the subject and author contact of nodes independent from the node content.
Example use cases:
Allow people to view the topic list (node subjects) of some areas as a marketing method for free, but then pay to read the content.
In free-to-read areas, allow paid members access to author contact information (ex: their email) but not unpaid members.
> It might be worth me mentioning: the migrations file I posted doesn't include the idea of main/detailed/meta/other as nodes under a root node.
Is there a benefit to having more than one root node in the post tree? (in other words, making it multiple separate trees)
BTW in general I think things should be inferred, calculated or inherited when possible instead of set. E.g. the best default view for a node can be inferred by its distance to root. (distance 1 = subforum, distance 2 = topic start, distance 3+ = comment). This will only need to be overridden occasionally and the forum could function initially and be tried out without an override setting.
> in general I think things should be inferred, calculated or inherited when possible instead of set. E.g. the best default view for a node can be inferred by its distance to root.
I might be a bit sensitive to this b/c of different performance requirements for different envs (like high capacity voting systems) -- i haven't worked on a webapp like this for a while. my general approach is that you have some authoritative data behind inferences/calculations/etc and then cache as required for performance. I will try relaxing that a bit b/c it will probs start to become significant WRT architecture.
> Is there a benefit to having more than one root node in the post tree? (in other words, making it multiple separate trees)
It's easier to find all the replies to a topic in one query. Or at least it was at that point in prototyping. Tho I did that via approx `select * from nodes where genesis_id = #{current_node.id}`. (note: i'm deliberately avoiding any string interpolation in queries in the actual ruby code, just easier to write it out as sql here)
IDK what the performance characteristics would be like for a traversal of a subtree (e.g. to get all replies recursively). I think -- in an SQL db -- you could do like recursive self-joins mb to get all children. IDK if that would error out or never-halt if a cycle accidentally got introduced, tho. I don't like the idea of doing multiple queries, though, from the ruby side. if there's a long chain of replies that could get laggy.
> (distance 1 = subforum, distance 2 = topic start, distance 3+ = comment). This will only need to be overridden occasionally and the forum could function initially and be tried out without an override setting.
I was thinking about not having an override setting and what that might be like. Do we need the 'rename' fork method? I was thinking you could set a title on a comment like here on curi.us -- setting a title doens't automatically make it show up in a list of posts or something.
I feel like the 'move' fork method might be all we really need, and that can be supported via good quoting + rich links between nodes (the secondary link type)
Mb one use case for the rename-fork method (not that this is really forking) is: I want a 3+ node to be treated as a level 2 node, which would mean treating the level 2 node (actual topic start) as a level 1 node, at least some of the time. This way you could have like subthreads/subtopics, but IDK it seems like there are better ways to do this sorta thing (like via tags). One reason tags might not be so good here is permissions; there will need to be some restrictions on tags, but do we need to restrict subtopics then too? Hmm.
from curi.us/2395#54
> it'll be a lot easier to understand after having the core features working so you can try them out. it's harder to plan ahead about forking now, and unnecessary.
Yeah, going to leave this for now. I made some progress last night on having exactly 1 root node with children main/meta/etc, so will play with that more today to get a prototype working.
>> more advanced: i think it'd be good to be able to set a permission that isn't passed to descendants, so you can share something individually without sharing its descendants/subtree.
> Another more advanced idea would be to have permissions for the subject and author contact of nodes independent from the node content.
> Example use cases:
> Allow people to view the topic list (node subjects) of some areas as a marketing method for free, but then pay to read the content.
> In free-to-read areas, allow paid members access to author contact information (ex: their email) but not unpaid members.
I think this would be mostly simple to do with a decent+heirarchical permissions system design. Provided it's structured right, a lot of those finer-grain permisisons should typically be inherited, then stuff like public-subject + private-body should be easy. I'm not so sure about one-shot permissions tho; like exactly how they'd be represented without reaching for another tag/flag/property (which would also mean checking for it on any call that needs to check permissions.)
#142 I was talking about long term saved data. Caching is a separate issue and is fine.
> IDK what the performance characteristics would be like for a traversal of a subtree
It sounds like multiple root nodes would be a premature performance optimization, and I think it's unlikely to be the right optimization if one is needed – there are tools designed for performant graph traversal without giving up the conceptual elegance of having stuff connected together.
> I think -- in an SQL db -- you could do like recursive self-joins mb to get all children.
Postgres has graph traversal functions which were discussed above, and there are other options like turpentine brought up neo4j.
#132
> more advanced: i think it'd be good to be able to set a permission that isn't passed to descendants, so you can share something individually without sharing its descendants/subtree.
i think this might naturally fall out of an elegant design:
if posts are a child of main/etc which is a child of root, it's elegant to use a system where a node has permissions for itself being different to permissions for the children. like main should only be editable by admin, but it should also specify that topics can be created by anyone. topics can be edited by the author (and admin) and should specify that replies can be made by everyone.
most of the time the permissions for children will be set by the method that handles the node creation, so normal users won't be able to do weird stuff.
> Postgres has graph traversal functions which were discussed above, and there are other options like turpentine brought up neo4j.
I have a basic impl of WITH RECURSIVE based on an example in SQLite docs.
One thing I note with https://cff.au.ngrok.io/6 is that the node_ids come back sorted from a WITH REC query -- even tho I didn't include any sorting logic and none was included in the raw SQL query printed by `rails s` stdout.
That might mean that rendering a tree structure isn't as straight forward. My manual-recursive logic (in the template) comes out nicer.
This image shows what I mean -- check the order of titles in the TOC (WITH REC) vs comments (naive)
#143
> Do we need the 'rename' fork method?
I think we do. I am seeing how everything can/will work together more clearly now.
There are some cool things like project management that are more intuitive with rename-forks. (e.g. breaking down tasks and having subtasks; you list the subtasks by using the 'index' view on the main task)
I have a highly general system of tagging, I think. It feels like a good, generic way to handle most metadata-related issues -- and it's easy to extend and provide some user-level functionality.
here are some notes I drew up:
btw, if anyone is colorblind: i use color to separate content in a fairly ad-hoc manner, please lmk if there's any issues and will try some other stuff. maybe nbd.
Some notes on usage:
some notes on selecting text in CFF. we want to show rich content, but also be able to work in plaintext. How do we do copying to the clipboard without changing how ctrl+c works (or making it obvious that something nonstandard will happen and make sure that the standard way is still possible plus low effort).
this idea is to show an additional UI element when the user selects text. there's a lot of great features we could "hide" behind that, which only show up if the user selects text. ofc the specific behaviour of that UI element could be done lots of different ways and have user prefs, etc.
some people select text while reading, so we need a way for this to be noninvasive, but that doesn't matter much for the default I think. (mb incognito browsers would be an issue if not logged in?)
I think it'd be good to have a way to 'reply' to someone without posting a msg. Sorta like the way that reactions are used on Discord or other apps. A reply can be visually expensive in the UI and sometimes ppl avoid stuff that they feel is ~insubstantial.
Basically I'm thinking discord emote reactions would be a good thing to include.
> I think it'd be good to have a way to 'reply' to someone without posting a msg. Sorta like the way that reactions are used on Discord or other apps. A reply can be visually expensive in the UI and sometimes ppl avoid stuff that they feel is ~insubstantial.
> Basically I'm thinking discord emote reactions would be a good thing to include.
I'd prefer a more verbal forum culture which is more differentiated from social media. I don't want a popularity contest. I don't want people to judge their posts (or anyone else's) by how many likes (or smiley face or thumbs up emoji reactions) they get, or worry about that at all.
One of the problems reactions are trying to solve, besides enabling popularity contests, is to post something that doesn't ask for or merit much attention. It lets you communicate in a way that's smaller than a regular post.
So what I want to try is two types of nodes: regular and minor. Minor nodes will have a different (less emphasized) visual appearance, and can be collapsed or removed by a search filter. This will let people say things like "+1" without it cluttering up the thread or asking for attention in the regular way. It will also let people say things like "going to bed; will reply more tomorrow" and other side comments.
some sql recursive stuff
Just noting down some SQL recursive stuff.
A node and it's parents:
WITH RECURSIVE
node_and_parents(id, parent_id) AS (
SELECT id, parent_id
FROM nodes
WHERE id = #{node.id}
UNION ALL
SELECT n.id, n.parent_id
FROM node_and_parents np, nodes n
WHERE np.parent_id = n.id
)
SELECT * FROM nodes n, node_and_parents np WHERE n.id = np.id
Here's some more on github (in this case for finding authz tags)
https://github.com/fi-tools/cf-forum/blob/891f0c02953497479168d01a014233491d493787/app/models/node.rb#L160-L201
I think i might have been wrong about SQL. I used to be anti SQL since it was trendy.
I've implemented recursive node visibility entirely in SQL. I started prototyping it as like a monstrous ~40 line SQL query, but since managed to refactor it down to a few views (some are still decently large queries though).
My big reservation about going with a method like this is performance -- like DB performance might be bad.
On the other hand, the reason I really like this is that we get visibility of nodes directly from the query result itself, and don't need to do more processing or more queries.
I think we could optimize performance issues by adding indexes over the views (and cut down the views to minimal size, too). That will increase storage space and write operation times, but IMO that'll be usually worth it since a forum is read-intensive, not write-intensive. maybe that's an issue for DOS attacks, but I think those a soluble with other more traditional methods.
you can test it out on https://cff.au.ngrok.io with accounts cfsub@xk.io and cfgen@xk.io (both pws are `hunter2`). cfsub is the 'subscriber' test acct, and cfgen is a general acct with no special privileges.
Most of the files in this directory are views I created to do the visibility-via-db-query thing; here are some interesting ones:
https://github.com/fi-tools/cf-forum/blob/max-proto/db/views/node_authz_reads_v01.sql
https://github.com/fi-tools/cf-forum/blob/max-proto/db/views/user_groups_v01.sql
https://github.com/fi-tools/cf-forum/blob/max-proto/db/views/node_with_children_v01.sql
https://github.com/fi-tools/cf-forum/blob/max-proto/db/views/node_with_ancestors_v01.sql
Node I wrote some views to make querying a nodes ancestors and decedents easier too -- using the recursive stuff mentioned above.
I was ambiguous about 'node visibility' -- i mean read permissions.
some notes on permissions. main point is that permissions are inherited until they're replaced. then only the new permissions that are set persist.
Looking for feedback. ATM doing stuff like subject-only visibility might be hard, but mb there's an easy way to add that feature. I haven't thought about it much.
Permissions for read/write will be the main ones. Read implies 'read this node and all children until new permissions are set'. Write implies 'can create nodes under this node until new permissions are set'.
I think 'wiki'/group editing of a post would be mb be a good feature. but I think it'd be better to implement that differently to the main 'write' permission, like via an `authz_collaborator` permission or something. mb with a different prefix, but nbd. would be useful in project situations.
#157 I forgot to mention in the diagram - only the *root of each highlighted subtree* has permissions set. it's children (with the same permissions) do not have a permission tag set. in the image, there are 3 permission tags set.
A side effect that's nice: if you want a private forum, setting read:group_name on the root node will automatically make everything public. If you removed the 'all' group, too (sorta hardcoded atm), then there'd be no way for an anon user to see anything.
I found a tool - dbdiagram.io - that draws diagrams of your DB schema (and it can generate one for schema.rb too). This is the current DB schema (it doens't show views or anchored_id/target_id links)
Advanced feature idea:
Whenever someone posts an external link, we save an archive copy of the linked page.
Alternative: send the link to https://archive.vn and we just save a link to their copy.
#160 we might be able to integrate fiarly easily. That'd be a good first step. Doing the archive outself would require considerable work to avoid edge cases. Good idea.
#157 performance might be an issue -- with 25k fake posts using 'forum_index' view:
Completed 200 OK in 847024ms (Views: 95.3ms | ActiveRecord: 591057.2ms | Allocations: 362538)
This was using postgres. The postgres queries seemed much faster than the ruby stuff; like the ruby process was pinned at 100% of a cpu core when I checked it.
I tried to materialize the views in postgres but I ran in to problems with that: like the views not being refreshed, and one of the views threw an error when I tried to manually refresh.
Checking SQLite now. I noticed, when playing with postgres the other day, SQLite queries seemed generally faster than pg queries, sometimes by 1-2 orders of magnitude. that might have been an edge case tho.
#162 The SQLite version is definitely faster atm. Although the fake data should start from the same random seed, I think there might be some ruby concurrency stuff going on that breaks deterministicness. There was one node with 17k descendants which I think took the majority of the time.
Completed 200 OK in 615943ms (Views: 72100.2ms | ActiveRecord: 39129.7ms | Allocations: 983299)
I noted lots of the queries took like 20000ms on postgres and like 1000-1500ms on sqlite. Not sure if that was affected by one node ending up with a majority of the fake nodes.
Here's an image of the page I'm loading for clarity.
#163 I don't think SQLite is going to turn out to be the right tool.
#164 I agree.
I tested MySQL to get another datapoint, and it's much faster than postgres and sqlite:
Completed 200 OK in 307293ms (Views: 111.5ms | ActiveRecord: 42400.1ms | Allocations: 375583)
I'm not sure what error is in those comparisons, but they're similar orders of magnitude.
I've got logs for all the queries and things, and I have some ideas about how to get some good improvements easily. The main idea is to look through model/view code for things that have re-usable nested results. like when I pass nodes to 'deeper' views -- e.g. for nested comments or the trees. I think I could do something there to avoid ~exponential complexity. (I think it's exponential-ish, not exactly sure. it might be polynomial)
#165 well, I generated some more, and interesting, data.
### drawing subtree/0
| database | total ms | views ms | activerecord ms | allocations |
|---|---|---|---|---|
| mysql | 530420 | 63520 | 22575 | 56137161 |
| postgres | 326062 | 34266 | 35771 | 28341259 |
| sqlite | 311967 | 39161 | 7005 | 28383508 |
The difference in allocations for mysql is interesting. It's curious that things were a lot closer with this test. (same databases+records as before).
We'll need to make some UI type decisions soon. e.g. if we should develop with vue.js or similar. stuff like markdown editing depends on it.
speaking of markdown, there are some architectural-ish decisions to make. server-side rendering or client-side; what gets stored in the DB; how to handle "rich" content elements like links to other nodes (handle them as part of normal md rendering or do stuff like automatically create links between nodes, etc).
I think i'll start creating issues for features pretty soon, mb use github issues on fi-tools/cf-forum? that seems easiest. if we do PRs (seems good) then github issues have strong integrations. discussion can still happen here, but presumably some would happen there, too.
SQLite
#164 curi wrote:
> I don't think SQLite is going to turn out to be the right tool.
Why not? I'd pick SQLite over any other database for my own projects, unless I expected a large number of writes per second. A large number of writes per second could come from, e.g., thousands of concurrent users all making updates to the database at the same time or code that does some kind of event logging with thousands of events per second.
http://www.sqlite.org/draft/whentouse.html :
> SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites). The amount of web traffic that SQLite can handle depends on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.
How well SQLite works for a web site depends also on whether the hits cause database reads or writes. I expect SQLite can handle as many reads per second any other database. The issue is with writes.
#168 *as any other database
#168 It looks like sqlite supports full text search, too.
One reason, mb, not to use sqlite is it's smaller feature-set. like it doesn't support materialized views. postgres has many more features (even than other DBs; like mysql deliberately has a smaller feature-set to optimise speed).
That said, it hasn't really been an issue for me so far, like I've used recursive CTEs and views and it seems to work fine. There's an occasional thing I've run in to, but none have been a big deal so far. sqlite is definitely easier to work with than postgres or mysql (the only other sql backends i've tried).
I think i might try converting the current codebase to use neo4j and see how that performs with 25k nodes. I was looking at some intro docs and it seems low-overhead atm.
A lot of our queries are traversal based, and so should be more performant and consistent with neo4j.
However, thinking about how things are structured atm, I think there might be some significant optimisation possible in the view logic which would be easy enough to with the current backend.
I think my method should go something like:
* figure out what the constraint is atm. i guess that it's one of: rendering lots of stuff, the way rendering is done atm (recursive, has lots of queries), DB structure, or DB query structure.
* only after that should i decide what to optimise. if lots of data in templates is the issue then DB stuff won't help. if it's view rendering methodology then changing from recursive to linear could help, otherwise it might be queries triggered by the render and moving the queries up-front might be enough. if those aren't the constraint and the db is then I should consider neo4j.
one simple way to check potential gains from neo4j out is to just do a bunch of inserts (using faker, which I have set up). If that is substantially faster then it's a good indicator that moving to neo4j is worth considering.
#171 I'm not clear on what you mean by a bunch of inserts. I would guess that what's relevant is whether it will work faster for the kind of load you were discussing in
https://curi.us/2396-new-community-website-features-and-tech/reply/171#163
#172 nvm I didn't read closely enough.
i think my tag system is basically the same thing as what's in this image. re-implementing a graph system in sql sounds like it'll throw up more problems than using a proper graph db from the start.
is neo4j suitable for use as our only database, or would we still have an sql database for some stuff?
Some neo4j crits:
https://twitter.com/joshsusser/status/956386647393759232
todo: check out whether that holds up
in an intro neo4j + rails vid the host mentions that you can configure your rails app to work with neo4j and activerecord, which sounds right to me. would be weird if you couldn't.
#175 I think losing SQL entirely would be a potential deal-breaker because stuff like devise is set up with that. provided there's not high-overlap between the graph and an SQL DB then any overlap should be low overhead. like 'all the nodes a user replied to' has negligable overhead b/c we already know the user_id.
if we couldn't do 2 systems at once and it meant losing devise and stuff, well IDK that's something to consider if we get there. But there are other options first.
#175 I should have added an answer to your question: we could use neo4j for everything if we wanted to. I don't think there'd be a technical problem with that.
#168 Thanks. I retract my claim that:
> I don't think SQLite is going to turn out to be the right tool.
I didn't know the distinction that it keeps up with postgres fine on reads but not for heavy writes. We won't have heavy writes.
I think looking at a graph db is a good idea.
Also I think performance tests will be more meaningful with some simplified, repeatable test cases. E.g. make a 15 level tree (~16k nodes) where every node (besides leaves) has 2 children and check performance on some specific traversals. Try again with 8 level tree and 25 level tree. Try again with a dataset with much less branching. Make sure the queries being run are written in a reasonable way so the results are actually about db performance not some html view doing a slow algorithm.
#177 I think having two different dbs, of different types, has major complexity downsides. Lots of extra work to manage what data goes where and then deal merge the data back together in various cases b/c it's all related. e.g. users in sql, nodes in graph, but users own nodes.
that twitter thread is worrying re node4j. i took a quick look at the thing it recommended:
https://github.com/bitnine-oss/agensgraph
the idea seems to be that it lets you use some graph features with postgres so you can have a single db that does everything well. that sounds nice.
#180 oh but https://github.com/bitnine-oss/agensgraph hasn't been developed much for the last 3 years. postgres 13 is out and the latest git commits are about postgres 10.
https://stackoverflow.com/questions/20776718/best-way-to-model-graph-data-in-postgresql
> Use PostgreSQL for the underlying storage and use networkX or iGraph via PL/Python for the processing engine.
> In their book "Graph Databases", Ian Robinson, Jim Webber, and Emil Eifrem make a distinction between the underlying storage and the processing engine. If you look at the answer I followed in a recent problem (see here), you will see that I'm using PostgreSQL for the underlying storage and networkX as the processing engine. The performance gain relative to my original solution was huge (and similar to the ones described in the "Graph Databases" book) and implementing it was very easy.
(links omitted from quote, and there are other notable posts on that page. i'm not sure what this post means in detail but seemed relevant)
also maybe agensgraph is actively developed but only the paid version or something. https://bitnine.net/agensgraph/ looks like an active company with e.g. recent corporate blog posts
#182 maybe they renamed to Apache AGE? not clear on what's going on. this repo has recent updates https://github.com/bitnine-oss/AgensGraph-Extension
see also https://wiki.postgresql.org/wiki/AgensGraph
> some simplified, repeatable test cases. E.g. make a 15 level tree (~16k nodes) where every node (besides leaves) has 2 children and check performance on some specific traversals.
I've got code committed (mb in 'db-tweaks' branch) that makes Faker data deterministic. It currently approximates the balanced cases you mention, but it'd be quick to implement balanced trees.
> performance tests
The constraint with queries, atm, is going to be on the JOINs and VIEWs that are done/used. Basically queries like 'only the nodes a user can see' require joining the nodes table to itself, the user table, and the tag_decls table (so you can find the closest ancestor with a permissions tag matching a user's groups). It uses several intermediate queries, many of which involve JOINs. I looked at the breakdown of one of the queries in pgadmin's explain util and it was pretty invovled.
However, the same query in a graph db could be a lot cheaper b/c it doesn't involve the whole table, just the bits that we're traversing. IDK how to do 'nearest parent with a permissions tag', but 'has a permissions tag' is easy: (n:Node)-[:authz_read]->(g:Group {user_id: nil}).
Because it's schemaless, it's much lower overhead to have more meaningful classes/hashes (rather than like just strings) -- no migrations. We can also do generic strings just as easily, or add namespaces, etc.
Whether it's more efficient or not at like returning just the nodes a user can see, IDK. Any time you get into recusion/inheritance things can get sticky unless you keep results small (e.g. with a LIMIT).
On the note of LIMITs, I tried to limit the results size on like 'all children nodes' and it didn't seem to make a difference to query execution time (using SQLite).
https://age.apache.org
> Apache AGE a PostgreSQL extension that provides graph database functionality. AGE is an acronym for AgensGraph Extension, and is inspired by Bitnine's fork of PostgreSQL 10, AgensGraph, which is a multi-model database. The goal of the project is to create single storage that can handle both relational and graph model data so that users can use standard ANSI SQL along with openCypher, the Graph query language.
https://bitnine.net/agensbrowser/
> AgensBrowser is a web interface for AgensGraph to visualize and manage graph data. AgensBrowser offers the interactive visualization of graphs and enables you to query and modify graph data using Cypher and SQL on the web.
https://bitnine.net/agensgraph/
has a feature chart saying they're the best b/c you get 4 models in one db: SQL, graph, document, and key-value store.
they have both community and enterprise editions
#180
> that twitter thread is worrying re node4j. i took a quick look at the thing it recommended:
> https://github.com/bitnine-oss/agensgraph
> the idea seems to be that it lets you use some graph features with postgres so you can have a single db that does everything well. that sounds nice.
#181
> oh but https://github.com/bitnine-oss/agensgraph hasn't been developed much for the last 3 years. postgres 13 is out and the latest git commits are about postgres 10.
Yeah, I had a look. It doesn't seem to have a ruby library - at least according to the summary site i found. I'm hesitant to pick up a DB without that b/c, even if it was easy to implement, it probably means risks/concerns wrt sanitization and stuff. IDK tho, depends.
comparison between AgensGraph and neo4j: https://db-engines.com/en/system/AgensGraph%3BNeo4j (the site has other comparisons with more dbs, too, i think)
I saw AGE too but haven't looked in to it.
#184 i think try performance tests without your tag system involved or even users, just focusing on the graph nodes and giving them a few properties to filter on (either directly in their db table or in a one-one relationship node-properties table). that way you can see if the core use case is slow (with current approach) or the slowness is due to the extras.
#188 Yeah okay. I'm hestiant to base decisions on those sort of tests b/c it sounds a bit synthetic. WRT properties on graph nodes, they currently only have: author, parent, content, and timestamps. the idea was that all the other properties could be done through tags. I think for simple filtering type stuff we'll get good performance b/c that's typical SQL stuff. 'all descentands' should be fast b/c it's basically just scanning a sorted adjacency list.
one thing i haven't tried is doing much smaller queries and more logic in ruby. like 'all descendants' and build a tree in ruby, then 'those with permission tags' and 'this user's groups' and use those to trim branches which the user doesn't have permission for.
I'm still planning on doing some perf tests using a comparable set up to what's in place now. i'd like to include stuff like visibility in the queries themselves b/c then we can only have queries that account for that (and never just like Nodes.all, where a bug could be introduced that shows private stuff). Mb I shouldn't treat that as such a big deal, tho.
is it better to have a nodes table with a parent_id field for relationships, or is it better to have a nodes table and also a relationships table? why? (this is meant as a conceptual question, not a performance tests question).
if you want a graph (nodes with multiple parents) you need the relationships table, but i currently think a posts tree is ok, though i could be wrong. (need graph for web links and maybe some other relationships, though).
reasons for posts tree not graph: simplifies design and UI. idk use cases for posts with multiple parents or how the UI would handle them. (this assumes having two separate things, a posts tree or graph and an other relationships graph, rather than storing those as a single graph. idk which is better).
> I'm hestiant to base decisions on those sort of tests b/c it sounds a bit synthetic.
i was thinking it'd provide some useful info, not tell us what decisions to make
> the idea was that all the other properties could be done through tags
why? what problem does that solve?
in general, if your code deals with specific details, then you don't benefit from putting them in an extra flexible system, b/c you can't use that flexibility – you can only use what the code knows about. an example of that is permissions – sure you can have an arbitrary meta data system but what are you gaining when the only things that are useful to write in that meta data are the specific permissions the code knows how to handle? a couple plain db fields will work just as well and be simpler. an example where you *can* benefit from flexibility is tagging posts with categories. it's useful to have tags like "Politics" or "Philosophy" while having zero code that knows what those strings mean.
the concept of "it's super flexible" can be alluring. but i think if gaining something requires new code anyway, it may not be much use. if you're doing development you always have tons of flexibility to add whatever you want anyway. whereas if you add flexibility that allows benefits when no new code is written, that's useful.
deciding what specific things you need can keep them more organized than a generic system instead of just imagining it'll be able to do anything. we don't need the ability to have any number of arbitrary user permissions. we only need a few well-chosen permissions.
>> the idea was that all the other properties could be done through tags
> why? what problem does that solve?
The flexibility means ~arbitrarily complex stuff could be built on top. IDK what future requirements will be, but I think we'd be able to represent them with the current tagging system. That means easier, standardised implementation of features and reach of managment utils and less special case code. Also less risk of future incompatibility.
I hoped that it would be a good, principled, elegant foundation. (I don't think it's worked out like that, tho)
however, the conceptual downsides: we end up with SQL views for special cases, albiet somewhat modular; constraints are hard to enforce; the SQL queries are hard to reason about. we'd also probably end up wanting some UI customization around admin stuff, at least for constraint enforcement, but also consistency and ease of operation.
my method of tagging is basically built in to a decent graph db, tho. not just like superficially; it's the way graph dbs work. so that would mean simpler queries, better constraint mgmt, easier to reason about.
> deciding what specific things you need can keep them more organized than a generic system instead of just imagining it'll be able to do anything. we don't need the ability to have any number of arbitrary user permissions. we only need a few well-chosen permissions.
thought: if you have a flexible system it can still be worth making a specific system too. like we could have flags on a users record (e.g. is_sub, is_mod, is_admin), and by default an account has all set to false. if there's impl cost for building on the generic system (like permissions via tags -- which there certainly is) then it should be compared to the impl cost of the specific system. worst case you migrate from specific to generic when you need to.
I might try comparing performance of a 'permissions in user record' approach to see what the overhead of just my tag system is.
I was thinking today that queries on just the nodes should be super fast. like it's just an array of 25k nodes (in this case) tracking each other's position in the array -- not much overhead.
#191
> the concept of "it's super flexible" can be alluring.
the phrase "chasing universality" comes to mind.
#190 You could also have a nodes table and one table for each different type of relationship between nodes. E.g., you could have these tables for relationships: is_parent_of, links_to, references. Each of those tables would have two columns, source and target, each of which is a node ID.
db/postgres: some surprising and good results
I don't quite know how to explain these results, but I've made an interesting development.
I refactored sql that I used in views to use Arel. In doing that I found a better way to do a particular join (and what I was joining with).
In SQLite the query takes about the same time: 2000ms (actually a bit worse than before).
nodes_admin_can_read_with_parent_root 2068.22 ms | result: 6
get_nodes_readable_by 2218.54 ms | result: 25019
But in postgres -- which used to be an order of magnitude worse at like 30000ms -- is now an order of magnitude better at around 300-500ms. That's including rails's overhead; the query itself is 200-300ms.
nodes_admin_can_read_with_parent_root 274.80 ms | result: 6
get_nodes_readable_by 452.95 ms | result: 25019
relevant code is here - the benchmark-db-fixes branch on fi-tools/cf-forum
You can run the benchmarks with
rails runner tools/benchmark.rb
and
rails runner -e devpg tools/benchmark.rb
(provided you have postgres set up)
some screenshots of the benchmarks with other results included:
(Note that just returning all children with no filtering is ~150ms for both)
I like the Ruby Weekly newsletter and read some stuff from today's issue:
https://rubyweekly.com/issues/536
https://github.com/hotwired/hotwire-rails
> Hotwire is an alternative approach to building modern web applications without using much JavaScript by sending HTML instead of JSON over the wire. This makes for fast first-load pages, keeps template rendering on the server, and allows for a simpler, more productive development experience in any programming language, without sacrificing any of the speed or responsiveness associated with a traditional single-page application.
> The heart of Hotwire is Turbo. A set of complimentary techniques for speeding up page changes and form submissions, dividing complex pages into components, and stream partial page updates over WebSocket. All without writing any JavaScript at all. And designed from the start to integrate perfectly with native hybrid applications for iOS and Android.
> While Turbo usually takes care of at least 80% of the interactivity that traditionally would have required JavaScript, there are still cases where a dash of custom code is required. Stimulus makes this easy with a HTML-centric approach to state and wiring.
https://www.learnhotwire.com
> Hotwire is a new approach to building applications you'd typically lean on React or Vue for - without all the Javascript. It's brought to you by the great minds behind Ruby on Rails, Basecamp, and Hey.com - so you know it puts developer productivity and happiness first.
https://github.com/hotwired/turbo
Note: Turbo was formerly called Turbolinks. There are search results under the old name, e.g.:
https://thoughtbot.com/upcase/videos/turbolinks
And Rails has Action Cable for WebSockets:
https://guides.rubyonrails.org/action_cable_overview.html
From the first email from the learnhotwire.com people:
> Hotwire helps us fulfill the dream most of us have as developers: building reactive, modern applications, quickly and easily. Frameworks like React and Vue are great, but it's not the code we love - it's the results. The truth is, we want to build our applications fast and not go crazy doing it. Unfortunately over the past few years, it feels like we have no choice but to use complex build chains, learn advance state management techniques, drown in an ocean of JSON responses, and write our code in multiple languages or dialects.
>
> We do all of this work, just to eventually render HTML in the browser.
>
> Hotwire frees us of all of this, allowing us to build the full experience in Ruby on Rails, avoiding the complex machinery, and delivery HTML directly to the browser, over the wire. We can still leverage the power of Javascript when we need it, but it is now a power tool, not something you use to build everything.
>
> Of course, most of these ideas aren't new, but rather, it leverages patterns that are battle tested in frameworks like Elixir's Phoenix (LiveView) and .NET (Blazor). There's even a pre-existing option for Ruby on Rails (Stimulus Reflex). However, Hotwire is created by the same experts, and with the same philosophies, as Ruby on Rails, Basecamp, and Hey.com.
#200 sounds pretty good. Will check it out tomorrow.
AGE / AgensGraph
> #182 maybe they renamed to Apache AGE? not clear on what's going on. this repo has recent updates https://github.com/bitnine-oss/AgensGraph-Extension
yup.
https://www.postgresql.org/about/news/announcing-age-a-multi-model-graph-database-extension-for-postgresql-2050/
> AGE, a multi-model graph database extension for PostgreSQL has been announced. **AGE is the successor to AgensGraph.** AGE will offer the same integration of SQL and Cypher without users having to discard their existing solutions, allow for a cleaner integration of AGE with PostgreSQL’s robust collection of other extensions, and expand scalability without sacrificing performance.
that page has a download link which goes to
https://github.com/bitnine-oss/AgensGraph-Extension
which then redirects you again to
https://github.com/apache/incubator-age
> The project is in alpha stage now and it is currently being developed in the form of an extension for PostgreSQL 11.
> The next Apache AGE release (0.3.0-incubating) will be available around Jan 15, 2021.
recent history:
> acdc110 11 hours ago
https://age.apache.org/
> AGE is currently being developed for the PostgreSQL 11 release and will support PostgreSQL 12 and 13 in 2021 and all the future releases of PostgreSQL.
under *Installing AGE* (no direct link)
> Docker images are available on Docker Hub and are based on the official PostgreSQL 11 Debian and Alpine images.
https://hub.docker.com/r/sorrell/agensgraph-extension
manual installation looks easy too. IDK if we lose anything by moving to postgres 11, but it's probs worth thinking about; and the right time to stdize build env if we're going to go with the current architecture. (on that note: we should probably figure out how to evaluate whether the architecture is good enough or not.)
rich text editor
Rails added a rich text editor. It's optional. At a glance it looks nice.
https://fullstackrubyonrails.com/blog/how-to-use-actiontext-in-rails-6
Demo and more info:
https://trix-editor.org
https://guides.rubyonrails.org/action_text_overview.html
It has nested quoting (click quote once on some text, then click the right indent button to add nesting levels).
But so far I haven't figured out what format it saves its data as, or what it can export to. So I don't know if it's markdown compatible or not (my initial thought is we want to save posts in markdown format in the db).
#203 I know it can output html, which we could potentially convert to markdown. That might require limiting the feature set. I don't know what the alternatives are and how good they are.
#168 I should amend what I wrote to say "transactions that write" instead of "writes". You can do a *bunch* of writes in a single transaction in SQLite and it'll be very fast. But if you need to update the database with a bunch of writes and have the results of each write be instantly available to other readers after each write completes, then that requires a lot of transactions and would be slower in SQLite than it would be in other databases.
#204 I think html -> markdown will be harder than markdown -> html. I had a look around Trix docs but couldn't see anything advertising markdown support. There are other options tho, like some pure JS libraries for rendering markdown.
mb a good way to handle things: on the edit/create page we show raw markdown with a preview; the backend only sees markdown. then we can render markdown as HTML (user configurable?) for posts/comments etc. Adding buttons to do bold, etc, should be pretty straight forward even without a library (i.e. implementing it manually ourselves)
#202 it'd be nice and more user friendly if we had a rich text input option (preferably save as markdown) in addition to markdown input.
#207 think about tech illiterate people who repeatedly screw up quoting on this site or in email. markdown input with some buttons will confuse them and make the site a lot harder for them to use than FB or twitter. a WYSIWYG option will be easier for lots of people and is expected.
#206 I found this blogpost which goes through a few attempts to find a good markdown integration. https://www.codefellows.org/blog/how-to-create-a-markdown-friendly-blog-in-a-rails-app/
The author ends up going with the ruby library https://github.com/vmg/redcarpet (which looks active). apparently it was developed at github. looks like it has a decent feature-set and is configurable (so we could add extensions to do stuff like link comments like curi.us does). the rendering is done serverside, tho, so my guess is that we'd want to cache that. IDK tho; like maybe it'd make html-injection attacks easier. mb serverside rendering would be low overhead.
#206 i don't see much in the way of docs for trix but it looks nice so i skimmed some code:
https://github.com/basecamp/trix/blob/main/src/trix/models/document.coffee
some document export methods there:
toSerializableDocument
toString
toJSON
toConsole
and imports:
fromJSON
fromHTML
fromString
https://github.com/basecamp/trix/blob/main/src/trix/models/text.coffee
the text class has the same exports and the fromJSON import
and there's an html parser class
https://github.com/basecamp/trix/blob/main/src/trix/models/html_parser.coffee
#207
> it'd be nice and more user friendly if we had a rich text input option (preferably save as markdown) in addition to markdown input.
I think we can do that entirely client-side (in the UI) if we want. It can be the default.
Also WRT quoting I think there are win-win options like a feature that shows a 'quote' button when you select someone else's text that auto-inserts it into the reply field. it's win-win in the sense that it's useful for ppl who already know how to do good quoting.
> markdown input with some buttons will confuse them and make the site a lot harder for them to use than FB or twitter.
IDK, like reddit has pretty basic formatting stuff from memory. or at least WRT their old UI (can't remember wrt new UI).
#209 curi.us (and FI) uses redcarpet 2.3.0 for posts (but not comments). I chose redcarpet because I found a version that was compatible with old ruby. my research on this is some years old, but i recall there are a few major alternatives.
https://www.sitepoint.com/markdown-processing-ruby/
> We’re going to focus on 4 Ruby implementations of Markdown: kramdown, maruku, rdiscount, and redcarpet.
IIRC kramdown is what's used by my jekyll setup for https://www.elliottemple.com
if you wanted browser markdown rendering (good for e.g. live preview while writing a post, though i'd prefer a nice whole js editor someone already made) then you'd google for javascript markdown libraries not ruby ones.
> IDK, like reddit has pretty basic formatting stuff from memory. or at least WRT their old UI (can't remember wrt new UI).
most people who use reddit do not know how to use their text input stuff correctly, and a ton of people do not post or post less because they don't know how. reddit is incompetent and is massively harming their business by having an awful website (in dozens of ways). a tiny, self-selected group of reddit users do know how to use it and write quite a few of the comments, but a lot of people who write comments use few to no formatting features.
#209 that blogger is confused:
> Jekyll uses Liquid as a templating language, and I was using Rails’ out-of-the-box ERB. As far as I can tell, there is no way to make those two play nicely together.
you don't make templating languages play nicely with each other. but Rails supports using multiple templating languages in one project – just name files with different extensions. i don't know if there is any issue using Liquid files with Rails, but the issue is not about getting Liquid to play nice with ERB. also templating language and markdown to html conversion are separate issues.
btw there are some command line markdown to html tools, some of which may be pretty good, and they may also be relevant for exporting forum content as pdf, epub, mobi, etc. (cuz they'll do markdown -> html -> various. maybe some is done without the html in the middle, idk.) one of the tools that looks particularly promising is https://pandoc.org/index.html which has a ruby wrapper https://github.com/xwmx/pandoc-ruby
#214 re pandoc: yeah that's the 'anything -> anything' converter. we might lose stuff going from html -> markdown, tho.
pandoc is definitely useful for exporting.
personally I strongly prefer doing posts in markdown and just wrapping that on the client side. more flexible imo and can still have a rich WYSIWYG style editor. otherwise we entirely lose plaintext editing. plus, with a rich editor, users couldn't paste in markdown. IDK how they'd past stuff in without doing markdown->html on paste, anyway. Paste does support rich text but it'd need to be parsed. mb the rich editors support that already tho.
#213 yup, i didn't take into account selection bias there (regarding posts and posters). also my own bias of which subreddits i've browsed in the past.
#215 i don't understand what you're arguing with. i wasn't talking about not having markdown input support. i want markdown as an input. i also want a user friendly input option.
rails recursive includes
This isn't general, but i did find a way to do 'recursive' includes. One reason I did all the arel stuff (see models/node.rb) was b/c doing this sort of thing wasn't obvious. Note: it's not as good as the recursive method IMO, and doesn't consider permissions:
Node.includes(direct_children: [:direct_children, { direct_children: [:direct_children] }])
The debut output for SQL queries looks like:
Node Load (1.6ms) SELECT "nodes".* FROM "nodes" WHERE "nodes"."id" = $1 LIMIT $2 [["id", 0], ["LIMIT", 1]]
Node Load (1.0ms) SELECT "nodes".* FROM "nodes" WHERE "nodes"."parent_id" = $1 [["parent_id", 0]]
Node Load (1.2ms) SELECT "nodes".* FROM "nodes" WHERE "nodes"."parent_id" IN ($1, $2, $3, $4, $5, $6) [[nil, 2], [nil, 3], [nil, 4], [nil, 1], [nil, 14], [nil, 18]]
Node Load (1.1ms) SELECT "nodes".* FROM "nodes" WHERE "nodes"."parent_id" IN ($1, $2, $3, $4, $5, $6) [[nil, 5], [nil, 6], [nil, 15], [nil, 20], [nil, 21], [nil, 19]]
cf forum rails db progress
done a bunch of work on using db views and caching some stuff we can easily calculate incrementally.
using Node.insert_all! the seeds file can generate 88k nodes (forum posts) in under 3 minutes - fast enough to be useful.
using materialized views: all the permissions stuff is cached and refreshes are fast - 400ms with concurrently=false (blocks reads) and 800ms with concurrently=true.
rendering is fast: although there's some n+1 ish stuff going on, viewing a topic with 30k descendants takes less than 200ms and the SQL queries take like 4ms each. Note: this is capped at a depth of 3, and the tree has branching factor 3, so 40 nodes or so.
When I set max_depth = 4 (so ~120 nodes), this is the split between active record and views:
Completed 200 OK in 558ms (Views: 411.1ms | ActiveRecord: 139.1ms | Allocations: 248131)
with max_depth=3, some stats:
db progress: all pages rendering in <100ms with permissions.
some pictures / profile stats: https://github.com/fi-tools/cf-forum/pull/23#issuecomment-768514477
one thing I forgot to mention with #218 is that it's a very rails-y solution, too:
Ruby on Rails Creator Takes on JavaScript Frameworks with Hotwire
todo lists (that u can check off) and calendar stuff (e.g. notify at certain time, or recurring events like being asked a question on a schedule) are possible features.
Trying more to find something that someone else already made that could work. Anyone want to research that or make recommendations? That'd be helpful, thanks.